Special: Learning from imbalanced data

May 3, 2018 · London, United Kingdom

Note: Please remember to sign up with Skills Matter: https://skillsmatter.com/meetups/10908-london-data-science-journal-club-may

Imbalanced data is a frequently occurring problem. For instance in fraud detection, most transactions are not fraudulent. When detecting genes in DNA sequences, most DNA is not coding for a gene. Similar problems occur in the insurance business or web traffic analytics. Machine learning from imbalanced classes is challenging because standard metrics can give counterintuitive results. Some of our favourite ML tools won't work as expected.

In this meeting of the London Data Science Journal club, we will try to understand the different techniques to deal with imbalanced data. Instead of focusing on a single paper, we will split up into groups and pick a particular algorithm, metric or software related to imbalanced data.

Towards the end of the meetup, we will get together again to
discuss our results.

Below are some literature references to get started, but please do come along and tell us about your own experiences. Please post in the comments any links that you find that are useful!

N Chiwala: Data Mining for Imbalanced Datasets
https://www3.nd.edu/~dial/publications/chawla2005data.pdf

T. Hoens & N. Chiwala: Imbalanced Datasets: From Sampling To Classifiers
https://www3.nd.edu/~dial/publications/hoens2013imbalanced.pdf

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
http://www.jmlr.org/papers/volume18/16-365/16-365.pdf

King & Zong : Logistic Regression in Rare Events Data
https://gking.harvard.edu/files/0s.pdf
Also interesting
https://statisticalhorizons.com/logistic-regression-for-rare-events

A note about the Journal Club format:
1. There is no speaker at Journal Club.
2. There is NO speaker at Journal Club.
3. We split into small groups of 6 people and discuss the papers. For the first hour the groups are random to make sure everyone is on the same page. Afterwards we split into blog/paper/code groups to go deeper.
4. Volunteers sometimes seed the discussion by guiding through the paper highlights for 5 mins. You are very welcome to volunteer in the comments.
5. Reading the materials in advance is really helpful. If you don't have time, please come anyway. We need this group to learn together.

Event organizers

Are you organizing Special: Learning from imbalanced data?

Claim the event and start manage its content.

I am the organizer
Social
Rating

based on 0 reviews