Note: Please remember to sign up with Skills Matter: https://skillsmatter.com/meetups/10908-london-data-science-journal-club-may
Imbalanced data is a frequently occurring problem. For instance in fraud detection, most transactions are not fraudulent. When detecting genes in DNA sequences, most DNA is not coding for a gene. Similar problems occur in the insurance business or web traffic analytics. Machine learning from imbalanced classes is challenging because standard metrics can give counterintuitive results. Some of our favourite ML tools won't work as expected.
In this meeting of the London Data Science Journal club, we will try to understand the different techniques to deal with imbalanced data. Instead of focusing on a single paper, we will split up into groups and pick a particular algorithm, metric or software related to imbalanced data.
Towards the end of the meetup, we will get together again to
discuss our results.
Below are some literature references to get started, but please do come along and tell us about your own experiences. Please post in the comments any links that you find that are useful!
N Chiwala: Data Mining for Imbalanced Datasets
T. Hoens & N. Chiwala: Imbalanced Datasets: From Sampling To Classifiers
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
King & Zong : Logistic Regression in Rare Events Data
A note about the Journal Club format:
1. There is no speaker at Journal Club.
2. There is NO speaker at Journal Club.
3. We split into small groups of 6 people and discuss the papers. For the first hour the groups are random to make sure everyone is on the same page. Afterwards we split into blog/paper/code groups to go deeper.
4. Volunteers sometimes seed the discussion by guiding through the paper highlights for 5 mins. You are very welcome to volunteer in the comments.
5. Reading the materials in advance is really helpful. If you don't have time, please come anyway. We need this group to learn together.
Claim the event and start manage its content.I am the organizer