Hello world! And a special hi to kaggle aficionados and data science enthusiasts out there! This next meetup is going to be really interesting! We will use a Kaggle challenge to focus on model evaluation and interpretation. We’re also happy to say that the meetup will be longer than usual, meaning we will have time to really study those models thoroughly 🧐
WHAT WE WILL DO
There are two types of reactions when a model produces 99.99% accuracy on your training set: (1) “this is the best model in the world and I’m a data science genius!” and (2) “nice! But could this thing be overfitting the training data?”. Overfitting is a common problem in data science, and it is just one aspect of model evaluation. In this meetup, we will deliberately skip the initial data cleaning and feature engineering part of the process and zoom in on model evaluation and interpretation.
Using data from a Kaggle challenge, we propose to first split the training set into training, validation and test subsets, in order to tune and evaluate different types of models. How do these models perform versus a simpler baseline model? Which of the models is doing a better job at predicting the results?
If we have time, we would like to have a look at feature importance in the models we evaluated in the first part. Knowing which features are most relevant allows us to use simpler models and to understand how these models work and how predictions are made.
To bring you up to speed with these topics, we recommend having a look at the video of Lesson 3 of Introduction to Machine Learning for Coders (http://course.fast.ai/lessonsml1/lesson3.html)
We will be hosted by NOVA IMS, the information management school of Universidade Nova de Lisboa (http://www.novaims.unl.pt/)
WHAT YOU NEED FOR THE MEETUP
You will definitely need
(1) to bring your own laptop
(2) to sign-up to Kaggle https://www.kaggle.com/account/login
WE ALSO RECOMMEND
(1) joining the group's slack channel: https://goo.gl/R6dpng
(2) installing Anaconda https://conda.io/docs/user-guide/install/index.html
Note that, although we recommend beginners to install Anaconda and work with Python, you are free to use whichever tool you prefer.
(3) reading some background material on model evaluation and interpretation (for example, the video of Lesson 3 of Introduction to Machine Learning for Coders http://course.fast.ai/lessonsml1/lesson3.html)
Claim the event and start manage its content.I am the organizer