Data Workshop with Vladimir
Saturday, July 9th, 11:00 – 16:00 (4 sessions with pizza break).
GE Healthcare ul. Życzkowskiego 20
Let me invite you for Data Workshop #3.
You can check how it were on previous workshops: Data Workshop #2 - http://www.meetup.com/datakrk/events/231590232/ and Data Workshop #1 - http://www.meetup.com/datakrk/events/230392309/.
Let’s focus on very important topic - Evaluating Machine Learning Models. Without understanding this key area rest in ML is useless… really.
"We don’t have better algorithms. We just have more data.” (c) Google's Research Director Peter Norvig. And question is - is it an axiom? And how can you carefully check that? In general all machine learning (data science) is about a lot of hypothesizes, but which one is better? Of course you can guess, but the reality is it depends on challenge and this is why you should verify carefully it one by one. And when you understood it, the question is how to verify it properly?
Let me put an example.
You’re a pilot and you have a lot of tools, one of them is to measure attitude (roughly speaking how close land is) - altimeter [https://en.wikipedia.org/wiki/Altimeter]. But you have few other options how to measure it, e.g. just look out of the window and estimate by sight how far the land is :). Which option do you preffer (especially if you’re on the board)?
What you will learn
You will learn key concepts and pitfalls. What type of metrics exists for regression, classification and ranking? How to validate result to avoid overfitting? What the difference between: hold-out validation, cross-validation and bootstrap, what is better? How it works inside on intuitive level? What is better underestimate or overestimate? And why one metric is good for one case and bad for other? And finally what does mean “good” or “bad” and for who?
GE Healthcare will provide pizza during the lunch break
About the speaker
I like traveling, also in IT world. I worked in different areas in IT (with different technologies). A lot of things happened in this time… I don’t remember all of them, but last 2-3 years I spend my time related with data. I was involved in building infrastructure for Big Data, I was preparing ETL (Hadoop stuff) and analyzed data (sales forecasting) and so on. In my free time I’m learning from MOOC (Coursera, Udacity, edX and so on), books and I take participation on the Kaggle. I love solve problem related with data.
• Basic knowledge of python
• Install anaconda ( http://continuum.io/downloads ) or install manually those packages: ipython, scikit-learn, pandas, ggplot
• Install ml_metrics (pip install ml_metrics) - https://github.com/benhamner/Metrics/tree/master/Python/ml_metrics
• Use this script to verify your environment - https://github.com/dataworkshop/prerequisite
Please bring your laptop with you
Please come an hour before if you need help with setting up the environment!
HOW TO GET TO THE MEETING?
By public transportation
You can reach our office by tram 4, 5, 9, 10, 52 or 72. The nearest stop is 'AWF'
and have a walk along Politechnika Krakowska buildings (Życzkowskiego street). Avia building is located at the end of this small street. Remember that total travel time from the city center may take around 30 minutes.
On Saturday there will be plenty of places to park the car next to Życzkowskiego street