Surely we have entered the era of “big data”. Thanks to, for instance, activities on social media, the volume of data created by humankind is growing faster than ever before.
When “big data” is used more as a marketing buzzword, what many fail to put into perspective is how “big” is big. What even more bizarre is few are able to keep in mind that “big” or “small” is only meaningful comparatively. A terabyte of data is a huge amount of data for a commonplace PC but probably wouldn’t move the needle for a cloud service provider.
In this presentation, Fei would like to share lessons learned when dealing with “big” data for a home PC in a recent Kaggle competition. Kaggle is a social site that features sponsors with enterprise data science modelling challenges plus generous prizemoney and data science experts, enthusiasts, or beginners from all over the globe to solve the problems. The competition Fei was in expects participants to predict whether or not a Chinese mobile user installs an app when exposed to an ad for it. Although the training set only covers about three days of data, billions of records add up to cause a lot of head-scratching moments with a local desktop computer, especially during the feature engineering process.
Lessons learned would be beneficial to deal with “big” data for home PCs, before giving up and surrendering to cloud computing prematurely.
Fei Zhan has a PhD in theoretical physics from Augsburg University, Germany and worked as a research fellow on strong-correlated systems of cold atoms at University of Queensland. After that he has been working as data scientist, analyst, engineer on data projects in a variety of industries and businesses.
Our Sponsor this month:
AARNet Pty Ltd (APL) is the not for profit company that operates Australia's Academic and Research Network (AARNet). The shareholders are 38 Australian universities and the CSIRO.
AARNet provides high capacity Internet and other communications services for the nation's research and education community, including universities, health and other research organisations, schools, vocational training providers and cultural institutions. AARNet serves over one million end users who access the network for teaching, learning and research.
For further information, please visit: www.aarnet.edu.au (http://www.aarnet.edu.au/)
Location and Time:
Our event space is sponsored by Thoughtworks:
We're a software consultancy and community of passionate, purpose-led individuals. We think disruptively to deliver technology to address our clients' toughest challenges, all while seeking to revolutionise the IT industry and create positive social change.
The event is at the Thoughtworks offices, level 19, 127 Creek Street, with official kick-off at 6:00 pm.