How to become a Data ScientistPosted: 2 January, 2015
When I was studying Statistics I though that this was the Degree with more opportunities in the future. But when I started working I saw that maybe I was wrong. In fact my first work was as a Java Junior Developer, nothing to do with my studies. More or less 15 years later this could have changed.
The evolution of the Technology has increased exponentially the power of the computers. To be simple, in the area of Data Management, computers let us two main things:
- One is store all data, here the challenge is clear if we think that every 60 seconds Google receives over 4,000,000 search queries, YouTube users upload 71 hours of new videos, Pinterest users Pin 3,472 photos, Facebook users share, 2,460,000 pieces of content and Twitter users share 277,000 tweet (Infographic How much Data is Generated every minute).
- And the second, now we can explode this data applying all kind of new and advanced data management techniques like algorithms for predicting patterns or using parallel processing with Terabytes of data to extract and process the valuable information. For example we can talk about Genetics Algorithms (GA) that use the nature to find the best solutions, you can find one simple exercise of GA in the R-bloggers site using R.
The evolution of the Technology let us connecting all kind of devices with sensors, and these sensors transmit all kind of data through internet to act depending of the data processed. This is Internet of the Things or IoT, in Europe there are some initiatives that promote the IoT world with lots of resources, guides, subventions, … (Internet of the Things Europe Initiatives). These connections will produce huge quantities of Data then we will need Petabytes of storage, the best Computer performance and the most advanced Applications to process the Data. Here again the two constants: storage and the data management techniques.
Thanks to this evolution we are changing the world of the Public Sector applying this tech to Smartcities, eHealth, Agriculture,etc. In the case of Smartcitiy we can find Barcelona which is the first in Spain and the fourth in Europe with projects like Intelligent Traffic Lights or Apps4bcn.
Thanks to my past as Statistician and the new era of Data Management I have started a new hobby several months ago “Data Scientist”. My curiosity started in Coursera with this course about Machine Learning done by one of the Co-Founders and Chairman of Coursera the Data Scientist Andrew Ng.This course was a little bit intensive for me and I couldn’t dedicate the time you need for learning this fantastic material, I hope turn back in the mid future.
Also I was delving in the world of Hadoop doing all kind of courses in the site Big Data University, here you will find all kind of courses related to the Hadoop environment like HBase, Hive, Pig or Jaql.
In Coursera appeared new courses related to Statistics and Data Science like Statistics One, Web Intelligence and Big Data or Computing for Data Analysis this one done by Roger D. Peng from the Jonhs Hopkings University. At that moment I found and interesting path about Data Scientist precisely done by the Jonhs HopKings University “The Data Science Specialization” , I have done several courses just listening the lectures and doing the practice I found interesting or needed in that moment.
One of the best investments was the purchase of the book “Developing Analytic Talent – Becoming a Data Scientist” by Dr Vincent GranVille a visionary Data Scientist with 15 years of big data, predictive modeling, digital and business analytic experience. Here you will find all about the Data Science World understanding what a Data Scientist is, the path for being an Horizontal DS and lots of extraordinary resources to increase your Knowledge about Data Science.
The last thing I’ve found in the web is the site Kaggle where you will find all kind of competitions and tutorials to increase your knowledge in the Data Science World and even you can participate for win the competition and earn money, but is not my objective and less right now, I have a very large roadmap to become a competitive DS.
With all this info and other that are less interesting I decide to better plan the way for enjoying in my hobby. I have focused my goals in three challenges:
- Data Science ApprenticeShip, deepining in the DS world with the book and the way established by Dr Vincent. There you also can participate in a real projects putting in practice all your knowledge acquired.
- In parallel, Coursera Data Science Specialization.
- And in the mid future I will continue my Kaggle Competitions.
In this Blog I’ll share how I enjoy in my hobby following the plan explained above. I’ll keep you informed.