But, What is a Data Scientist?Posted: 17 January, 2015
I have started explaining how to become a Data Scientist, but … What is a Data Scientist? Is there an Official University or Professional career with this name? Do you need a Certification to show your skills as a Data Scientist? What are the main skills that you need to be named as a Data Scientist?
Data Scientist is one of the latest emerged jobs and maybe the sexiest job of the century as you can see in detail in this Forbes article.
A really funny description of Data Scientist I have founded in a tweet: “Person who is better at statistics than any software engineer and better at software engineering than any statistician”.
Then, do you need to study Statistics? Do you need to study also Software Engineering? Is there a career about Data Science?
There are a lot of courses, masters and technical trainings you can find in internet. For example in Barcelona there is an interesting Data Science Master awarded from the Graduate School of Economics, the University Autonomy of Barcelona and the Pompeu Fabra University. These universities understand that this program will be for:
- Graduates in Economics and Business with solid background and keen interest in quantitative methods
- Graduates in Statistics, Mathematics, Engineering, Computer Science, and Physics with the ambition to work with real-world problems and data
- Programming professionals who want to acquire analytical, quantitative tools to leverage their experience
- Aspiring PhD students looking for rigorous training in quantitative and analytical method
Is difficult to focus this career in one, two or three Graduates. But we can look at the Skills acquired in this master program for finding new clues:
- Seize the opportunity of data-driven value creation within an organization
- Recognize appropriate statistical methodologies and optimization techniques for complex problems
- Work with database management systems and distributed processing in a cloud computing environment
- Gain experience analyzing Big Data from the Internet of Things (industrial sensor data), the Internet of People (social and location data) and business transaction data
- Communicate data analysis results effectively with data presentation and aesthetic charting skills
- Work in a data-driven, heterogeneous and research-oriented environment
Again we have here a lot of different skills from different professional careers, then it’s quite difficult to have an unique career for a DS without having any other technical career.
In my opinion the people best positioned to be a DS is the people who understand the business part whatever the industry will be. If you have a very good understanding of the business and have a technical career the DS path will help you to increase your value in the organization. Is part of a data scientist’s mission to understand the business needs and evaluate potential solutions that can deliver return on investment to the business. So, through customer engagements team, DS work with customers to identify use cases and proofs of concept, where DS identify the challenges to help them start on the journey.
This is the most amazing part of the work of a DS, help the customers to understand better the strategy of their business to improve the ROI from the data their have inside their organizations and overall outside, mainly the data from social media. The companies have to understand the market to manage their strategies, that’s why they need all the info they can analyze to better oriented them. Here you can find the best solutions from Data Scientist experts.
If you explain this on that way you will imagine that DS are the key to improve your market, then you must understand that this people has to be really well prepared and that they must have a lot of experience in the business and technical world.
In this article from Jeffrey Leek you can find what type of analysis a DS should know:
- Descriptive: the discipline of quantitatively describing the main features of a collection of data. In essence, it describes a set of data.
- Exploratory: An approach to analyzing data sets to find previously unknown relationships.
- Inferential: That is, use a relatively small sample of data to say something about a bigger population.
- Predictive: In essence, to use the data on some objects to predict values for another object.
- Causal: To find out what happens to one variable when you change another.
- Mechanistic: Understand the exact changes in variables that lead to changes in other variables for individual objects.
Turning back to the Data Scientist origin, in this article from Vincent Granville in Data Science Central you will find that there are these different categories of Data Scientist:
- Those strong in statistics: they sometimes develop new statistical theories for big data, that even traditional statisticians are not aware of. They are expert in statistical modeling, experimental design, sampling, clustering, data reduction, confidence intervals, testing, modeling, predictive modeling and other related techniques.
- Those strong in mathematics: NSA (national security agency) or defense/military people working on big data, astronomers, and operations research people doing analytic business optimization (inventory management and forecasting, pricing optimization, supply chain, quality control, yield optimization) as they collect, analyse and extract value out of data.
- Those strong in data engineering, Hadoop, database/memory/file systems optimization and architecture, API’s, Analytics as a Service, optimization of data flows, data plumbing.
- Those strong in machine learning / computer science (algorithms, computational complexity)
- Those strong in business, ROI optimization, decision sciences, involved in some of the tasks traditionally performed by business analysts in bigger companies (dashboards design, metric mix selection and metric definitions, ROI optimization, high-level database design)
- Those strong in production code development, software engineering (they know a few programming languages)
- Those strong in visualization
- Those strong in GIS, spatial data, data modeled by graphs, graph databases
- Those strong in a few of the above.
In this article also you will find a really interesting mind map about what is a “health” Data Scientist, please, look at it to better understand what is a DS it’s really illustrative.
But if you think that you need to solve a problem just one DS can help us independently the technical solution she/he must apply, but this DS knows everything from technical perspective.
Well again in Data Science Central you can read this article to better understand that you can be a Vertical DS or an Horizontal DS:
- Vertical data scientists have very deep knowledge in some narrow field. They might be computer scientists very familiar with computational complexity of all sorting algorithms. Or a statistician who knows everything about eigenvalues, singular value decomposition and its numerical stability, and asymptotic convergence of maximum pseudo-likelihood estimators. Or a software engineer with years of experience writing Python code (including graphic libraries) applied to API development and web crawling technology. Or a database guy with strong data modeling, data warehousing, graph databases, Hadoop and NoSQL expertise. Or a predictive modeler expert in Bayesian networks, SAS and SVM.
- Horizontal data scientists are a blend of business analysts, statisticians, computer scientists and domain experts. They combine vision with technical knowledge. They might not be expert in eigenvalues, generalized linear models and other semi-obsolete statistical techniques, but they know about more modern, data-driven techniques applicable to unstructured, streaming, and big data. They can design robust, efficient, simple, replicable and scalable code and algorithms.
In my opinion Vertical Data Scientist are people who are starting in this path, as you can read an Statistician or a Software Engineer with some DS expertise even a database guy with strong BI and Big Data knowledge. And Horizontal data scientist who is more an expert from the business side and with the technical knowledge necessary to understand the best solution to apply in every need.
The best recommendation is to have a Data Science Team with different expertise to have the best solution in every business case.
Finally in this image you will find the path you could follow to have the necessary knowledge to be a Vertical or Horizontal Data Scientist, enjoy it!