Data science and data analysis are among the most confusing terms. Most people have a hard time differentiating them from one another.
In the past few years, I had job interviews with companies’ representatives for data science and data analytics positions. During those interviews I get very excited that I’m going to have a chance to practice data science in those big respectful companies. But my enthusiasm and excitement starts to vanish when I hear questions such as the following:
- Do you know how to deal with databases?
- What do you know about the “Data Warehousing”?
- Do you have prior experience gathering and securing data quality?
- What do you know about relational databases and UMLs?
Do not get me wrong. Knowing the answer of each one of the questions above is important. But that type of questions indicate at least to me that there is a bit of confusion even among professionals between “Data science” and “Data engineering”.
What is Data Science?
Data science has different definitions. The simplest definition is “The science that aims to explore knowledge through data analysis”. According to Wikipedia “Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured” .
Data scientists use machine learning algorithms, mathematical models and statistical models to study a certain phenomenon and probably come up with recommendations. Data science can use both descriptive and predictive models to extract insights to find business opportunities and threats. Data science can also help finding solutions to pollution issues, energy issues and resources optimization challenges.
What is Data Engineering?
We can define engineering as the field that uses science and technology in building and designing all sorts of systems to solve certain problems. From this general definition, we can say that data engineering is the field that focuses on building systems that dedicated to processing, analyzing and aggregating data.
Data engineer uses SQL, UMLs, and object oriented programming languages such as Python, C++, Java and so on.
Using all of those sets of skills, data engineer works on preparing the needed infrastructure for connecting, aggregating and processing data. Building data processing application, Relational Databases Management Systems RDBMSs in addition to big data frameworks that uses map reduce algorithms are all examples of data engineers’ related projects.
What is Data Analysis & Who is Data Analyst?
Data analysis has too many definitions. When I read some of Data analysis definitions I begin to wonder, am I the one who should take the blame for not understanding them? Or are they just too complicated.
The most essential definition of data analysis you should grasp is the following: Data analysis is the process of transforming data into information. it answers the following questions:
- What happened?
- Who made it happen?
- When it happened?
- How it happened?
- What is happening?
- Who is making it happen?
- How it is happening?
From this previous definition we can say that SEO analysis for websites we do is 1 type of data analysis. Data analyst works to answer different questions relate to the field of study. The data analyst presents these answers on a form of reports and dashboards. The data analysis team usually consumes structured and semi-structured data that is available to the organization to build those reports and dashboards.
What Is The Difference Between Data Analyst And Data Scientist?
Differentiating between data analyst and data scientist is a bit tricky, because each one of them seek to achieve business goals through data. They just use different methods and approaches to achieve these goals.
- As I said before, he uses all the structured and semi-structured data that is already stored in the company’s database.
- Needs SQL, OOP and data visualization set of skills to perform his tasks
- Uses RDBMSs, BI tools such as MSSQL server, Teradata, Tableau and power BI.
- His tasks outputs come in a form of reports or dashboards.
- Data scientist uses every resource he can get to perform his analysis. Therefore, he uses internal structured, semi-structured and non-structured data. Besides internal data, he also uses external structured, semi-structured and non-structured data.
- Uses R, SAS, SBSS and other analytics tools besides using the data analyst tools on his tasks.
- His tasks outputs come in a form of descriptive, predictive, diagnostic or prescriptive models.
Data science shares some aspects with data engineering and data analysis and integrates with them in other aspects. To understand that just imagine the following scenario:
Data engineer prepares the data, store’s it and logically connects it with each other. Data analyst pulls the data, explore it and builds useful informative reports out of it. Data scientist finds pain points or interesting points out of the prepared reports and then focuses his efforts on providing deep and advanced analytics on and about those points.