IMG_4227.PNG

Often paired with the term “Data Engineering” is also the term “Data Science”.

IMG_4243.PNG

According to Kelle O’Neal and Charles Roe:
“Data Science allows enterprises the ability to turn their data assets into a narrative.  Data Science allows that narrative to be expanded across timelines, in different data spaces that trace from the past into the future, with much more involved questions and answers about an enterprise, different potential outcomes, and repercussions based on recommendations. Data Science employs a range of mathematical, business, and scientific techniques to solve complex problems about an organisation’s data assets.” [2]

IMG_4087 2.PNG

In contrast, the focus of the Data Engineer is on the process from data curation to dissemination and the focus of the Data Scientist is on the analytics of the data, thus extracting knowledge from the data.

To achieve quality data capture, near-real-time accessibility and meaningful analytics, one cannot function without the other, and effective teamwork optimises the value of each role.  As such, an analytics team would be composed of distinct roles/capabilities [1]:

  • Data Engineers (in areas such as database architecture, database development, machine learning architecture, ETL scripting , etc.)   

  • Data Scientists

  • Business Analysts

IMG_4246.PNG

Data Engineering brings together the broad expertise, of these roles, to ensure the data are curated and accessible to the Data Scientist, and in our environment today, this process is becoming more and more complex.  Therefore, expertise in curating big-data and data of varying formats (structured and unstructured) is a critical core competency to optimise the potential impact of these digital assets (i.e. the data).

IMG_4247.PNG

The Data Scientist works deep in the data, utilizing various tools and techniques to discover patterns in the data that may drive decision making for the business.  Optimising utilisation of the data to enable accurate conclusions can bear greater value to the organisation. As an example, per Tom Eunice’s post, “a fraud-detection algorithm may be very accurate when based on many months of historical data. However, months of historical data may not always be available. Designing a fraud-detection model that is still accurate using historical data from only a few days would be of more use and more practical to implement.” [1]

The Business Analyst helps the Data Scientist understand the meaning of the data and the relevance of any discovered relationships. Initially, uncovering relationships in the data and upon further investigation, identifies meaningful patterns that may reveal information that otherwise may not have been known. [1]

The full complement of the roles in an analytics team is what drives the business value.  One discipline without the other (e.g. data engineering without data science) will result in missed opportunities.


Data Engineering and Data Science
Bridging the gap

Josh Wills is the head of data engineering at Slack. Prior to Slack, he built and led data science teams at Cloudera and Google. He is the founder of the Apache Crunch project, co-authored an O'Reilly book on advanced analytics with Apache Spark, and wrote a popular tweet about data scientists. This is the only hat he owns.


REFERENCES
[1] Eunice, Tom. “Do Data Scientists Need Data Management.” IBM Big Data & Analytics Hub, IBM, 2015, www.ibmbigdatahub.com/blog/do-data-scientists-need-data-management.

[2] O'Neal, Kelle, and Charles Roe. “Business Intelligence versus Data Science: A DATAVERSITY 2015 Report.” DATAVERSITY, DATAVERSITY, 2015, http://whitepapers.dataversity.net/content54237/.