The Emergence and Future of the Data Engineer

The Emergence and Future of the Data Engineer

The Emergence and Future of the Data Engineer

The professional world for data experts has been evolving for quite some time. However, in more recent years there has been a fundamental shift both in the technical landscape and the culture surrounding data professionals. These recent evolutions in data management have led to the creation of the field called data engineering. Before I delve into what data engineering looks like today, it helps to discuss how we got here.

Prior to the arrival of "big data" systems, data professionals typically carried job titles such as "Database Developer", "Database Administrator", "Data Architect", and "BI Developer". Keep in mind these jobs still exist today. I have personally carried all of these titles at one point in my career before building and selling my own systems integrator. Each of these job titles typically has some degree of overlap depending on the employing organization's IT culture. Below I briefly define each of these yesteryear roles:

You may notice the absence of the data scientist. That's because that role became prominent only in the past 10 years. It was the business analyst that did the bulk of number crunching, mostly from a historical perspective (a.k.a. Business intelligence). All of these roles generally worked on symmetric multiprocessing database systems (SMP). Examples include PostgreSQL, MySQL, SQL Server, Oracle, and DB2. However, as massively parallel processing (MPP) databases became more prevalent in the 2000s, the above roles began working with ever increasing data volumes. And example of an MPP database is Greenplum Database, an open source MPP data warehouse with a PostgreSQL heritage. Enterprises were continuing to invest more in their data processing systems with the aim of uncovering insights in these huge volumes of data that would result in competitive advantage over rivals.

Read Also:
Cloud Client-Computing Streamlines Healthcare Infrastructures

While the roles I described earlier are still important to this day, they were not sufficient in a big data world.  Data engineering was born with the mainstream adoption of “big data” architectures and systems - most notably Apache Hadoop. In addition, many other non-database frameworks emerged across the data processing and persistence landscape. Unlike the data professionals of the past, data engineers must have the ability to develop across a large variety of languages and processing frameworks.

The first formal data engineers were primarily focused on Apache Hadoop. They were paid to conduct data wrangling experiments leveraging experimental software developed by some of the largest data crunching organizations such as Facebook, Google, and Yahoo. By 2010, mainstream enterprises were adopting Hadoop. The field of data engineering went from a small niche to becoming a major trend in IT shops across the country. This was the first generation of formal data engineers.

Another important theme that drove the new field was the clear distinction between those that worked with data processing systems and those that derived advanced analytics from them. No longer was it enough to collate, cleanse, and display what had happened in the past. Companies needed the ability to both forecast what would happen (predictive insights) as well as what action to take on such future events (prescriptive insights). While these advanced analytics were possible in the past, few had the necessary skills, and thus the birth of the data scientist. The data scientist provides the perfect complement to the data engineer.

Read Also:
Developing a data ethics framework in the age of AI

Today, the second generation of data engineers is firmly established. Modern day data engineers have to be even more adaptable as the pace of technology continues to increase. Specialized data processing systems rule the industry landscape and the ability to quickly adapt is a critical skill.



Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
Ten Ways To Improve IT Culture with Agile, DevOps, Data, and Collaboration

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
Microsoft brings AI to healthcare with new partner program

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
Make Customers Happy with Immediate, Actionable Data Insights

Real Business Intelligence

11
Jul
2017
Real Business Intelligence

25% off with code RBIYM01

Read Also:
Developing a data ethics framework in the age of AI
Read Also:
Managing data across Agile, DevOps and IT Service Management

Advanced Analytics Forum

20
Sep
2017
Advanced Analytics Forum

15% off with code Discount15

Read Also:
Artificial intelligence and the art of reader-driven publishing

Leave a Reply

Your email address will not be published. Required fields are marked *