The professional world for data experts has been evolving for quite some time. However, in more recent years there has been a fundamental shift both in the technical landscape and the culture surrounding data professionals. These recent evolutions in data management have led to the creation of the field called data engineering. Before I delve into what data engineering looks like today, it helps to discuss how we got here.
Prior to the arrival of "big data" systems, data professionals typically carried job titles such as "Database Developer", "Database Administrator", "Data Architect", and "BI Developer". Keep in mind these jobs still exist today. I have personally carried all of these titles at one point in my career before building and selling my own systems integrator. Each of these job titles typically has some degree of overlap depending on the employing organization's IT culture. Below I briefly define each of these yesteryear roles:
You may notice the absence of the data scientist. That's because that role became prominent only in the past 10 years. It was the business analyst that did the bulk of number crunching, mostly from a historical perspective (a.k.a. Business intelligence). All of these roles generally worked on symmetric multiprocessing database systems (SMP). Examples include PostgreSQL, MySQL, SQL Server, Oracle, and DB2. However, as massively parallel processing (MPP) databases became more prevalent in the 2000s, the above roles began working with ever increasing data volumes. And example of an MPP database is Greenplum Database, an open source MPP data warehouse with a PostgreSQL heritage. Enterprises were continuing to invest more in their data processing systems with the aim of uncovering insights in these huge volumes of data that would result in competitive advantage over rivals.
While the roles I described earlier are still important to this day, they were not sufficient in a big data world. Data engineering was born with the mainstream adoption of “big data” architectures and systems - most notably Apache Hadoop. In addition, many other non-database frameworks emerged across the data processing and persistence landscape. Unlike the data professionals of the past, data engineers must have the ability to develop across a large variety of languages and processing frameworks.
The first formal data engineers were primarily focused on Apache Hadoop. They were paid to conduct data wrangling experiments leveraging experimental software developed by some of the largest data crunching organizations such as Facebook, Google, and Yahoo. By 2010, mainstream enterprises were adopting Hadoop. The field of data engineering went from a small niche to becoming a major trend in IT shops across the country. This was the first generation of formal data engineers.
Another important theme that drove the new field was the clear distinction between those that worked with data processing systems and those that derived advanced analytics from them. No longer was it enough to collate, cleanse, and display what had happened in the past. Companies needed the ability to both forecast what would happen (predictive insights) as well as what action to take on such future events (prescriptive insights). While these advanced analytics were possible in the past, few had the necessary skills, and thus the birth of the data scientist. The data scientist provides the perfect complement to the data engineer.
Today, the second generation of data engineers is firmly established. Modern day data engineers have to be even more adaptable as the pace of technology continues to increase. Specialized data processing systems rule the industry landscape and the ability to quickly adapt is a critical skill.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20