What does your next data-driven project have to do with data stewardship?
Well actually a lot if you want to get the most out of your data. Many companies today are filling the data lake with vast amounts of structured and unstructured data. But they tend to forget an important fact: on average, organizations believe that 32 percent of their data is inaccurate. Sounds like addressing this data quality issue before your data lake turns to a data swamp is a must, not an option, right? That is where data stewardship comes into play.
Data stewardship is becoming a critical requirement for successful data-driven insight across the enterprise. And cleaner data will lead to more use, while reducing the costs associated with “bad data quality” such as decisions made using incorrect analytics.
If you think of all the data you need to work with each day, you know that often it is incomplete and sometimes incorrect. You may be able to fix it since you know it, but that process does not scale when dealing with vast amounts of data and when other groups “bring their own data” and know what it should look like. Also, let’s not forget that using email or Excel to resolve data quality issues one by one is not very efficient, not to mention the risks that come with the proliferation of uncontrolled copies of potentially sensitive data everywhere in the enterprise across file folders. You need purposed tools, processes and polices to effectively and sustainably manage data quality.
As a critical component of data governance, data stewardship is the process of managing the lifecycle of data from curation to retirement. Data stewardship is about defining and maintaining data models, documenting the data, cleansing the data, and defining the rules and policies. It enables the implementation of well-defined data governance processes covering several activities including monitoring, reconciliation, refining, deduplication, cleansing and aggregation to help deliver quality data to applications and end users.
In addition to improved data integrity, data stewardship helps ensure that data is being used consistently through the organization, and reduces data ambiguity through metadata and semantics. Simply put, data stewardship reduces “bad data” in your company, which translates to better decision-making and the elimination of the costs incurred when using incorrect information.
Traditionally, data stewardship tasks are assigned to a staff of data experts, the so-called data stewards. But the challenge is that there are few data stewards in a company and they are generally dedicated to high risk projects, such as regulatory compliance. In the absence of data stewards, nobody knows who is accountable for data quality, and that is what leads to a frustrating situation where organizations are fully aware that almost one third of their data assets are not accurate, but nobody acts on it.
With more data-driven projects, “bring your own data” projects by the line of business, and increased use of data by data workers such as data scientists, marketing and operations, there presents a need to rethink data stewardship. Next generation data stewardship tools need to evolve to support:
With Talend Winter ’17, we are proud to launch a new capability, the Talend Data Stewardship app, a comprehensive tool you can use to configure and manage data assets, that addresses the quality challenges holding your data-driven projects back.
More than a tool just for data stewards with specific data expertise, IT can empower business users to use a point-and-click, Excel-like tool to curate their data. With Talend Data Stewardship you can manage and quickly resolve any data integrity issue to achieve “trusted” data across the enterprise.