As data-driven business models, digital transformations, big data analytics and the like continue to rise, they challenge the conventional data governance process.
They also provide opportunity to place data governance at the center of important business changes, according to participants in last week's Enterprise Data Governance Online 2017 webinar.
Among the most challenging new developments is the data lake, which, in its most basic form, eschews upfront curation and categorization of data. Curation, which includes cleansing data and assuring its consistency, is among the hallmarks of the data governance process.
Effective data governance can be applied to a Hadoop data lake, according to Shannon Fuller, director of data governance at Carolinas HealthCare System, based in Charlotte, N.C. The data-lake path was chosen for an innovative big data project, he said, because it could encourage more rapid application development and create a common repository, while protecting patients' information and protecting intellectual property.
"We decided this would not be another data warehouse," Fuller said. "It would be stand-alone assets available to the whole organization."
One road to reports, another to sandbox Fuller said his organization is using a twofold path that prepares sets of curated data carefully for both business users and data scientists. Driving the project is Carolinas HealthCare's push to look at a patient's overall treatment plan, taking disparate data into account and making decisions on compensation models. Fuller described his operation as an IBM InfoSphere shop, but said the pilot data lake was accomplished using Microsoft's HDInsight and Azure Data Lake Store. Tresata software was used to catalog some of the source data, according to Fuller. Once treated, data is then pushed back into the Azure Data Lake Store to be further analyzed, or to feed reports and executive dashboards.