Automate the business intelligence pipeline

Automate the business intelligence pipeline

Automate the business intelligence pipeline

Demand for data by today’s business users is growing exponentially in two ways. First, business users have exhausted the opportunities in the data they hold. They want more sources of data to find new value, and they want the data to be accurate to deliver analytic outcomes. Second, the number of data-savvy business analysts is larger than ever and growing fast. To satisfy the increasing demand, IT departments must field a continuous stream of data requests—big and small.

While business execs see tremendous potential in putting more data at more users’ fingertips, organizations struggle to deliver that data. Traditional data integration tools, developed in the 1990s for populating data warehouses, are too brittle to scale to many data sources. They’re also prohibitively time-consuming and costly.

At the same time, new big data platforms only solve the data collection part of the problem. The data loaded into Hadoop is never enterprise-ready; it requires preparation before reaching a business user. The reality is that most enterprises are dealing with hundreds and thousands of sources. The only path to delivering clean, unified data across all of these sources is automation.

Read Also:
Cloud can drive innovation but only if you have the right Data Fabric

Let’s take a simple example from the insurance industry. For instance, business analysts want to understand claim risk of an upcoming flood. They pull customers for the affected geographic region and filter for those with active home insurance. Immediately they need to drill into high property values and sparsely populated areas. This requires individual policies to be classified into more granular categories. Then they want a broad, accurate view so that they can correlate customers with both home and car insurance (in a different system, of course).

They also want to enrich the data with up-to-date property value information, so they need to bring in an external benchmark of real estate pricing. Finally, this analysis needs to be done in near-real time to take action. Waiting a quarter or even a full month is out of the question. In all of these steps, the analysts need to be working off of clean and trusted data in order to draw correct conclusions and make data-driven decisions. To summarize the challenges that need to be met:

The two most challenging aspects of automating the delivery of data across many different sources are mastering and classification. Competent ETL engineers can do basic transforms like look-ups or minor calculations quickly and easily. But advanced tasks such as identifying global corporate entities or product categories across millions of records can't be easily scripted and maintained. We’ve all heard the example of matching “I.B.M.” to “International Business Machines,” but the problem is actually much more difficult. IBM has hundreds of subsidiaries, brands, and products. Mastering all of those together and bundling it up to be used by a businessperson is no small matching feat.

Read Also:
Top Stocks to Buy in Artificial Intelligence

At Tamr, we’ve built a solution to automate these complex tasks to ensure rich and accurate data. Tamr uses a machine-driven but human-guided workflow to ensure the automation is efficient, accurate, and trustworthy. Tamr uses machine learning algorithms to predict how individual records should be classified and matched (as products, organizations, or individuals). For instance, if an invoice comes in with the description “Latex Gloves,” our algorithm might classify it as “Laboratory Supplies” and match it to the product “Rubber Gloves.” Tamr uses the entire record to “predict” these classifications and matches—everything from the description to the price to less obvious indicators like who created that invoice.

To ensure that these predictions are accurate, we have a workflow for experts and users to give feedback. Tamr’s algorithms are built to iterate on the feedback. Under the covers, we use supervised learning techniques to tune weights and improve accuracy.



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Keys to Success for Chief Data Officers
Read Also:
Keys to Success for Chief Data Officers

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Data with personality: the evolution of visualisation

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Cloud can drive innovation but only if you have the right Data Fabric

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Four Ways to Survive the IT Operations Big Data Deluge

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Four Ways to Survive the IT Operations Big Data Deluge

Leave a Reply

Your email address will not be published. Required fields are marked *