Smart Data Platform – The Future of Big Data Technology

Smart Data Platform – The Future of Big Data Technology

Smart Data Platform – The Future of Big Data Technology

Data processing and analytical modelling are major bottlenecks in today’s Big Data world, due to need of human intelligence to decide relationships between data, required data engineering tasks, analytical models and it’s parameters. This article talks about Smart Data Platform to help to solve such problems.

The concept of Big Data has been in vogue for about 5 years now. Judging by Google Trends, big data was consistently and rapidly gaining increasing visibility from 2011 to 2015, after which its trendiness gradually flat-lined. The truth is that big data has moved beyond the “visionary” stage of development. People are now waiting for big data to be applied to numerous industries and generate a tremendous amount of value. TalkingData has being cultivating the field of big data in China for 5 years now. Having experienced rapid growth, we lead the big data apps industry for many traditional sectors. However, our growth has brought tremendous demands on our R&D, consulting, and data science resources. In order to ensure optimal service quality, we have had to turn away many potential clients. That’s because the value-realization process is extremely expensive. Aside from basic hardware and software investments, the biggest cost comes from human resources. A great deal of manpower is needed to build and maintain such applications. When we want to modify these apps’ goals, each change also requires further resources.

Read Also:
The Growing use of Big Data at Intelligence Agencies

 For the medium/small-sized businesses and traditional sector actors, what they really need is a relatively cheap and fancy-free version of big data—in other words, a big data platform that drastically lowers the entry requirement. Smart Data Platform is such a platform. It will drastically reduce a business’ cost to build, operate, and maintain their data platform. Businesses will be able to make their core businesses more efficient with minimal marginal cost; what’s more, they will be able to bolster their earnings from small cases and small scenarios without incurring prohibitively high expenses.

The idea of Smart Data Platform encompasses Data management, Data engineering, and data science. Right now big data’s biggest bottlenecks are data processing and analytical modelling. TalkingData have been working a solution to these two problems, and here we want to talk about their future outlook.

Currently data processing is almost entirely reliant on individual human minds. Humans are needed to decide how to cleanse, correct, standardize, and aggregate similar data—not to mention identifying data relationships. Before the arrival of big data, few regarded this as a problem. However, there have been a whopping 204 papers about data processing submitted to conferences (such as VLDB and SIGMOD) since big data became “hot” in 2012. However we are only beginning to tackle the problem of smart data processing, and there is no mature open source project or business product available. Drawing on our practical experience with and follow-up research on this topic, TalkingData has divided smart data processing into two phases—data relationship identification and data item aggregation.

Read Also:
Why Today’s Big Data is Not Yesterday’s Big Data — Exponential and Combinatorial Growth

Data relationship identification involves first identifying all the metadata in a set of tables/files, then using the relationship between the metadata to identify the relationship between the tables/files themselves. If we are to automate this process, we must first tackle three problems.

First and the simplest of three is that how would we directly identify metadata. This can be achieved by establishing rules based on human experience. For example, if we want to identify cell phone number fields, we can establish rules based on how cell phone number are usually named. Of course, it is unrealistic to expect that pre-established rules can cover all the scenarios, and here is where active learning comes in. When the case is uncertain, the user can intervene and make a decision—which the computer will use to establish new rules.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Data science is easy; making it work is hard
Read Also:
Deep Data is the New Big Data: Tips for Making Your Data Deep

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Why Today’s Big Data is Not Yesterday’s Big Data — Exponential and Combinatorial Growth

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Cloudera: Riding an IoT-driven big data wave

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
TPOT: A Python tool for automating data science

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
TPOT: A Python tool for automating data science

Leave a Reply

Your email address will not be published. Required fields are marked *