Data preparation is usually the most time consuming part of a data analysis project. To get good results, follow the six steps here, starting with Understand the Business Needs, Get to Know the Data, and Wrangle, Munge, and Mash Up.
Garbage in, garbage out. In this age of big and unstructured data analytics, good data preparationis a must to avoid risking invalid results or being blocked from analyses of benefit to your business. You may also need to dedicate up to 80% of the time of a data analysis initiative to preparing data properly. So, to optimize results, follow the six steps below.
Ask! Get the ultimate beneficiaries of your data preparation to tell you what business insights or knowledge they want from the data available. Check that enterprise goals translate into appropriate business questions and key performance indicators (KPIs), which can then be mapped onto the data and analytics to be used. Don’t get sucked into a “proof of concept” project without a valid, useful business benefit.
Step 2 – Get to Know the Data
Understand where the data is to be accessed, and whether it falls into the category of simple, diversified, big or complex data. These categories are determined by the overall volume of data and the number of tables. The data you need may be in Excel files, in a data warehouse, or in a CRM system. You’ll need the right credentials to access the data, and the right software and hardware resources to process it.
Time to take out the garbage. Identify or amend your data sources to ensure they are complete, accurate, and current.