We talked yesterday, in Part 1 of our four-part blog series on big data and analytics, about data virtualization and how it helps bring big data together from many sources. Now, let’s talk about preparing that raw data for analysis.
We call this process data preparation.
Clean Data Is Vital
Data preparation can be sheer drudgery. And it accounts for 80 percent of the time that data scientists spend on the job.
But good data preparation is vital. Without proper data preparation, it’s impossible to explore data or make predictions based on it.
Getting the data you need ready to use can be challenging even after you have located it using data virtualization.
You identify duplicate data or blank fields and eliminate them. You fix misspellings. You split columns or add data.
The aim of data preparation is to make the data used for analysis as error-free as possible. The old IT saying, “Garbage in, garbage out” is never truer than with data preparation. You want your data to be as clean as it can be or you won’t get the results you need.
Data Preparation in Real Life
Imagine that a data analyst is presenting a profit-and-loss overview. The analyst has data from different SKUs, geos, bundles, channels, sales partners, and margin calculations.
The traditional way to make a dashboard for the presentation is to create pivot tables by hand. Hunting down and picking up the right data to create graphs and charts is laborious and time-consuming.
These days, new approaches to data preparation are automating many of the processes to shorten the time and effort involved.
Maybe you are dealing with billions of megabytes of data. Or just a few thousand project photos covering five years. Or you are a data analyst with an amount of data that is somewhere in between.
Automated data preparation takes care of a lot of the busy work in any of these examples.
Automated Data Preparation Helps Everyone
Data professionals love automated data preparation because it frees them from routine chores. Instead of wasting time cleaning data by hand, they leave it to automated data preparation. They spend their time analyzing the data for business intelligence (BI).
Executives love data preparation because it lets them use data when and where they want. BI is now self-service and much more agile, thanks to automated data preparation.
IT departments also appreciate automated data preparation. It frees them from legacy extract, load and transform tasks related to manual data preparation. Automated data preparation lightens the burden of IT departments and turns them into BI enablers.
The next step in making money from data is providing data that makes sense and is useful to the organization. That is a process known as data exploration. We discuss that in Part 3 of this blog series.
For more information, please visit the Cisco Data and Analytics page.
Learning@Cisco product manager Neeraj Chadha has more than 20 years of experience in the networking industry. Over that time, he has functioned as a software developer and network engineer, and in various aspects of product management. Currently, he guides the overall product strategy and evolution of Cisco courseware and certifications around Wireless, Collaboration, and Big Data and Analytics. Neeraj's primary areas of focus include technology trends, digital transformation, continuing education, and product strategy.