In the two previous installments of this four-part blog series on big data and analytics, we have talked about data virtualization and data preparation.

You have located all of your data through data virtualization. You undertook thorough data preparation in order to make it as clean as possible.

In these posts, we discussed how these are the first crucial steps toward making the data from the Internet of Things (IoT) usable.

Now it’s time for the next step: data exploration.

Data exploration is the first step in data analysis. It usually involves describing the main characteristics of a set of data. With data exploration, you scan all data sets to discover whether data has valuable connections.

Getting Big Data Ready to Use, Part 3: Data ExplorationThink again about the business analyst with the graphs mentioned in my first post on data virtualization.

If someone is presenting a graph in a meeting, that data needs context to be meaningful. If the presenter simply describes the chart20 percent up this quarter, 7 percent down the nextthat isn’t helping anyone else in the meeting understand the meaning behind the numbers.

'Why' Is More Helpful Than 'What'

What is behind that sales fluctuation? What does that data really mean? Data that shows sales fluctuations but nothing else is missing context.

Data exploration helps establish context for any data, so the business can extract meaning from it and answer complex questions. In the big-data era, data that establishes context comes from an endless variety of human, machine, or device sources.

This kind of data can be divided into four categories:

  1. Identity: This includes name, description, meaning, structure, possible values, and similar context-setting information. This data covers the most basic, independent attributes of a business issue, like sales volume changes. It is the starting point for any use.
  2. Source: Where the data comes from, where it is now, its ownership, guarantees of integrity, etc. The data source helps define the boundaries of acceptable use.
  3. Timeliness: How current is the data involved in the business question (for example, sales volume changes)? And for how long will it remain relevant?
  4. Usage: Usage refers to any process that can be or has been applied to the business matter. This process can alter its content, relevance, or structure and thus influence its further use.


Exploration Makes Data Prediction Possible

Establishing the data’s identity, source, timeliness, and usage helps set it in a context with meaning. This is an essential task of data exploration.

Data exploration sets us up for the final step in finding the opportunity in big data: data prediction, the topic of the fourth and final post in this series.

I welcome your thoughts on these postsplease chime in with feedback in the comments below.

And, for more information, visit the Cisco Data and Analytics page.


Get the latest IT industry news and exclusive Cisco learning offers. Sign Up Now!


Neeraj Chadha-Large.jpgLearning@Cisco product manager Neeraj Chadha has more than 20 years of experience in the networking industry. Over that time, he has functioned as a software developer and network engineer, and in various aspects of product management. Currently, he guides the overall product strategy and evolution of Cisco courseware and certifications around Wireless, Collaboration, and Big Data and Analytics. Neeraj's primary areas of focus include technology trends, digital transformation, continuing education, and product strategy.