Big Data vs. Smart Data, The Data Science That Will Help Your Business

Participation in the digital world has led to a change in the customer and consumer profile. From people who had minimal knowledge of the market provided by the media and membership groups, it has turned to increasingly informed and demanding buyers.

This forces companies to go from being reactive to proactive, detecting the needs of their potential and current customers even before they realize them, thus also anticipating the competition.

To get to know this new consumer, companies make use of Big Data and the most recent Smart Data, concepts that are gaining more and more force and that allow them to analyze the different data provided by the digital footprint of their customers to detect opportunities and needs, thus offering them what they need when they need it.

However, due to the degree of novelty of these terms, many companies lack the experience or resources to make the most of the advantages offered by their studies, being necessary to go to technology consultants to provide advice on the matter.

To liven up this task, an introduction to the concept of Big Data and Smart Data is provided, along with elements related to them and examples of companies that put them into practice today.

Big Data vs. Smart Data extraction methods

In line with the definition given by Michael Frampton (2014), Big Data is data whose dimension and complexity are so great that traditional analysis tools are unable to process them in an acceptable time or cost.

Said size and complexity generate problems in the collection, storage, treatment, and analysis of this data. However, software libraries such as Hadoop allow solving part of this problem.

Another way of defining Big Data is, as Diya Soubra (2012) cites, through the so-called “Gartner 3Vs” defined by Doug Laney (2001) in his article “3DManagement: Controlling Data Volume, Velocity, and Variety”.

Volume

The total size of the database. The constant and growing use of the internet and social networks (RRSS) is generating large volumes of information that must be stored and processed to be used.

Speed

The time it takes to collect and process the information. The volumes of information must be stored and processed in real-time. By analyzing data in real-time, the company will be able to be more agile and competitive by predicting events with little margin of error, thus anticipating the market and the customer.

Variety

The different data types that make up the databases (images, numerical databases, comments on social networks).

To this 3 V above, it is convenient to add one more V. The integrity of the data, since no matter how large and diverse they may be, if they are not truthful, any analysis carried out will be meaningless as the results are alien to reality.

On the other hand, Juan Martín (2017) defines Smart Data as the transformation of large lists of data into information with an available and usable value that allows answering unknowns and serves a specific purpose.

The purpose of Smart Data is to convert the large volumes of Big Data data into valuable and relevant information ready to be used in real-time through the use of analysis and the subsequent interpretation of the results.

In addition, unlike Big Data, Smart Data operates through 5V: volume, speed, variety, the integrity of the sources used, and value of the data, the latter being the most important since it represents the definition of the concept.

Classes of data that store

Both Big Data and Smart Data can collect data of various kinds. Luis Joyanes (2016) indicated that they could be classified as structured, semi-structured, and unstructured.

Structured data

Those with a defined and specific format as well as fixed fields. They have information known as a priori that appears and is generated in a specific order. Relational databases, spreadsheets, and files fall into this category.

Unstructured data

It does not have a predefined format, so it is stored as documents or objects without a similar structure to each other. This is the most complex data to analyze. Examples of unstructured data lack fixed fields such as images, audio and video files, or messages and emails.

Semi-structured data

They are a combination of the previous two. They do not have a fixed format but contain elements such as labels or markers to identify the elements included in them. To read them, it is necessary to use procedures that indicate how to act after reading each segment of information. HTML and XML tags fall into this category.

What is needed to start a Big Data or Smart Data project?

As mentioned by Carlos Pérez (2016), for Big Data to be carried out, and therefore Smart Data, a series of essential elements is necessary:

Human resources

A distinction is made between those who have the technical knowledge and between those who know the business or the sector it operates.

Technological infrastructures

Hardware and software with the size and power necessary to store big data and Smart Data projects.

It should be noted that, unlike Big Data, Smart Data does not need to have large volumes of data to carry it out, so the infrastructures it requires do not need to be so powerful or have so much storage capacity.

Data sources

They refer both to information-gathering systems, databases, or historical data, as well as to new current sources such as the internet, social networks, or information open to the public by public institutions.

Big Data and the problem of dimensionality

On the other hand, as José Antonio Guerrero (2016) indicates, one of the problems that affect Big Data is related to the dimensionality of the data.

The problem of the dimensionality of Big Data can be defined as the possible adverse effects produced by the increase in the number of variables compared to the number of observations.

A high dimensionality results in overfitting. The predictions made by the model will be poor because it has become more complex by introducing a more significant number of variables.

Furthermore, suppose collinearity (when a variable is a linear combination of others already introduced in the model). In that case, this can affect the algorithms used or the stability of the solutions. To solve the problem of dimensionality, there are three types of methods that are exposed below:

Filtering methods

They are based on a criterion to choose the variables regardless of the algorithm with which the model fits. They are quick to apply but can reject a variable because its primary effect is not significant, although it could interact with other variables. Examples of filtering methods can be correlation or the hypothesis contrast test.

Enveloping methods

They differ from the previous ones in that they seek to select a set of variables that provides the best fit with a specific algorithm. An example is the stepwise (or step-by-step) procedures used in multiple linear regression (backward, forward, and enter).

Extraction methods

They seek to convert a set of initial variables into another smaller set that retains most of the information. Their main advantage is that they do not use the information from the response variables, so unlabeled data can better represent the information. Its drawback is in difficulty in interpreting the results.

Principal component analysis or PCA, correspondence analysis, or cluster analysis are examples of this method.

Practical applications of Big Data today

Currently, it is possible to find numerous cases of companies that carry out Big Data activities, Bernard Marr (2016) collects some examples:

Walmart

In 2004, with the arrival of Hurricane Sandy, Walmart found that changes in weather conditions had increased the demand for emergency equipment and increased the demand for Strawberry Pop-Tarts in various locations. In 2012, as a result of this discovery, and with the arrival of Hurricane France, supplies of this product were sent, which had the expected reception.

Netflix

Its Big Data projects combine the data it has with various analytical techniques, which allows the platform to recommend the user the most appropriate content to their tastes. This recommendation system is based on a content tagging process. Netflix pays thousands of viewers to view content and tag various aspects of their viewing. This has allowed him to create 80,000 new microgenres that are recommended to the viewer.

Narrative Science

They started generating automated sports game reports for the Big 10 Network and are currently producing business and financial news for Forbes, MasterCard, or the UK’s national health service. To do this, it uses the process known as Natural Language Generation (NLG), which consists of obtaining information and figures from databases and automatically building stories that appear to have been written by people. At MediaRoom Solutions, we work on NLG projects financed by the Google DNI initiative.

Practical applications of Smart Data today

For his part, Juan Martín (2017) explains that it is possible to find examples of Smart Data applications both in a business environment and in an everyday one.

At the business level, organizations use the findings of Smart Data analytics to improve people’s daily lives. Such is the case of IMB and Microsoft, which work together to reduce the number of traffic jams created daily on the roads, or Google, which, through Google Flu Trends (currently no longer used), sought to relativize the data to detect the expansion of epidemics.

Smart Data is also applicable to enterprise-level departments. For example, the Human Resources department of a company could obtain more information from its potential candidates using the Smart Data provided by social networks.

In addition, the fact that Smart Data does not involve a large amount of data allows SMEs to use it, using it, for example, to retain their customers by creating personalized offers from the data collected in their CRM.

Daily, it is possible to find Smart Data applications on Smart TVs that use the data collected from television consumption to offer proposals for multimedia content such as series or movies adapted to user tastes.

The so-called Smart cities also use Smart Data to manage the employment of public services.

Sectors such as health can use Smart Data to control the health of their patients better and provide them with more appropriate treatments based on information on their vital signs and clinical results.

Also Read: Data Insights: Microsoft’s Artificial Intelligence Solutions