Ask your data
  • Blog
  • About & Contact

How big is big?

2/6/2017

0 Comments

 
BildGoðafoss, Iceland
Big data has become a big and – as far as I perceive – somewhat controversial term. Google delivers  325.000.000 search results when you enter „big data“ with some 300.000 search requests per months according to Keywords Everywhere . This is way more than what one gets when searching for Internet of things (a term with high buzzword potential - 222.000.000 results) or Data science (75.600.000 hits). 

Some years ago Google contributed to the big data hype by launching Google Flu Trends (GFT). The aim of GFT was simple: By targeting specific terms within the millions of Google search requests, aggregating them and plugging them into a linear model it should be possible to monitor and predict the spread of the flu in various countries. The underlying idea of GFT was that modern means of gathering and storing data make staticial sampling, i.e. obtaining a representative sample to work with, obsolete because it is possible to observe the population as a whole. Two articles I can recommend in this regard, one from the Financial Times and another from Wired, discuss the fallacies that eventually stopped Google form continuing the project. The gist of the methodical critism was that, apart from weaknesses in the model Google had applied, the approach was essentially „theory-free“, i.e. neglecting any assumptions on causality and simply focussing on recognizing patterns in incredible amounts of data that is available. A large number of „false posivities“ might be a consequence of that which is particularly interesting from a marketing and fundraising context. I am quite sure you have received (digital) advertising that made you think: Ok, not so far-fetched to send me that but why actually me (wedding products, babywear, health problems from sleeping disorders to incontinence)?

I think that particularly smaller and medium-sized companies in industries where amounts of data flowing through systems are not as large as for retailers or digital businesses should definitely deal with big data – but in a pragmatic and down-to-earth manner. Their focus should be on how to use the data that is already available or obtainable at low cost and with minimal side-effects. I find Andrew Joss article on how to gradually build a 360-degree view on the customer highly interesting in this regard. One of the main conclusions I drew from it is that it might be worthwhile to have a close look at which (unstructured) data is acutally already available and to develop ways to analyze it.

A handful of terms starting with the letter V is often referred to in the big data debate. They are far from being a „receipe“ on how to succeed with big data but might be helpful for you to reflect and discuss big data issued. 

Bild
​Volume
The main characteristic that makes data big is its sheer volume. The amount of data created daily is incredibly high and continuously growing. There are estimates that 2.5 quintillion bytes are created daily, this is enough to fill 10 million Blu-ray discs. These volumes have been growing and continue to do so also in smaller and medium-sized companies across industries.
 
Variety
Data comes with different types. A basic way to classify types of data is to differentiate between strcutured data, i.e. highly organized data (names and addresses, payment transactions or quantitative survey results like Net Promotor Scores) and its opposite which is unstructured data. Unstructured data is mostly qualitative such as interactional data from social media (likes, shares, comments, tweets), stored customer contacts or surevy feedback as free text. Both structured and unstructured data are fundamentally different when it comes to data processing and analyses.

Veracity
Veracity as our third V-word reflects the overall „trustworthiness“ of data. Aspects to be considered in the light of the respective data source are authenticity, completeness, reproducibility and last not least reliabilty of underlying models and assumptions – particularly when it comes to things like scores or indirectly measured socio-demographics like income.

Velocity
The penultimate V represents what it takes to process big data with IT systems. It is where processing and response times as well as overall system performance come into play. Aiming to work with larger amounts and/or more complex data definitely sets a minimum baseline for the underlying IT infrastrucure.
​
Value
This V, last but not least, asks the „So-What“-question. The search for patterns and correlations is an essential and legitimate technique – but should not be practied for its own sake in order not to end up with spurious correlations. It is questionably whether the mere abundance of data makes up for theory ignorance or flawed overall reasoning.

Where does this all take us?
Technology will lead the way. Continuous progress in the are of data storage and processing capacities paired with the ubiquity of devices that collect data at points of sale, homes and even bodies will turn big data in an even bigger topic. However, it might be the first small steps like asking yourself „What does big data mean for me and us?“ that big data success stories begin with. 
0 Comments



Leave a Reply.

    Categories

    Alle
    Artificial Intelligence
    Because It´s Fun!
    Churn
    Clustering
    Data Sciene @NPO
    Data Visualization
    Facebook
    Machine Learning
    Maps
    Natural Language Processing
    Neural Nets
    Power BI
    Predictive Analytics
    Social Media
    Time Series
    Twitter

    Archive

    Januar 2021
    November 2020
    September 2020
    August 2020
    Mai 2020
    April 2020
    Februar 2020
    Dezember 2019
    November 2019
    September 2019
    Juni 2019
    April 2019
    März 2019
    Januar 2019
    Dezember 2018
    Oktober 2018
    August 2018
    Juni 2018
    Mai 2018
    März 2018
    Februar 2018
    Dezember 2017
    November 2017
    Oktober 2017
    September 2017
    August 2017
    Juli 2017
    Mai 2017
    April 2017
    März 2017
    Februar 2017
    Januar 2017

About

Copyright © 2018
  • Blog
  • About & Contact