BI City

BI City >>  Expert  >>  

Derick Jose

"How to leverage new age data solutions to help cut through "Big Data Smog" "

Derick Jose
Big Data Entrepreneur
Derick started his career with Fujitsu in Pune then moved to US for 6 years where he worked predominantly in Telecom sector. He then joined MindTree initially as a part of DW group with BI Solutions, and then moved on to pre-sales and consulting along the way. Now he works as a Big Data Entrepreneur in Flutura.

"Big Data smog" is fast descending on us as our ability to generate data dramatically exceeds our ability to extract signal from noise! We have more medical sensors, automobile sensors, RFID chips, cell phone towers emitting event data in real time today than any other point in history. We have more human generated data on digital platforms as more consumers research, configure, purchase and emotionally bond online. This trend is only going to accelerate and increase the ‘data fog density’ as the velocity and variety of data sources increase.  As with every problem, it is also a blessing in disguise as it sharpens the focus on intelligence extraction than the traditional outlook of managing the voluminous data.

Here are 2 real life experiential examples of possibilities enabled by Big Data.

Scenario -1:  A leading online travel provider
We were engaged in an engagement with a leading online travel provider who wanted to answer simple questions like which are the fastest growing city corridors for morning flights on Mondays (official trip) and which are the similar corridors for Friday evenings( leisure trips) ? Does the probability of booking increase if airline X comes in the first 3 response ranks? Which are the most searched “Business corridors” and “Leisure corridors” which also have a high rate of booking drops to ensure more searchers are also close the booking transaction.  Essentially they want to extract and tap into intelligence from search logs to infer and monetize the intent of a consumer. These expressions of consumer are intent are trapped in terabytes of search log files which were being regularly flushed without any extraction of intelligence.

  A leading telecom provider
This was another unique scenario we were called into, in which a leading mobile service provider wanted to collect and weave a rich tapestry of atomic event data emitted by various devices in their network like routers, firewalls, switches etc. By doing so they wanted to do real time pattern matching to be compliant with regulatory needs and proactively infer malicious intent from external users. Questions like which “Is their pattern decodable from the sequence of these alarm flags set off across multiple event streams?” Does alarm event X have a greater probability of adverse event outcomes than alarm event Y?

In both the above examples organizations are hungry to answer previously unanswerable questions which could spell the difference between life and death in a hyper competitive market environment.

In both the situations the problem could not have been solved using traditional data management solutions & tools would have been impractical considering the sheer volume and velocity at which data hits the infrastructure.

So what is driving today’s new age Big Data solution?

There are 3 key silent but potent forces at work

Driving force-1:  Democratization of machine learning algorithms like scoring, collaborative filtering, text mining using tools like Mahout and R instead of million dollar statistical packages.

Driving force-2:  Commoditization of storage and computing power using technologies like Hadoop, Hive, Scoop , Flume which makes big data processing accessible to a wider base of companies, not necessarily the ones with more spending power

Driving force-3: 
Ability to monetize data being a core component of business model resulting in a huge capability chasm between leaders and laggards. Netflix displaced Blockbuster, Amazon displaced Borders

Once these 3 driving forces intersect its going to unleash a tsunami of new data intelligence within the organization. So what can organizations do to get a head start on the journey if you don’t want to be left behind in the intelligence extraction race ?

There are 3 immediate things an organization can do to get started on the journey.

Action-1:  Develop cross functional capability spanning 3 important  skill sets

1)  Ability to extract behavioral patterns using Advanced machine learning (Clustering, Classification, Collaborative filtering, Support vector machines, Multi dimensional scaling, text mining, Bayesian classifiers etc.

2)  Ability to tame high velocity data using core data engineering skills

3)  Ability to have an experimental mindset and observe human behavior in the ‘live digital lab’ environment

Action-2:  Curate high value business use cases which distance you from competition

1)  Ability to sense and respond to search intelligence is a powerful differentiating capability

2)  Ability to sense and respond to high value machine data patterns is a differentiator

Action-3:  Setup a Big Data ‘Sandbox’ where one can ‘play around’ with data

To conclude Jackson Brown Jr famously said “"Nothing is more expensive than a missed opportunity." The confluence of democratized machine learning, commodity computing and business impacting use cases enables the organization to intercept an unprecedented opportunity which can be monetized.