Sparse Data is the New Black February 28, 2014 | by Xively
I recently read an amazing article by Christopher Mims discussing the value and challenges of sparse data in addition to Big Data. And of course don’t click on that link until AFTER you’ve read this post first! As I read it, the gist of Mims’s article is that companies are collecting more and more sparse data, i.e. occasional, disparate data points as opposed to a constant stream of data, that can create new insights when amassed in very large amounts across numerous devices or users. This may not sound very exciting…unless you are in any way, shape, or form interested in the Internet of Things.
Why? Because a vast majority of IoT data is sparse. Be it for practical, computational, transmission or storage reasons, the truth is that many products likely don’t need millisecond precision on data coming from connected device sensors. This is great for keeping your messaging and storage requirements lean, but proves challenging once you’re ready to actually do something with that data. So the question is: How do you handle lots and lots of little data coming in from across different sensors, devices, locations, product types, data types, and so on? Enter sparse data processing.
For those unfamiliar with sparse data, the idea is simple. If you have a matrix of numbers, a sparse matrix would be one comprised mainly of zeros – in other words, it’s mostly empty. Algorithmically speaking, you probably don’t want to waste time solving for each and everyone of those zeros, so some very smart mathematicians and computer scientists have found ways around this. Where this sparse data becomes really powerful is when you keep adding more rows to the matrix (i.e. increased dimensionality). Each of those rows could be considered a new sensor reading over time as well as user data exhaust, 3rd party API data, or just about anything that may help increase insight into making some sort of predictive insight.
Now let’s think about how all this sparse data can transform customer-facing products or internally-used connected devices. If you sell a connected product that lets users see home humidity levels on an iPhone, this data is likely very sparse. Once you add time or other feature vectors such as external temperature, light sensor information, and other data to the mix, you can begin to add machine learning techniques to perform predictive analytics in real-time. Your sparse data set may illuminate clusters of activity that wouldn’t be noticeable from a pretty HTML5 visualization of that data. In addition to finding trends from clustered data, a natural next step would be to close the feedback loop. Once trained on sensor data from the past, your product might be able to automagically trigger an actuator event or send you anticipatory warning messages. This is where ubiquitous computing, context-aware computing, ambient intelligence or whatever you want to call it becomes interesting.
The applications of sparse data discovery and action for real business problems such as better supply chain management, predictive maintenance, or smart building management become far more interesting when you see the potential laying between all those zeros.