~ An O’Reilly Radar Report: What is Data Science?
The analytics market is enormous; it is estimated at around $70 billion today, growing at an astounding 14-20% per year.1 And everyone’s getting in, from the technology heavyweights to the warehousing specialists to the visualization tool vendors to industry-specific companies across all sectors.
The utility industry is no different. Energy Central’s Utility Analytics Institute forecasts that utilities worldwide will spend over $2 billion2 and Pike Research forecasts over $4 billion annually on analytics by 2015, with a 65% compound annual growth rate (CAGR) from 2010.3
The intelligence gathered from the analysis of data from new devices and sensors is essential if utilities hope to derive the promised benefits from their smart grid initiatives.
The problem, though, is not finding data. It is figuring out what to do with it.
The sky-rocketing data volumes within utilities are well publicized. The move to Advanced Metering Infrastructure will produce a staggering 3,000x the amount of data when compared to manual, monthly reads. Add in data from distribution automation devices, smart substations, home networks, wide area measurement systems and demand response systems, among others, and utilities will have plenty of data at their disposal. But what will they do with it?
- Will they use network instrumentation data to filter to root cause to eliminate unnecessary truck rolls?
- Will they use metering data and combine it with SCADA data to detect energy theft?
- Will they use distribution data to predict asset failures and proactively make repairs?
- Will they use weather data to track storms and Geographic Information System (GIS) data to optimize crew dispatch?
- Will they use synchrophasor data to provide the situational awareness that was missing in 2003?
EMC’s Data Scientist Team has been working with utility companies to not only define their most pressing questions, but also to apply the latest in mathematical and statistical methodologies to answering those questions.
Once the questions and methods have been defined, the next issue to resolve is the underlying technology.
According to Cowen and Company, “the vast majority of data growth is coming in the form of data sets that are not well suited for traditional relational database vendors like Oracle. To capitalize on the big data trend, a new breed of Big Data companies has emerged, leveraging commodity hardware, open source, and proprietary technology to capture and analyze these new data sets.”4
Most of the traditional relational databases were designed for Online Transactional Processing, or OLTP, workloads. They were not designed for the new world of large-scale, complex, multi-dimensional analytics that define the prevailing data warehouses, like EMC Greenplum. The inherently different architecture, with features like share-nothing massively parallel processing, was designed from the ground up to support large analytics to answer today’s most complex questions, such as those facing utilities as they move along the smart grid journey.
3Pike Research. “Smart Grid Data Analytics.” Published 4Q2010.
4Cowen and Company. “Big Data: A New Breed of Database Vendor Means Trouble For The Existing Order.” July 1, 2011.