Archive for the ‘Downstream’ Category

Warning! Smart Big Data Analytics People in the Room

David Holmes

David Holmes

CTO & Chief Industry Executive -Global Oil & Gas Program at EMC
As Chief Industry Executive for the Global Oil & Gas Program, David is responsible for developing EMC’s Oil and Gas upstream solutions and product positioning strategy in conjunction with the Brazil Research Center and Global CTO Organization. Works with partners and clients to identify oil and gas business needs and designs solution architectures to support these opportunities. David has served on a number of industry committees including the European ECIM organization and SPE’s “Petabytes in Asset Management.” He has delivered numerous technical papers at conferences around the world and holds a patent for his work on the remote visualization of geotechnical applications.
David Holmes
David Holmes

Latest posts by David Holmes (see all)

Recently, I attended the Society of Petroleum Engineers Forum event on Big Data Analytics in Dubai, UAE. Forum events are industry led and have no sponsorship, they bring together 50 thought leaders from vendors, oilfield service companies and oil companies to look at the challenges and opportunities related to a particular topic. It’s pretty exhausting being in a room full of smart people for four days, and my brain definitely needed the weekend to cool down.

But over four days of workshops and discussions, a clear theme was identified: The lack of an integrated approach to big data analytics. Companies complained of a lack of joined up thinking and of business stakeholders investing in bespoke point solutions that only increased the complexity and challenges of delivering future solutions. It was pretty cool to be able to talk holistically about a range of solutions that addressed infrastructure, data integration, data quality, data analytics, data persistence, the role of the cloud and the third platform as well as some top notch PAAS and agile development smarts. EMC has all of these, available (as is our want), either piece by piece or as a fully engineered solution wrapped up in the ribbon that is the EMC Federation Business Data Lake.

However, the implementation of an integrated big data analytics capability across the enterprise has consequences beyond those I had anticipated and at all levels of the business:

smart_people_in_room

  1. Strategically – One attendee talked of his frustration at the lack of consistent adoption of big data analytics to support portfolio management. A comprehensive approach would allow companies to dynamically manage their portfolio of assets supporting the regular review of business strategy based on changing market conditions. Optimizing portfolio management has an ROI running into the hundreds of millions if not billions of dollars. One speaker talked passionately about how “bias is the mortal enemy of upstream performance.” Big data analytics should help remove bias and support rational decision making.
  2. Operationally – Many companies are introducing big data analytics tools to address particular workflows or challenges. While these solutions might address particular high value problems, often they are not being implemented in a joined up way. Almost all of the attendees supported having a centralized big data analytics function with data engineers embedded into asset teams but working as part of a centralized group working on a common set of platforms.
  3. Tactically – There was quite a lot of talk as to how big data analytics could be commoditized to support smaller opportunities. One example given was that maybe you could save $40,000 a year through analyzing water purchasing contracts and linking this to your reservoir model. But that only makes sense if you could run a project to implement such a solution for less than $60K (assume a 100% ROI over 3 years and an 18 month payback). The only way you can support small projects is to have all of the infrastructure and resources in place already.

Of course a lot of talk at the event was around the oil price. But did this put people off looking at technology projects? Not really, as one person put it – “Oil companies of all sizes are facing an existential crisis, the company that is first to effectively leverage big data analytics across their enterprise will have a material competitive advantage over its competitors. Then everyone else will have to follow suit.”

HDFS Everywhere!

David Holmes

David Holmes

CTO & Chief Industry Executive -Global Oil & Gas Program at EMC
As Chief Industry Executive for the Global Oil & Gas Program, David is responsible for developing EMC’s Oil and Gas upstream solutions and product positioning strategy in conjunction with the Brazil Research Center and Global CTO Organization. Works with partners and clients to identify oil and gas business needs and designs solution architectures to support these opportunities. David has served on a number of industry committees including the European ECIM organization and SPE’s “Petabytes in Asset Management.” He has delivered numerous technical papers at conferences around the world and holds a patent for his work on the remote visualization of geotechnical applications.
David Holmes
David Holmes

Latest posts by David Holmes (see all)

Challenges

Increasingly oil and gas companies are looking to big data and analytics to provide a new approach to answering some of their hardest questions. One of the foundation components of this is to use the HaDoop File System (HDFS). HDFS is a unifying persistence layer for many of the big data and analytical tools on the market (Pivotal’s and other vendors). Whilst many companies have looked to Hadoop clusters to provide both storage and compute, EMC has recognized that there are a number of challenges associated with this approach including:

  1. If storage sits inside a Hadoop cluster, there must be a (potentially time consuming) ETL task to get data from where it sits into the cluster. As soon as the ETL process is complete, the data is out of sync.
  2. In order to increase storage it is also necessary to increase compute. This can create an imbalance between compute and storage capacity. This can further be exacerbated by the need to buy Hadoop distribution licenses for each node.
  3. Because Hadoop HDFS is designed to run on cheap commodity hardware, it provides “eventual consistency” of data, and ensures availability by maintaining three (or more) copies of all data. This leads to much greater raw storage requirements than traditional storage environments (<33% usable capacity).
  4. All metadata requests to a Hadoop-HDFS cluster must be directed to a single NameNode. Although it is possible to configure a standby NameNode in Active/Passive mode, the failover process is weak and recovery is not straightforward.

Solutions

To address these challenges, EMC has developed three storage solutions that resolve these issues (with a fourth coming soon):

  • EMC Isilon provides high performance HDFS storage as an additional protocol. This means that any data copied to the Isilon cluster using CIFS or NFS can be made available through HDFS. The storage is much more efficient as data protection is achieved using Isilon’s built in protection so only one copy of each data file (plus parity) is created. In addition, each Isilon node runs as both a NameNode and a DataNode so there is much higher performance, availability and no single point of failure.
  • EMC Elastic Cloud Storage (ECS) provides a very scalable geo-distributed object store which fully supports HDFS. ECS is available either as an appliance (with low cost EMC commodity hardware) or as software (in a ‘bring your own tin’ model). ECS is highly compelling for companies looking to build vast geo-distributed object data stores and also for archiving workflows (especially for seismic acquisition data).
  • EMC ViPR Data Services (VDS) enables commodity and other vendor storage systems to be exposed using the HDFS protocol. So for storage systems that do not natively support HDFS, you can use VDS to layer on top of this storage and make the data available via HDFS.

Benefits

Using these technologies, EMC makes it very easy to deliver on an ‘HDFS Anywhere’ strategy, but what are the compelling reasons for doing this?

  1. By making the entire multi-vendor storage real estate available through HDFS, big data and analytical tools can be layered on top of the enterprise persistence layer allowing in-place analytics without having to perform any ETL tasks. This capability delivers cost reduction, reduced cycle times and increased productivity.
  2. As companies seek to deploy the new generation of cloud native applications, it is essential (particularly in oil and gas) to be able to have an integrated environment for old and new applications sitting on top of common persistence layers. This is an essential characteristic of contemporary IT systems as companies look to embrace Bi-modal IT strategies.

Summary

At EMC we are increasingly hearing from oil and gas companies that to achieve their efficiency targets and cost reductions, they need a concise roadmap to enable them to consolidate their legacy applications with an environment that supports and embraces the next generation of mobile, big data analytical apps. HDFS Everywhere is one element of the strategy to achieve this.

For many oil companies, the ability to run big data analytics against all their structured, semi-structured and unstructured data is compelling. Removing the necessity to carry out complex ETL tasks and the inevitable analytical latency enables analytics use cases and gives legacy vendors an easy roadmap to start migrating their applications to the 3rd Platform.

PS If you’d like to know more, swing by our booth #2511 at SEG in New Orleans (18-21st October 2015).