Author Archive

David Holmes

David Holmes

CTO & Chief Industry Executive -Global Oil & Gas Program at EMC
As Chief Industry Executive for the Global Oil & Gas Program, David is responsible for developing EMC’s Oil and Gas upstream solutions and product positioning strategy in conjunction with the Brazil Research Center and Global CTO Organization. Works with partners and clients to identify oil and gas business needs and designs solution architectures to support these opportunities. David has served on a number of industry committees including the European ECIM organization and SPE’s “Petabytes in Asset Management.” He has delivered numerous technical papers at conferences around the world and holds a patent for his work on the remote visualization of geotechnical applications.
David Holmes
David Holmes

Latest posts by David Holmes (see all)

Warning! Smart Big Data Analytics People in the Room

Recently, I attended the Society of Petroleum Engineers Forum event on Big Data Analytics in Dubai, UAE. Forum events are industry led and have no sponsorship, they bring together 50 thought leaders from vendors, oilfield service companies and oil companies to look at the challenges and opportunities related to a particular topic. It’s pretty exhausting being in a room full of smart people for four days, and my brain definitely needed the weekend to cool down.

But over four days of workshops and discussions, a clear theme was identified: The lack of an integrated approach to big data analytics. Companies complained of a lack of joined up thinking and of business stakeholders investing in bespoke point solutions that only increased the complexity and challenges of delivering future solutions. It was pretty cool to be able to talk holistically about a range of solutions that addressed infrastructure, data integration, data quality, data analytics, data persistence, the role of the cloud and the third platform as well as some top notch PAAS and agile development smarts. EMC has all of these, available (as is our want), either piece by piece or as a fully engineered solution wrapped up in the ribbon that is the EMC Federation Business Data Lake.

However, the implementation of an integrated big data analytics capability across the enterprise has consequences beyond those I had anticipated and at all levels of the business:

smart_people_in_room

  1. Strategically – One attendee talked of his frustration at the lack of consistent adoption of big data analytics to support portfolio management. A comprehensive approach would allow companies to dynamically manage their portfolio of assets supporting the regular review of business strategy based on changing market conditions. Optimizing portfolio management has an ROI running into the hundreds of millions if not billions of dollars. One speaker talked passionately about how “bias is the mortal enemy of upstream performance.” Big data analytics should help remove bias and support rational decision making.
  2. Operationally – Many companies are introducing big data analytics tools to address particular workflows or challenges. While these solutions might address particular high value problems, often they are not being implemented in a joined up way. Almost all of the attendees supported having a centralized big data analytics function with data engineers embedded into asset teams but working as part of a centralized group working on a common set of platforms.
  3. Tactically – There was quite a lot of talk as to how big data analytics could be commoditized to support smaller opportunities. One example given was that maybe you could save $40,000 a year through analyzing water purchasing contracts and linking this to your reservoir model. But that only makes sense if you could run a project to implement such a solution for less than $60K (assume a 100% ROI over 3 years and an 18 month payback). The only way you can support small projects is to have all of the infrastructure and resources in place already.

Of course a lot of talk at the event was around the oil price. But did this put people off looking at technology projects? Not really, as one person put it – “Oil companies of all sizes are facing an existential crisis, the company that is first to effectively leverage big data analytics across their enterprise will have a material competitive advantage over its competitors. Then everyone else will have to follow suit.”

And it’s cheaper too!

Maximize G&G Application Performance & Lower TCO Simultaneously

 

As oil and gas companies wrestle with delivering dramatic reductions in their operating and capital budgets, whilst maintaining a razor sharp focus on safe and efficient operations, many people are asking how or even if IT can support companies in these challenging times. But whatever answer we might come up with, the starting point in today’s climate must be to deliver material reductions in cost. Oh and you had better remember that at the last count the industry has 250,000 less employees, so make sure you figure out how to transform user productivity.

So how can we drive up user productivity and reduce costs simultaneously? Recently we have been doing a lot of engineering work on next generation geoscience systems. Interpretation and modelling applications tend to be heavily workstation-oriented where individual users are equipped with expensive self-contained computing resources. When data is required, a low-latency high-bandwidth transfer from network storage to the workstation is needed to achieve useful levels of performance and productivity.

However as data velocity and volume increase, sustaining workstation-based applications is a real challenge. Individual workstations need ever increasing levels of computing power, memory and storage to cope, and delivering the required high-bandwidth low-latency I/O becomes eye-wateringly expensive.

Wstn Apprch

This rather expensive exercise places IT under constant pressure to deliver computing resources to meet the largest expected workload at the time, which means that workstations are either often over-specified compared to average expected workloads or under-specified, leading to user frustration and inefficient working practices. The overall result is that computing power, storage and memory cannot be correctly balanced to workloads, since the high-end resources are not always needed and cannot be shared.

pic2

There is also a penalty in workload throughput. We have observed cases where geoscientists need to wait as long as 30 minutes to load projects into their applications. This has a negative impact on team productivity and agility, particularly when seismic data forms a critical part of the workflow.

The core strategy to address these challenges is the centralization of computational resources (both CPU and GPU) inside the data center using Converged Infrastructure. Essentially, this approach takes the enormous amount of computational power that is out on workstations and relocates it back into the data center. There are a couple of key reasons why this is beneficial:

  1. Operational Efficiency – the workstation-oriented approach leaves much of the computing resource underutilized and requires an expensive IT support mechanism. Having a shared set of central resources enables better provisioning of appropriate resources to users with thin-client devices – far more with less
  2. Efficient Network Utilization – by co-locating computation resources with the data, we remove the need to shift large volumes of data over networks to individual workstations, given wider easier access to a rich data set – again far more with less

VDI Petrotech App

TCO studies have shown that EMC’s Petrotechnical Appliance – a Converged Infrastructure solution – can achieve cost savings in excess of 35% by migrating from distributed workstation computing to centralized computing delivered through VDI. Simultaneously, end-users are able to experience a more consistent delivery of computing resources to match individual workload demands without needing expensive workstation upgrades – so higher end user productivity and lower costs!

 

SEG 2015 - EMC Article Submission - TCO image
The EMC Petrotechnical Appliance is based on the industry-leading Vblock® Converged Infrastructure from VCE. It is being increasingly adopted by oil & gas companies and is recommended by leading Oil Field Services companies as a key ingredient for optimizing geoscience operations, particularly in the current oil & gas economic climate.

HDFS Everywhere!

Challenges

Increasingly oil and gas companies are looking to big data and analytics to provide a new approach to answering some of their hardest questions. One of the foundation components of this is to use the HaDoop File System (HDFS). HDFS is a unifying persistence layer for many of the big data and analytical tools on the market (Pivotal’s and other vendors). Whilst many companies have looked to Hadoop clusters to provide both storage and compute, EMC has recognized that there are a number of challenges associated with this approach including:

  1. If storage sits inside a Hadoop cluster, there must be a (potentially time consuming) ETL task to get data from where it sits into the cluster. As soon as the ETL process is complete, the data is out of sync.
  2. In order to increase storage it is also necessary to increase compute. This can create an imbalance between compute and storage capacity. This can further be exacerbated by the need to buy Hadoop distribution licenses for each node.
  3. Because Hadoop HDFS is designed to run on cheap commodity hardware, it provides “eventual consistency” of data, and ensures availability by maintaining three (or more) copies of all data. This leads to much greater raw storage requirements than traditional storage environments (<33% usable capacity).
  4. All metadata requests to a Hadoop-HDFS cluster must be directed to a single NameNode. Although it is possible to configure a standby NameNode in Active/Passive mode, the failover process is weak and recovery is not straightforward.

Solutions

To address these challenges, EMC has developed three storage solutions that resolve these issues (with a fourth coming soon):

  • EMC Isilon provides high performance HDFS storage as an additional protocol. This means that any data copied to the Isilon cluster using CIFS or NFS can be made available through HDFS. The storage is much more efficient as data protection is achieved using Isilon’s built in protection so only one copy of each data file (plus parity) is created. In addition, each Isilon node runs as both a NameNode and a DataNode so there is much higher performance, availability and no single point of failure.
  • EMC Elastic Cloud Storage (ECS) provides a very scalable geo-distributed object store which fully supports HDFS. ECS is available either as an appliance (with low cost EMC commodity hardware) or as software (in a ‘bring your own tin’ model). ECS is highly compelling for companies looking to build vast geo-distributed object data stores and also for archiving workflows (especially for seismic acquisition data).
  • EMC ViPR Data Services (VDS) enables commodity and other vendor storage systems to be exposed using the HDFS protocol. So for storage systems that do not natively support HDFS, you can use VDS to layer on top of this storage and make the data available via HDFS.

Benefits

Using these technologies, EMC makes it very easy to deliver on an ‘HDFS Anywhere’ strategy, but what are the compelling reasons for doing this?

  1. By making the entire multi-vendor storage real estate available through HDFS, big data and analytical tools can be layered on top of the enterprise persistence layer allowing in-place analytics without having to perform any ETL tasks. This capability delivers cost reduction, reduced cycle times and increased productivity.
  2. As companies seek to deploy the new generation of cloud native applications, it is essential (particularly in oil and gas) to be able to have an integrated environment for old and new applications sitting on top of common persistence layers. This is an essential characteristic of contemporary IT systems as companies look to embrace Bi-modal IT strategies.

Summary

At EMC we are increasingly hearing from oil and gas companies that to achieve their efficiency targets and cost reductions, they need a concise roadmap to enable them to consolidate their legacy applications with an environment that supports and embraces the next generation of mobile, big data analytical apps. HDFS Everywhere is one element of the strategy to achieve this.

For many oil companies, the ability to run big data analytics against all their structured, semi-structured and unstructured data is compelling. Removing the necessity to carry out complex ETL tasks and the inevitable analytical latency enables analytics use cases and gives legacy vendors an easy roadmap to start migrating their applications to the 3rd Platform.

PS If you’d like to know more, swing by our booth #2511 at SEG in New Orleans (18-21st October 2015).