Archive for the ‘Uncategorized’ Category

HDFS Everywhere!

David Holmes

David Holmes

CTO & Chief Industry Executive -Global Oil & Gas Program at EMC
As Chief Industry Executive for the Global Oil & Gas Program, David is responsible for developing EMC’s Oil and Gas upstream solutions and product positioning strategy in conjunction with the Brazil Research Center and Global CTO Organization. Works with partners and clients to identify oil and gas business needs and designs solution architectures to support these opportunities. David has served on a number of industry committees including the European ECIM organization and SPE’s “Petabytes in Asset Management.” He has delivered numerous technical papers at conferences around the world and holds a patent for his work on the remote visualization of geotechnical applications.
David Holmes
David Holmes

Latest posts by David Holmes (see all)

Challenges

Increasingly oil and gas companies are looking to big data and analytics to provide a new approach to answering some of their hardest questions. One of the foundation components of this is to use the HaDoop File System (HDFS). HDFS is a unifying persistence layer for many of the big data and analytical tools on the market (Pivotal’s and other vendors). Whilst many companies have looked to Hadoop clusters to provide both storage and compute, EMC has recognized that there are a number of challenges associated with this approach including:

  1. If storage sits inside a Hadoop cluster, there must be a (potentially time consuming) ETL task to get data from where it sits into the cluster. As soon as the ETL process is complete, the data is out of sync.
  2. In order to increase storage it is also necessary to increase compute. This can create an imbalance between compute and storage capacity. This can further be exacerbated by the need to buy Hadoop distribution licenses for each node.
  3. Because Hadoop HDFS is designed to run on cheap commodity hardware, it provides “eventual consistency” of data, and ensures availability by maintaining three (or more) copies of all data. This leads to much greater raw storage requirements than traditional storage environments (<33% usable capacity).
  4. All metadata requests to a Hadoop-HDFS cluster must be directed to a single NameNode. Although it is possible to configure a standby NameNode in Active/Passive mode, the failover process is weak and recovery is not straightforward.

Solutions

To address these challenges, EMC has developed three storage solutions that resolve these issues (with a fourth coming soon):

  • EMC Isilon provides high performance HDFS storage as an additional protocol. This means that any data copied to the Isilon cluster using CIFS or NFS can be made available through HDFS. The storage is much more efficient as data protection is achieved using Isilon’s built in protection so only one copy of each data file (plus parity) is created. In addition, each Isilon node runs as both a NameNode and a DataNode so there is much higher performance, availability and no single point of failure.
  • EMC Elastic Cloud Storage (ECS) provides a very scalable geo-distributed object store which fully supports HDFS. ECS is available either as an appliance (with low cost EMC commodity hardware) or as software (in a ‘bring your own tin’ model). ECS is highly compelling for companies looking to build vast geo-distributed object data stores and also for archiving workflows (especially for seismic acquisition data).
  • EMC ViPR Data Services (VDS) enables commodity and other vendor storage systems to be exposed using the HDFS protocol. So for storage systems that do not natively support HDFS, you can use VDS to layer on top of this storage and make the data available via HDFS.

Benefits

Using these technologies, EMC makes it very easy to deliver on an ‘HDFS Anywhere’ strategy, but what are the compelling reasons for doing this?

  1. By making the entire multi-vendor storage real estate available through HDFS, big data and analytical tools can be layered on top of the enterprise persistence layer allowing in-place analytics without having to perform any ETL tasks. This capability delivers cost reduction, reduced cycle times and increased productivity.
  2. As companies seek to deploy the new generation of cloud native applications, it is essential (particularly in oil and gas) to be able to have an integrated environment for old and new applications sitting on top of common persistence layers. This is an essential characteristic of contemporary IT systems as companies look to embrace Bi-modal IT strategies.

Summary

At EMC we are increasingly hearing from oil and gas companies that to achieve their efficiency targets and cost reductions, they need a concise roadmap to enable them to consolidate their legacy applications with an environment that supports and embraces the next generation of mobile, big data analytical apps. HDFS Everywhere is one element of the strategy to achieve this.

For many oil companies, the ability to run big data analytics against all their structured, semi-structured and unstructured data is compelling. Removing the necessity to carry out complex ETL tasks and the inevitable analytical latency enables analytics use cases and gives legacy vendors an easy roadmap to start migrating their applications to the 3rd Platform.

PS If you’d like to know more, swing by our booth #2511 at SEG in New Orleans (18-21st October 2015).

 

Welcome to EMC Energy!

Tim Voyt

Tim Voyt

Mr. Timothy Voyt has more than 20 years of experience in providing technology and services to the Energy industry, both domestically and internationally. Mr. Voyt joined EMC in April of 2005 as the Oil and Gas Director and has global responsibility for all aspects of the EMC’s Oil and Gas Vertical Program. Prior to joining EMC, Mr. Voyt served as Executive Vice President of Operations for Tobin International where he had operational accountability for all of Tobin's Data Products, Software Products, Services, and International Operating Divisions. Mr. Voyt also spent more than 9 years with Landmark Graphics building and managing Landmark’s pre- and post sales services businesses within North America, Europe, Africa, and Russia.
Tim Voyt

Latest posts by Tim Voyt (see all)

Welcome to the EMC Energy Blog.

[show_avatar email=tim.voyt@emc.com align=left avatar_size=30]Energy, when has it not been a hot topic? From the early days of computer aided exploration systems to recent advances in seismic acquisition and smart grid technologies, the world of energy continues on what seems like a never-ending boom.

I have been fortunate to be immersed in these technologies affecting the highly-competitive energy industry and would like to offer you a look into this world.

Contributing to this blog will be the many subject matter experts who reside within EMC as well as their peers, colleagues, and thought leaders from around the Energy industry. So, without further adieu, let me begin by (more…)