Archive for the ‘eDiscovery’ Category

HDFS Everywhere!

David Holmes

David Holmes

CTO & Chief Industry Executive -Global Oil & Gas Program at EMC
As Chief Industry Executive for the Global Oil & Gas Program, David is responsible for developing EMC’s Oil and Gas upstream solutions and product positioning strategy in conjunction with the Brazil Research Center and Global CTO Organization. Works with partners and clients to identify oil and gas business needs and designs solution architectures to support these opportunities. David has served on a number of industry committees including the European ECIM organization and SPE’s “Petabytes in Asset Management.” He has delivered numerous technical papers at conferences around the world and holds a patent for his work on the remote visualization of geotechnical applications.
David Holmes
David Holmes

Latest posts by David Holmes (see all)


Increasingly oil and gas companies are looking to big data and analytics to provide a new approach to answering some of their hardest questions. One of the foundation components of this is to use the HaDoop File System (HDFS). HDFS is a unifying persistence layer for many of the big data and analytical tools on the market (Pivotal’s and other vendors). Whilst many companies have looked to Hadoop clusters to provide both storage and compute, EMC has recognized that there are a number of challenges associated with this approach including:

  1. If storage sits inside a Hadoop cluster, there must be a (potentially time consuming) ETL task to get data from where it sits into the cluster. As soon as the ETL process is complete, the data is out of sync.
  2. In order to increase storage it is also necessary to increase compute. This can create an imbalance between compute and storage capacity. This can further be exacerbated by the need to buy Hadoop distribution licenses for each node.
  3. Because Hadoop HDFS is designed to run on cheap commodity hardware, it provides “eventual consistency” of data, and ensures availability by maintaining three (or more) copies of all data. This leads to much greater raw storage requirements than traditional storage environments (<33% usable capacity).
  4. All metadata requests to a Hadoop-HDFS cluster must be directed to a single NameNode. Although it is possible to configure a standby NameNode in Active/Passive mode, the failover process is weak and recovery is not straightforward.


To address these challenges, EMC has developed three storage solutions that resolve these issues (with a fourth coming soon):

  • EMC Isilon provides high performance HDFS storage as an additional protocol. This means that any data copied to the Isilon cluster using CIFS or NFS can be made available through HDFS. The storage is much more efficient as data protection is achieved using Isilon’s built in protection so only one copy of each data file (plus parity) is created. In addition, each Isilon node runs as both a NameNode and a DataNode so there is much higher performance, availability and no single point of failure.
  • EMC Elastic Cloud Storage (ECS) provides a very scalable geo-distributed object store which fully supports HDFS. ECS is available either as an appliance (with low cost EMC commodity hardware) or as software (in a ‘bring your own tin’ model). ECS is highly compelling for companies looking to build vast geo-distributed object data stores and also for archiving workflows (especially for seismic acquisition data).
  • EMC ViPR Data Services (VDS) enables commodity and other vendor storage systems to be exposed using the HDFS protocol. So for storage systems that do not natively support HDFS, you can use VDS to layer on top of this storage and make the data available via HDFS.


Using these technologies, EMC makes it very easy to deliver on an ‘HDFS Anywhere’ strategy, but what are the compelling reasons for doing this?

  1. By making the entire multi-vendor storage real estate available through HDFS, big data and analytical tools can be layered on top of the enterprise persistence layer allowing in-place analytics without having to perform any ETL tasks. This capability delivers cost reduction, reduced cycle times and increased productivity.
  2. As companies seek to deploy the new generation of cloud native applications, it is essential (particularly in oil and gas) to be able to have an integrated environment for old and new applications sitting on top of common persistence layers. This is an essential characteristic of contemporary IT systems as companies look to embrace Bi-modal IT strategies.


At EMC we are increasingly hearing from oil and gas companies that to achieve their efficiency targets and cost reductions, they need a concise roadmap to enable them to consolidate their legacy applications with an environment that supports and embraces the next generation of mobile, big data analytical apps. HDFS Everywhere is one element of the strategy to achieve this.

For many oil companies, the ability to run big data analytics against all their structured, semi-structured and unstructured data is compelling. Removing the necessity to carry out complex ETL tasks and the inevitable analytical latency enables analytics use cases and gives legacy vendors an easy roadmap to start migrating their applications to the 3rd Platform.

PS If you’d like to know more, swing by our booth #2511 at SEG in New Orleans (18-21st October 2015).


Spills, Meltdowns and Environmental Remediation: The ROI of Being Prepared when Litigation Strikes

Heidi Maher, Esq.

Heidi Maher, Esq.

Heidi Maher, Esq. Principal, eDiscovery and Compliance Legal Team Heidi Maher is an eDiscovery advisor in EMC’s Compliance & eDiscovery Practice where she leverages her legal experience along with EMC’s unique technology to help organizations address challenges related to ediscovery, compliance, and records management. By serving as a liaison between legal, IT, and other key business units, Ms. Maher helps organizations implement internal procedures and technology solutions that minimize the risk and expense associated with compliance and e-discovery. She has drafted discovery readiness plans for multinationals and advised corporations and law firms on privacy laws and best practices for international data transfers. Ms. Maher has conducted numerous CLE courses, webinars, workshops, industry conference presentations, as well authored articles on e-discovery and compliance in publications such as Digital Discovery & Electronic Evidence. She has been a member of Working Groups 1 and 6 of The Sedona Conference, a well-known e-discovery think tank, and was a project leader for the Electronic Discovery Model (EDRM). Ms. Maher is also a frequent contributor to EMC’s eDiscovery blog at Prior to EMC, Ms. Maher gained extensive litigation and technology experience as a legal consultant with RenewData Corp. where she helped educate lawyers and litigation professionals in corporations and law firms about technology and its role in litigation. Prior to that, she was a felony prosecutor, Assistant State Attorney General, and an attorney in private practice working on complex multi-million-dollar class-action and mass-tort litigation at the largest law firm in Austin, Texas. Ms. Maher received her J.D. from Baylor School of Law and her B.S. from the University of Texas at Austin. She is licensed to practice law in the Eastern and Western District Courts of Texas as well the Fifth Circuit Court of Appeals. Recent Articles and Blogs: Categorizing eDiscovery: A Practical Framework for Managing Your Information Legal Hold Guidelines for every Legal Department RULE 502: Friend or Foe? Money, Greed, Bribery & Corruption: the Cost of International Business??? eDiscovery StraightTalk “Is Forensic Collection Mandatory for All Civil Litigation?” – Issue 7 Internal Investigations drive eDiscovery Activity eDiscovery StraightTalk with Heidi Maher, Esq. – Issue 2 The X-Files: Issues Surrounding Exotic Forms of Electronically Stored Information. - Digital Discovery & e-Evidence, Bureau of National Affairs (co-author) E‐Discovery From Across the Pond: Data Transfers from the European Union to the U.S. ‐ Digital Discovery & e-Evidence, Bureau of National Affairs (co-author)

[show_avatar align=left avatar_size=30]IQPC’s well organized and well attended eDiscovery for Oil and Gas Seminar is September 26-27 in Houston.  Thought leaders from, among others, Hess, Anadarko, BP, TransCanada and Valero, will be on hand to speak to the unique and not so unique eDiscovery challenges facing the Oil and Gas sector.  Recent record profits and catastrophic events have put the industry in the spotlight not just for lawsuits but also government investigations and regulatory oversight.

Organizations already have difficulties responding in a timely and cost efficient manner to eDiscovery requests.  The industry being inherently global in nature has the added challenge of determining how best to bring information back from certain countries. Many foreign countries, especially those in the European Union have blocking statutes and other privacy laws that prohibit the transfer of data to the United States. The topic I am speaking about is (more…)