Friday, July 13, 2012
Predictive Coding - Is it the Way of the Future for Discovery of Electronically Stored Information
By Mike Palumbo
The May 21, 2012, edition of The National Law Journal featured several articles on recent technological and legal trends in electronic discovery. Coincidentally, the Monday, June 18, 2012, edition of the Wall Street Journal (Marketplace Section) also contained a discussion of similar topics. The information contained in these articles is important for lawyers and their clients. The salient points of these articles are summarized herein. The essential message is that, while searching ESI is a complex, costly task, the process is being made more efficient by "technology-assisted research" (TAR).
"Keyword" searching has been the standard method of culling through massive amounts of electronically stored data. In keyword searching, documents are loaded into a program and lawyers input search terms to find relevant documents. However, courts and commentators have long recognized the limitations of keyword searches as an efficient means of obtaining relevant documents from ESI. Although many courts have been skeptical of keyword searching, few alternatives have gained general approval of the courts. So, courts and litigants have been looking for an acceptable, efficient computerized review regime that can gain general acceptance. Recently, the focus of that search has been on various forms of TAR, including "predictive-coding", a software tool that uses algorithms to automatically tag documents.
A technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege. A human examines and codes only those documents the computer identifies - a tiny fraction of the entire collection. Using the results of this human review, the computer codes the remaining documents in the collection for responsiveness or privilege. A technology-assisted review process may involve, in whole or in part, the use of one or more approaches including, but not limited to, keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, predictive coding and sampling.
Predictive coding has been analogized to a spam filter, separating wanted documents from unwanted documents. Predictive coding involves bypassing the fact that a party doesn't know all the good keywords so they instead look for documents which contain buckets of key words and metadata found in relevant documents and then use these documents to search for more like them. Keywords have friends, and studies have shown that using documents as your means for finding other similar documents is far more accurate than guessing at keywords. In effect, this method avoids the traditional keyword dance that parties can spend weeks, months or longer fighting about.
As noted, predictive coding has been in the news recently. According to one blog writer, "The e-discovery community has been abuzz since late last week (April 26, 2012), when District Judge Carter issued a decision in da Silva Moore v. Publicis Groupe SA, (S.D.N.Y. April 25, 2012), a Title VII class action gender discrimination case, with the potential of millions of documents, affirming Magistrate Judge Peck's order approving an ESI protocol that provides for the use of predictive coding." According to the writer, this decision is the first judicial opinion to endorse predictive coding as a defensible way to review massive amounts of ESI. The ESI community believes that it will encourage others to use the methodology and thereby replace the army of contract lawyers that are being used to review ESI. Devin Scanlon, a consultant with MODUS, a provider of advanced data analytics, notes "Computer assisted and advanced analytics driven work flows are quickly gaining traction. Last year, everyone wanted to know the theory behind it. This year, everyone wants to use it."
In da Silva Moore, the parties were generally in agreement regarding the use of predictive coding. However, a ruling in the Virginia case, Global Aerospace Inc v. Landow Aviation, resulted in an order allowing a party to proceed with predictive coding despite the objections of another party. Landow Aviation involved the collapse of the roofs of three jet hangers, which resulted in the destruction of 14 private jets. Realizing that litigation was inevitable, the owner of the hangers preserved about 8,000 gigabytes of ESI, which according to the Wall Street Journal, was enough data to fill the hard drives of 8 new desktop computers. Rejecting the more typical approach that involves hiring of temporary contract lawyers at significantly reduced hourly rates to do the initial review of the documents (at an anticipated cost of more than one dollar a document) in favor of TAR that was supposed to be more accurate and less costly (estimated to be 1/10 of the cost of manual review), the owners requested that the court allow it to use predictive coding to do the initial work. When agreement on the production methodology could not be reached, Landow's lawyers filed a motion to allow the firm to use predictive coding to cull the collection.
On Monday, April 23, 2012, Judge James H. Chamblin of the 20th Judicial Circuit of Virginia's Loudoun Circuit Court entered a protective order allowing the defendants, over objection, to use predictive coding as their selected method for processing and producing documents from the collection that exceeds 2 million documents. Judge Chamblin expressly reserved the right of any receiving party to challenge the continued use of predictive coding should the production prove to be inaccurate or incomplete.
Another case where there is dispute between the parties regarding the appropriate method of culling through e-discovery is Kleen Products, LLC v. Packaging Corporation of America, No. 10 C 5711 (N.D. Ill, Feb 21, 2012). The parties' briefs reveal that the Kleen plaintiffs want predictive coding to be used because they view Boolean searches as inaccurate and outmoded. On the other hand, the defendants already had heavily invested in the use of Boolean search terms. Plaintiffs asked the court to order defendants to redo their previous productions and all future productions using alternative technology.
One blog discussing Kleen Products noted: " The defendants cited a publication by The Sedona Conference from 2007, "Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery," saying "by far the most commonly used search methodology today is the use of 'keyword searches.'" While the defendants accurately claimed that the type of protocol they used has been held mandatory in some cases, those cases predate the arrival of computer-assisted review technology. The defendants also made the assertion-correct at the time-that no court had spoken on the issue of computer-assisted review." (Applied Discovery, 3/8/12) This was in February before the da Silva Moore opinion. The plaintiffs replied by analogizing the defendants' election of keyword searching as "choosing a horse as a mode of transportation . . . because it is the best available horse, even though technology has evolved and a superior form of transportation-the automobile-is now available." (Id.)
The judge in Kleen Products conducted two hearings on the issue. However, following two full days of expert witness testimony regarding the adequacy of the initial productions, the court asked the parties to try and reach a compromise on the "Boolean" keyword approach. The judge apparently reasoned that having the parties work out a mutually agreeable approach based on what defendants had already implemented was preferable to scheduling yet another full day of expert testimony. The judge did note that, according to Section 6 of the Sedona Conference's Best Practices for E-Discovery, the responding party is the best judge of the procedures to be used for producing its own documents. In other words, the judge was saying that, absent some extreme situation, the opposing party could not dictate the method used for production.
Another blogger summarized the lessons to be learned from the Kleen Products case as follows: "Kleen Products illustrates that keyword search is not dead. Instead, keyword search should be viewed as one of many tools in the Litigator's Toolbelt™ that can be used with other tools such as email threading, advanced filtering technology, and even predictive coding tools. Finally, litigators should take note that regardless of the tools they select, they must be prepared to defend their process and use of those tools or risk the scrutiny of judges and opposing parties." (e-discovery 2.0, 6/5/12)
To conclude, e-discovery continues to evolve. Lawyers and clients are urged to keep up with the current developments.
___________________________________________________ For a description of the details of the predictive coding process, see July 1, 2012 article by Ralph Losey, author of the well known and respected blog The e-discovery Team (http://e-discoveryteam.com),where he sets out how he conducted a search of approximately 700,000 emails and attachments that had been gathered during the Enron litigation. Mr. Losey searched for documents relating to involuntary termination of Enron employees utilizing Kroll Ontrack's Inview software, which he has described as having one of the industry's strongest predictive coding abilities. In da Silva Moore, the software used was Recommind's Axcelerate suite. Other predictive coding software includes kCura's Relativity (via Content Analyst), and Lateral Data (via Quantum).