Friday, July 13, 2012
Predictive Coding - Is it the Way of the Future for Discovery of Electronically Stored Information
By Mike Palumbo
The
May 21, 2012, edition of The National Law Journal featured several articles on
recent technological and legal trends in electronic discovery. Coincidentally,
the Monday, June 18, 2012, edition of the Wall Street Journal (Marketplace
Section) also contained a discussion of similar topics. The information
contained in these articles is important for lawyers and their clients. The
salient points of these articles are summarized herein. The essential message
is that, while searching ESI is a complex, costly task, the process is being
made more efficient by "technology-assisted research" (TAR).
"Keyword" searching has been the
standard method of culling through massive amounts of electronically stored
data. In keyword searching, documents are loaded into a program and lawyers
input search terms to find relevant documents.
However, courts and commentators have long recognized the limitations of
keyword searches as an efficient means of obtaining relevant documents from
ESI. Although many courts have been skeptical of keyword searching, few
alternatives have gained general approval of the courts. So, courts and
litigants have been looking for an acceptable, efficient computerized review
regime that can gain general acceptance. Recently, the focus of that search has
been on various forms of TAR, including "predictive-coding", a software tool that uses algorithms to automatically
tag documents.
A technology-assisted review process involves
the interplay of humans and computers to identify the documents in a collection
that are responsive to a production request, or to identify those documents
that should be withheld on the basis of privilege. A human examines and codes
only those documents the computer identifies - a tiny fraction of the entire
collection. Using the results of this human review, the computer codes the
remaining documents in the collection for responsiveness or privilege. A technology-assisted review process may
involve, in whole or in part, the use of one or more approaches including, but
not limited to, keyword search, Boolean search, conceptual search, clustering,
machine learning, relevance ranking, predictive coding and sampling.
Predictive
coding has been analogized to a spam filter, separating wanted documents from
unwanted documents. Predictive coding involves bypassing the fact that a party
doesn't know all the good keywords so they instead look for documents which contain
buckets of key words and metadata found in relevant documents and then use
these documents to search for more like them. Keywords have friends, and
studies have shown that using documents as your means for finding other similar
documents is far more accurate than guessing at keywords. In effect, this
method avoids the traditional keyword dance that parties can spend weeks,
months or longer fighting about.
As
noted, predictive coding has been in the news recently. According to one blog
writer, "The e-discovery community has been abuzz since late last week
(April 26, 2012), when District Judge Carter issued a decision in da Silva
Moore v. Publicis Groupe SA, (S.D.N.Y.
April 25, 2012), a Title VII class action gender discrimination case, with the
potential of millions of documents, affirming Magistrate Judge
Peck's order approving an ESI protocol that provides for the use of predictive
coding." According to the writer,
this decision is the first judicial opinion to endorse
predictive coding as a defensible way to review massive amounts of ESI. The ESI
community believes that it will encourage others to use the methodology and
thereby replace the army of contract lawyers that are being used to review
ESI.[1] Devin Scanlon, a consultant with MODUS, a provider of advanced
data analytics, notes "Computer assisted and advanced analytics driven
work flows are quickly gaining traction.
Last year, everyone wanted to know the theory behind it. This year, everyone wants to use it."
In da
Silva Moore, the parties were generally in agreement regarding the use of
predictive coding. However, a ruling in the Virginia case, Global Aerospace
Inc v. Landow Aviation, resulted in an order allowing a party to proceed
with predictive coding despite the objections of another party. Landow
Aviation involved the collapse of the roofs of three jet hangers, which
resulted in the destruction of 14 private jets. Realizing that litigation was
inevitable, the owner of the hangers preserved about 8,000 gigabytes of ESI,
which according to the Wall Street Journal, was enough data to fill the hard
drives of 8 new desktop computers. Rejecting the more typical approach that
involves hiring of temporary contract lawyers at significantly reduced hourly
rates to do the initial review of the documents (at an anticipated cost of more
than one dollar a document) in favor of TAR that was supposed to be more
accurate and less costly (estimated to be 1/10 of the cost of manual review),
the owners requested that the court allow it to use predictive coding to do the
initial work. When agreement on the production methodology could not be
reached, Landow's lawyers filed a motion to allow the firm to use predictive
coding to cull the collection.
On
Monday, April 23, 2012, Judge James H. Chamblin of the 20th Judicial Circuit of
Virginia's Loudoun Circuit Court entered a protective order allowing the
defendants, over objection, to use predictive coding as their selected method
for processing and producing documents from the collection that exceeds 2
million documents. Judge Chamblin expressly reserved the right of any receiving
party to challenge the continued use of predictive coding should the production
prove to be inaccurate or incomplete.
Another
case where there is dispute between the parties regarding the appropriate
method of culling through e-discovery is
Kleen Products, LLC v. Packaging Corporation of America, No. 10 C
5711 (N.D. Ill, Feb 21, 2012). The parties' briefs reveal that the Kleen
plaintiffs want predictive coding to be used because they view Boolean searches
as inaccurate and outmoded. On the other hand, the defendants already had
heavily invested in the use of Boolean search terms. Plaintiffs asked the court
to order defendants to redo their previous productions and all future productions
using alternative technology.
One
blog discussing Kleen Products noted: " The defendants cited a
publication by The Sedona Conference from 2007, "Best Practices Commentary
on the Use of Search & Information Retrieval Methods in E-Discovery,"
saying "by far the most commonly used search methodology today is the use
of 'keyword searches.'" While the defendants accurately claimed that the
type of protocol they used has been held mandatory in some cases, those cases
predate the arrival of computer-assisted review technology. The defendants also
made the assertion-correct at the time-that no court had spoken on the issue of
computer-assisted review." (Applied Discovery, 3/8/12) This was in February before the da Silva
Moore opinion. The plaintiffs replied by analogizing the defendants'
election of keyword searching as "choosing a horse as a mode of
transportation . . . because it is the best available horse, even though
technology has evolved and a superior form of transportation-the automobile-is
now available." (Id.)
The
judge in Kleen Products conducted two hearings on the issue. However,
following two full days of expert witness testimony regarding the adequacy of
the initial productions, the court asked the parties to try and reach a
compromise on the "Boolean" keyword approach. The judge apparently
reasoned that having the parties work out a mutually agreeable approach based
on what defendants had already implemented was preferable to scheduling yet
another full day of expert testimony. The judge did note that, according to
Section 6 of the Sedona Conference's Best Practices for E-Discovery, the
responding party is the best judge of the procedures to be used for producing
its own documents. In other words, the judge was saying that, absent some
extreme situation, the opposing party could not dictate the method used for
production.
Another
blogger summarized the lessons to be learned from the Kleen Products
case as follows: "Kleen
Products illustrates that keyword search is not dead. Instead,
keyword search should be viewed as one of many tools in the Litigator's
Toolbelt™ that can be used with other tools such as email threading, advanced
filtering technology, and even predictive coding tools. Finally, litigators
should take note that regardless of the tools they select, they must be
prepared to defend their process and use of those tools or risk the scrutiny of
judges and opposing parties." (e-discovery 2.0, 6/5/12)
To
conclude, e-discovery continues to evolve. Lawyers and clients are urged to
keep up with the current developments.
___________________________________________________
[1] For a description
of the details of the predictive coding process, see July 1, 2012 article by
Ralph Losey, author of the well known and respected blog The e-discovery
Team (http://e-discoveryteam.com),where
he sets out how he conducted a search of approximately 700,000 emails and
attachments that had been gathered during the Enron litigation. Mr. Losey
searched for documents relating to involuntary termination of Enron employees
utilizing Kroll
Ontrack's Inview
software, which he has described as having one of the industry's strongest predictive coding abilities. In da Silva Moore, the software used was Recommind's Axcelerate suite. Other predictive coding software includes
kCura's Relativity (via Content Analyst), and Lateral Data (via Quantum).
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment