Skip directly to content

Predictive Coding Is Becoming The Best Practices Standard for Reviewing Relevant ESI

Contact us by submitting an email on the Contact page.

on Sun, 07/29/2012 - 18:49

e-discovery, predictive coding
Two US District Court judges are blazing the path that will very likely set the best practices standard for the entire legal community in the area of e-discovery production.  

Judge Andrew Peck is most certainly the Lewis and Clark of e-discovery.  His recent opinion in Monique Da Silva Moore et al v. Publicis Groupe & MSL Group, et al.No. 11 Civ. 1279 (ALC) (AJP). (S.D. New York. June 15, 2012) describes for the first time that "linear manual review is simply too expensive where, as here, there are over three million emails to review. Moreover, while some lawyers still consider manual review to be the 'gold standard,' that is a myth, as statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review."

Judge Peck based this conclusion on two very significant, detailed and empirical studies which one can read in its original publication form.  The details, however, are not the subject of this post.

In National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement AgencyNo. 10 Civ. 3488 (SAS) (S.D.New York, July 13, 2012), Judge Scheindlin takes control. 

For the purposes of this post, it is important to review National Day in more detail.  In National Day, several US government agencies including the FBI, ICE, and the DHS thwarted the plaintiffs’ attempts to obtain relevant discovery information.

After the defendants failed to comply with their obligations, Judge Scheindlin ordered them to produce the records on a new “drop dead date”.  The defendants’ then conducted an extensive keyword search using hundreds of employees expending thousands of hours and resulted in the production of tens of thousands of responsive records.  Yet, the Plaintiffs argued the searches were not reasonably designed to uncover all responsive records and thus was inadequate.

Judge Scheindlin agreed.  Judge Scheindlin pointed out that keyword searches to find all relevant documents is almost never enough and suggests "predictive coding" is a much more accurate search process.

"Simple keyword searching is often not enough: 'Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.' There is increasingly strong evidence that '[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.' As Judge Andrew Peck -- one of this Court's experts in e-discovery -- recently put it: 'In too many cases, however, the way lawyers choose keywords is the equivalent of the child's game of 'Go Fish' ... keyword searches usually are not very effective...

Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.” National Day at [Google Scholar Citation Currently Unavailable]

Judge Scheindlin went further in describing the newer methodology of predictive coding:

"There are emerging best practices for dealing with these [keyword search] shortcomings and they are explained in detail elsewhere.’ And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.

Through iterative learning, these methods (known as 'computer-assisted' or 'predictive' coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies' unsupported assertions that their lay custodians have designed and conducted a reasonable search."  [Google Scholar Citation Currently Unavailable]

Judge Scheindlin's reasoning is persuasive even while some have discounted predictive coding because it relies on computer generated search based algorithms.  However, what is emerging is that predictive coding picks up what us mere mortals miss. 

Because of Judge Peck and Judge Scheindlin, many judges have accepted predictive coding technology as a best practice standard because of the previously mentioned consistently high accuracy rates as well as the iterative training and sampling processes.

Predictive Coding History

This discovery practices shift is significant because it was not that long ago when managing discovery centered around managing, storing and reviewing physical documents one by one.  The discovery process then evolved to managing the physical documents but storing them electronically through rudimentary scanning technology.  The discovery process continued its evolution by using Optical Character Recognition (OCR) scanning technology which allowed for keyword searches we have become so familiar with in the post-Google era.

Two issues encountered with early OCR technology were its technical recognition limitation and the quality of the underlying scanned document.  The evolutionary process continued.  The origin of the actual documents were electronic  and not physical.  The actual documents were being prepared electronically which axiomatically created high quality documents.  At the same time the scanning OCR technology for physical documents was improving.  

At that time and even through today some may argue, keyword searching was the most effective way to locate a document.   However, the larger the number of documents, the more false positives were being found.  The use of keywords is a skill of precision which means relevant documents may be missed based on simple misspellings or misinterpreted meanings.  

Technology was starting to allow for documents to be "tagged."   Tagging documents became increasingly important because tagging allowed documents to be organized if they were not already organized.  Tagging helped to mitigate the effect of locating false positives.  Corporate and litigation departments started tagging documents based on content, relevancy, keyword meaning, type, etc.  However, tagging is extremely time intensive and requires significant human input and thus is expensive.  

Now enters predictive coding.  For the uninitiated, the term may sound quite foreign.  It may be helpful to rephrase the term "predictive coding" to "predictive tagging."   What Judge Peck and Judge Scheindlin allude to but never quite define predictive coding.

Predictive coding (or predictive tagging) is software technology that automatically predicts how documents should be classified (or tagged) based on a limited, but significant, level of human input.  The idea is similar to creating exemplars or templates for forms, pleadings or contracts.

And there you have it… predictive coding.  With this historical background in mind, predictive coding makes more sense to the non-tech savvy professional.

Practical Application

Law firms must become intimately familiar with predictive coding in order to advise clients on their internal obligations to manage and properly store information because the ability to automatically rank and then “code” or “tag” electronic documents based on criteria such as relevance and privilege has the potential to save companies millions in e-discovery costs.

The potential cost savings of predictive coding technology are so great that corporate legal departments inevitably will expect the law firms that represent them to learn how to use the technology. Much like the shift from paper to electronic file conversion was driven by the potential for cost savings, so too is the buzz surrounding predictive coding technology. 

While the days of linear document review are rapidly disappearing, the role of savvy e-discovery practitioners creates a necessary edge for law firms interested in acquiring new clients and maintaining current clients.

Understanding keyword searching and boolean methodology is no longer enough.  Predictive coding requires the coder to understand the meaning, content, purpose and principles behind the documents.  

Calls to Action

The material discussed above is academic without specific calls to action.  Law firms have a phenomenal opportunity to move ahead of their competitors by showing their clients the value of this knowledge.   

  1. Law firms should conduct free predictive coding seminars/webinars for clients and their in house counsel on the importance of a proper unified information collection system and unified information preservation system.
  2. Law firms should create its own predictive coding e-discovery set of protocols and systems in addition to its other e-discovery set of protocols and systems.  
  3. Law firms should then recommend software products for their clients that will also integrate with the law firm's e-discovery systems.
  4. Law firms should train internal employees (lawyers and paralegals)as well as client employees on the principles, use and implementation of these protocols.

Employing the above mentioned four calls to action will create an opportunity to stand out against others vying to acquire new clients while showing current clients how using this technology could save significant dollars.   Law firms need to invest time in understanding the future of predictive coding or risk becoming less relevant.


Malpractice is never a good motivation for education.  However, the time is coming when failing to educate clients on the importance of predictive coding e-discovery will become a malpractice event.  Attorneys across the country are already being sanctioned for failure to adhere to related e-discovery best practices.  The indications are clear.  Lawyers have a duty to explore the use of this most advanced e-discovery tool.

Though advanced as predictive coding is, there are many questions still left unanswered.  These questions will no doubt be resolved in yet to be decided cases in both federal and state court. 

Predictive coding is simply another e-discovery tool in an arsenal of many e-discovery tools.  Maybe the most important reason to implement predictive coding is found in Federal Rules of Civil Procedure 1 which states, the '[FRCP] should be construed and administered to secure the just, speedy, and inexpensive determination of every action and proceeding."

If the reader is interested in learning more about how to deal with these and other related issues, please post a comment below or email gerald[at]

Post new comment