TAR eDiscovery orders and opinions have made some pretty big splashes in the last five years, and the recent FCA US LLC v. Cummings, Inc., order, despite being brief, was no exception. The court took up the question of whether keyword search culling of a data set prior to the application of Technology Assisted Review (i.e., TAR or Predictive Coding) is the preferred method. The answer, in the court’s opinion, was simple but powerful: it is not.
Some have described this decision as a “nightmare.” Others have less vividly decried it as likely to impede much needed progress in the use of advanced analytics. While I understand the causes for concern, I find it hard to disagree with the court’s decision based on my understanding of the relevant judicial precedent and the gravity of the flaws associated with keyword search culling.
Personally, I don’t believe that TAR judicial history to date, apart from the circumstance and proportionality based rulings in In re Biomet (Apr. 18, 2013) and Bridgestone (July 22, 2014), support another outcome. I am also confident that the rulings in those cases would have been in line with the FCA decision had the question of keyword search culling arisen prior to the undertaking of substantial discovery efforts. The fact that the court in Biomet felt the need to clearly separate that issue from its ruling seems to support that line of thinking.
"The issue before me today isn’t whether predictive coding is a better way of doing things than keyword searching prior to predictive coding. I must decide whether Biomet’s procedure satisfies its discovery obligations and, if so, whether it must also do what the Steering Committee seeks.” In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig., 2013 U.S. Dist. LEXIS 84440 (N.D. Ind. Apr. 18, 2013)
In his 2012 Da Silva Moore opinion, the case that started it all, Judge Peck stated, “[t]he goal is for the review method to result in higher recall and higher precision than another review method, at a cost proportionate to the ‘value’ of the case.” He then provided analysis on how TAR could meet or exceed current performance in those areas through improved coding consistency and more effective and efficient retrieval of relevant information. Taking a hard look at those two points of analysis, in my mind, is critical to understanding court’s order in FCA.
Judge Peck’s analysis regarding improved decision consistency noted that human error was largely to blame for inaccuracies in the application of coding, and cited studies attributing as little as 5% of the difference found in coding to disagreement in reviewer judgement. While the analysis rightly notes that human error does potentially affect both precision and recall, it does not frame potential improvements offered via TAR as critical to practitioners meeting discovery obligations in suitable cases. This makes sense to me given that human error and disagreement regarding document decisions during review, in my experience, don’t contribute substantially to lost recall, especially where strong quality control processes are in practice.
Keyword search, on the other hand, was identified as a critical area generally in need of immediate and considerable improvement, and more specifically in the matter at hand. Judge Peck, while citing the Blair and Maron Study and TREC studies, noted that keyword search retrieval may result in recall rates as low as 20%, and that TAR could substantially improve upon the well-known recall limitations of that method in suitable cases. He also noted that keyword search generally results in poor precision, which significantly negatively affects the discovery process by inflating review costs on a regular basis. Judge Peck, finally, cited to a few recent cases to drive home the severity of recall and precision shortcomings associated with keyword search, and to make clear the associated risks to practitioners in meeting discovery obligations.
Although the findings and analysis offered in the Da Silva Moore opinion have been echoed several times in more recent relevant cases, the emphasis on keyword search deficiencies seems to have faded over time, with more emphasis being placed on the reviewer consistency analysis. That shift likely started in part when Judge Peck’s order was reviewed on appeal. The court affirmed his ruling, but did so with a much greater emphasis on the human error component of his analysis, and only a brief mention of the keyword search concerns.
“However, even if all parties here were willing to entertain the notion of manually reviewing the documents, such review is prone to human error and marred with inconsistencies from the various attorneys’ determination of whether a document is responsive. Judge Peck concluded that under the circumstances of this particular case, the use of the predictive coding software as specified in the ESI protocol is more appropriate than keyword searching.” Da Silva Moore, et al. v. Publicis Groupe, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012)
The greater focus on reviewer consistency as a justification for TAR use was almost surely also furthered by an industry-wide desire to get these more advanced TAR tools into wider use. Arguing improved consistency as a justification for TAR use would appear to be much less risky proposition when considering additional obligations that could logically result from successfully arguing the need for improved completeness. The problem with that line of thinking, strictly speaking in terms of recall, is that using TAR on a data set which has been culled via keywords arguably adds to the loss of relevant data, unless the achieved TAR project recall is exceptionally high.
While TAR promises to be an improvement in the areas of precision and recall in suitable cases, the technology is not without limitations that must be recognized. We know that TAR leaves behind relevant data, just like keywords and straight linear review. That is why we have recall scores – to identify and codify what constitutes “good enough” in terms of relevant data retrieval.
The strength of TAR (i.e., Predictive Coding) is that it arguably leaves behind less relevant material than those other methods when deployed on suitable starting data sets. We all must remember that discovery does not have a standard of perfection, but rather requires a reasonable good faith effort. Therefore, the goal is to have an approach that is better than the available alternatives, not perfect.
However, arguing that the use of TAR significantly improves review of an already incomplete set of documents derived from keyword search misses the point of the original Da Silva Moore ruling. Recall may be marginally better to start due to improved consistency, but TAR still can’t find relevant documents that aren’t in the review universe. That is not to say that search terms can’t still be used prior to TAR. There is always room to negotiate such things among the parties, and for the court to make exceptions and adjustments considering proportionality.
In my opinion, the human error contribution to recall loss during review is not comparable to the inadequacies associated with keyword culling in discovery. They are not on the same spectrum in terms of their impact. Yes, TAR has been found to be comparable or more accurate than manual review in terms of consistency when properly implemented on like data sets. However, any gain in completeness via improved consistency becomes insignificant when considering the additional relevant data likely lost under the agreed-upon TAR recall rate in most cases. Those gains become even more insignificant when considering the amount of data likely lost using keyword search retrieval in advance of TAR or manual review.