By Sara Skeens and Josh Tolles
Welcome to part three of our Creative Analytics series. Part one
provided a suggested roadmap for getting more comfortable with analytics tools, and exploring more creative uses. In part two
, we discussed some of the challenges common to the presentation phase of the EDRM, which require us to look for creative solutions. This brings us to part three – the solutions. In this post we will provide more detail on a few key tools and techniques that we deploy to overcome those common challenges. This final installment is intended to serve as the closing primer for our co-hosted webinar with kCura that will be taking place tomorrow, Wednesday July 13th - Leveraging Analytics for Depo & Trial Prep
. Please tune in then where we will put things into a more visual, workflow-based perspective.
Narrowing The Field - Making The Most of Your Time
Deposition and trial preparations typically begin as production review ends (in some cases the two processes can run over each other as well, adding additional complications). It is here that you are usually faced with making sense of two distinct data sets – your produced documents and productions received. Traditional fact finding efforts here involve simply leveraging reviewer coding and supplemental keyword searches. These techniques are a great place to start, but can be highly time inefficient and almost always suffer in terms of completeness.
One helpful early approach here is to limit your fact finding data set to only unique content as much as possible. Analyzing duplicate content is a painful drain on resources. Whether a false keyword hit, or a true hot document, you generally only need one good look within the four corners to assess its value. This can be a bit counter intuitive, especially if you have been working with family coding guidelines during your review efforts. However, it is best to start small when time is of the essence. Identify key individual documents as quickly as possible and then build context around those items later.
Achieving the smallest, most unique starting data set can be accomplished via a layering of analytics tools and metadata culling techniques. The three most common forms of analysis in play here are hash values, e-mail threading, and textual near duplicate identification. These tools can be mixed and matched to test progressive volume reduction strategies. The order in which these tools should be layered requires a full understanding of how each technology works and a strong understanding of the composition of your data set. Once identified, this reduction strategy can be used going forward and in reverse (more on that below).
Casting a Wide Net – Confidence In Completeness
Moving efficiently through your starting set of data is only half of the battle. Effective preparation in this phase requires confidence in the completeness of your efforts. Reviewer coding and supplemental searches discussed above, while helpful, are almost always incomplete. The identification of key documents from those pools should serve more as a foundation for a more exhaustive effort that overcomes the deficiencies of standard key words and the inconsistencies of human judgment.
There are a variety of analytics tools available that empower us to do just that. For example, the same structured analytics analysis deployed during data reduction efforts above can be used to find and gather duplicates, textual near-duplicates, and other non-unique items that are substantially related to our identified key documents. This allows us to quickly identify documents that may have been miss-coded or misunderstood during production review. These items can provide critical additional context around possession and control of the associated information.
Conceptual analytics tools can be leveraged to suggest pockets of undiscovered information, identify inconsistent treatment of similar content, and offer insights for improving our keyword search efforts. With a core set of documents in place, the fact development team can leverage these tool to find new documents and / or validate their efforts to date in a highly controlled manner. A few examples include:
- Clustering document sets and mapping vetted key documents to guide additional quality assurance review of high-priority clusters.
- Using text excerpts from vetted key documents to power categorization sets or RAR projects to bubble up highly similar documents for quality assurance review.
- Expand fact-finding search terms by leverage keyword expansion and terms found in key documents
Making Sense of It All – Organizing the Chaos
One of the benefits of analytics that is most beneficial in the production review phase is just as applicable here – logical organization. Analytics tools excel at organizing documents by both textual and conceptual similarity. The ability to organize the documents in your fact finding universe into e-mail thread groups, near duplicate groups, or clusters is incredibly valuable as you work through your narrowed subsets. Don't assume that you have to abandon one organization you have to abandon one organizational method in favor of another. It may require some massaging, but they can and should be stacked to provide even greater time and effort efficiencies.
Final Thoughts – Waiting for Wednesday
It is important to understand that there is no fixed formula for applying the available tools here. Every matter is subject to different constraints, particularly time constraints. The primary step in any effort like this is typically to identify and organize all documents that are textually similar to known key documents. Whether and to what degree we deploy additional fact development and quality checking tools and workflows will depend upon several factors. The most important take away here is that the use of analytics provides fact development teams with the ability to adapt as they evaluate documents and further refine their strategy. Again, please join us on July 13th as we discuss in more detail.