"All keyword searches are not created equal"
A recent federal civil case, Stanley v. Creative Pipe (F.Supp.2d ----, 2008 WL 2221841 (D. Md.)) has raised interesting questions about the standard of reasonableness when using keyword searches to select or exclude ESI from a large collection. The fact pattern is a long tangled story about using searches to segregate privileged ESI and boils down to the waiver of privilege of 165 contested documents that were produced because they did not get hits from the search engine used by a computer forensic expert.
The legal ramifications of the case are better left to attorney interpretation, but U.S. Magistrate Judge Paul Grimm made several interesting statements that give insight into how the increasingly sophisticated judiciary is raising the bar on search. The first leads in with, "there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying exclusively on such searches for privilege review."
So the bedrock of retrieval, the venerated keyword search, does not stand all on its own. In this case, the seventy-odd search terms were neither disclosed to, nor agreed upon by the opposing party. The terms were created by counsel and client, then executed by their hired gun. All the items that could not be indexed were manually reviewed for privilege, but it appears that no one reviewed the 'searchable' items that did not contain the list of search terms before being produced.
Judge Grimm continues, "Common sense suggests that even a properly designed and executed keyword search may prove to be over-inclusive or under-inclusive, resulting in the identification of documents as privileged which are not, and non-privileged which, in fact, are. The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive."
Upon receiving the production, the opposing party used a "readily-available desktop search tool." to find some of the formerly privileged items. It is unfortunate that the order does not specify the technology used by each side, but it is clear that quite a bit of effort was expended to manually wade through PDF and TIFF files that could have been searched with the right tools. Even hiring a 'computer forensic expert' did not save the Defendants' from their own decisions.
Keywords are easy to understand and even easier to misuse. Beyond the science of 'precision and recall' there needs to be process, communication and metrics to reach the standard of reasonable due diligence, this 'comfort level' that Judge Grimm cites. The process needs to identify the known exceptions to your chosen technology. Have you asked or tested for the ability to search across different formats of email, files, databases, images, voice, video and languages? Do you even know the composition of your ESI collection and how that will affect any searches?
Leave a comment