The Roots of Search Engines - Why you should care what is behind your enterprise Search

| | Comments (0)

                Enterprise level discovery requires enterprise level technology or by definition it becomes 'unduly burdensome' for any but the smallest cases.  Recent opinions from federal judges and appellate courts on both coasts have made it clear that the discovery process, technology and personnel are under increasing scrutiny. Plaintiff's counsel are reading the same opinions from Judge John Facciola (United States v. O'Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008)) and the 9th Circuit (Quon v. Arch Wireless). Many will interpret them as a signal to initiate Daubert style hearings to force corporations to defend their search engines, communication systems and the discovery effort in response to discovery demands.

                Corporate counsel may then turn to the vendor that sold them an 'enterprise solution' and demand an expert to defend the search engine. Some of the vendors have made the serious investment to validate their technology through 3rd party testing or by recruiting resident experts. The reality of our industry is that there have been few if any public challenges to the accuracy and completeness of software sold to the e-discovery market. Until recently, most of this software was purchased and run by service providers who had an interest in defending the product that they proposed and operated behind the magic curtains.

                The majority of applications now focused on the corporate e-Discovery spend were originally created for information management, knowledge management, process analytics, storage management, disaster recovery and other non-litigation business needs. Enterprise search engines were mostly born from the research into scaling internet search engines that started back in the early 1990's. The problem with web based search is that the systems are optimized for speed and relevance, rather than completeness.  A recent Autonomy release drew attention to the engine 'jump out' issue, where some systems actually stop or skip indexes when they think it unlikely to find a match.

                Not all applications that use search engines created for the internet are unsuitable for discovery search. The index engine 'schema' or configuration controls how it crawls through files to extract out the information needed to get search results as well as how it behaves while running a search. Many enterprise applications like email archives and document management systems utilize open source or OEM'd index engines like FAST, IDOL and Lucene. Most of these index engines are dependent upon outside software to recognize divergent types of ESI and extract the text in discrete words (a process called tokenization) with a few exceptions like Autonomy's IDOL platform. The configuration and options in a system can result in dropping out critical ESI from the index/search or it can bring the system to its knees by forcing it to try to index file types which contain no text or character sets that it cannot handle.

                The critical take away from these new cases is that you need to know the capabilities and limitations of your chosen search engine(s). Not just what the software provider tells you it can do, but actually how it performs on your ESI. Think about standing before a court and answering the question, "How do you know that your search got everything that matched your criteria?" Judges are not seeking some impossible level of absolute certainty, but they expect to hear about your reasonable effort to validate your tools.

Leave a comment

Entry Sponsorship

This entry is sponsored by Autonomy ZANTAZ

About Autonomy ZANTAZ Blog

    Autonomy ZANTAZ is the leader in archiving, eDiscovery and Proactive Information Risk Management markets. It is the only vendor that offers an entire spectrum of Proactive Information Risk Management solutions ranging from real-time policy management, records management and consolidated archiving to early case assessment, enterprise legal hold and EDD, review and production. ZANTAZ solutions run on common platform, IDOL, which supports more than 100 languages and 1,000 file types. ZANTAZ solutions are available as hosted services, on-site software or ac ombination of both. ZANTAZ customers include 9 of the 10 top global law firms, 11 of the fortune 25 and 14 of the top 20 financial securities firms. Customers include Abbot Laboratories, Capital One, JMP Securities, Johnson and Johnson, Liberty Mutual, Linklaters, Philip Morris International and the US Department of Interior.