Wednesday, February 29, 2012

A Stream of Auto-Classification Consciousness by Randolph Kahn, ESQ.

“But how will a court judge our use of auto-classification technology to do the heavy lifting regarding what information was a record and what was junk?”


“I want to be comfortable with our decision that using algorithmic classification software technology to apply our records retention rules and clean up the contents of our shared drive won’t get us flogged by a regulator or a court.”


“I am concerned that if we get rid of this data without having our employees review it manually, that we are open to attack in a court.”


We have empirical data to support the proposition that employees classify and code information way worse than computers, by a long shot. Yet most companies continue to rely on their employees to manage information. “[T]echnology-assisted process, in which only a small fraction of the document collection is ever examined by humans, can yield higher recall and/or precision than an exhaustive manual review process, in which the entire document collection is examined and coded by humans.” Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Maura R. Grossman, JD., Ph.D. and Gordon V. Cormack, Ph.D.


Most big companies have petabytes of structure and unstructured content, which is billions of files. Do you think a judge would say it was "reasonable" to expect employees to classify and review billions of files before they could be purged? According to the Council of Information Auto-Classification’s “The Information Explosion Survey”, 98% of organizations reported rapid information growth that they predict will extend into the future and that growth is creating a variety of challenges and consequences. Half of the respondents indicated they are forced to recreate information previously created because they cannot find it. 74% of the organizations stated valuable information is being lost (i.e. can’t find, disposed of, misplaced) due to the lack of proper technology solutions. 73% of respondents reported their organization misses business opportunities because they can’t efficiently access information.


Technology is amazingly powerful at uncovering value from information and connecting dots, at the same time people are impotent in the face of the mountain of data to make it make sense. People bet their life on the Genome project made possible by technology unearthing the connections in data, but you are not sure if you should use auto-classification technology to determine if an email is a record.


In an article entitled “Search, Forward: Will manual document review and keyword searches be replaced by computer-assisted coding” US Federal Magistrate Judge Andrew Peck wrote, “[p]erhaps they are looking for an opinion concluding that: “It is the opinion of the court that the use of predictive coding is a proper and acceptable means of conducting searches under the Federal Rules of Civil Procedure, and furthermore that the software provided for this purpose by [Insert name of your favorite vendor] is the software of choice in this court.” If so, it will be a long wait… Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help “secure the just, speedy, and inexpensive” (Fed. R. Civ. P. 1) determination of case in our e-discovery world.


And Judge Peck strikes again in Moore v. Publicis Groupe, in his February 22, 2012 order in this case, “Computer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases. While this Court recognizes that computer-assisted review is not perfect, the Federal Rules of Civil Procedure do not require perfection…Counsel no longer have to worry about being the “first” or “guinea pig” for judicial acceptance of computer assisted review.”


If computer-assisted review is ok for finding relevant information for a lawsuit, why shouldn’t you be comfortable with using these types of technologies to classify records? Clearly applying records management rules is a far less risky proposition than responding to discovery. Auto-classification is a way to better manage and if appropriate, defensibly dispose of huge volumes of data when people can’t. The courts are now making that decision easier.