Wednesday, November 28, 2012

Information Management

A mind set change. A transformation. A Friend. A Better use of Limited Budgets.

Having just finished rereading a Chicago Tribune article entitled “Getting to Zero”, I remain perplexed about how aspirational information management really is. The article recounts the need for employees to get their work email inbox down to ZERO messages or as close as possible. In other words, having a clean inbox means that employees will “feel more organized and less stressed by the daily email avalanche”.   No doubt having fewer email messages to read frees up time.  But in order to have fewer emails to deal with, they either shouldn’t get directed your way to begin with or they have already been dealt with. I can’t make the business use of email go away. But, I can help get rid of the email clutter once it’s there.

A mind set change. For all records managers you will hate what I am going to say. Employees aren’t going to classify and code email messages according to their retention value and if they did, they would get it wrong most of the time.  Change your thinking because the way you think about the problem, even if intellectually correct is practically unreasonable. So while some email may have longer term value, the great preponderance of them has no on-going value after a very short period of time. For all of those messages, I want the system to blow them away right after they no longer are needed. That will make your email box volume go way down real quick. Maybe not zero, but way less. For the few messages that have on-going business value, there has to be a simple way to deal with those. While imperfect and contrary to what I once thought, email as a business communication needs to have one retention period that the system can manage without employee involvement.  Easy to implement and use. Imperfect for one of a kind content that truly has ongoing business value. 

A transformation. Transform a problem into a business solution and victory. Too much email means an unhappy employee, an overburdened email system; a workforce stretched thin, lower customer satisfaction, great private information risk, higher litigation response costs and risks, CIO budget wasted on storing extraneous digital data debris, etc.  If I can get rid of all email but the few with long term business or legal value then I will solve a whole bunch of problems contemporaneously. The thing most companies forget about is users’ needs, so give employees a place to temporarily house needed messages and prohibit messages from being stored outside that environment. 

Imagine a world with only a few emails. Imagine having a better relationship with employees, customers, email administrators and the CIO.

Friday, September 21, 2012

Information Classification

Information classification is the process of arranging information with shared characteristics.  It is a lot harder than what meets the eye.  And if it’s the employees classifying data (increasingly a non-issue as there is way too much information), it’s more like an art than a science and more like contextual guesstimating than measuring.  Even harder when a user must make the determination on what is a formal record (required for legal or regulatory reasons) – vs. what is not.  And there is a really good explanation as to why that’s the reality.
Today around the globe, employees do business, real business in Facebook, Twitter, blogs, SharePoint, text messages and email.  As email has been the business tool of choice for many years and as there are billions of them used in business every day, it’s a good place to start to explore just why having 100% exactitude in classifying is not a reality.

Let’s delve into an example to start to understand just how complicated the mere act of classifying information can be. 

Lily, the manager of the sales support unit gets the following email from Teddy, the head of the leasing business unit.  You make the call—is it a record and if so, how should it be classified?

“Thanks for overseeing the Ace Leasing deal.  I thought your assistant manager, Dylan did a good job and I think he is ready for bigger challenges and a boost in pay.  It would have been useful if he brought contracting in sooner.  We should really think about how to make the documentation process touch fewer hands and simpler over all.  Also, we need to get implementation services involved ASAP.  Please have Riley from contracting confirm the pricing, as it wasn’t on the attached proposal.

Best, Teddy. 
BTW-say hi to your daughter Cooper.”

This email or millions like it happen every day, all day long.  If you were asked to classify it, would you say it’s a record requiring long term retention? If you did, what kind of record is it?

If you had any employee determine what the business value of the email was, they could classify it many different, albeit CORRECT ways.  Most employees predictably classify information with a parochial perspective about what it is based on their work experience.  If Lily, the recipient classified it, she would be colored by the utility of the email for her job or department. In that case maybe it’s a sales record which should be put in the Ace Leasing file.  On the other hand, as a manager she may see it as an HR related record, which recommends advancing Dylan and getting him a pay raise.  Maybe it should even go into Lily’s personnel folder as being complimentary of her good management of her unit.  Maybe the email is a record for the contracting department or instructions for the implementation of the project. Maybe it’s also a record for the Business Process Improvement team to fix the business process as management thinks it’s broken.  Fact is it could properly be classified as all those types of records.  All different records have different retention periods associated with them.  And further depending upon who classifies and what business unit they are from, the result may be substantially different. 

Not surprisingly, employees are not particularly good at classifying information, even the smart ones, and if they don’t need to do it, they won’t, and don’t even care. Now imagine each employee touches 100 information nuggets daily that need classification.  This partly explains why classification is so difficult.  It also makes the point that there are many subjective right answers. I believe many records could be properly classified in different correct ways.  We sometime think there is only one right way.

For almost a decade I have been thinking about the use of auto-classification technology to classify and manage information.  I used to think it wasn’t ready for prime time.  Today it is really powerful when used properly.  I then got hung up on lawyers attacking it giving a known failure rate.  I got over that as they attack everything any way and reasonableness and information volumes dictate relying on technology to do the heavy classification lifting.  Given information volumes and expecting employees to do the classifying is like asking your auditors to count the grains of sand on the beach, and classify them according to size and shape.  And now I am down to how effective the technology has to be to allow your classification to be done by a computer.  There are no hard and fast rules about confidence ratings or efficacy scores (sometimes referred to as F-Score,) even though most people would be substantially comforted if there were simple rules for what was good or good enough.    

I know employees are not good at classification.  I know that employees don’t have time to do it and even if they did, they usually won’t get it right.  I know people classify information in different ways and rarely are consistent from employee to employee.  I know information volumes for most big businesses are growing at 20-50% per year.  I know computers can do classification.  I know it is not simple or cheap to do auto-classification.  I know it takes upfront effort to get auto-classification right.  I know that a company can’t dispose of business information without some diligence process to ensure that records are retained and evidence is preserved.  I know that I have concluded that every big business needs to consider defensible disposition of information using technology to make it happen.  In the end, I know people will attack the process and they will attack the auto-classification soft underbelly—the failure rate, the confidence score, the F-Score.  I used to think it had to be above 90% to be good enough. Then I thought well maybe 80% is good enough.

Well, I have changed my thinking because the paradigm bounding my thoughts on this topic is flawed. As the classification tool crawls, it uses linguistic and numerical analysis to determine what something is and how to properly classify it.  In the end if the software tells me it believes it’s correct with a confidence score of 51% or higher—what that means is the software probably got it right but maybe there is another category that is also a good option.  In the end people do exactly what the technology does, but we hold technology to a different and higher standard.  I am not sure what the right confidence score is, but I think we need to give technology a chance and not look for reasons to dismiss its utility. Nothing’s perfect, including your employees.


Wednesday, June 6, 2012

Kahn’s 4 Keys to Defensible Disposition

With virtually no companies methodically applying retention rules to their ever-growing information heaps, and no practical way for employees to discern what is needed and what is digital data debris, you need to be thinking about how you will defensibly dispose of info crud.  After all, “innocent” technology folks have been forced to defend claims of destruction of evidence for merely recycling systems to make room for more stuff.  So here are Kahn’s 4 Keys to Defensible Disposition.   

Kahn’s 4 Keys to Defensible Disposition
1.    There is sufficient diligence (including review, audit, analysis by human and/or technology) to determine that the information subject to disposition is no longer needed for records retention or legal purposes.
2.    The analysis and diligence process is managed by individuals without any personal interest or incentive in the disposition of the specific content subject to disposition and any disposition is undertaken with agreement and oversight by law department and relevant business unit heads.
3.    The disposition process followed is documented, routinized and repeatable and all disposition actions taken are authorized, final, complete and irreversible.
4.    Prior to any disposition, there will be sufficient notification of the proposed disposition actions to be taken, to the affected business unit heads and the legal representative to be able to immediately stop the disposition process if questions arise as to the appropriateness or legality of the disposition.

Wednesday, February 29, 2012

A Stream of Auto-Classification Consciousness by Randolph Kahn, ESQ.

“But how will a court judge our use of auto-classification technology to do the heavy lifting regarding what information was a record and what was junk?”

“I want to be comfortable with our decision that using algorithmic classification software technology to apply our records retention rules and clean up the contents of our shared drive won’t get us flogged by a regulator or a court.”

“I am concerned that if we get rid of this data without having our employees review it manually, that we are open to attack in a court.”

We have empirical data to support the proposition that employees classify and code information way worse than computers, by a long shot. Yet most companies continue to rely on their employees to manage information. “[T]echnology-assisted process, in which only a small fraction of the document collection is ever examined by humans, can yield higher recall and/or precision than an exhaustive manual review process, in which the entire document collection is examined and coded by humans.” Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Maura R. Grossman, JD., Ph.D. and Gordon V. Cormack, Ph.D.

Most big companies have petabytes of structure and unstructured content, which is billions of files. Do you think a judge would say it was "reasonable" to expect employees to classify and review billions of files before they could be purged? According to the Council of Information Auto-Classification’s “The Information Explosion Survey”, 98% of organizations reported rapid information growth that they predict will extend into the future and that growth is creating a variety of challenges and consequences. Half of the respondents indicated they are forced to recreate information previously created because they cannot find it. 74% of the organizations stated valuable information is being lost (i.e. can’t find, disposed of, misplaced) due to the lack of proper technology solutions. 73% of respondents reported their organization misses business opportunities because they can’t efficiently access information.

Technology is amazingly powerful at uncovering value from information and connecting dots, at the same time people are impotent in the face of the mountain of data to make it make sense. People bet their life on the Genome project made possible by technology unearthing the connections in data, but you are not sure if you should use auto-classification technology to determine if an email is a record.

In an article entitled “Search, Forward: Will manual document review and keyword searches be replaced by computer-assisted coding” US Federal Magistrate Judge Andrew Peck wrote, “[p]erhaps they are looking for an opinion concluding that: “It is the opinion of the court that the use of predictive coding is a proper and acceptable means of conducting searches under the Federal Rules of Civil Procedure, and furthermore that the software provided for this purpose by [Insert name of your favorite vendor] is the software of choice in this court.” If so, it will be a long wait… Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help “secure the just, speedy, and inexpensive” (Fed. R. Civ. P. 1) determination of case in our e-discovery world.

And Judge Peck strikes again in Moore v. Publicis Groupe, in his February 22, 2012 order in this case, “Computer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases. While this Court recognizes that computer-assisted review is not perfect, the Federal Rules of Civil Procedure do not require perfection…Counsel no longer have to worry about being the “first” or “guinea pig” for judicial acceptance of computer assisted review.”

If computer-assisted review is ok for finding relevant information for a lawsuit, why shouldn’t you be comfortable with using these types of technologies to classify records? Clearly applying records management rules is a far less risky proposition than responding to discovery. Auto-classification is a way to better manage and if appropriate, defensibly dispose of huge volumes of data when people can’t. The courts are now making that decision easier.