Despite the effectiveness of content/document/records management solutions, only five to 10 percent of unstructured files are actually stored in managed repositories. The remainder represents a huge portion of your organization’s knowledge. The few proactive organizations that have tried to apply records management best practices to shared drives find the effort exhausting. Shutting down a shared drive is almost as unachievable a goal as creating the paperless office. Even though the cost of storage continually drops, the cost of managing this information is growing, particularly for IT staff.

Question: Why should you clean up and migrate off your shared drives onto a content repository?
Answer: there’s a lot of useless stuff out there, and it costs money to keep it there.

Let’s look at the three kinds of content on most shared drives.

1. Roughly a quarter to a third of storage space is eTrash—stuff that has no value to the organization and consists of:

    • Temporary files created from crashed applications, process logs, and automatic backup functions.
    • Applications and install files that no longer work.
    • Duplicates – but remember that in most cases, one of the copies should be kept and you will need to find a way to identify which one.
    • Backup – a duplicate or copy of something that is important. 
    • Zip files – users zip both files and folders to make backups or to compress files for emailing. Both of these activities often result in duplicates because originals are not deleted.
    • Expired records that have reached desired/required retention but have not been purged. Easier said than done. Many organizations err in keeping everything forever because they are afraid of destroying files that should be kept longer. The law doesn’t expect perfection when it comes to records retention, only reasonable effort and good intentions.

2. Another 25 to 33 percent of storage consists of blocked files that are probably valuable but that should not be migrated. This information may need to be preserved, but not in a content repository.

  • Grouped documents – nested or grouped folders are sets of related documents that are linked by their relative position in a folder hierarchy. They rely on a file being in the right location in the file system: HTML presentations, compound documents, CAD files, databases, or applications. It is not that they can’t exist in a content repository; they just cannot be migrated as they are to a repository since all the links will no longer work. 
  • Saved programs – users store applications, install files, and system files on shared drives for a number of reasons. Applications, however, need all of their code and referenced files available in an active state in order to work. A content repository is not that kind of environment.
  • Stored databases – people also create individual or workgroup databases that also need to be in a live state and have access to all referenced table and resources in order to work.
  • Archived emails – email inboxes get too big, so staff archives messages to laptops. Laptops get lost or stolen so users archive on shared network drives as personal folders, commonly called “PST files” for their “.pst” suffix. These files can become huge. At a recent client, they constituted one percent of files by type, but they hogged 80 percent of storage space. All that important information was “too important” to delete and is now sitting in a PST file, where it is not easily retrieved, shared, or reused. The PST files themselves should not be migrated—just the content within.
      • Organized media – iPod libraries and company picnic photos are all worth keeping. However, because of their file size and lack of corporate value, they should not be migrated or stored on servers that are backed up and restored in an emergency.

3. Inventory Resistant “stuff.” This is good, valuable, discoverable content which hasn’t been captured yet because it is hard to categorize and migrate. In this category, you find:

  • Risky content – confidential information such as employee or customer social security numbers or credit cards.
  • Vital content – content needed to run the organization, but intermingled with less important content.
  • Legal hold – content that should be preserved for litigation purposes.

 

Here are some costs for keeping files on your shared drives.

  • Staff productivity costs – information workers waste 3.5 hours each week on searches that don’t turn up the right information. This is partly due to poorly indexed and tagged documents, and partly due to the time required to search through up to three times as much stuff as they need to. Doing that with 1,000 staff at $60K per year = $5.25M.
  • Producing information for e-discovery – companies spend roughly $200 per GB for each e-discovery culling case. One case per day and 1 terabyte of storage = $73M.
  • Network storage costs are a logical and necessary expense, but small in comparison. Companies spend roughly $10 to $30 per gigabyte per month to save, backup, and restore content. 1 terabyte = $184,320

 

Recommendations: create an information management plan

Organizations need to establish an information management plan that not only incorporates business goals and current technologies but also identifies the unstructured information stored on shared drives:

  • The first step is to know what you have.
  • Don’t make it your goal to migrate all your content to a repository.
  • Provide a means for staff to continue to store all types of information, but not in the repository.
  • Consider records management, disaster recovery requirements, legal preservation, and IT concerns when cleaning up and migrating data.
  • Before you delete or migrate anything, make sure you have a good understanding of your legal preservation requirements. You don’t want to delete information without appropriate protections in place.


Brian Tuemmler is a director at Gimmal.

This article was originally published at http://www.aiim.org/Infonomics/ArticleView.aspx?ID=36572.