The proliferation of digital content continues
unabated. A recent IDC study estimates
that “…from 2005 to 2020, the digital universe will grow by a factor of 300,
from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200
gigabytes for every man, woman, and child in 2020). From now until 2020, the
digital universe will about double every two years…” At the same time, the
level of investment in IT infrastructure and services to manage such an
unprecedented growth in digital content is anticipated to grow by only 40%. A report by Oxford Economics titled ”The New Digital Economy” estimates that in
2013 the “total size of the digital economy is about $20.4 trillion, equivalent
to roughly 13.8% of all sales flowing thought the world economy…”.
Given the accelerated growth of the digital economy a
widening gap between unmanaged growth of digital content and investments
necessary to harness it is becoming unsustainable and is creating significant
challenges for both private and public sector organizations. Such challenges span security, privacy and
operational risks. The IDC noted that “ much
of the digital universe is unprotected” to meet increasingly more complex data privacy
regimes. The cost to organizations to remedy data breaches and comply with
e-discovery requests can be prohibitive and it may also damage organizational reputation,
brand and competitive advantages. It is estimated that on a per record basis
the cost of remedying a data breach is $200 and the cost of collecting,
reviewing and producing documents pursuant to an e-discovery request can be in
the millions of dollars, particularly in highly litigated industries.
While most organizations have well defined content
lifecycle, records management systems and policies in place they continue to
lack clear insight to what content they have accumulated over time. A
particular challenge is managing legacy data in file systems, older versions of
document repositories and email systems. A recent AIIM survey found that 61% of
survey respondents indicated that “organizational assets are not leveraged to
maximum effect” and 46% “consider that storage media and IT infrastructure will
be swamped with uncontrolled content if no actions are taken…”A study by Haystac
Associates, a software and services company focused on information governance
best practices found that “most organizations don’t know where their all their
data is and lack tools to systematically filter it. The amount of time spent on searching for
content is estimated at 24% which may be considerably reduced if the data is
properly cleansed, organized and well identified. Understanding where the data
is located is a necessary starting point for a digital landfill clean-up…”
The Haystac analysis is particularly instructive in
that it provides a systematic foundation for the content inventory and cleanup
process that begins with a content identification phase using
advanced tools to crawl, index and classify content repositories against
organizational taxonomies that may be based on subject, function, hybrid or
faceted classification schemes. The
second phase of a digital landfill cleanup project is the content analysis phase
the objective of which is to determine the value and relevance of documents identified
in the initial content classification phase.
The analysis may encompass a number of variables such as the age of the
document, the organizational value of the document, the authors who created the
document and for what purpose, the application in which the document was
created (this is particularly relevant from the perspective of long term
preservation and longevity standards), the version level, how many versions,
business and archival value consistent with organizational retention and
archival policies. The third and final
phase of the digital landfill project is the content cleanup phase the
objective of which is a determination of what should be kept, what should be
retained because of its business value, what should be migrated to a system of
record as part of a managed repository and what should be preserved and
archived in compliance with record retention and archival policies and
regulatory mandates. The content clean
up phase outcomes may be illustrated in the following diagram:
Often organizations tend to focus on the go forward
strategy for harnessing the value of their knowledge assets and defer the tough
decisions relating to how to address their legacy content, their digital
landfill. The confluence of IT
consolidation, budget cutbacks and the changing composition of the workforce
necessitates that content inventory and cleanup ought to be much higher on the
IT/IM project priorities. Investments in
content clean up initiatives far outweigh the downstream costs associated with
the continued growth of unmanaged content repositories.