In the past five years, the telecom industry has seen an explosion in file
based storage. This growth has come from an increasing subscriber base and
launch of database services.
The growth in file based storage has fueled the need for better management
approaches, including the use of migration tools to drive better placement of
data. In concert with the growth in file based storage is the need for data
custodians and legal and compliance professionals to more quickly retrieve and
move data. However, the lack of content-aware file services has made content
based search and migration of corporate data challenging.
At a simplistic level, file services can be viewed as the storage
infrastructure serving unstructured or file based data. However, a more
comprehensive look at file services includes more advanced functions performed
on file data, including virtualization, archiving, and migration to help manage
unstructured file data.
However, there is a growing need for file services to leverage content
services that enable content based indexing, classification, and search for file
based data. Business, legal, and regulatory demands are driving the need for
integration of file and content services. File based storage infrastructure must
become more cognizant of the relevance of data it stores based upon a topic,
keyword, customer, or custodian and enable content-triggered migrations in
support of legal holds and retention policies. Moreover, telecom operators need
to perform federated searches based upon content across systems and
applications.
Situation Overview
The past five years have seen an explosion in file based storage to
accompany large investments made in block storage infrastructure. Traditionally,
block based storage is suited for highly random I/O environments such as
structured databases and applications supporting transactions. A long-standing
architecture for mission-critical applications, block based storage
architectures are fast and efficient and provide high levels of reliability and
availability with features such as provisioning, virtualization, replication,
and migration. File based storage makes use of standard network protocols such
as CIFS and NFS over IP. When an application sends a request to a file based
storage system, it presents a file. File based storage services provide
functions such as file organization, sharing, virtualization, replication,
migration, archiving, etc.
A challenge with file based storage and block based storage is that they lack
any awareness of the content within the data they store. Block storage deals at
the byte level. File storage deals at the file system level. Neither approach
understands the value of this data to the organization. Firms are facing an
increasing need for storage systems to be more knowledgeable about the content
of the data. This knowledge can result in more intelligent content-centric
policies for migration, search, preservation, retention, and disposition. It is
no longer enough for storage systems to provide only the right levels of
performance, reliability, and availability of an application. Storage systems
need to provide content services such as indexing, classification, and search.
File-level services such as migration can benefit from a full-content indexed
repository and call upon it for policy based management and migration, federated
search, legal holds, retention, classification, and storage tiering based on
content-triggered rules.
The telecom industry has taken some steps towards becoming more content
aware. There are software solutions that allow data to be classified and
indexed. There are hardware or appliance solutions that have built-in
classification, but lack native indexing and search functions. These tools are
components of the desired result, but require manual integration and management
of disparate technologies.
File and Content Services
There are several ways in which a telecom operator as well as enterprises
can address the file and content services:
Software based solution: Applying third party software solutions to achieve
content services allow data to be classified, indexed, and searched. Leveraging
a third party application gives a firm the flexibility to select the
best-of-the-breed solution to meet its specific environment and application
workload. For example, specialized content services solutions are tuned for
indexing and search of specific file types such as audio files.
This is a workable solution that comes with a few challenges. Having the
content services delivered separately from the storage platform requires manual
integration and intervention on the part of the technical team to manage the
storage resources and enable features such as content based archiving or
migration.
Any action that must be taken as a result of content services relies on the
hardware or other third party software to execute it. The result is a disconnect
between information driving the action and the action itself.
Hardware based solution: Another approach is to leverage a file mover
API native to the file based storage system. API provides window into the
organization of files within the file based storage system, but requires
integration with a third-party application for policy controls and data
movement. The third party application provides the policy engine for scheduling
and moving data based on the policy between different tiers of storage.
Integrated solution: A third approach to integrating file and content
services is to allow file level services to leverage content services in the
storage infrastructure. Integration between a file migration service and content
service search result allows content-triggered migration of data between
higher-performance file and active archive storage tiers.
Additionally, centralized federated search across different storage tiers,
via centralized content service satisfies business requirements for legal and
regulatory investigations. However, not all vendors have integrated file and
content services. Critical questions in determining the scope of integration of
file and content services include:
Performance: Is migration directly from the file tier to a content
tier supported without requiring processing involvement from third party data
movement software?
Cost: Is an additionally priced third party application providing the
policy engine for triggering migration required, or is migration natively
provided?
Maintenance: If a third party application is required, does it support
the storage vendor APIs?
Content awareness: Can a migration policy be triggered based on both
content or keyword search results and metadata attributes?
Chain of custody: If a third party application is required, what is
the impact during storage migrations, and are third party chain of custody
certifications required?
Transparency: Is the migration from one tier to another transparent to
the client and expected access paths? Is the back-up application migration
aware, or is a recall of the migrated files necessary during routine back-up
processes?
Find-ability: How easy is it to find content based on keywords across
different file storage tiers, including an active archive tier? Are multiple
searches required by application, or is a federated search available.
Akhilesh Shukla
akhileshs@cybermedia.co.in