You are on page 1of 4

Cost elements of digital preservation http://www.leeds.ac.uk/cedars/documents/CIW01r.

html

Cost elements of digital preservation


Kelly Russell and Ellis Weinberger

draft of 31 May 2000

1 Introduction
Although not a great deal is known about the costs of preserving complex digital objects over time, there is an accepted or perceived
wisdom within the library community that it will be more expensive and more intensive than preservation of traditional library materials.
Although it may too early to make meaningful comparisons of the costs of digital vs traditional preservation, one thing is certain: the
costs of preservation of digital materials will be different than for other materials and will require resource commitments of a different
nature on an ongoing basis. The ongoing costs of digital preservation are also likely to span a more extended timeframe than traditional
preservation and it may be the case that different technical strategies will prescribe quite different costing timeframes and schedules. This
document will attempt to identify some of the main costs elements that libraries can expect to encounter when considering digital
preservation as part of their ongoing collection management function. It is divided into two parts: part one will provide an introduction
and overview of some of the general issues associated with the costs of digital preservation and part two will provide a detailed
breakdown of specific cost elements.

This paper makes use of a number of quite specific terms many of which are based on the Open Archival Information Systems reference
model. Some of these terms are defined in Annex A. For more detailed discussion of OAIS and its terminology please refer to the OAIS
reference manual.

1.1 A Timeframe for Digital Preservation

The costs of preservation always represent an ongoing commitment - whether for digital or traditional materials. However there is
growing realisation that the time between a object’s "creation" and its preservation is shrinking rapidly for digital materials. Preservation
will need to be addressed increasingly at the time of acquisition or even creation of the digital resource.

For these new digital materials it is not yet clear what commitment over the long term will mean for libraries. In part this will depend on
the archiving model in which the preservation occurs and how responsibility is allocated, the technical strategy chosen for preservation
and the type of access required. Regardless of these variations, digital preservation will require ongoing resources. It is important to
recognise that different technical strategies for preservation and for access have different cost timeframes. For example, if an archive
adopts a migration strategy which will move the digital object into current software, action (and therefore resources) will be necessary
each time a software upgrade occurs. By comparison if another type of migration strategy is adopted where materials on ingest into the
archive are migrated into standards formats then action (and therefore resource) to migrate that object will be required less frequently.

1.2 The lifecycle of a digital resource

Unlike other more traditional library materials, digital resources represent a continuum; where a book is published, put on a shelf to be
accessed and preservation occurs only when the object begins to deteriorate, digital materials are created only to require some sort of
ongoing "re-creation" (migration, refreshing onto new media etc.) in order to ensure access is preserved. For digital materials the link
between creation and preservation is much more important because decisions about the way a digital object is created influence how (or
indeed whether) it can be preserved. Likewise, decisions taken at the time of preservation can impact on how (or indeed whether) the
material can be accessed in future. Therefore the "costs" of preservation start at creation of the resource. In this sense the creation of a
digital object is the true starting point for digital preservation. For libraries involved in digitisation projects this means preservation of the
digital files must be considered when the project begins – and it must be budgeted for! Many other digital resources are created outside
the library however and work with publishers and other content creators will be critical to encourage the adoption of appropriate
standards and technologies which will help rather than hinder preservation.

1.3 Cost/Benefit

The costs of preserving digital materials need to be considered in light of the relative benefits. Digital preservation will inevitably be about
trade-offs. As with investments of any kind, what you put in tends to be reflected in what you get out. Decisions to save money could
compromise the completeness of the preservation. However enormous costs to preserve a complex digital object to which no one requests
access is also undesirable. This suggests a preservation strategy which is appropriate to the perceived value of the digital object.
However, the long-term value of digital materials can be difficult to determine – particularly when rapidly changing technology requires
decisions about long-term value before this has a chance to be proved through a period of use! Analysing the benefits of preservation is
inextricably linked with policies for selection of materials for archiving.

1.4 Selection of Material for Digital Preservation

In considering the issue of selection of materials it is important to consider both existing collection management policies within the
institution and an object’s suitability as part of the collection as well technical considerations to do with the specific digital object and its
requirements for continuing access. These will be considered separately below but it should be clear that they must be considered
together.

1.41 Collection Management Policy Issues

Preservation is part of a suite of activities associated with collection management including selection, organisation and access. As such for
all materials these functions impact on one another. However for digital materials, as has been suggested above, creation/acquisition and
preservation are inextricably linked and decisions about preserving materials for the long term should reflect selection policy for the
collection as a whole. If, for example, a library maintains a selection policy which describes areas of specialisation for the collection as a
whole it is this material which should be considered candidates for preservation – whether digital or not. However where no formal
selection policy exists for the whole collection, a good place to start *for some types* of objects is at the point of selection for digitisation.
A great deal of work is currently ongoing on selection criteria for digitisation which might be of use when considering the long-term value
of all types of digital content.

1.42 Technical Considerations

Preservation of digital material necessarily involves consideration of technical issues and what will be necessary to render the object from
bits and bites into a meaningful digital object. However there are a number of levels for consideration between a digital object’s bits and
bytes and the functionality and properties that make the digital object what it is to a user. For digital materials simply maintaining a
bytestream does not necessarily ensure the digital material will be preserved at a level acceptable to the archive and its users. "Access"
can be at a variety of levels for digital materials ranging from access to the full range of functionality and content to simply access the
`bare bones’ intellectual content.

The level at which a digital resource is archived and maintained will depend on value judgements made by the archivist. In the Cedars

1 de 4 10/4/2007 16:56
Cost elements of digital preservation http://www.leeds.ac.uk/cedars/documents/CIW01r.html

project this is called assessing a digital object’s "significant properties". Determining the significant properties of a digital object will
dictate the amount of information or "metadata" (including detailed technical metadata called "representation information") that must be
stored alongside the bytestream to ensure the object is accessible to that level. A digital object’s significant properties are not assumed to
be empirical; archives will make judgements at levels appropriate to fulfil their preservation responsibilities and meet the needs of the
archive's user communities. For example, in some cases archives will need to ensure exact replication of a digital object for legal
purposes. This require preservation of the object’s full functionality and will have significant associated costs. These costs need to be
weighed against the desirability/necessity of preserving the object. For digital materials the preservation of complex functionality may
prove considerably more costly than preservation of the basic intellectual content. In general, the more complex the digital object, the
more involved (and resource intensive) the digital preservation. The question that must be asked is whether the object’s perceived long
term value is worth the expense of preserving the `bells and whistles’.

One way of reducing costs of preservation is by encouraging the use of standards or system-independent file formats either on creation
(preferable) or on migration. Material created in this type of environment will require less preservation action to ensure access is
maintained over time.

1.5 Collaborative Approaches

It is unlikely that in the UK (or elsewhere) libraries will be able to rely on duplication of effort to ensure the preservation of digital
materials. The level of commitment, resource and expertise required to archive digital material will mean co-ordination across the library
sector will be critical. As with all things, there are economies of scale associated with a collaborative approach. Cooperative collection
management at the point of selection, acquisition and access to traditional library collections is already taking place across existing
consortia and proving very effective. For digital preservation, collaboration may be carried out at different stages or in relation to various
aspects of digital preservation and this might significantly reduce the costs for a single organisation. For example, collaboration might
take place for selection of materials, for copyright negotiations or for administration of the archives. There are a variety of options to be
explored. However, it should be noted that cost influences may differ (or not) depending on whether the collaboration is occurring
regionally, nationally, or internationally.

2 Cost Elements for Digital Preservation


The cost of preserving an object will depend on many factors. As suggested above there will be a multitude of other considerations which
will impact on how these costs are made manifest and it may be that any decision-making based on the following elements is best
expressed as a matrix.

The elements have been listed according to the order in which they will tend to occur within the collection manager’s workflow. The
elements below are those which are most closely associated with preservation of a digital object and ensuring the object remains
accessible over the long term. However it should be recognised that for digital materials it is not always easy (or even possible) to
separate costs of preservation from costs of access. It may be that an institution’s investment in technical infrastructure for providing
access to digital materials also supports a preservation function and in this sense the cost is shared across both preservation and access.
Likewise, costs for providing resource discovery and delivery of materials from the archive may also vary depending on the extent to
which the archive is integrated into existing collection management functions where access arrangements are shared across a range of
collections. While acknowledging that it is not always possible in practice to distinguish between preservation costs and costs for providing
access, this list of cost elements attempts to focus on preservation activities specifically.

1. Selecting a particular digital object for preservation.


It is likely that there will be two different representative groups involved in the selection of material for long-term preservation. These are
Collection Managers (e.g. archivists, subject specialists) and Systems Managers who will need to act in consultation with one another on
issues relating to the long-term retention of digital materials. The collection manager can provide advice about usage or about the relative
value of object to the overall collection. The systems manager can discuss the cost of specific technical issues such as required
conversion, migration or even emulation as well as the necessary technical metadata or representation information.

Selection decisions may be based on existing policy documents or, in some cases, taken on an object by object or collection by collection
basis. More time will be required if there is no existing policy for selection. There may be collaborative agreements (across consortia)
about preservation responsibility which in time, may make this less time-consuming (and therefore less costly), but can be very costly at
the outset. As mentioned above, when possibly this activity should reflect the library’s collection management policies (i.e. preservation
decisions may be made on acquisition of the material). There are a number of selection policies for digital materials available such as
Guidelines for the Selection of Online Australian Publications Intended for Preservation by the National Library of Australia and the
Berkeley Digital Library Sunsite.

2. Negotiating the right to preserve the object

This will include the time of the negotiator and the time of the person drafting and exchanging the agreements. This may also include
detailed consideration of the object to assess all the relevant rights holders including rights holders of software and underlying
technologies. It may be that in the case of some materials, the publisher does not own the rights for the underlying technology and this
will require separate negotiations. Based on the Cedars Project experience, this is likely to be a lengthy process.

3. Negotiating the right to provide access to the preserved object

There will also be time required for negotiating access arrangement for materials stored in the archive if end-users are to have short-term
or near-term access to archived materials. This may not apply to all archives. Like negotiations for preservation, negotiations for access
may require considerable time and expertise.

4. Determining the appropriate technical strategy for preservation and continuing access.

This will include the time taken to ensure the digital object is adequately prepared for archiving as well as the resources for agreeing on a
specific preservation strategy for continuing access (e.g. migration or emulation). This will requires detailed consideration of the digital
object to determine its Significant Properties and, based on this decision, determining the underlying technical requirements for
preservation. This element may include the cost of purchase or design of any software or hardware needed to prepare an object for
archiving. Resources will also be required to determined the best technical strategy for providing continuing access to material in the
archive – i.e. migration, emulation. This will be determined by agreement on an object’s significant properties. For example, an object

2 de 4 10/4/2007 16:56
Cost elements of digital preservation http://www.leeds.ac.uk/cedars/documents/CIW01r.html

which requires preservation of its "look and feel" may require the development or enhancement of emulation tools.

5. Validating the completeness of the object on delivery to the archive.

This will include the time taken to obtain any necessary documentation and the time spent checking the object received against
documentation received relating to the object. For many digital objects this may require significant human resource.

6. Producing Metadata.

This will include study of the documentation provided with the object and/or an inspection of the item itself and will draw upon
information gathered during technical preservation (element 3). Depending on how (or whether) the archiving function is integrated with
existing collection management activities, some metadata may be collected or incorporated from existing cataloging or other metadata
records. Development of appropriate representation information or detailed technical metadata should also be represented in the
preservation metadata and will require specific technical expertise. Metadata costs will also need to accommodate the gathering of rights
management information. See section above on the right to preserve and provide access to an object.

7. Storing files.

This will include maintenance and purchase of hardware, software, and transfer of files from generation to generation of storage media as
well as the periodic inspection of stored files and of the storage media itself. The creation of backup copies etc will also be included in this
element.

8. Administering the archive.

This will include the costs involved in following the developments in technology and law which will make a difference to preservation of
the object, and updating the archive. It may also include the costs of changing the archive system in accordance with changes in archive
policy. It should also include staff costs (salaries, overheads, training/retraining/skills upgrading), insurance, building overheads
(heat/light/air/security protection), certification/compliance etc.

Annex A Definitions of Terms

Action: any activity associated with preservation requiring resources

Resources: Funding commitment either in the form of direct payment or human time and expertise

Collection Manager: used broadly to mean librarian, archivist, subject specialist etc.

Systems Manager: used broadly to mean technical specialist

Preservation strategy: a digital preservation strategy is a particular technical approach to the preservation of digital materials. Broadly
speaking there are three main technical approaches to preserving digital materials: technology preservation, technology emulation and
data migration. The first two focus on the technology itself. In each of these, it is understood that, in order to preserve the functionality of
any digital resource, there must be a preservation action taken to preserve the technical environment which originally created and ran it.
Data migration strategies focus on the need to maintain the digital files in a format which is accessible using "current technology" and
require regular migration from one technical environment to a newer one. The appropriateness of a digital preservation strategy will be
determined by agreement on its "significant properties".

Significant Properties: Those technical characteristics agreed by the archive or by the collection manager to be most important for
preserving the digital object over time. For digital materials simply maintaining a bytestream does not necessarily ensure the digital
material will be preserved at a level acceptable to the archive and its users. A digital object’s significant properties are not assumed to be
empirical; archives will make judgements at levels appropriate to fulfil their preservation responsibilities and meet the needs of the
archive's user communities. For Cedars, it is the creation and maintenance of the detailed metadata associated with the object’s
significant properties which is the backbone of an archive’s preservation function.

Significant Properties: A simple example. If an archive takes deposit of a PDF electronic journal and decides
that the significant properties are only the text within the journal, there may be no need to store information about
the PDF environment but only to include information about retrieving (or rendering) an ASCII text file. These are
decisions that must be made by the collection manager or archivist (often in consultation with technicians over
what is possible and the associated costs).

Significant Properties: A more complex example. An electronic journal which is published via the web as HTML
. The "significant properties" are deemed to include the hypertext links (internal) as well as the multimedia
unctions (e.g. sound and video clips). It is at this level of functionality (full) that preservation will occur. Although
end-users currently access the journal in HTML, these pages are created on the fly from SGML. For archive
purposes the archive takes the SGML files. Therefore the information (or representation network) which is
developed includes technical descriptions of the objects including information about the systems and the software
necessary to run the video and sound as well as less complex information about retrieving the text and images.

Metadata for Preservation

The effective use of digital resources in an archive will rely on a robust system of resource description – for the purposes of resource
discovery, managing access and ensuring preservation of the resources. Metadata research and continues to generate interest
world-wide; to date, most activity has focused on metadata for resource discovery. However, there is increasing awareness that effective
digital archives will depend on the creation and storage of relevant descriptive information (metadata) required to support a chosen
preservation strategy (i.e. migration, emulation or technical preservation). This information will need to describe the data in detail
including file format, and software and hardware platforms. It may also contain information about rights management and access control.
Specifically preservation metadata will take two forms:

3 de 4 10/4/2007 16:56
Cost elements of digital preservation http://www.leeds.ac.uk/cedars/documents/CIW01r.html

descriptive information which includes general resource description as well as rights mangement information and
descriptions of actions taken for the purposes of preservation
representation information which maps the stored data into more meaningful concepts – ie systems information which
renders simple bits and bytes into a meaningful digital object. E.g. the ASCII definition which maps data (bits) into readable
symbols.

The Open Archival Information System Reference Model

The Open Archival Information Systems Reference Model has been developed by the Consultative Committee on Space Data to provide a
conceptual framework and reference tool for defining a digital archive. It describes a specific functional model of both people and systems
requirements for implementing a digital archive. The reference model could also be applied to a non-digital archive. The OAIS is
undergoing the ISO process and its publication as a standards is expected later this year and the Cedars project has provided a
demonstrator project based on it. The importance of OAIS to the archiving community is undeniable but its usefulness to research
libraries and archives largely unexplored. The NEDLIB project is also implementing the OAIS model within the context of the deposit of
electronic materials for archiving.

4 de 4 10/4/2007 16:56

You might also like