What do we mean by a data archive?
A place to securely hold digital research materials (data) of any sort along with documentation that helps explain what they are and how to use them (metadata). The application of consistent archiving policies, preservation techniques and discovery tools, further increases the long term availability and usefulness of the data. This is the main difference between storage and archiving of data. ORA-Data is the University of Oxford’s research data archive.
An archive is for stable (completed) versions of the data not a research workspace. Once data are deposited, they remain in that state and are attributable via a persistent identifier. Active (‘live’) data that are constantly being worked on should not be deposited in an archive. Stable snapshots of longitudinal data can be deposited.
Why should I use a data archive?
This is an option when you have chosen to store your data in an actively curated environment for a significant period, and disseminate details of your data. An external archive can also be a solution when your funder or journal publisher encourages you to make your data accessible or link it to publications. Some journals, particularly bio-sciences suggest named public archives.
Is there already an archive appropriate for your subject area?
Some disciplines are well served by established and well known data archives. Examples include the UK Data Archive, Dryad, GenBank, EMBL-EBI, and the Natural Environment Research Council (NERC) Open Research Archive. Deposit in some of these archives is dependent on funding body or publisher. Over a thousand specialist archives are listed in the Re3Data registry of research data repositories. Some Oxford departments also have well established data stores that have served their research group for a significant time. However, the fact remains that some disciplines do not have obvious locations for archiving data.
ORA-Data as an Oxford University data archive
As well as being a catalogue of data produced by Oxford researchers, ORA-Data is a data archiving service provided by the University of Oxford. ORA-Data accepts data from any discipline and especially data that underpins a publication. It also accepts datasets that must be deposited to comply with a funder’s policy but where there is no suitable archive.
The aim therefore is not to hold all research data produced by Oxford researchers and ORA-Data will co-exist with established archives.
ORA-Data preserves stable versions of data and can assign a DOI to collections if necessary making them citable. Each collection has a freely available online record so the data can be found. The metadata contained in these records complies with international standards. Often there are other means of storing data open to researchers but these may lack some of the benefits of depositing with an archive.
Oxford University IT Services’ HFS Back-up and Archiving Service
IT Services offers two services based on its HFS (Hierarchical File Server) infrastructure. One of these is a back-up service to protect against data loss. Researchers can back up their computers either manually or on a scheduled basis and restore files or folders in the event of disaster. The other service is a long-standing private archiving service, although this is quite different from ORA Data and the two should not be confused. The HFS Archiving service enables researchers to save data to, and restore it from, the HFS. Unlike datasets designated as open access in ORA Data, only the individual or research group that deposited the data can access it again in the future. Minimal contextual information is stored, there is no means to search the archive, and data held therein cannot be referenced.
Personal storage on an external hard drive
An external hard drive is inexpensive but should only be seen as a temporary or short-term storage option. The difference between storing your data on a drive and in a reputable archive is that:
- A drive does not benefit from multiple backups in multiple locations
- Preservation and curation actions will not be carried out to ensure continued accessibility of content over time
- Using a drive places all the responsibility on the drive owner to do it properly
- The data creator suffers the consequences of it not being assigned a DOI
- The data creator suffers the consequences of it not being discoverable
- All hard drives fail eventually
This should be seen as an online equivalent of hard drive storage rather than as alternative to archival preservation. Pricing and convenience of use can be attractive but terms and conditions of each provider should be examined in detail. It is common for such services to clearly state they accept no liability for security breaches or data loss. In addition most Cloud Storage services do not offer appropriate citation or access options.
Criteria for selecting a data archive
1. Long term accessibility?
A data archive should agree to store your data for a significant period of time. It should also undertake to ensure data will remain findable and accessible for this period. It should give details of scenarios and procedures where data will be removed or deleted. Check the policies and terms and conditions of the archive carefully to ensure that it will retain your data for as long as you require, or at least give you sufficient notice of removal.
See for example “Company may terminate your access to all or any part of the Service at any time, with or without cause, with or without notice, effective immediately, which may result in the forfeiture and destruction of all information associated with your account, including User Submissions” and “Company reserves the right to … discontinue the Service (including the availability of any feature, database, or content) at any time by posting a notice on the Site, on or through the Service, or by sending an email to the email address associated with your account.” [Figshare http://figshare.com/terms]
2. A record for the data
Two of the minimum requirements for datasets, required by funders and journal publishers, is that the dataset can be cited and found. To this end, the dataset should have a persistent, meaningful and discoverable record. The metadata describing the dataset should be compliant with common standards. This is likely to include the Datacite minimum metadata set for data citation. ORA-Data enables data creators to assign rich metadata to their dataset that allows them to comply with funder and publisher requirements as well as to receive credit and acknowledgement for their data.
3. Digital Object Identifiers (DOIs)
It is rapidly becoming the norm that datasets require a DOI (the digital equivalent of an ISBN) so they can be cited and found. Some major funders recommend (although don’t require) that a DOI is used for the unique identifier. Does your selected archive assign DOIs to deposited datasets?
4. Meeting funder’s requirements
Does the archive enable you to meet your funder’s requirements? The requirements might include elements such as:
- Taking active steps to preserve the data for sharing
- Ensuring access for a given period (eg 10 years after last access)
- Assignment of sufficient metadata to describe and locate the data
- Assignment of a DOI or other unique identifier
The pros and cons of archiving, along with alternative solutions, are illustrated in the following table:
|Long-term preservation||Secure and multiple location backups||Documentation to support data||Data Discoverability||Citation (inc DOI)||Online access to data|
|IT Services HFS||✘
Access lost if you relocate to another institution
|✔||Unlikely and not structured to common standards||✘||✘||✘|
|Department or other local group storage||Will require investigation and confirmation||✔||Unlikely and not structured to common standards||✘||✔
(if linked to record in ORA-Data)
|Will require investigation and confirmation by you|
|Subject or National Archive e.g. NERC/ UKDA||✔||✔||✔||✔||✔
|Journal archive/ Dryad||Not guaranteed. Reliant on 3rd party terms and conditions||✔||✔||✔||✔
|Personal hard drive/Memory stick/Flash drive||✘||✘||✘||✘||✘||✘|
|Cloud Based Services||Not guaranteed. Reliant on 3rd party terms and conditions||Not guaranteed. Reliant on 3rd party terms and conditions||Some but generally not comprehensive||Limited||Some provide a DOI but not all||✔|
The Digital Curation Centre’s guide Where to keep research data also offers some helpful guidelines for evaluating data repositories.