- Definition of data
- What is RDM?
- RDM workflow
- Oxford policy
- Funder requirements
- Frequently asked questions
The University of Oxford Policy on the Management of Research Data and Records uses the following definition for data:
“the recorded information (regardless of the form or the media in which they may exist) necessary to support or validate a research project’s observations, findings or outputs“.
In practice, the nature of research data can vary widely depending on discipline: it can be textual, numerical, qualitative, quantitative, final, preliminary, physical, digital or print.
- Research data management is a general term covering how you organize, structure, store, and care for the information used or generated during a research project. It includes:
- Planning how your data will be looked after – many funders now require data plans as part of applications
- How you deal with information on a day-to-day basis over the lifetime of a project
- What happens to data in the longer term – what you do with it after the project concludes
- Research data management is an issue that needs to be considered at various points in the research lifecycle – particularly during the planning stages, when you start the work, when you publish your research, and as you approach the end of a project.
The following sections provide concise advice about what to consider at each of these points, and where you can turn to for help.
I have an idea for a new research project
- Consider what technologies you will need to use. Discuss with your colleagues, departmental IT officer, the Bodleian Libraries, and IT Services Research Support team as necessary.
- Consider where you will store the data the project generates so that it can be securely shared between colleagues and collaborators.
- Consult your subject librarian regarding appropriate mechanisms for preserving your data outputs and facilitating continued use beyond the project.
- Complete a data management plan (DMP). This will help you think through the key issues relating to research data management and any costs involved. The DCC provides a handy online tool for building a plan.
- If applying for funding, confirm the project bid and budget with Research Services. Bear in mind that there is a charge for using ORA-Data.
I have authorization to begin my research project
I have published some of my research
- If there is data underpinning your publication, consider how you can make it available.
- Note that in some circumstances there will be good reason not to make your data fully open.
- Consult the requirements of your funder to determine data archiving and access expectations.
- A short guide to funding agency expectations is available on the DCC website – see also the Funder Requirements page on this site.
- The University Policy on the Management of Research Data and Records covers both funded and ‘unfunded’ research: .
- Create a record for your data in ORA-Data. This applies wherever you have deposited the actual data.
I have almost completed my research project
- If you have created data which may have commercial value (not already covered in the terms of the research contract), consult Research Services.
- If you have created data that may be useful to others, consider depositing it in an appropriate data repository (as above) and create a record for it in ORA-Data. ORA-Data acts as a central data catalogue which will feed information to global resource discovery service and help increase your research profile.
Data management is a key part of responsible research. Good practice in managing your data will ensure benefits ensue for you, your fellow researchers and the wider public.
- Funding and regulatory body requirements are met.
- Research data remains accurate, authentic, reliable and complete.
- Duplication of effort is kept to a minimum.
- Research data keeps its integrity and research results may be replicated.
- Data security is enhanced, thus minimising the risk of data loss.
- In July 2012, Council approved a new Policy on the Management of Research Data and Records. Some key messages:
- Research data is the information needed ‘to support or validate a research project’s observations, findings or outputs’
- Research data should be:
- Accurate, complete, identifiable, retrievable, and securely stored
- Able to be made available to others
- Research data should be retained for ‘as long as they are of continuing value to the researcher and the wider research community’ – but a minimum of three years
- Specific requirements from funders take precedence
- Researchers are responsible for:
- Developing and documenting clear data management procedures
- Planning for the ongoing custodianship of their data
- Ensuring that legal, ethical, and funding body requirements are met
- Policy applies to University staff and doctoral students
- Most funders have some form of policy regarding managing research data, from requiring data management plans at the proposal stage, through to expectations on depositing and sharing your data. The extent and detail of these policies can vary – click here for summary requirements and advice from some of Oxford’s main sponsors.
Is it ok simply to instruct people who want to look at my data to call me or send an email?
One of the principles behind funder expectations is that ‘funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner.’
A simple direction to interested parties to “contact the author” would not normally be considered sufficient. Decisions about data archiving, preservation and possible future sharing of data need to be made.
What is metadata?
In the context of research data management, ‘metadata’ is the contextual information about the data that will help others to find and understand it. This will usually include information such as ‘who created the data’, ‘what is the data about’, ‘are there any restrictions regarding who can use the data and in what circumstances’, and so forth. Different disciplines will generally find different metadata fields useful. Library catalogues are essentially catalogues of metadata.
In some disciplines research publications are often the richest source of information about how a particular dataset was derived, so it is important to link articles to data. In others it may be necessary to develop additional documentation about how the data was collected, organised and used.
If the ‘result’ in a paper is the demonstration of a novel piece of software, does the software count as data?
Probably not. If data has been generated as a result of running software code, then it may be helpful to provide a link to that code in the metadata, but the software itself would not constitute data in most situations.
If custom written software is used to process the data do instructions need to be provided on use of this software?
If the software is essential to validating the research findings then adequate information should be provided to enable its re-running by third parties. This may involve taking additional steps to preserve the software in addition to the data itself.
I produce computer code and/or simulations based on data supplied by other research groups. How do funder expectations affect me?
You may wish to preserve your code (and in some instances the environment in which it was run) in order to adhere to the general spirit of making research more widely available. In this instance consider asking the data creators to deposit their data in an appropriate repository that will assign a Digital Object Identifier (DOI) to the data, so that you can then reference it.
The Software Sustainability Institute may be of interest.
Will the University offer help and support to meet this requirement?
The University will support researchers in meeting funder (ESRC, EPSRC, MRC etc.) expectations or requirements via advice and guidance. It offers services and infrastructure to support various aspects of research data management.
Enquiries relating to research data management should be directed to email@example.com, where they will be reviewed and addressed by a cross-departmental team of staff from IT Services, the Bodleian Library, Research Services, and the Oxford eResearch Centre.
Information about software, services, and good practice is available on the rest of this website.
I need to complete an information security questionnaire in order to access and use a sensitive dataset. How should I go about completing the questionnaire?
Such questionnaires will usually seek reassurances about the security of the computing facilities and infrastructure that you will be using. There may be restrictions regarding where you can access the sensitive data, such as a requirement that you must be within the physical University department in which you are based (and therefore behind appropriate firewalls) or that you will only ever access the data from a specific computer terminal. If this is the case then your departmental IT support staff will probably need to answer many of the technical questions relating to infrastructure and storage, including the set-up of the computers, how they are maintained, and how they connect to the rest of the University and ultimately the outside world.
If you need to refer to the University Information Security (IS) Policy, it is available here. This also covers the University policy on the protection of confidential information, which you may be asked about separately. IS policies are written to be aligned with ISO 27001. Specifically, selected baseline controls are based upon the UCISA Information Security Toolkit which, in turn is based upon ISO 27002. Details regarding the University’s alignment with ISO standards are given in the IS policy above – specifically section “2. Aims and Commitments”. Baseline security standards based on ISO 27002 control-set can be found in the University’s own Information Security Toolkit.
Information about data encryption, passwords, email, mobile devices, and other general security issues is available from the InfoSec team at University of Oxford IT Services. Take a look at the InfoSec website – specific information about encryption is also available.
Further questions on this can still be sent to the RDO contact email, firstname.lastname@example.org – the team reviewing messages includes members of IT Services.
I have data that I need to deposit in an ‘appropriate’ data archive. How do I find such an archive?
An extensive directory of data repositories is available from re3data.org. These range from very generic commercially-provided repositories such as Figshare, to narrowly-defined subject-specific repositories. Generally speaking, it’s better to use a subject-specific repository than a generic one, as they will have staff that understand the data and can help curate it properly as time passes.
Some repositories request that the data they receive is accompanied by metadata (contextual information about the data) in a particular format, so it’s worth getting in touch with appropriate repositories before you get too far in to the data gathering process. It’s much easier to document your data as you gather it rather than leaving it until the end of a project – and documenting data during a project can also help you and any collaborators find relevant information more quickly whilst you are still working on it. Feel free to discuss this with your Subject Librarian or arrange to meet with one of the RDM support team by emailing email@example.com.
Unfortunately, subject-specific data repositories do not exist for many fields. If there is no appropriate disciplinary repositories for your data, you can meet funder requirements by depositing it in Oxford’s institutional data repository: ORA-Data.
Even if you deposit your data to an external data repository, you should still create a record for it in ORA-Data, so that the University can keep track of research outputs. This is increasingly expected by research funders, and can help with the assessment of impact.
Why should I deposit my data in ORA-Data rather than another data repository?
ORA-Data is the University of Oxford’s institutional data repository. This does not, however, mean that all research data you wish to preserve should go into ORA-Data. If there is a specialist data repository for your discipline, you should under normal circumstances deposit your research data there rather than in ORA-Data. You can find out more information about specialist data repositories from re3data.org.
You should ONLY deposit your data in ORA-Data if there is no more appropriate specialized data repository in your field. You should however create a record for your data in ORA-Data even if you deposit the data itself elsewhere. This helps the University know what and where it is in the event of an audit, but it also improves the visibility of your data, as ORA-Data records are indexed by search engines such as Google.
You may wish to deposit your data in a general data repository such as Dryad or Zenodo, or a commercially-provided alternative such as FigShare. These may be convenient, but there are reasons why these are not advised as alternatives to specialist repositories or ORA-Data for data underlying published research conclusions:
- ORA-Data, and to some degree specialist repositories, are likely to have better longevity than free generalist (and/or) commercial repositories.
- ORA-Data and specialist repositories are likely to be able to offer a better level of post-deposit curation in the future than free services (things like format migrations and integrity checking).
- ORA-Data and (most) specialist repositories include a metadata review to ensure that minimum standards are met.
- With ORA-Data the data will be held within Oxford, so there are no concerns about legal jurisdictions; specialist repositories may be based outside of the UK, but usually make their terms and conditions very clear.
- Commercial and free services may lack strong long-term business models. Check their terms and conditions to see what will happen to your data should things turn sour.
Finally, some journal publishers accept data deposits alongside the articles they publish. If depositing in a publisher’s data repository, check that the terms and conditions meet your funder’s minimum expectations, and create a record in ORA-Data.
For more detailed advice, see the Archives and Other Options page.
Can I use ORA-Data at any time, or only at the end of a project?
You can deposit data into ORA-Data or create records for data in ORA-Data at any point in a project. Some funders such as the EPSRC indicate that a record describing your research data should normally be made available within 12 months of the data being generated, even if access to the dataset itself is restricted.
Remember that if you need to pay to deposit the data your project has produced, you will need to complete the deposit process and payment before your grant expires.
Can I deposit my data in free archives such as Figshare, Zenodo, or Dryad?
Yes, but… Be careful that the repository can meet the terms and conditions of your funder, and remember to add a record in ORA-Data.
Generalist and commercial services such as Zenodo and Figshare are usually free, they offer the immediate assignment of a Digital Object Identifier on deposit, and may come with visualization tools.
They may also come with significant risks, however, particularly with regards to their longevity. Free services in particular should be handled with caution – what measures do they have in place to guarantee that they and your data will still be around in three years? In ten? Check the terms and conditions to see what will happen to your data if the service disappears.
The data I’m producing needs to remain private, so ideas about archiving and sharing don’t apply to me, right?
Funders expect data be be archived whenever possible. This does not necessarily mean it will always be shared publicly or without restrictions. In such cases it is recommended you produce a publicly-accessible record of your research data in ORA-Data.The record for the data should indicate why access to the data is restricted or not possible. For example, Expectation vi from the EPSRC guidelines on archiving and sharing data states: “Where access to the data is restricted the published metadata should also give the reason and summarize the conditions which must be satisfied for access to be granted. For example ‘commercially confidential’ data, in which a business organization has a legitimate interest, might be made available to others subject to a suitable legally enforceable non-disclosure agreement.”
If you believe that even publishing a record about your data would be problematic (e.g. it might expose matters of national security), please contact firstname.lastname@example.org.
I need to include a Digital Object Identifier (DOI) for my data in my article submission. How and when do I get one?
Most data repositories will assign a unique identifier, most commonly a Digital Object Identifier, when you deposit you data with them.
I’m putting together a project bid and need to complete a data management plan. How do I go about doing this?
Most of the major funding bodies provide a data management plan (DMP) template as well as guidance for completing the plan. You can find summary information about funders’ DMP requirements on the Funder Requirements page and the Digital Curation Centre website. Before going any further, it’s also worth visiting your funder’s own website to ensure you are referring to the latest versions of the template and guidelines where available.
The Digital Curation Centre offers a useful web service for completing DMPs: DMPonline. This presents the technical plan template as an online form, alongside guidance and tips for completing each entry from both the funder and the University (where such advice is available).
If you would like more detailed advice as to what to include in a DMP, arrange to meet with one of the RDM support team by writing to email@example.com.
I would like to store my data securely within the University. What are my options?
This applies to data in current use rather than the archiving of data as a project concludes or after it’s finished. Data storage in Oxford is usually provided at the departmental level. Ask your departmental IT Officer if they have any server space that you or your research group could use. Failing that, the NSMS team at IT Services offer managed server space for a fee. See the NSMS website for further information: . Some departments outsource their data storage to NSMS, but they may spare you from the cost if you will not be generating a large amount of data yourself.
Some of the tools and services provided by the University for researchers include space for data and document storage. The SharePoint service, for instance, provides 25 GB. Further information on this may be found at . A list of other University services that come with data storage space is also available.
IT Services provides a University-wide, centrally-funded, backup and long-term file storage service for staff and postgraduate students – known as the HFS (Hierarchical File Server). It provides an automated backup service for personal computers and servers alike.
The University is furthermore undertaking a project to scope and provide dedicated storage for research data. This is due to conclude in late 2015/2016.
How much detail is required when documenting data?
There is a certain minimum amount of information about your data (metadata) that is required to ensure it can be properly cited: the names of those responsible for creating the data; a title; a publisher; and the publication year. This will generally be required during the data deposit process, along with information about how and why the data was generated. Most data repositories have a form to fill out when depositing data, and it is a good idea to see what information they ask for before you get too far into a project.
It is always sensible to add some documentation to a dataset whilst you are still working on it – explain abbreviations, and add notes about data that seem odd or which may cause confusion to those not involved in generating the data. Not only will this assist researchers who may wish to look at the data in future, but it will also help you and other members of your team to understand it if you need to revisit it yourself in a year or two’s time.
The data upon which I based my analysis was not generated by me or my team. What should I do?
You do not need to deposit existing data or that belonging to a third party unless you have materially altered or added to it. If you have substantially altered the data – and this includes restructuring it so as to enable analysis – then whether you can or should deposit it will depend largely on the intellectual property (IP) rights invested in the data and the licence it was published under (if applicable). Unless it is clear that you have the right to publish the data in its modified form, email firstname.lastname@example.org for further advice.