Make your data count

Frequently asked questions from Oxford researchers

Return to main EPSRC Data Requirements page

General questions
Details of the EPSRC expectations
What do I need to preserve?
Software and computer code
ORA-Data and other archives
How do I…?

General questions

  • Why do I need to manage my research data?

    The EPSRC have listed the benefits that they anticipate will arise as a result of the implementation of their expectations at https://www.epsrc.ac.uk/about/standards/researchdata/scope/.

    In recent years there has been a significant drive by research funders and in some cases researchers themselves to encourage greater openness with research data. This has been in response to concerns about the non-reproducibility of research and the potential for malpractice, but also in part to facilitate data reuse and aggregation. Another driver is the principle, as defined by Research Councils UK, that, ‘publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.’ (http://www.rcuk.ac.uk/research/datapolicy/)

    In order to enable greater openness, research data needs to be discoverable, accessible, and described in such a way that it is intelligible to others. It also needs to be preserved and curated for the long term (where there is value in doing so).

    Besides benefiting the broader research community, data that is well managed is also likely to have more immediate benefits to the research group that created it and their collaborators. By storing, documenting, and preserving data efficiently, it should be easier for researchers to find the data they need when they need it.

    If you are interested in learning more about the motivating factors behind the drive towards research data management, take a look at:

  • What ‘research data’ am I expected to preserve?

    The question of what research data needs to be preserved beyond the end of a research project can be difficult to address. The University Policy defines such data as

    The recorded information (regardless of the form or the media in which they may exist) necessary to support or validate a research project’s observations, findings or outputs.

    That is to say that only such data as underpins a published research output needs to be preserved. This includes the numbers behind any charts or graphs in the publication and the data required to justify quantitative statements.

    There is no strict requirement for a researcher or research group to preserve anything in addition to this, although in some instances it may be beneficial to preserve additional data if that data can potentially be re-used by other researchers, not least because this tends to lead to increased citations and may open up possibilities for future collaboration.

    In some disciplines data may undergo a number of processes and refinements before any research outputs may be produced based on that data. In such situations, it would normally only be the ‘finished’ data that needs to be preserved in order to validate a project’s outputs. It is unlikely that ‘raw’ data will need to be preserved.

  • How much detail is required when documenting data?

    There is a certain minimum amount of information about your data (metadata) that is required to ensure it can be properly cited: the names of those responsible for creating the data; a title; a publisher; and the publication year. This will generally be required during the data deposit process, along with information about how and why the data was generated. Most data repositories have a form to fill out when depositing data, and it is a good idea to see what information they ask for before you get too far into a project.

    It is always sensible to add some documentation to a dataset whilst you are still working on it – explain abbreviations, and add notes about data that seem odd or which may cause confusion to those not involved in generating the data. Not only will this assist researchers who may wish to look at the data in future, but it will also help you and other members of your team to understand it if you need to revisit it yourself in a year or two’s time.

  • What is metadata?

    In the context of research data management, ‘metadata’ is the contextual information about the data that will help others to find and understand it. This will usually include information such as ‘who created the data’, ‘what is the data about’, ‘are there any restrictions regarding who can use the data and in what circumstances’, and so forth. Different disciplines will generally find different metadata fields useful. Library catalogues are essentially catalogues of metadata.

    Research publications are often the richest source of information about how a particular dataset was derived, so it is important to link articles to data.

  • Are the other UK Research Councils planning to introduce similar rules to the EPSRC with regards to data preservation and sharing?

    Most already have similar rules in place. Please refer to http://researchdata.ox.ac.uk/funder-requirements/ to find out more.

    The University has been making a specific effort to raise awareness of the EPSRC requirements because of the May 2015 deadline and the fact that it is research institutions that are being held responsible for ensuring data is preserved and documented.

  • Will the University offer help and support to meet these requirements?

    The University will support researchers in meeting the EPSRC expectations via advice and guidance, and by offering services and infrastructure to support various aspects of research data management.

    Enquiries relating to research data management should be directed to researchdata@ox.ac.uk, where they will be reviewed and addressed by a cross-departmental team of staff from IT Services, the Bodleian Library, Research Services, and the Oxford eResearch Centre.

    Information about software, services, and good practice is available from http://researchdata.ox.ac.uk/.

Details of the EPSRC expectations

  • When do these new rules start? Is it EPSRC grants awarded after 1 May 2015, or any EPSRC-funded work after that date?

    The EPSRC rules apply to EPSRC-funded research published after 1 May 2015, even if the grant was awarded before that date.

    The Clarifications of EPSRC Expectations on Research Data Management state that ‘If the data directly supports research findings published after 1st May 2015 the expectation will be met if the published findings contain appropriate data citations … In such cases the cited data/supporting documentation is expected to be accessible online no later than the date of first online publication of the article’.

  • EPSRC requires all published papers to include a statement about how the underlying data may be accessed. What should this look like?

    The data statement is essentially a citation of the data underpinning your paper. There is no particular format that data statements need to comply with, nor is any guidance provided by the EPSRC as to where in a paper the statement should appear. The paper abstract, references section, or acknowledgements of funding would all be appropriate places in which to include the data statement, assuming the publisher does not have a specific policy.

    The statement needs to include ‘how and on what terms any supporting research data may be accessed’. In practice, this should include a permanent link to the data, or, if the data itself is not publicly available, a link to the catalogue record for the data and a brief summary of access restrictions.

    Examples might include:

    ‘All data underpinning this paper are available via the University of Oxford data repository: http://dx.doi.org/10.5287/bodleian:dr26xx55s.’

    ‘All data supporting this study are provided as supplementary information accompanying this paper: [include a URL if there is one].’

    ‘Due to agreements signed with commercial partners, access to supporting data is restricted. Further information about the data and conditions for access are available via the University of Oxford’s institutional data repository, ORA-Data: http://dx.doi.org/10.5287/bodleian:dr26xx55s’.

  • The data I’m producing needs to remain private. Does this mean the EPSRC expectations don’t apply to me?

    Even if there are valid reasons why your data cannot be shared publicly, you still need to produce a publicly-accessible record of your research data in ORA-Data, and you still need to include a statement in publications based on the restricted data explaining that access is restricted and mentioning any exceptions. The public record for the data should indicate why access to the data is restricted. Expectation vi states: “Where access to the data is restricted the published metadata should also give the reason and summarize the conditions which must be satisfied for access to be granted. For example ‘commercially confidential’ data, in which a business organization has a legitimate interest, might be made available to others subject to a suitable legally enforceable non-disclosure agreement.”

    If you believe that even publishing a record about your data would be problematic (e.g. it might expose matters of national security), please contact researchdata@ox.ac.uk.

  • Is it OK simply to instruct people who want to look at my data to call me or send an email?

    In most cases this is unlikely to be sufficient.

    One of the principles behind the EPSRC expectations is that ‘EPSRC-funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner.’

    The Clarifications of EPSRC Expectations on Research Data Management go on to state that ‘If compelling legal or ethical reasons exist to protect access to the data these should be noted in the statement included in the published research paper. A simple direction to interested parties to “contact the author” would not normally be considered sufficient.’

  • All of the data underpinning our publication is available in the supplementary information that the journal provides – does that mean we’re already EPSRC compliant?

    Provided that the data in the supplementary information is publicly available (not just to subscribers) and the journal can provide 10 years plus of preservation and access, then you are half-way there. You will still need to create a record for the data in ORA-Data, however.

  • I am supposed to deposit data and create records for data underpinning published research. At what point is research considered ‘published’?

    It is sensible to submit the data underpinning an article and complete a record (or records) for that data either upon submission or at the point of acceptance. This is because you will need to include a statement in the article about how and on what terms the underlying data may be accessed, ideally including a digital object identifier (issued at the point of data deposit) and this is likely to be your last opportunity to get this in the journal-ready article. It is also likely to be easier to find and document the data at or close to the time of submission.

    If you deposit data underpinning an article that is subsequently rejected, you may request that the data is removed and that the corresponding record in ORA-Data is edited to indicate that data is not publicly accessible.

    If the data has been submitted as supplementary information to the publisher who has subsequently rejected the paper, the corresponding record in ORA-Data can be ‘hidden’ so that it does not appear in searches. Please email ora@bodleian.ox.ac.uk in such an instance.

  • I’m publishing an EPSRC-funded research paper, but it’s primarily theoretical – no data was generated during the research. How should I indicate this in the research statement?

    The EPSRC acknowledge in the clarifications to their expectations ‘that not all research papers are supported by research data, and will therefore rely on researchers making informed judgements about when it is appropriate to include such a statement’. If you wish to make it clear to the EPSRC that you are aware of their requirements but that they do not apply to a particular publication, simply indicate this in the paper where you credit their funding. There is no specific formula to use, although something along the lines of ‘This paper complies with EPSRC requirements on data management. The research described here is not based upon data generated by the authors in the course of EPSRC-funded research’ should do.

  • The data upon which I based my analysis was not generated by me or my team. What should I do?

    You do not need to deposit existing data or that belonging to a third party unless you have materially altered or added to it. If you have substantially altered the data – and this includes restructuring it so as to enable analysis – then whether you can or should deposit it will depend largely on the intellectual property (IP) rights invested in the data and the licence it was published under (if applicable). Unless it is clear that you have the right to publish the data in its modified form, email researchdata@ox.ac.uk for further advice.

What do I need to preserve?

  • What ‘research data’ am I expected to preserve?

    The question of what research data needs to be preserved beyond the end of a research project can be difficult to address. The University Policy defines such data as

    The recorded information (regardless of the form or the media in which they may exist) necessary to support or validate a research project’s observations, findings or outputs.

    That is to say that only such data as underpins a published research output needs to be preserved. This includes the numbers behind any charts or graphs in the publication and the data required to justify quantitative statements.

    There is no strict requirement for a researcher or research group to preserve anything in addition to this, although in some instances it may be beneficial to preserve additional data if that data can potentially be re-used by other researchers, not least because this tends to lead to increased citations and may open up possibilities for future collaboration.

    In some disciplines data may undergo a number of processes and refinements before any research outputs may be produced based on that data. In such situations, it would normally only be the ‘finished’ data that needs to be preserved in order to validate a project’s outputs. It is unlikely that ‘raw’ data will need to be preserved.

  • I have thousands of datasets – surely I can’t be expected to document and deposit all of them!

    Almost all data repositories, including ORA-Data, allow you to deposit multiple files as a single ‘object’. Provided that a single description can reasonably describe a collection of datasets, it is fine to deposit those data together. And remember: you are only obliged to deposit data that underpins published research.

  • Do the EPSRC data expectations apply to doctoral theses?

    The EPSRC expectations do not specifically mention doctoral theses, but there is nothing to suggest they are exempt. It is therefore advised that at the very least data underpinning a doctoral thesis should be made available in the same manner as data underpinning other research publications. Remember that data can be embargoed to provide the author with a ‘limited period of privileged access to the data they collect to allow them to work on and publish their results. The length of this period will depend on the scientific discipline and the nature of the research’. In other words, you do not need to make your data immediately available on the submission of your thesis, although you should still create a record (or records) describing the data in ORA-Data.

  • Do the EPSRC expectations apply to conference papers?

    Yes, if those conference papers are published. Not if they are merely being presented at a conference. There are substantial disciplinary differences regarding the status of conference papers, so if this distinction is not as clear-cut as it might seem in your field, contact researchdata@ox.ac.uk.

  • Do the EPSRC expectations apply to data generated as part of research training at Doctoral Training Centres?

    The EPSRC expectations apply to data produced at Doctoral Training Centres, but remember that there is no obligation to preserve and record data that neither underpins published research nor is deemed to be of sufficient value to be worth preserving independently for future re-use.

Software and computer code

  • If the ‘result’ in a paper is the demonstration of a novel piece of software, does the software count as data?

    Probably not. If data has been generated as a result of running software code, then it may be helpful to provide a link to that code in the metadata, but the software itself would not constitute data in most situations.

  • If custom written software is used to process the data do instructions need to be provided on use of this software?

    If the software is essential to validating the research findings then adequate information should be provided to enable its re-running by third parties. This may involve taking additional steps to preserve the software in addition to the data itself. However, note the following clarification from the EPSRC:

    “It is accepted that there may be cases in which it may not be possible or cost effective to preserve research data. This will depend on the type and scale of the data, their role in validating published results, and their predicted long term usefulness for further research. For example, in the case of simulated data or outputs of models, it may be more effective to preserve the means to recreate the data by preserving the generating code and environment, rather than preserving the data themselves. Provided that the ability to validate published research findings is not fundamentally compromised, a deliberate decision to dispose of research data at an appropriate time is acceptable in these cases.”

  • I produce computer code and/or simulations based on data supplied by other research groups. How do the EPSRC expectations affect me?

    The EPSRC expectations say very little about code. The only mention of code in the Clarifications of EPSRC Expectations is in the context of when it may be appropriate to not preserve the data generated by a project because that data can be more easily regenerated by re-running code [Clarifications to expectation vii].

    If your code does not generate data, but rather consumes data produced by others, it is reasonable to assume that there is no expectation on you to document and preserve the code as part of the EPSRC Policy Framework on Research Data, although there may be an expectation to do so in some other part of your funding agreement.

    You may of course wish to preserve your code (and in some instances the environment in which it was run) in order to adhere to the general spirit of making research more widely available. In this instance consider asking the data creators to deposit their data in an appropriate repository that will assign a Digital Object Identifier (DOI) to the data, so that you can then reference it.

    The Software Sustainability Institute may be of interest.

ORA-Data and other archives

  • I have data that I need to deposit in an ‘appropriate’ data archive. How do I find such an archive?

    An extensive directory of data repositories is available from Re3data. These range from very generic commercially-provided repositories such as Figshare, to narrowly-defined subject-specific repositories. Generally speaking, it’s better to use a subject-specific repository than a generic one, as they will have staff that understand the data and can help curate it properly as time passes.

    Some repositories request that the data they receive is accompanied by metadata (contextual information about the data) in a particular format, so it’s worth getting in touch with appropriate repositories before you get too far in to the data gathering process. It’s much easier to document your data as you gather it rather than leaving it until the end of a project – and documenting data during a project can also help you and any collaborators find relevant information more quickly whilst you are still working on it. Feel free to discuss this with your Subject Librarian or arrange to meet with one of the RDM support team by emailing researchdata@ox.ac.uk.

    Unfortunately, subject-specific data repositories do not exist for many fields. If there is no appropriate disciplinary repositories for your data, you can meet funder requirements by depositing it in Oxford’s institutional data repository: ORA-Data.

    Even if you deposit your data to an external data repository, you should still create a record for it in ORA-Data, so that the University can keep track of research outputs. This is increasingly expected by research funders, and can help with the assessment of impact.

  • When should I deposit my data in ORA-Data rather than another data repository?

    ORA-Data is the University of Oxford’s institutional data repository. This does not, however, mean that all research data you wish to preserve should go into ORA-Data. If there is a specialist data repository for your discipline, you should under normal circumstances deposit your research data there rather than in ORA-Data. You can find out more information about specialist data repositories from Re3data. Take a look at the EPSRC Decision Tree for a deposit workflow.

    You should ONLY deposit your data in ORA-Data if there is no more appropriate specialized data repository in your field. You should however create a record for your data in ORA-Data even if you deposit the data itself elsewhere. This helps the University know what and where it is in the event of an audit, but it also improves the visibility of your data, as ORA-Data records are indexed by search engines such as Google.

    You may wish to deposit your data in a general data repository such as Dryad or Zenodo, or a commercially-provided alternative such as FigShare. These may be convenient, but there are reasons why these are not advised as alternatives to specialist repositories or ORA-Data for data underlying published research conclusions:

    • ORA-Data, and to some degree specialist repositories, are likely to have better longevity than free generalist (and/or) commercial repositories.
    • ORA-Data and specialist repositories are likely to be able to offer a better level of post-deposit curation in the future than free services (things like format migrations and integrity checking).
    • ORA-Data and (most) specialist repositories include a metadata review to ensure that minimum standards are met.
    • With ORA-Data the data will be held within Oxford, so there are no concerns about legal jurisdictions; specialist repositories may be based outside of the UK, but usually make their terms and conditions very clear.
    • Commercial and free services may lack strong long-term business models. Check their terms and conditions to see what will happen to your data should things turn sour.

    Finally, some journal publishers accept data deposits alongside the articles they publish. If depositing in a publisher’s data repository, check that the terms and conditions meet your funder’s minimum expectations, and create a record in ORA-Data.

    More detailed advice regarding archive options is available from http://researchdata.ox.ac.uk/preserving-your-data/archives-and-other-options/.

  • Can I use ORA-Data at any time, or only at the end of a project?

    You can deposit data into ORA-Data or create records for data in ORA-Data at any point in a project. Indeed, the EPSRC expectations indicate that a record describing your research data should normally be made available within 12 months of the data being generated, even if access to the dataset itself is restricted.

    Remember that if you need to pay to deposit the data your project has produced, you will need to complete the deposit process and payment before your grant expires.

  • What’s the point of creating a record in ORA-Data – the data is already referenced in the published article?

    The article will usually provide the richest source of information about the data, but the EPSRC expectations require that a separate metadata record is created nonetheless. The metadata record serves several functions:

    • Provides a more succinct summary of the information required to know whether the data itself will be of interest apart from the article – some data may be of interest to disciplines other than the researcher’s own
    • Increases visibility of the data and the article(s) it underpins
    • Makes it possible to search for data independently from articles
    • Provides structured citation information
    • Renders information machine-readable
    • Enables the University to track the data generated by its researchers and departments (which can be used in REF)
    • May increase impact

    The team behind ORA-Data are looking to enhance the service by ‘harvesting’ metadata records from other systems, but for the time being researchers will need to complete ORA-Data metadata records by hand.

  • I will want to add links to research papers after submitting my data to ORA-Data – is this possible?

    Yes – but initially it will need to be done by Bodleian staff on your behalf. When you submit data to ORA-Data and a DOI is assigned, there are some fields that cannot thereafter be edited. This is an obligation under the terms of the DOI issue. After all, it wouldn’t be a permanent object identifier if the record could be edited to describe a fundamentally different object. Other fields, however, can be amended after creation, and the ‘relationships’ field falls into this category.

    From 1st May 2015, ORA-Data records can only be edited by Bodleian staff once they have been submitted. If there are errors or omissions in the record, you will need to contact ora@bodleian.ox.ac.uk. The Bodleian is working on a limited editing interface that will enable record creators to edit their own records post-submission, which will be available in due course. The Bodleian are also considering ways to automatically populate the relationship field with associated papers that reference the dataset once they are assigned their own DOIs.

  • If I put the data underpinning a thesis onto ORA-Data will that affect the future publication potential of the research?

    We have no evidence that making your data freely available will affect future publications. Many publishers (for example Elsevier) do not mind if the text of your thesis has been made freely available online prior to publication of your monograph or other publication with them. However, there are some publishers who will not publish a monograph if the text of the thesis it is based on has been made freely available online. If in doubt, contact your publisher or ora@bodleian.ox.ac.uk, or see the ORA help page about Pre-Publication Concerns.

  • My data is too big (and costly) to download from the temporary remote/cloud storage where is currently exists onto ORA-Data, what should I do?

    The EPSRC accept that in some circumstances it may not be cost-effective to try to preserve very large amounts of data: “It is accepted that there may be cases in which it may not be possible or cost effective to preserve research data. This will depend on the type and scale of the data, their role in validating published results, and their predicted long term usefulness for further research. … Provided that the ability to validate published research findings is not fundamentally compromised, a deliberate decision to dispose of research data at an appropriate time is acceptable in these cases.”

    If in doubt as to whether your circumstances would justify not preserving the data in question, write to researchdata@ox.ac.uk.

  • Who, or what, is a data steward?

    When completing a record in ORA-Data you will be asked to name a ‘data steward’. You may also hear the term in other contexts. A data steward is essentially a person, or preferably a role, who can be contacted about the data in the event of the creators and depositors moving away from the university. This may be necessary due to the relatively long periods that research data now needs to be preserved.

    A data steward may be able to help inform decisions such as whether there is a case to be made for retaining or deleting data after a specified minimum preservation period is up, or they may be able to answer questions as to whether it is reasonable to share that data in situations unforeseen at the time of deposit. It is preferable to having deposits which cannot be used simply because there is no one to give a decision.

    Whilst someone in a position such as head of department would make an ideal data steward, this may not be in reality be practical. Other potential positions that might be appropriate for this role could include a dedicated departmental data manager or a subject librarian.

  • Can I deposit my data in free archives such as Figshare, Zenodo, or Dryad?

    Yes, but… Be careful that the repository can meet the terms and conditions of your funder, and remember to add a record in ORA-Data.

    Generalist and commercial services such as Zenodo and Figshare are usually free, they offer the immediate assignment of a Digital Object Identifier on deposit, and may come with visualization tools.

    They may also come with significant risks, however, particularly with regards to their longevity. Free services in particular should be handled with caution – what measures do they have in place to guarantee that they and your data will still be around in three years? In ten? Check the terms and conditions to see what will happen to your data if the service disappears.

How do I…?

  • I’m putting together a project bid and need to complete a data management plan. How do I go about doing this?

    Most of the major funding bodies provide a data management plan (DMP) template as well as guidance for completing the plan. You can find summary information about funders’ DMP requirements here on the Research Data Oxford website (http://researchdata.ox.ac.uk/funder-requirements/) and on the Digital Curation Centre website (http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies). Before going any further, visit your funder’s website to ensure you are referring to the latest versions of the template and guidelines where available.

    The Digital Curation Centre offers a useful web service for completing DMPs: the DMPonline tool. This presents the technical plan template as an online form, alongside guidance and tips for completing each entry from both the funder and the University (where such advice is available).

    If you would like more detailed advice as to what to include in a DMP, arrange to meet with one of the RDM support team by writing to researchdata@ox.ac.uk.

  • I need to complete an information security questionnaire in order to access and use a sensitive dataset. How should I go about completing the questionnaire?

    Such questionnaires will usually seek reassurances about the security of the computing facilities and infrastructure that you will be using. There may be restrictions regarding where you can access the sensitive data, such as a requirement that you must be within the physical University department in which you are based (and therefore behind appropriate firewalls) or that you will only ever access the data from a specific computer terminal. If this is the case then your departmental IT support staff will probably need to answer many of the technical questions relating to infrastructure and storage, including the set-up of the computers, how they are maintained, and how they connect to the rest of the University and ultimately the outside world.

    If you need to refer to the University Information Security (IS) Policy, it is available at http://www.it.ox.ac.uk/infosec/ispolicy/. This also covers the University policy on the protection of confidential information, which you may be asked about separately. IS policies are written to be aligned with ISO 27001. Specifically, selected baseline controls are based upon the UCISA Information Security Toolkit which, in turn is based upon ISO 27002. Details regarding the University’s alignment with ISO standards are given in the IS policy above – specifically section “2. Aims and Commitments”. Baseline security standards based on ISO 27002 control-set can be found at http://www.it.ox.ac.uk/policies-and-guidelines/is-toolkit.

    Information about data encryption, passwords, email, mobile devices, and other general security issues is available from the InfoSec team at University of Oxford IT Services. Take a look at the InfoSec website – or for information about encryption in particular, see http://www.it.ox.ac.uk/policies-and-guidelines/is-toolkit/encryption.

    Further questions on this can still be sent to the RDO contact email, researchdata@ox.ac.uk – the team reviewing messages includes members of IT Services.

  • I would like to store my data securely within the University. What are my options?

    Storage for data in current use (as opposed to archiving for data after a project concludes) is usually provided at the departmental level in Oxford. Ask your departmental IT Officer if they have any server space that you or your research group could use. Failing that, the NSMS team at IT Services offer managed server space for a fee. See the NSMS website for further information. Some departments outsource their data storage to NSMS, but they may spare you from the cost if you will not be generating a large amount of data yourself.

    Some of the tools and services provided by the University for researchers include space for data and document storage. The SharePoint service, for instance, provides 25 GB. A list of other University services that come with data storage space is also available.

    IT Services provides a University-wide, centrally-funded, backup and long-term file storage service for staff and postgraduate students – known as the HFS (Hierarchical File Server). It provides an automated backup service for personal computers and servers alike.

    The University is furthermore undertaking a project to scope and provide dedicated storage for research data.

  • I need to include a Digital Object Identifier (DOI) for my data in my article submission. How and when do I get one?

    Most data repositories will assign a unique identifier, most commonly a Digital Object Identifier, when you deposit your data with them.