Make your data count

Thoughts from RDMF12 in Leicester – What data do we preserve?

The theme of the Digital Curation Centre’s 12th Research Data Management Forum was ‘Linking Data and Repositories’, although in practice speakers covered a wide range of topics from convincing researchers of the value of data preservation and curation to the new JISC investments. One of the good things about the RDMF workshops is the way that certain topical issues being confronted by several organizations at that moment tend to bubble to the top of conversation. Having left for the workshop immediately after one of the Oxford Research Data Management Working Group meetings I was pleasantly surprised that one of the issues that raised its head at that meeting was also confronted at the RDMF: the question of exactly what data researchers should be preserving.

Confronted with increasingly stringent funder requirements relating to research data management, one of the responses we’re hearing from researchers is ‘but what data do I actually need to preserve’? Given that good research data management can take time, people are understandably nervous of expectations that seem to imply they must document and deposit absolutely all of the information they are collecting or generating. The University Policy does offer a bit of clarity in this regard: “Research data and records are defined as the recorded information (regardless of the form or the media in which they may exist) necessary to support or validate a research project’s observations, findings or outputs”. This may be helpful in narrowing the scope of what needs to be curated. The implication is, for instance, that in most cases there is no need to preserve ‘raw’ data when it is the more refined, processed data that directly underpins research outputs (although it would clearly be important to explain how the processed data has been derived). The policy does, nevertheless, leave something to the judgement of the researcher(s) responsible for the data. This is in part deliberate, as it is extremely hard to provide a single simple definition that is clear and unambiguous across all academic disciplines. And definitions of ‘data’ can vary widely.

Questions of data selection or ‘triage’ are not new to the RDM community, but whereas previously they tended to be relegated in favour of topics of greater immediacy to institutions, it seems that this is now an issue whose time has come.

One of the striking outcomes of RDMF12 was that it seems that universities in the UK are now all heading towards the same essential set of components to provide data management infrastructure. From university after university delegates heard tell of a slightly different combination of repository and related technologies, but all leading towards integrated systems intended to do the same basic things. The unsolved issues were much more around engaging researchers and encouraging engagement with these infrastructures.

Jonathan Tedds (University of Leicester) got the workshop going in the first session with his turn as a (only partially acted) truculent scientist, lamenting the pointlessness of the extra bureaucracy and management interference that ‘research data management’ involved. It effectively brought home to the audience the many not-entirely-unreasonable objections that researchers have to new funder requirements, several of which concerned the matter of what counted as ‘research data’ and whether preserving certain data really had any value.

Over the past few years there has been much, often tedious, talk about ‘carrots and sticks’, and which approach is best in terms of encouraging compliance (the very word ‘compliance’ can have a curious effect on some). Conversations at RDMF12 steered very much towards engaging with and encouraging researchers to see the benefits of data curation, and working together to understand and define what data would actually have value beyond the immediate project. Simon Hodson’s conclusion emphasized the need to select data sensibly when putting policies into practice and avoid the prospect of infuriating researchers whilst generating ‘digital landfill’. This is something we’ll now be following up at Oxford as we come up with clearer guidelines and support as to what data should actually be preserved by researchers in various disciplines and why.