Make your data count

Report from the Research Data Alliance Plenary Meeting no.4

Research Data Alliance logo
Since the Research Data Alliance began in 2012 it has rapidly grown to become the forum for discussing and coordinating efforts to improve research data management practices between organisations and academic disciplines at the international scale. Whereas the International Digital Curation Conference provides the traditional forum for those working in the field of research data management to introduce new developments, tools, and ideas, the RDA Plenaries consist of a series of workshops by Interest Groups which may then spawn Working Groups of volunteers who are tasked with implementing specific projects. This format is obviously particularly well-suited to encouraging international collaboration and the development of generally-accepted standards. The success of the RDA is attested by the rapid growth of attendance at the plenary meetings: over 500 people attended the 4th Plenary in Amsterdam.

This was the first RDA plenary meeting that I had attended, and many others were new to the RDA too. Two of the disadvantages of the rapid growth and sheer scale of the event was that some delegates were inevitably not up to speed with many of the issues up for discussion, and that ten or so parallel sessions going on at any one time. This made it hard to know what to attend and whether discussions would be building on prior work or effectively starting again from scratch. The quality of the workshops thus varied depending on the convenors and delegates that attended. A session on data management planning seemed to be curiously unaware of the tools already developed by the Digital Curation Centre, and a workshop about sustainability was slowed by delegates posing the same big questions that delegates to this sort of event have been posing for the last four years. But despite these inevitable frustrations, there was much of interest.

The conference began with EU assurances of concern and of money. Robert Jan Smits (Director-General of Research and Innovation for the European Commission) indicated that the reproducibility of research is at present only 10% to 30% and it was clear that scientists needed to start treating their data with the same care as their publications if this situation were to be improved. A video presentation by Neelie Kroes (departing Vice President of the European Commission responsible for the Digital Agenda for Europe) was followed by Professor Barend Mons’s keynote considering ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) data and other issues du jour.

The keynote was perhaps not as controversial as billed, although it did raise some points of contention that were returned to throughout the conference. Firstly, Mons chose to advocate the term ‘data stewardship’ over ‘data management’, although to my mind the latter is necessary (and involves the researcher or creator of the data) for the former to be possible (where some sort of specialist custodian assumes responsibility); secondly, he took the approach that the research paper should be considered supplementary to the data rather than vice versa; thirdly, he placed a major emphasis on the machine readability of data, in particular the encoding of every single scientific assertion as an RDF triplet with provenance and licence information. This may have been informed by Mons’s background in bioinformatics. To me, coming from a background in the humanities, it sounded utopian. On a more pragmatic note, Mons added that he thought 5% of total research funding should go to data stewardship.

Keynote over, the assembled throng disReapingFruit_RDA4persed to the many parallel working groups, interest groups, and birds of a feather sessions. I opted for interest groups on service management, sustainability, and active data management plans. Of these, the first was the most interesting, despite including rather a large element of sales pitch. There has been little discussion so far amongst the research data management community about how to actually run services, partly because most of the effort has been focussed on establishing what services are needed rather than how to run them. The conveners emphasised the importance of having a service management system in place before a service becomes well established, although the pitch was aimed more at services being offered by research groups than by institutional service departments such as IT Services or Libraries, where solid service management practices are (or should be) already in place. Whilst the session was informative, I couldn’t help but feel that in most cases research groups might be better advised to speak to their IT Services about setting up supported services rather than simply doing it themselves only to find they lacked the time or reward structures to manage and maintain them. On a related note, the Sustainability Interest Group, which was concerned mostly with software and eResearch infrastructure, exposed a significant gap in understanding and expectations regarding the provision of tools developed at the research group level and institutionally-supported tools and services – a gap that IT Services departments should perhaps be trying to close by getting more closely involved with researchers.

Tuesday’s keynote by Christine Borgman (UCLA) began by taking issue with Barend Mons’s keynote the day before, insisting that ‘publications are not simply containers for data’ but rather arguments that are supported by data. Furthermore, she stressed the nature of data as compound objects with frequently uncertain ownership, and criticized metadata models as frequently being too ‘heavyweight’ for individual researchers to use. Approaching research data management from a librarian’s perspective, she criticised popular online tools such as FigShare for their short-term business models and lack of sustainability. Her talk underscored some of the differences in attitudes and approaches towards data management and curation in different disciplines, stating memorably that in the humanities many researchers regard their data almost as a ‘dowry’ that they can bring to a new institution when they move. I should state that in practice I’ve spoken to few researchers who think of their data in quite such terms.

The Interest Group on the legal interoperability of research data proved more interesting than I thought it might do. The issue of ensuring that data can be combined and reused across various jurisdictions is very important given the international nature of much research these days, but it’s not a subject I’d heard addressed before. This practical-minded interest group had already been working on a set of principles to govern interoperability, which were discussed in some detail. The emphasis of the work was on intellectual property rather than licences, as licences are (it seems) essentially copyright enforceable rather than contractual agreements (as least in the U.S.). I may not have followed all of the legal niceties in the discussion, but I shall certainly follow the outcomes of the 1-day workshop that select members of the group will stage in Washington DC on the 21st October.

Before attending the RDA Plenary, I had only really been actively following the discussions of the Long-tail of Data Interest Group. Their double session consisted of case studies and discussion of practical steps to help the international research data management community. Lots of recommendations for researchers and institutions arose from the case studies, including: training students in data sharing; data CVs and impact portfolios; integrated research data management one-stop-shops (which Oxford already has to some extent); supporting platforms for sharing failed experiments; providing data visualisation tools and platforms; integrating data management tools into researchers day-to-day workflows; providing free data storage (up to a point); providing institutional data management plan templates; advertising the existence of data that’s already being shared; establishing academic community data-sharing interest groups; filming and sharing data ‘success stories’; ensuring links between papers and data are made and maintained; and league tables of data deposits by academic departments within institutions, to encourage an element of competition. Suggestions for collective ‘working group’ projects to assist the community included developing plug-ins for popular software already used by researchers in order to facilitate data sharing, and easy deposit to data archives and improving awareness of the tools and services that already exist. I seem to have volunteered myself to collate information about existing tools and share them more broadly with the Interest Group, which I shall try to make a start on over the next few months.

On the final morning of the conference I attended the science stream workshop on knowledge networks, which was interesting albeit rather too theoretical for my taste. The presentations and conversations focussed on the possibility of creating a new sort of science gateway which would maximise utility for researchers by balancing the scope of search engines such as Google with the relevance of specialist disciplinary gateways. It will be good if it works.

The final plenary session consisted of a panel discussion featuring the great and good of various nations to emphasize the need for international action and collaboration to address the challenges of research data management. A brief but lively discussion about the extent to which researchers should share resources with the private sector ensued, but did not lead anywhere in particular. With a blast of the Beach Boys and a video extolling the delights of San Diego, the conference closed with the message that the next RDA would be held in California in March 2015. All things considered, attending the conference was well worth while, although in future it might be useful to provide a little more information about what each parallel session will cover in order to help direct delegates to the most relevant sessions.