Data sharing - privacy versus speed of access

In a paper published in EPJ Data Science, Barbara Jasny, deputy editor for commentary at Science magazine in Washington, DC, USA, looks at the history of the debates surrounding data access during and after the human genome "war".

In this context, she outlines current challenges in accessing information affecting research, particularly with regard to the social sciences, personalised medicine and sustainability.

The trouble is that most researchers do not currently share their data. This is due both to research practices and research culture. Scientists withholding data put forward various justifications.

These include the prohibitive amount of work involved, the need to withhold data prior to publication to retain a competitive advantage, or constraints associated with the raw data itself when received under confidentiality agreements.

The author focused particularly on data sharing during the human genome sequencing race. The competition to present the first complete sequence of a human genome was then perceived as a battle. Jasny frames it as pitting 'free genome data access advocates', who are government-controlled scientists funded primarily by the NIH and the UK Wellcome Trust, versus the US company Celera, which she says intended to exploit the data commercially. It's a bit of revisionist history. To most, it was slow government versus the nimble private sector and only when the government sector was getting beat despite spending 10X as much did they agree to jointly take credit. Celera would not have done that if they intended to 'exploit' the genome as they are being characterized here.

Further data access battles intensified after the publication of the draft genome in 2000. The public research initiative made data available, there were still government-mandated conditions on publishing research results based on the data, just like today. The data thus only became truly free to use after some delay, just like today.

Jasny concludes that two forces are currently impacting the research community: first, the need to protect individual privacy regarding information; and second, the push towards open access to data, which is increasingly being mandated by public funding agencies - with the same delay as was present in 2000 - and assuming the members of the president's party in Congress don't continue to block open science.