Paleonet: A reply to an Open Letter in Support of Digital Data Archiving

jonathan antcliffe jonathanantcliffe at
Mon Mar 28 10:59:35 GMT 2011

Dear all

I very
much endorse the ethic behind this campaign. I think that there are many issues
that must be carefully addressed if this campaign is to achieve the good it is
setting out to do. I feel very strongly that we should not rush into this as a
community, as tempting as the lure of data always is, but take time to reflect
on the wider implications that this proposal would have to our science. 


the release of raw data during publication is always a risk to the scientist
who publishes. It is a risk because that data has potentially taken a lot of time
and money to produce, and it is almost certain that the authors do not intend
to use it solely in this one publication. Thus releasing data always
jeopardises future research plans as others now have access to the data. If
data was made more usable than is already the case then this risk would
increase. I fear that this would utterly alter the desire of scientists to
produce interim reports or multiple papers (lowering citations for everyone
making it harder for us all to compete for funding), not publishing until much
more complete statements are ready. This could actually slow the rate of
publication and the ready availability of data. I think it is a real concern
that this will empower the richer and more established institutions at the
expense of those with less access to funding. To those with more funding the
loss of data to a competitor half way through a project is less of a worry than
to those with less funding. It similarly could empower those in established
positions at the expense of those beginning their careers, particularly new research


most of the data is already available and published. It is just not published
in the way that some argue is most convenient to them. Data acquisition is a
hugely time consuming process and I see no compelling argument why those who
spend their time generating detailed data by analysing material all over the
world should alter how they present data so that those doing meta-analyses who
have not produced any original data themselves can produces papers far more
rapidly and thereby outcompete their peers for jobs. This motion as it
currently stands could mandate an institutional bias in favour of the re-analyser
over the data generator. 


the repeated generation of data is hugely positive and essential to the
scientific process, it is not a waste of government research funds. The open
letter makes much of the need for reanalysis with which I agree wholeheartedly.
However regeneration of data is even more important. There is no point
reanalysing bad data, and we don’t know if it is bad unless it can be
regenerated. Thus reanalysis should proceed from, not preclude, regenerating


there is a serious issue regarding the compulsion and how it relates to data
archiving space. Ultimately there is a finite amount of digital archiving space
available due to cost. Phylogenetic data tables are small, photographs not as small,
CAT scan data bigger, Synchotron data enormous. It cannot be the position of a
journal that you must archive images online (and will not be allowed to publish
until you do) unless your machine produced a data set that is too large for the
servers to cope with. So we want all your photographs but if you work on
synchrotrons then don’t worry about it. Again this amounts potentially to an
institutional bias, this time in favour of scientists with large research
grants who can afford to use large expensive machines against those with less
research funding who do photography and drawing but would then be compelled to
hand it all over... 


agree with Jere Lipps that the arguments regarding funding have not been
properly explored and further to his remarks that there are also strong 
implications here. Is data archiving something that we will have to cost
grants or is it always just going to be paid for centrally by the 
for all research produced in their country? We need to admit that 
is not swimming in funding in comparison to other sciences. So I must 
condemn sentiments put forward when a signatory of the open letter made 
argument at a recent conference, stating that research councils should 
not have
to repeatedly fund the same work. I challenge anyone to find two 
projects funded within five years of each other in palaeontology in the 
UK with
the express same aims, outcomes, and, critically, also a complete lack 
mutual illumination. Otherwise such funding is in the best scientific 
of testability and data regeneration. In comparison to most sciences we 
very little public money and I will not endorse anything that implies 
that we
need even less public money or that we are wasting public money. At a 
time of such sweeping funding cuts it seems that such statements amount 
to us voluntarily putting our head on the block. If we were to
examine the number of citations per pound of public money spent then I 
am sure
that palaeontology would rank very favourably against other sciences. 
much research is done from private funding, whilst living off small 
salaries or no salary at all. This raises a serious question of the 
of archiving for those who have funded data acquisition out of their own
Should we then be compelled to pay the government/journals to take 
ownership of
data that we have privately funded and produced whilst being unable to 
get hold
of the small amounts of public money available? 


Sixthly, there
is no mention of the legion of data already published. But vast amounts is
still not easily available, even in pdf. This should be our primary focus in
terms of making use of centralised government money for archiving. How much
more useful would a pdfs archive of published field guides be, or an effort to
translate works published outside of the English speaking world to make use of
this enormous resource of knowledge. If such government money was available for
archiving we need to think very carefully how we could gain maximum benefit
from it as a community.


reference is made to the elephant in the room, the access to fossil 
or restricted field sites, though the vast majority are protected for 
very good reasons related to conservation. This remains the real problem
 in the availability of
palaeontological data. 







Jonathan Antcliffe 

Commission Research Fellow

of Earth Sciences

of Bristol


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Paleonet mailing list