Selecting Research Data for Curation
By Cosmas Fletcher Mbewe
As a novice in research data
management, one of my naive assumptions was that saving all data would require
nothing more than purchasing additional hard drives. Lee and Stvilia (2017),
however, soon shattered that illusion. In their survey of institutional
repository staff, Lee and Stvilia (2017) make it clear that while well-endowed
universities are willing to store just about any form of data, there is no
established process for assessing which types of data should be stored in the
long term. For instance, one interviewee reported that the university had a
ten-year retention policy, but none of the datasets had ever reached that mark.
But what prevents us from keeping
everything then? There is an important counterargument presented by Whyte and
Wilson (2010), which refutes my earlier view of the "storage is
cheap" concept. The fact is that while the cost of storage decreases, the
expenses related to metadata generation, backups, and maintenance remain
considerable. The DCC Curation Lifecycle Model created by Higgins (2008) has
helped me see that appraisal should be the basis of every curatorial practice.
The question of what makes data
valuable comes next. Whyte and Wilson (2010) suggest such aspects of
consideration as relevance to the organization's mission, scientific
importance, uniqueness, redistributability, non-replicability, economic value,
and documentation. However, according to Lee and Stvilia (2017), many
repositories work with the informal ReadMe metadata, spending up to 70%–80% of
their budget on this. Similarly, Tenopir, Birch, and Allard (2012) discovered
gaps between the needs of researchers and the services offered by libraries.
Again, these are not purely technological issues; they also indicate the clash
between our aspirations and possibilities.
Most alarming of all is the lack of
expertise. Subject specialists were found in five of the thirteen organizations
studied by Lee and Stvilia (2017). How could a central organization without
such expertise possibly make judgments on research data in different
disciplines? According to Borgman, Wallis, and Enyedy (2007), it is critical to
understand the context within which the scientific communities produce data. I
strongly believe that data appraisal needs to be a joint effort from
researchers, librarians, and subject specialists. Data citations and usage
rates will help in the decision process but should not determine the process.
To preserve everything is a utopia;
to preserve nothing is a disaster. My point is that we need to engage in good
data appraisal if we are to achieve proper digital preservation that may be pivotal in the long run. It is evident that without proper procedures, any attempt to preserve the records may be disastrous.
References
Borgman, C. L., Wallis, J. C.,
& Enyedy, N. (2007). Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal
on Digital Libraries, 7(1-2), 17–30.
Higgins, S. (2008). The DCC
Curation Lifecycle Model. International Journal of Digital Curation, 3(1), 134–140.
Lee, D. J., & Stvilia, B.
(2017). Practices of research data curation in institutional repositories: A
qualitative view from repository staff. PLoS ONE, 12(3), e0173987.
Tenopir, C., Birch, B., &
Allard, S. (2012). Academic libraries and research data services. Association
of College and Research Libraries.
Whyte, A., & Wilson, A. (2010).
How to appraise and select research data for curation. Digital Curation Centre.

Excellent
ReplyDeleteGood read
ReplyDeleteGreat
ReplyDeleteGreat masterpiece right here. I love the you have articulated issues
ReplyDeleteWhat a piece
ReplyDeleteNice work bro
ReplyDeleteGreat!
ReplyDeleteGreat analysis on preserving whatever our hands could get hold of.
ReplyDeleteNice work with some aids
ReplyDelete