Skip to main content

Selecting Research Data for Curation

By Cosmas Fletcher Mbewe

As a novice in research data management, one of my naive assumptions was that saving all data would require nothing more than purchasing additional hard drives. Lee and Stvilia (2017), however, soon shattered that illusion. In their survey of institutional repository staff, Lee and Stvilia (2017) make it clear that while well-endowed universities are willing to store just about any form of data, there is no established process for assessing which types of data should be stored in the long term. For instance, one interviewee reported that the university had a ten-year retention policy, but none of the datasets had ever reached that mark.

But what prevents us from keeping everything then? There is an important counterargument presented by Whyte and Wilson (2010), which refutes my earlier view of the "storage is cheap" concept. The fact is that while the cost of storage decreases, the expenses related to metadata generation, backups, and maintenance remain considerable. The DCC Curation Lifecycle Model created by Higgins (2008) has helped me see that appraisal should be the basis of every curatorial practice.

The question of what makes data valuable comes next. Whyte and Wilson (2010) suggest such aspects of consideration as relevance to the organization's mission, scientific importance, uniqueness, redistributability, non-replicability, economic value, and documentation. However, according to Lee and Stvilia (2017), many repositories work with the informal ReadMe metadata, spending up to 70%–80% of their budget on this. Similarly, Tenopir, Birch, and Allard (2012) discovered gaps between the needs of researchers and the services offered by libraries. Again, these are not purely technological issues; they also indicate the clash between our aspirations and possibilities.

Most alarming of all is the lack of expertise. Subject specialists were found in five of the thirteen organizations studied by Lee and Stvilia (2017). How could a central organization without such expertise possibly make judgments on research data in different disciplines? According to Borgman, Wallis, and Enyedy (2007), it is critical to understand the context within which the scientific communities produce data. I strongly believe that data appraisal needs to be a joint effort from researchers, librarians, and subject specialists. Data citations and usage rates will help in the decision process but should not determine the process.

To preserve everything is a utopia; to preserve nothing is a disaster. My point is that we need to engage in good data appraisal if we are to achieve proper digital preservation that may be pivotal in the long run. It is evident that without proper procedures, any attempt to preserve the records may be disastrous.

References

Borgman, C. L., Wallis, J. C., & Enyedy, N. (2007). Little science confronts the data deluge: Habitat            ecology, embedded sensor networks, and digital libraries. International Journal on Digital                        Libraries,         7(1-2), 17–30.

Higgins, S. (2008). The DCC Curation Lifecycle Model. International Journal of Digital Curation, 3(1),         134–140.

Lee, D. J., & Stvilia, B. (2017). Practices of research data curation in institutional repositories: A                 qualitative view from repository staff. PLoS ONE, 12(3), e0173987.

Tenopir, C., Birch, B., & Allard, S. (2012). Academic libraries and research data services. Association         of   College and Research Libraries.

Whyte, A., & Wilson, A. (2010). How to appraise and select research data for curation. Digital                      Curation     Centre.


Picture: depicting a thorough scrutiny of data to be appraised



Comments

Post a Comment

Popular posts from this blog

  DATA COLLECTION AND REPOSITORIES: TOPIC SUMMARY Anyone who has ever done data collection for their research project understands well the chaos of data collection process. One day questionnaires, next day interviews recorded on the laptop, then another two weeks to figure out the meaning behind the strange file name. It is precisely for this reason that knowing how to collect data and work with repositories is so crucial in the MLIS sphere, particularly in regard to the issue of data duration (Borgman, 2015). I shall begin with data collection process. It is a process of obtaining data for its further analysis. Surveys, interviews, observation and automation are some of the methods of data collection widely used in LIS. All of those techniques have their pros and cons, however, one must remember that whatever way one chooses data should be obtained with full observance of ethical principles. Such important issues like informed consent, privacy, and anonymisation should alway...

Information Literacy Skills

Information Literacy Skills and the Big Six Framework Cosmas Fletcher Mbewe Master of Library and Information Science Mzuzu University, Malawi  1. Introduction Information literacy is a crucial skill for higher education and the workplace. In the current world that is facing exponential information expansion, digitalification, and the prevalence of misinformation, postgraduate students must exhibit highly advanced skills regarding information identification, information evaluation, and the proper application of information. Information literacy skills enable learners to respond effectively to knowledge by applying it in a rigorous search or contribution towards academic and national discourse (Association of College and Research Libraries [ACRL], 2016). In relation to the Malawian higher education setting, information literacy can be considered crucial for such aspects as evidence-based decision-making, research productivity, and sustainable development. In this paper, information ...
USING AND REUSING DATA At first, I thought that data curation only involved such tasks as keeping data secure and backed up, migrating data formats when necessary, and nothing more. In truth, I was completely off the mark in my assessment. Having data stored on some server is of no value if no one can access it. Data usage Data usage is rather simple. This would be a case of a biologist analysing her field observations. It would also involve a student retrieving data and comparing them with data presented in the study. Now, data reuse is something more complicated and more challenging. Reusing data implies taking data from another source and applying it to solve a problem unknown to the original producers of the dataset. This can mean using census data for migration analysis, combining three clinical studies for a meta-analysis, etc. Data re-use According to Lee & Stvilia (2017), majority of the users engage in activities such as searching, browsing, downloading of content fr...