cossmbewe

USING AND REUSING DATA

At first, I thought that data curation only involved such tasks as keeping data secure and backed up, migrating data formats when necessary, and nothing more. In truth, I was completely off the mark in my assessment. Having data stored on some server is of no value if no one can access it.

Data usage

Data usage is rather simple. This would be a case of a biologist analysing her field observations. It would also involve a student retrieving data and comparing them with data presented in the study. Now, data reuse is something more complicated and more challenging. Reusing data implies taking data from another source and applying it to solve a problem unknown to the original producers of the dataset. This can mean using census data for migration analysis, combining three clinical studies for a meta-analysis, etc.

Data re-use

According to Lee & Stvilia (2017), majority of the users engage in activities such as searching, browsing, downloading of content from the repositories. There are other functionalities provided by other repositories; these include social networking tools where one can share datasets via Facebook or Twitter, full-text searching and bookmarking facilities. One particular repository allows user created “bookshelf” where they saved their favorite data sets which proved quite useful in teaching.

However, many repositories hardly do anything more than the downloading service itself. During their study, Lee & Stvilia (2017) reported that at least one repository member noted there was no direct analytics tool within the repository. Instead, they only provide their data and let others use them on their side.On the contrary, reuse is not self-evident. Borgman, Wallis & Enyedy (2007) studied reuse among habitat ecologists and discovered that successful reuse is tied to the awareness of everyday scientific practices. It does not work by merely tossing data across the walls.

Challenges

As noted by Whyte & Wilson (2010), keeping everything makes it less accessible. Poor metadata causes such headaches in repositories as those described in Lee & Stvilia's (2017) study. Researchers have to abuse basic Dublin Core fields since there are no other options available to fit their needs. One curator spends 80 percent of their time finding documentation about the data.

Then what makes the difference? DOIs help in citing datasets correctly. One such repository gives citation guidance for every dataset; the other helps researchers to make profiles that help them in tracking the downloads as well as usage of their dataset. DCC Curation Lifecycle Model (Higgins, 2008) has recognized this in making "access, use, and reuse" an activity that runs through rather than being done at the end.

Conclusion

The repositories that actually work take researchers into account right from the beginning in metadata creation. This involves giving researchers previewing environments that help them in seeing their descriptions in advance. The successful repositories are not judged by the amount of data stored by them but by how useful the data becomes. This is because storing data without using it is nothing but digital hoarding.

References

Borgman, C. L., Wallis, J. C., & Enyedy, N. (2007). Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7(1-2), 17–30.

Higgins, S. (2008). The DCC Curation Lifecycle Model. International Journal of Digital Curation, 3(1), 134–140.

Lee, D. J., & Stvilia, B. (2017). Practices of research data curation in institutional repositories: A qualitative view from repository staff. PLoS ONE, 12(3), e0173987.

Tenopir, C., Birch, B., & Allard, S. (2012). Academic libraries and research data services. Association of College and Research Libraries.

Whyte, A., & Wilson, A. (2010). How to appraise and select research data for curation. Digital Curation Centre.