Skip to main content

 STORING DATA: THE CURATOR'S CRUCIAL STEP

A data repository may come into mind in terms of a passive digital storage, a mere place for the data to stay there and collect digital dust. Nevertheless, this view is far away from reality. Storing data in a repository is not only an ultimate administrative task but also an important process which will preserve current scientific achievements for future generations. If data is not stored properly, no matter how well a curator cleans the dataset, the latter will become obsolete and useless within several years.

How does a repository differ from the cloud storage of personal drive on one's computer? This difference lies in the element of trustworthiness. Trustworthy data repository incorporates FAIR Principles of Findability, Accessibility, Interoperability, and Reusability (Wilkinson et al., 2016), thus providing data a much longer lifespan than the initial research work.

The research I undertook made me realize that establishing trust requires standardization. It is necessary to choose the place to store the data where persistent identification like DOI will be provided along with good preservation metadata. For example, most professional repositories use the PREMIS standard for the same. As opposed to traditional descriptive metadata, PREMIS tracks digital provenance which includes information about the fixity checks, migration actions, and access rights. As pointed out by the Society of American Archivists (n.d.), PREMIS is the international de facto standard for metadata for digital preservation purposes. This way, it becomes possible to establish whether the document has been changed after being placed there.

The aspect I found rather surprising was that storage involves activities different from mere storage of documents. Examining Cornell University Library's data curation practice shows that their process includes six phases called CURATE(D): Check, Understand, Request, Augment, Transform, and Evaluate and Document (Cornell Data Services, n.d.). Sometimes, the curator needs to perform a transformation of file formats or augmentation of metadata to ensure that future researchers would understand what this data means. Higgins (2018) explains that such active participation should be a regular part of the digital curation cycle, and shows how the act of storage is much more than simply storing files.

Also, the repository used in data storage should be certified. In particular, while reviewing the criteria of a reputable certification program (CoreTrustSeal, 2020), it becomes clear that to truly store data, one has to be prepared for long-term sustainability by ensuring an audit of the archive, its policies regarding disaster recovery, file integrity checking, and succession in case of the host organization becoming incapable of maintaining the repository anymore. According to Caplan (2021), the process is further facilitated by the use of preservation metadata standards, for example, PREMIS.

In conclusion, this stage can be considered as one of the most important steps towards ensuring accountability in digital curation. As I conclude my analysis, it becomes evident that data storage is not the end but a new beginning of the afterlife of curated data.

References

Caplan, P. (2021). Understanding PREMIS. Library Resources & Technical Services, 65(1), 22–       31.

CoreTrustSeal. (2020). CoreTrustSeal trustworthy data repositories requirements. CoreTrustSeal Board.

Cornell Data Services. (n.d.). Data curation service. Cornell University Library. Retrieved May            20, 2026, from https://data.research.cornell.edu/data-management/archiving-and-  preservation/data-curation-services/

Higgins, S. (2018). The DCC Curation Lifecycle Model. International Journal of Digital Curation,         13(1), 235–245.

Society of American Archivists. (n.d.). Preservation metadata: Implementation strategies         (PREMIS). In SAA Dictionary. Retrieved May 20, 2026, from             https://dictionary.archivists.org/entry/preservation-metadata-implementation-            strategies.html

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B.         (2016). The FAIR Guiding Principles for scientific data management and stewardship.            Scientific Data, 3(1), 160018.

Picture: The process of storing data


 

 

Comments

Post a Comment

Popular posts from this blog

  DATA COLLECTION AND REPOSITORIES: TOPIC SUMMARY Anyone who has ever done data collection for their research project understands well the chaos of data collection process. One day questionnaires, next day interviews recorded on the laptop, then another two weeks to figure out the meaning behind the strange file name. It is precisely for this reason that knowing how to collect data and work with repositories is so crucial in the MLIS sphere, particularly in regard to the issue of data duration (Borgman, 2015). I shall begin with data collection process. It is a process of obtaining data for its further analysis. Surveys, interviews, observation and automation are some of the methods of data collection widely used in LIS. All of those techniques have their pros and cons, however, one must remember that whatever way one chooses data should be obtained with full observance of ethical principles. Such important issues like informed consent, privacy, and anonymisation should alway...

Information Literacy Skills

Information Literacy Skills and the Big Six Framework Cosmas Fletcher Mbewe Master of Library and Information Science Mzuzu University, Malawi  1. Introduction Information literacy is a crucial skill for higher education and the workplace. In the current world that is facing exponential information expansion, digitalification, and the prevalence of misinformation, postgraduate students must exhibit highly advanced skills regarding information identification, information evaluation, and the proper application of information. Information literacy skills enable learners to respond effectively to knowledge by applying it in a rigorous search or contribution towards academic and national discourse (Association of College and Research Libraries [ACRL], 2016). In relation to the Malawian higher education setting, information literacy can be considered crucial for such aspects as evidence-based decision-making, research productivity, and sustainable development. In this paper, information ...
USING AND REUSING DATA At first, I thought that data curation only involved such tasks as keeping data secure and backed up, migrating data formats when necessary, and nothing more. In truth, I was completely off the mark in my assessment. Having data stored on some server is of no value if no one can access it. Data usage Data usage is rather simple. This would be a case of a biologist analysing her field observations. It would also involve a student retrieving data and comparing them with data presented in the study. Now, data reuse is something more complicated and more challenging. Reusing data implies taking data from another source and applying it to solve a problem unknown to the original producers of the dataset. This can mean using census data for migration analysis, combining three clinical studies for a meta-analysis, etc. Data re-use According to Lee & Stvilia (2017), majority of the users engage in activities such as searching, browsing, downloading of content fr...