cossmbewe

STORING DATA: THE CURATOR'S CRUCIAL STEP

A data repository may come into mind in terms of a passive digital storage, a mere place for the data to stay there and collect digital dust. Nevertheless, this view is far away from reality. Storing data in a repository is not only an ultimate administrative task but also an important process which will preserve current scientific achievements for future generations. If data is not stored properly, no matter how well a curator cleans the dataset, the latter will become obsolete and useless within several years.

How does a repository differ from the cloud storage of personal drive on one's computer? This difference lies in the element of trustworthiness. Trustworthy data repository incorporates FAIR Principles of Findability, Accessibility, Interoperability, and Reusability (Wilkinson et al., 2016), thus providing data a much longer lifespan than the initial research work.

The research I undertook made me realize that establishing trust requires standardization. It is necessary to choose the place to store the data where persistent identification like DOI will be provided along with good preservation metadata. For example, most professional repositories use the PREMIS standard for the same. As opposed to traditional descriptive metadata, PREMIS tracks digital provenance which includes information about the fixity checks, migration actions, and access rights. As pointed out by the Society of American Archivists (n.d.), PREMIS is the international de facto standard for metadata for digital preservation purposes. This way, it becomes possible to establish whether the document has been changed after being placed there.

The aspect I found rather surprising was that storage involves activities different from mere storage of documents. Examining Cornell University Library's data curation practice shows that their process includes six phases called CURATE(D): Check, Understand, Request, Augment, Transform, and Evaluate and Document (Cornell Data Services, n.d.). Sometimes, the curator needs to perform a transformation of file formats or augmentation of metadata to ensure that future researchers would understand what this data means. Higgins (2018) explains that such active participation should be a regular part of the digital curation cycle, and shows how the act of storage is much more than simply storing files.

Also, the repository used in data storage should be certified. In particular, while reviewing the criteria of a reputable certification program (CoreTrustSeal, 2020), it becomes clear that to truly store data, one has to be prepared for long-term sustainability by ensuring an audit of the archive, its policies regarding disaster recovery, file integrity checking, and succession in case of the host organization becoming incapable of maintaining the repository anymore. According to Caplan (2021), the process is further facilitated by the use of preservation metadata standards, for example, PREMIS.

In conclusion, this stage can be considered as one of the most important steps towards ensuring accountability in digital curation. As I conclude my analysis, it becomes evident that data storage is not the end but a new beginning of the afterlife of curated data.

References

Caplan, P. (2021). Understanding PREMIS. Library Resources & Technical Services, 65(1), 22– 31.

CoreTrustSeal. (2020). CoreTrustSeal trustworthy data repositories requirements. CoreTrustSeal Board.

Cornell Data Services. (n.d.). Data curation service. Cornell University Library. Retrieved May 20, 2026, from https://data.research.cornell.edu/data-management/archiving-and- preservation/data-curation-services/

Higgins, S. (2018). The DCC Curation Lifecycle Model. International Journal of Digital Curation, 13(1), 235–245.

Society of American Archivists. (n.d.). Preservation metadata: Implementation strategies (PREMIS). In SAA Dictionary. Retrieved May 20, 2026, from https://dictionary.archivists.org/entry/preservation-metadata-implementation- strategies.html

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018.

Picture: The process of storing data