STORING DATA: THE CURATOR'S CRUCIAL STEP
A
data repository may come into mind in terms of a passive digital storage, a
mere place for the data to stay there and collect digital dust. Nevertheless,
this view is far away from reality. Storing data in a repository is not only an
ultimate administrative task but also an important process which will preserve
current scientific achievements for future generations. If data is not stored
properly, no matter how well a curator cleans the dataset, the latter will
become obsolete and useless within several years.
How
does a repository differ from the cloud storage of personal drive on one's
computer? This difference lies in the element of trustworthiness. Trustworthy
data repository incorporates FAIR Principles of Findability, Accessibility,
Interoperability, and Reusability (Wilkinson et al., 2016), thus providing data
a much longer lifespan than the initial research work.
The
research I undertook made me realize that establishing trust requires
standardization. It is necessary to choose the place to store the data where
persistent identification like DOI will be provided along with good
preservation metadata. For example, most professional repositories use the
PREMIS standard for the same. As opposed to traditional descriptive metadata,
PREMIS tracks digital provenance which includes information about the fixity
checks, migration actions, and access rights. As pointed out by the Society of
American Archivists (n.d.), PREMIS is the international de facto standard for
metadata for digital preservation purposes. This way, it becomes possible to
establish whether the document has been changed after being placed there.
The
aspect I found rather surprising was that storage involves activities different
from mere storage of documents. Examining Cornell University Library's data
curation practice shows that their process includes six phases called
CURATE(D): Check, Understand, Request, Augment, Transform, and Evaluate and
Document (Cornell Data Services, n.d.). Sometimes, the curator needs to perform
a transformation of file formats or augmentation of metadata to ensure that
future researchers would understand what this data means. Higgins (2018)
explains that such active participation should be a regular part of the digital
curation cycle, and shows how the act of storage is much more than simply
storing files.
Also,
the repository used in data storage should be certified. In particular, while
reviewing the criteria of a reputable certification program (CoreTrustSeal,
2020), it becomes clear that to truly store data, one has to be prepared for
long-term sustainability by ensuring an audit of the archive, its policies
regarding disaster recovery, file integrity checking, and succession in case of
the host organization becoming incapable of maintaining the repository anymore.
According to Caplan (2021), the process is further facilitated by the use of
preservation metadata standards, for example, PREMIS.
In
conclusion, this stage can be considered as one of the most important steps
towards ensuring accountability in digital curation. As I conclude my analysis,
it becomes evident that data storage is not the end but a new beginning of the
afterlife of curated data.
References
Caplan,
P. (2021). Understanding PREMIS. Library Resources & Technical Services, 65(1),
22– 31.
CoreTrustSeal.
(2020). CoreTrustSeal trustworthy data repositories requirements.
CoreTrustSeal Board.
Cornell
Data Services. (n.d.). Data curation service. Cornell University Library.
Retrieved May 20, 2026, from https://data.research.cornell.edu/data-management/archiving-and- preservation/data-curation-services/
Higgins,
S. (2018). The DCC Curation Lifecycle Model. International Journal of Digital
Curation, 13(1), 235–245.
Society
of American Archivists. (n.d.). Preservation metadata: Implementation
strategies (PREMIS). In SAA Dictionary.
Retrieved May 20, 2026, from https://dictionary.archivists.org/entry/preservation-metadata-implementation- strategies.html
Wilkinson,
M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A.,
Blomberg, N., Boiten, J. W., da Silva
Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds,
S., Evelo, C. T., Finkers, R., … Mons, B. (2016).
The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018.

Indeed this is one of the important stages in ensuring accountability in data curationš„
ReplyDeleteGood work
ReplyDeleteGood
ReplyDeleteCatchy introduction good write up
ReplyDeleteWonderful
ReplyDeleteWell done
ReplyDeleteGreat stuff
ReplyDeleteThanks for communicating on factors to consider when choosing a repository for storing data
ReplyDelete