The concept of the Data Vault service is to take research data that is no longer being actively used, and to archive it in long-term archival storage. In order to facilitate this two processes need to take place as the data is prepared for storage in the vault:
Metadata needs to be provided with any package that is being archived. This means that the data can be found, understood, and any compliance issues complied with correctly (for example rights or retention). Metadata needs to be applied at different levels, for example to the complete vault or container for a project, to deposits made into that vault, and to individual files.
Rather than copying large structures of files into the archival storage, it has been decided to compile them into a single packages. This means that only single files need to be stored, and the packages can have extra information included, such as checksums of the files contained in the package and a copy of the metadata. Bagit seems to be the obvious choice for this, and there are many bagit libraries available in different programming languages.
As with the evolving project plan, two openly editable documents have been created to discuss these two issues. Please contribute if you have thoughts about these two issues!
- Metadata investigation:
- Packaging investigation: