<br />
<b>Warning</b>:  Undefined array key "file" in <b>/apps/www/wordpress/blogs/wp-includes/media.php</b> on line <b>1686</b><br />
<br />
<b>Warning</b>:  Undefined array key "file" in <b>/apps/www/wordpress/blogs/wp-includes/media.php</b> on line <b>1686</b><br />
{"id":827,"date":"2013-12-20T10:39:40","date_gmt":"2013-12-20T10:39:40","guid":{"rendered":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/?p=827"},"modified":"2022-03-09T16:31:53","modified_gmt":"2022-03-09T16:31:53","slug":"thinking-about-a-data-vault","status":"publish","type":"post","link":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/2013\/12\/20\/thinking-about-a-data-vault\/","title":{"rendered":"Thinking about a Data Vault"},"content":{"rendered":"<p>[Reposted from\u00a0<a href=\"https:\/\/libraryblogs.is.ed.ac.uk\/blog\/2013\/12\/20\/thinking-about-a-data-vault\/\">https:\/\/libraryblogs.is.ed.ac.uk\/blog\/2013\/12\/20\/thinking-about-a-data-vault\/<\/a>]<\/p>\n<p>In a recent blog post, we looked at the <a title=\"The four quadrants of research data curation systems\" href=\"https:\/\/libraryblogs.is.ed.ac.uk\/2013\/12\/06\/the-four-quadrants-of-research-data-curation-systems\/\">four quadrants of research data curation systems<\/a>.\u00a0 This categorised systems that manage or describe research data assets by whether their primary role is to store metadata or data, and whether the information is for private or public use.\u00a0 Four systems were then put into these quadrants.\u00a0 We then started to investigate further the requirements of a Data Asset Register in <a title=\"Thinking about Research Data Asset Registers\" href=\"https:\/\/libraryblogs.is.ed.ac.uk\/2013\/12\/12\/thinking-about-research-data-asset-registers\/\">another blog post<\/a>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-1853\" src=\"https:\/\/libraryblogs.is.ed.ac.uk\/wp-content\/uploads\/2013\/12\/quadrants3.png\" alt=\"quadrants3\" width=\"480\" height=\"360\" \/><\/p>\n<p>This blog post will look at the requirements and characteristics of a Data Vault, and how this component fits into the data curation system landscape.<\/p>\n<p><strong>What?<\/strong><\/p>\n<p>The first aspect to consider is what exactly is a Data Vault?\u00a0 For the purposes of this blog post, we\u2019ll simply consider it is a safe, private, store of data that is only accessible by the data creator or their representative.\u00a0 For simplicity, it could be considered very similar to a safety deposit box within a bank vault.\u00a0 However other than the concept, this analogy starts to break down quite quickly, as we\u2019ll discuss later.<\/p>\n<p><strong>Why?<\/strong><\/p>\n<p>There are different use cases where a Data Vault would be useful.\u00a0 A few are described here:<\/p>\n<ul>\n<li>A paper has been published, and according to the research funder\u2019s rules, the data underlying the paper must be made available upon request.\u00a0 It is therefore important to store a date-stamped <a href=\"http:\/\/www.ed.ac.uk\/schools-departments\/records-management-section\/records-management\/staff-guidance\/retention-schedules\/golden-copy\">golden-copy<\/a> of the data associated with the paper.\u00a0 Even if the author\u2019s own copy of the data is subsequently modified, the data at the point of publication is still available.<\/li>\n<li>Data containing personal information, perhaps medical records, needs to be stored securely, however the data is \u2018complete\u2019 and unlikely to change, yet hasn\u2019t reached the point where it should be deleted.<\/li>\n<li>Data analysis of a data set has been completed, and the research finished.\u00a0 The data may need to be accessed again, but is unlikely to change, so needn\u2019t be stored in the researcher\u2019s active data store.\u00a0 An example might be a set of completed crystallography analyses, which whilst still useful, will not need to be re-analysed.<\/li>\n<li>Data is subject to retention rules and must be kept securely for a given period of time.<\/li>\n<\/ul>\n<p><strong>How?<\/strong><\/p>\n<p>Clearly the storage characteristics for a Data Vault are different to an open data repository or active data filestore for working data.\u00a0 The following is a list of some of the characteristics a Data Vault will need, or could use:<\/p>\n<ul>\n<li>Write-once file system: The file system should only allow each file to be written once.\u00a0 Deleting a file should require extra effort so that it is hard to remove files.<\/li>\n<li>Versioning:\u00a0 If the same file needs to be stored again, then rather than overwriting the existing file, it should be stored alongside the file as a new version.\u00a0 This should be an automatic function.<\/li>\n<li>File security: Only the data owner or their delegate can access the data.<\/li>\n<li>Storage security: The Data Vault should only be accessible through the local university network, not the wider Internet.\u00a0 This reduces the vectors of attack, which is important given the potential sensitivity of the data contained within the Data Vault.<\/li>\n<li>Additional security: Encrypt the data, either via key management by the depositors, or within the storage system itself?<\/li>\n<li>Upload and access:\u00a0 Options include via a web interface (issues with very large files), special shared folders, dedicated upload facilities (e.g. GridFTP), or an API for integration with automated workflows.<\/li>\n<li>Integration: How would the Data Vault integrate with the <a href=\"https:\/\/libraryblogs.is.ed.ac.uk\/blog\/2013\/12\/12\/thinking-about-research-data-asset-registers\/\">Data Asset Register<\/a>?\u00a0 Could the register be the main user interface for accessing the Data Vault?<\/li>\n<li>Description:\u00a0 What level of description, or metadata, is required for data sets stored in the Data Vault, to ensure that they can be found and understood in the future?<\/li>\n<li>Assurance:\u00a0 Facilities to ensure that the file uploaded by the researcher is intact and correct when it reaches the vault, and periodic checks to ensure that the file has not become corrupted.\u00a0 What about more active preservation functions, including file format migration to keep files up to date (e.g. convert Word 95 documents to Word 2013 format)?<\/li>\n<li>Speed:\u00a0 Can the file system be much slower, perhaps a Hierarchical Storage Management (HSM) system that stores frequently accessed data on disk, but relegates older or less frequently accessed data to slower storage mechanisms such as tape?\u00a0 Access might then be slow (it takes a few minutes for the data to be automatically retrieved from the tape) but the cost of the service is much lower.<\/li>\n<li>Allocation: How much allocation should each person be given, or should it be unrestricted so as to encourage use?\u00a0 What about costing for additional space?\u00a0 Costings may be hard, because if the data is to be kept for perpetuity, then whole-life costing will be needed.\u00a0 If allocation is free, how to stop it being used for routine backups of data rather than golden-copy data?<\/li>\n<li>Who:\u00a0 Who is allowed access to the Data Vault to store data?<\/li>\n<li>Review periods:\u00a0 How to remind data owners what data they have in the Data Vault so that they can review their holdings, and remove unneeded data?<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-2175\" src=\"https:\/\/libraryblogs.is.ed.ac.uk\/wp-content\/uploads\/2013\/12\/Data-Vault.png\" alt=\"Data Vault\" width=\"480\" height=\"469\" \/><\/p>\n<p>Feedback on these issues and discussion points are very welcome!\u00a0 We will keep this blog updated with further updates as these services develop.<\/p>\n<p>Image available from\u00a0<a href=\"http:\/\/dx.doi.org\/10.6084\/m9.figshare.873617\">http:\/\/dx.doi.org\/10.6084\/m9.figshare.873617<\/a><\/p>\n<p>Tony Weir, Head of Unix Section, IT Infrastructure<br \/>\nStuart Lewis, Head of Research and Learning Services, Library &amp; University Collections.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[Reposted from\u00a0https:\/\/libraryblogs.is.ed.ac.uk\/blog\/2013\/12\/20\/thinking-about-a-data-vault\/] In a recent blog post, we looked at the four quadrants of research data curation systems.\u00a0 This categorised systems that manage or describe research data assets by whether their primary role is to store metadata or data, and &hellip; <a href=\"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/2013\/12\/20\/thinking-about-a-data-vault\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":175,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false},"categories":[10],"tags":[40,57,62,92],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/posts\/827"}],"collection":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/users\/175"}],"replies":[{"embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/comments?post=827"}],"version-history":[{"count":1,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/posts\/827\/revisions"}],"predecessor-version":[{"id":3799,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/posts\/827\/revisions\/3799"}],"wp:attachment":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/media?parent=827"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/categories?post=827"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/datablog\/wp-json\/wp\/v2\/tags?post=827"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}