SWORD stands for “Simple Web-service Offering Repository Deposit” and is a “profile” of AtomPub (RFC 5023) which is a RESTful API that allows non-Dataverse software to deposit files and metadata into a Dataverse installation. Client libraries are available in Python, Java, R, Ruby, and PHP.
Contents:
Introduced in Dataverse Network (DVN) 3.6, the SWORD API was formerly known as the “Data Deposit API” and data-deposit/v1 appeared in the URLs. For backwards compatibility these URLs continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to v1.1. For details, see Backward incompatible changes.
Dataverse implements most of SWORDv2, which is specified at http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html . Please reference the SWORDv2 specification for expected HTTP status codes (i.e. 201, 204, 404, etc.), headers (i.e. “Location”), etc. For a quick introduction to SWORD, the two minute video at http://cottagelabs.com/news/intro-to-sword-2 is recommended.
As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse 4.0 datasets can also be created via JSON using the “native” API. SWORD is limited to the dozen or so fields listed below in the crosswalk, but the native API allows you to populate all metadata fields available in Dataverse.
For better security than in DVN 3.x, usernames and passwords are no longer accepted. The use of an API token is required.
Differences in Dataverse 4 from DVN 3.x lead to a few minor backward incompatible changes in the Dataverse implementation of SWORD, which are listed below. Old v1 URLs should continue to work but the Service Document will contain a deprecation warning and responses will contain v1.1 URLs. See also Known issues.
The service document enumerates the dataverses (“collections” from a SWORD perspective) the user can deposit data into. The “collectionPolicy” element for each dataverse contains the Terms of Use. Any user with an API token can use this API endpoint. Institution-wide Shibboleth groups are not respected because membership in such a group can only be set via a browser.
curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/service-document
To create a dataset, you must have the “Dataset Creator” role (the AddDataset permission) on a dataverse. Practically speaking, you should first retrieve the service document to list the dataverses into which you are authorized to deposit data.
curl -u $API_TOKEN: --data-binary "@path/to/atom-entry-study.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS
Example Atom entry (XML)
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:dcterms="http://purl.org/dc/terms/">
<!-- some embedded metadata -->
<dcterms:title>Roasting at Home</dcterms:title>
<dcterms:creator>Peets, John</dcterms:creator>
<dcterms:creator affiliation="Coffee Bean State University">Stumptown, Jane</dcterms:creator>
<!-- Dataverse controlled vocabulary subject term -->
<dcterms:subject>Chemistry</dcterms:subject>
<!-- keywords -->
<dcterms:subject>coffee</dcterms:subject>
<dcterms:subject>beverage</dcterms:subject>
<dcterms:subject>caffeine</dcterms:subject>
<dcterms:description>Considerations before you start roasting your own coffee at home.</dcterms:description>
<!-- Producer with financial or admin responsibility of the data -->
<dcterms:publisher>Coffee Bean State University</dcterms:publisher>
<dcterms:contributor type="Funder">CaffeineForAll</dcterms:contributor>
<!-- production date -->
<dcterms:date>2013-07-11</dcterms:date>
<!-- kind of data -->
<dcterms:type>aggregate data</dcterms:type>
<!-- List of sources of the data collection-->
<dcterms:source>Stumptown, Jane. 2011. Home Roasting. Coffeemill Press.</dcterms:source>
<!-- related materials -->
<dcterms:relation>Peets, John. 2010. Roasting Coffee at the Coffee Shop. Coffeemill Press</dcterms:relation>
<!-- geographic coverage -->
<dcterms:coverage>United States</dcterms:coverage>
<dcterms:coverage>Canada</dcterms:coverage>
<!-- license and restrictions -->
<dcterms:license>NONE</dcterms:license>
<dcterms:rights>Downloader will not use the Materials in any way prohibited by applicable laws.</dcterms:rights>
<!-- related publications -->
<dcterms:isReferencedBy holdingsURI="http://dx.doi.org/10.1038/dvn333" agency="DOI" IDNo="10.1038/dvn333">Peets, J., & Stumptown, J. (2013). Roasting at Home. New England Journal of Coffee, 3(1), 22-34.</dcterms:isReferencedBy>
</entry>
DC (terms: namespace) | Dataverse DB Element | Required | Note |
---|---|---|---|
dcterms:title | title | Y | Title of the Dataset. |
dcterms:creator | authorName (LastName, FirstName) | Y | Author(s) for the Dataset. |
dcterms:subject | subject (Controlled Vocabulary) OR keyword | Y | Controlled Vocabulary list is in our User Guide > Metadata References. |
dcterms:description | dsDescriptionValue | Y | Describing the purpose, scope or nature of the Dataset. Can also use dcterms:abstract. |
dcterms:publisher | producerName | Person or agency financially or administratively responsible for the Dataset | |
dcterms:contributor | datasetContactEmail | Y | Contact Email is required so will need to add an attribute type=”Contact”. Also used for Funder: add attribute type=”Funder” which maps to contributorName. |
dcterms:date | productionDate (YYYY-MM-DD or YYYY-MM or YYYY) | Production date of Dataset. | |
dcterms:type | kindOfData | Type of data included in the file: survey data, census/enumeration data, aggregate data, clinical. | |
dcterms:source | dataSources | List of books, articles, data files if any that served as the sources for the Dataset. | |
dcterms:relation | relatedMaterial | Any related material (journal article citation is not included here - see: dcterms:isReferencedBy below). | |
dcterms:coverage | otherGeographicCoverage | General information on the geographic coverage of the Dataset. | |
dcterms:license | license | Set the license to CC0 (default in Dataverse for new Datasets), otherwise enter “NONE” and fill in the dcterms:rights field. | |
dcterms:rights | termsofuse | If not using CC0, enter any terms of use or restrictions for the Dataset. | |
dcterms:isReferencedBy | publicationCitation | The publication (journal article, book, other work) that uses this dataset (include citation, permanent identifier (DOI), and permanent URL). |
You must have permission to add datasets in a dataverse (the dataverse should appear in the service document) to list the datasets inside. Institution-wide Shibboleth groups are not respected because membership in such a group can only be set via a browser.
curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS
You must have EditDataset permission (Contributor role or above such as Curator or Admin) on the dataset to add files.
curl -u $API_TOKEN: --data-binary @path/to/example.zip -H "Content-Disposition: filename=example.zip" -H "Content-Type: application/zip" -H "Packaging: http://purl.org/net/sword/package/SimpleZip" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/study/doi:TEST/12345
You must have ViewUnpublishedDataset permission (Contributor role or above such as Curator or Admin) on the dataset to view its Atom entry.
Contains data citation (bibliographicCitation), alternate URI (persistent URI of study), edit URI, edit media URI, statement URI.
curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345
Contains title, author, feed of file entries, latestVersionState, locked boolean, updated timestamp. You must have ViewUnpublishedDataset permission (Contributor role or above such as Curator or Admin) on the dataset to display the statement.
curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/statement/study/doi:TEST/12345
You must have EditDataset permission (Contributor role or above such as Curator or Admin) on the dataset to delete files.
curl -u $API_TOKEN: -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/123
Please note that ALL metadata (title, author, etc.) will be replaced, including fields that can not be expressed with “dcterms” fields. You must have EditDataset permission (Contributor role or above such as Curator or Admin) on the dataset to replace metadata.
curl -u $API_TOKEN: --upload-file "path/to/atom-entry-study2.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345
You must have the DeleteDatasetDraft permission (Contributor role or above such as Curator or Admin) on the dataset to delete it. Please note that if the dataset has never been published you will be able to delete it completely but if the dataset has already been published you will only be able to delete post-publication drafts, never a published version.
curl -u $API_TOKEN: -i -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345
This API endpoint is the same as the “list datasets in a dataverse” endpoint documented above and the same permissions apply but it is documented here separately to point out that you can look for a boolean called dataverseHasBeenReleased to know if a dataverse has been released, which is required for publishing a dataset.
curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS
The cat /dev/null and --data-binary @- arguments are used to send zero-length content to the API, which is required by the upstream library to process the In-Progress: false header. You must have the PublishDataverse permission (Admin role) on the dataverse to publish it.
cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/dataverse/$DATAVERSE_ALIAS
The cat /dev/null and --data-binary @- arguments are used to send zero-length content to the API, which is required by the upstream library to process the In-Progress: false header. You must have the PublishDataset permission (Curator or Admin role) on the dataset to publish it.
cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345