SWORD API

SWORD stands for “Simple Web-service Offering Repository Deposit” and is a “profile” of AtomPub (RFC 5023) which is a RESTful API that allows non-Dataverse software to deposit files and metadata into a Dataverse installation. Client libraries are available in Python, Java, R, Ruby, and PHP.

Introduced in Dataverse Network (DVN) 3.6, the SWORD API was formerly known as the “Data Deposit API” and data-deposit/v1 appeared in the URLs. For backwards compatibility these URLs will continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to v1.1. For details, see Backward incompatible changes.

Dataverse implements most of SWORDv2, which is specified at http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html . Please reference the SWORDv2 specification for expected HTTP status codes (i.e. 201, 204, 404, etc.), headers (i.e. “Location”), etc. For a quick introduction to SWORD, the two minute video at http://cottagelabs.com/news/intro-to-sword-2 is recommended.

As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse 4.0 datasets can also be created via JSON using the “native” API.

Backward incompatible changes

For better security, usernames and passwords are no longer accepted. The use of an API token is required.

In addition, differences in Dataverse 4.0 have lead to a few minor backward incompatible changes in the Dataverse implementation of SWORD, which are listed below. Old v1 URLs should continue to work but the Service Document will contain a deprecation warning and responses will contain v1.1 URLs. See also Known issues.

New features as of v1.1

  • Dataverse 4.0 supports API tokens and they must be used rather that a username and password. In the curl examples below, you will see curl -u $API_TOKEN: showing that you should send your API token as the username and nothing as the password. For example, curl -u 54b143b5-d001-4254-afc0-a1c0f6a5b5a7:.
  • Dataverses can be published via SWORD
  • Datasets versions will only be increased to the next minor version (i.e. 1.1) rather than a major version (2.0) if possible. This depends on the nature of the change.
  • “Author Affiliation” can now be populated with an XML attribute. For example: <dcterms:creator affiliation=”Coffee Bean State University”>Stumptown, Jane</dcterms:creator>
  • “Contributor” can now be populated and the “Type” (Editor, Funder, Researcher, etc.) can be specified with an XML attribute. For example: <dcterms:contributor type=”Funder”>CaffeineForAll</dcterms:contributor>
  • “License” can now be set with dcterms:license and the possible values are “CC0” and “NONE”. “License” interacts with “Terms of Use” (dcterms:rights) in that if you include dcterms:rights in the XML, the license will be set to “NONE”. If you don’t include dcterms:rights, the license will default to “CC0”. It is invalid to specify “CC0” as a license and also include dcterms:rights; an error will be returned. For backwards compatibility, dcterms:rights is allowed to be blank (i.e. <dcterms:rights></dcterms:rights>) but blank values will not be persisted to the database and the license will be set to “NONE”.
  • “Contact E-mail” is automatically populated from dataset owners email.
  • “Subject” uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > Metadata References. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in “Keyword”. If Subject is empty it is automatically populated with “N/A”.
  • Zero-length files are now allowed (but not necessarily encouraged).
  • “Depositor” and “Deposit Date” are auto-populated.

curl examples

Retrieve SWORD service document

The service document enumerates the dataverses (“collections” from a SWORD perspective) the user can deposit data into. The “collectionPolicy” element for each dataverse contains the Terms of Use.

curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/service-document

Create a dataset with an Atom entry

curl -u $API_TOKEN: --data-binary "@path/to/atom-entry-study.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS

Example Atom entry (XML)

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:dcterms="http://purl.org/dc/terms/">
   <!-- some embedded metadata -->
   <dcterms:title>Roasting at Home</dcterms:title>
   <dcterms:creator>Peets, John</dcterms:creator>
   <dcterms:creator affiliation="Coffee Bean State University">Stumptown, Jane</dcterms:creator>
   <!-- Dataverse controlled vocabulary subject term -->
   <dcterms:subject>Chemistry</dcterms:subject>
   <!-- keywords -->
   <dcterms:subject>coffee</dcterms:subject>
   <dcterms:subject>beverage</dcterms:subject>
   <dcterms:subject>caffeine</dcterms:subject>
   <dcterms:description>Considerations before you start roasting your own coffee at home.</dcterms:description>
   <!-- Producer with financial or admin responsibility of the data -->
   <dcterms:publisher>Coffee Bean State University</dcterms:publisher>
   <dcterms:contributor type="Funder">CaffeineForAll</dcterms:contributor>
   <!-- production date -->
   <dcterms:date>2013-07-11</dcterms:date>
   <!-- kind of data -->
   <dcterms:type>aggregate data</dcterms:type>
   <!-- List of sources of the data collection-->
   <dcterms:source>Stumptown, Jane. 2011. Home Roasting. Coffeemill Press.</dcterms:source>
   <!-- related materials -->
   <dcterms:relation>Peets, John. 2010. Roasting Coffee at the Coffee Shop. Coffeemill Press</dcterms:relation>
   <!-- geographic coverage -->
   <dcterms:coverage>United States</dcterms:coverage>
   <dcterms:coverage>Canada</dcterms:coverage>
   <!-- license and restrictions -->
   <dcterms:license>NONE</dcterms:license>
   <dcterms:rights>Downloader will not use the Materials in any way prohibited by applicable laws.</dcterms:rights>
   <!-- related publications -->
   <dcterms:isReferencedBy holdingsURI="http://dx.doi.org/10.1038/dvn333" agency="DOI" IDNo="10.1038/dvn333">Peets, J., &amp; Stumptown, J. (2013). Roasting at Home. New England Journal of Coffee, 3(1), 22-34.</dcterms:isReferencedBy>
</entry>

Dublin Core Terms (DC Terms) Qualified Mapping - Dataverse DB Element Crosswalk

DC (terms: namespace) Dataverse DB Element Required Note
dcterms:title title Y Title of the Dataset.
dcterms:creator authorName (LastName, FirstName) Y Author(s) for the Dataset.
dcterms:subject subject (Controlled Vocabulary) OR keyword Y Controlled Vocabulary list is in our User Guide > Metadata References.
dcterms:description dsDescriptionValue Y Describing the purpose, scope or nature of the Dataset. Can also use dcterms:abstract.
dcterms:publisher producerName   Person or agency financially or administratively responsible for the Dataset
dcterms:contributor datasetContactEmail Y Contact Email is required so will need to add an attribute type=”Contact”. Also used for Funder: add attribute type=”Funder” which maps to contributorName.
dcterms:date productionDate (YYYY-MM-DD or YYYY-MM or YYYY)   Production date of Dataset.
dcterms:type kindOfData   Type of data included in the file: survey data, census/enumeration data, aggregate data, clinical.
dcterms:source dataSources   List of books, articles, data files if any that served as the sources for the Dataset.
dcterms:relation relatedMaterial   Any related material (journal article citation is not included here - see: dcterms:isReferencedBy below).
dcterms:coverage otherGeographicCoverage   General information on the geographic coverage of the Dataset.
dcterms:license license   Set the license to CC0 (default in Dataverse for new Datasets), otherwise enter “NONE” and fill in the dcterms:rights field.
dcterms:rights termsofuse   If not using CC0, enter any terms of use or restrictions for the Dataset.
dcterms:isReferencedBy publicationCitation   The publication (journal article, book, other work) that uses this dataset (include citation, permanent identifier (DOI), and permanent URL).

List datasets in a dataverse

curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS

Add files to a dataset with a zip file

curl -u $API_TOKEN: --data-binary @path/to/example.zip -H "Content-Disposition: filename=example.zip" -H "Content-Type: application/zip" -H "Packaging: http://purl.org/net/sword/package/SimpleZip" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/study/doi:TEST/12345

Display a dataset atom entry

Contains data citation (bibliographicCitation), alternate URI (persistent URI of study), edit URI, edit media URI, statement URI.

curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345

Display a dataset statement

Contains title, author, feed of file entries, latestVersionState, locked boolean, updated timestamp.

curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/statement/study/doi:TEST/12345

Delete a file by database id

curl -u $API_TOKEN: -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/123

Replacing metadata for a dataset

Please note that ALL metadata (title, author, etc.) will be replaced, including fields that can not be expressed with “dcterms” fields.

curl -u $API_TOKEN: --upload-file "path/to/atom-entry-study2.xml" -H "Content-Type: application/atom+xml" https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345

Delete a dataset

curl -u $API_TOKEN: -i -X DELETE https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345

Determine if a dataverse has been published

Look for a dataverseHasBeenReleased boolean.

curl -u $API_TOKEN: https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/$DATAVERSE_ALIAS

Publish a dataverse

The cat /dev/null and --data-binary @- arguments are used to send zero-length content to the API, which is required by the upstream library to process the In-Progress: false header.

cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/dataverse/$DATAVERSE_ALIAS

Publish a dataset

The cat /dev/null and --data-binary @- arguments are used to send zero-length content to the API, which is required by the upstream library to process the In-Progress: false header.

cat /dev/null | curl -u $API_TOKEN: -X POST -H "In-Progress: false" --data-binary @- https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:TEST/12345

Known issues

Roadmap

These are features we’d like to add in the future:

Bug fixes in v1.1