Dataset Migration API

The Dataverse software includes several ways to add Datasets originally created elsewhere (not to mention Harvesting capabilities). These include the Sword API (see the SWORD API guide) and the /dataverses/{id}/datasets/:import methods (json and ddi) (see the Native API guide).

This experimental migration API offers an additional option with some potential advantages:

  • Metadata can be specified using the json-ld format used in the OAI-ORE metadata export. Please note that the json-ld generated by OAI-ORE metadata export is not directly compatible with the Migration API. OAI-ORE export nests resource metadata under ore:describes wrapper and Dataset Migration API requires that metadata is on the root level. Please check example file below for reference.

    • If you need a tool to convert OAI-ORE exported json-ld into a format compatible with the Dataset Migration API, or if you need to generate compatible json-ld from sources other than an existing Dataverse installation, the BaseX database engine, used together with the XQuery language, provides an efficient solution. Please see example script transform-oai-ore-jsonld.xq for a simple conversion from exported OAI-ORE json-ld to a Dataset Migration API -compatible version.

  • Existing publication dates and PIDs are maintained (currently limited to the case where the PID can be managed by the Dataverse software, e.g. where the authority and shoulder match those the software is configured for)

  • Updating the PID at the provider can be done immediately or later (with other existing APIs).

  • Adding files can be done via the standard APIs, including using direct-upload to S3.

This API consists of 2 calls: one to create an initial Dataset version, and one to ‘republish’ the dataset through Dataverse with a specified publication date. Both calls require super-admin privileges.

These calls can be used in concert with other API calls to add files, update metadata, etc. before the ‘republish’ step is done.

Start Migrating a Dataset into a Dataverse Collection

Note

This action requires a Dataverse installation account with superuser permissions.

To import a dataset with an existing persistent identifier (PID), the provided json-ld metadata should include it.

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export DATAVERSE_ID=root

curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:startmigration --upload-file dataset-migrate.jsonld

An example jsonld file is available at dataset-migrate.jsonld . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance.

You also need to replace the dataverse.siteUrl in the json-ld @context with your current Dataverse site URL. This is necessary to define a local URI for metadata terms originating from community metadata blocks (in the case of the example file, from the Social Sciences and Humanities and Geospatial blocks).

Currently, as of Dataverse 6.5 and earlier, community metadata blocks do not assign a default global URI to the terms used in the block in contrast to citation metadata, which has global URI defined.

Publish a Migrated Dataset

The call above creates a Dataset. Once it is created, other APIs can be used to add files, add additional metadata, etc. When a version is complete, the following call can be used to publish it with its original publication date.

Note

This action requires a Dataverse installation account with superuser permissions.

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org

curl -H 'Content-Type: application/ld+json' -H X-Dataverse-key:$API_TOKEN -X POST -d '{"schema:datePublished": "2020-10-26","@context":{ "schema":"http://schema.org/"}}' "$SERVER_URL/api/datasets/{id}/actions/:releasemigrated"

datePublished is the only metadata supported in this call.

An optional query parameter: updatepidatprovider (default is false) can be set to true to automatically update the metadata and targetUrl of the PID at the provider. With this set true, the result of this call will be that the PID redirects to this dataset rather than the dataset in the source repository.

curl -H 'Content-Type: application/ld+json' -H X-Dataverse-key:$API_TOKEN -X POST -d '{"schema:datePublished": "2020-10-26","@context":{ "schema":"http://schema.org/"}}' "$SERVER_URL/api/datasets/{id}/actions/:releasemigrated?updatepidatprovider=true"

If the parameter is not added and set to true, other existing APIs can be used to update the PID at the provider later, e.g. Update Target URL for a Published Dataset at the PID provider