Native API

Dataverse 4 exposes most of its GUI functionality via a REST-based API. This section describes that functionality. Most API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header or in the URL as the key query parameter.

Note

CORS Some API endpoint allow CORS (cross-origin resource sharing), which makes them usable from scripts runing in web browsers. These endpoints are marked with a CORS badge.

Warning

Dataverse 4’s API is versioned at the URI - all API calls may include the version number like so: http://server-address/api/v1/.... Omitting the v1 part would default to the latest API version (currently 1). When writing scripts/applications that will be used for a long time, make sure to specify the API version, so they don’t break when the API is upgraded.

Endpoints

Dataverses

Generates a new dataverse under $id. Expects a JSON content describing the dataverse, as in the example below. If $id is omitted, a root dataverse is created. $id can either be a dataverse id (long) or a dataverse alias (more robust).

POST http://$SERVER/api/dataverses/$id?key=$apiKey

Download the JSON example file and modified to create dataverses to suit your needs. The fields name, alias, and dataverseContacts are required. The controlled vocabulary for dataverseType is

  • JOURNALS
  • LABORATORY
  • ORGANIZATIONS_INSTITUTIONS
  • RESEARCHERS
  • RESEARCH_GROUP
  • RESEARCH_PROJECTS
  • TEACHING_COURSES
  • UNCATEGORIZED
{
  "name": "Scientific Research",
  "alias": "science",
  "dataverseContacts": [
    {
      "contactEmail": "pi@example.edu"
    },
    {
      "contactEmail": "student@example.edu"
    }
  ],
  "affiliation": "Scientific Research University",
  "description": "We do all the science.",
  "dataverseType": "LABORATORY"
}

CORS View data about the dataverse identified by $id. $id can be the id number of the dataverse, its alias, or the special value :root.

GET http://$SERVER/api/dataverses/$id

Deletes the dataverse whose ID is given:

DELETE http://$SERVER/api/dataverses/$id?key=$apiKey

CORS Lists all the DvObjects under dataverse id.

GET http://$SERVER/api/dataverses/$id/contents

All the roles defined directly in the dataverse identified by id:

GET http://$SERVER/api/dataverses/$id/roles?key=$apiKey

CORS List all the facets for a given dataverse id.

GET http://$SERVER/api/dataverses/$id/facets?key=$apiKey

Creates a new role under dataverse id. Needs a json file with the role description:

POST http://$SERVER/api/dataverses/$id/roles?key=$apiKey

List all the role assignments at the given dataverse:

GET http://$SERVER/api/dataverses/$id/assignments?key=$apiKey

Assigns a new role, based on the POSTed JSON.

POST http://$SERVER/api/dataverses/$id/assignments?key=$apiKey

POSTed JSON example:

{
  "assignee": "@uma",
  "role": "curator"
}

Delete the assignment whose id is $id:

DELETE http://$SERVER/api/dataverses/$id/assignments/$id?key=$apiKey

CORS Get the metadata blocks defined on the passed dataverse:

GET http://$SERVER/api/dataverses/$id/metadatablocks?key=$apiKey

Sets the metadata blocks of the dataverse. Makes the dataverse a metadatablock root. The query body is a JSON array with a list of metadatablocks identifiers (either id or name).

POST http://$SERVER/api/dataverses/$id/metadatablocks?key=$apiKey

Get whether the dataverse is a metadata block root, or does it uses its parent blocks:

GET http://$SERVER/api/dataverses/$id/metadatablocks/isRoot?key=$apiKey

Set whether the dataverse is a metadata block root, or does it uses its parent blocks. Possible values are true and false (both are valid JSON expressions).

PUT http://$SERVER/api/dataverses/$id/metadatablocks/isRoot?key=$apiKey

Note

Previous endpoints GET http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey and POST http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey are deprecated, but supported.

Create a new dataset in dataverse id. The post data is a Json object, containing the dataset fields and an initial dataset version, under the field of "datasetVersion". The initial versions version number will be set to 1.0, and its state will be set to DRAFT regardless of the content of the json object. Example json can be found at data/dataset-create-new.json.

POST http://$SERVER/api/dataverses/$id/datasets/?key=$apiKey

Publish the Dataverse pointed by identifier, which can either by the dataverse alias or its numerical id.

POST http://$SERVER/api/dataverses/$identifier/actions/:publish?key=$apiKey

Datasets

Note Creation of new datasets is done with a POST onto dataverses. See Dataverses section.

Note In all commands below, dataset versions can be referred to as:

  • :draft the draft version, if any
  • :latest either a draft (if exists) or the latest published version.
  • :latest-published the latest published version
  • x.y a specific version, where x is the major version number and y is the minor version number.
  • x same as x.0

Note

Datasets can be accessed using persistent identifiers. This is done by passing the constant :persistentId where the numeric id of the dataset is expected, and then passing the actual persistent id as a query parameter with the name persistentId.

Example: Getting the dataset whose DOI is 10.5072/FK2/J8SJZB

GET http://$SERVER/api/datasets/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB

Getting its draft version:

GET http://$SERVER/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB

CORS Show the dataset whose id is passed:

GET http://$SERVER/api/datasets/$id?key=$apiKey

Delete the dataset whose id is passed:

DELETE http://$SERVER/api/datasets/$id?key=$apiKey

CORS List versions of the dataset:

GET http://$SERVER/api/datasets/$id/versions?key=$apiKey

CORS Show a version of the dataset. The Dataset also include any metadata blocks the data might have:

GET http://$SERVER/api/datasets/$id/versions/$versionNumber?key=$apiKey

CORS Export the metadata of the current published version of a dataset in various formats see Note below:

GET http://$SERVER/api/datasets/export?exporter=ddi&persistentId=$persistentId

Note

Supported exporters (export formats) are ddi, oai_ddi, dcterms, oai_dc, and dataverse_json.

CORS Lists all the file metadata, for the given dataset and version:

GET http://$SERVER/api/datasets/$id/versions/$versionId/files?key=$apiKey

CORS Lists all the metadata blocks and their content, for the given dataset and version:

GET http://$SERVER/api/datasets/$id/versions/$versionId/metadata?key=$apiKey

CORS Lists the metadata block block named blockname, for the given dataset and version:

GET http://$SERVER/api/datasets/$id/versions/$versionId/metadata/$blockname?key=$apiKey

Updates the current draft version of dataset $id. If the dataset does not have an draft version - e.g. when its most recent version is published, a new draft version is created. The invariant is - after a successful call to this command, the dataset has a DRAFT version with the passed data. The request body is a dataset version, in json format.

PUT http://$SERVER/api/datasets/$id/versions/:draft?key=$apiKey

Publishes the dataset whose id is passed. The new dataset version number is determined by the most recent version number and the type parameter. Passing type=minor increases the minor version number (2.3 is updated to 2.4). Passing type=major increases the major version number (2.3 is updated to 3.0):

POST http://$SERVER/api/datasets/$id/actions/:publish?type=$type&key=$apiKey

Note

POST should be used to publish a dataset. GET is supported for backward compatibility but is deprecated and may be removed: https://github.com/IQSS/dataverse/issues/2431

Deletes the draft version of dataset $id. Only the draft version can be deleted:

DELETE http://$SERVER/api/datasets/$id/versions/:draft?key=$apiKey

Sets the dataset field type to be used as the citation date for the given dataset (if the dataset does not include the dataset field type, the default logic is used). The name of the dataset field type should be sent in the body of the reqeust. To revert to the default logic, use :publicationDate as the $datasetFieldTypeName. Note that the dataset field used has to be a date field:

PUT http://$SERVER/api/datasets/$id/citationdate?key=$apiKey

Restores the default logic of the field type to be used as the citation date. Same as PUT with :publicationDate body:

DELETE http://$SERVER/api/datasets/$id/citationdate?key=$apiKey

List all the role assignments at the given dataset:

GET http://$SERVER/api/datasets/$id/assignments?key=$apiKey

Create a Private URL (must be able to manage dataset permissions):

POST http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey

Get a Private URL from a dataset (if available):

GET http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey

Delete a Private URL from a dataset (if it exists):

DELETE http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey

Add a file to an existing Dataset. Description and tags are optional:

POST http://$SERVER/api/datasets/$id/add?key=$apiKey

A more detailed “add” example using curl:

curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' -F 'jsonData={"description":"My description.","categories":["Data"]}' "https://example.dataverse.edu/api/datasets/:persistentId/add?persistentId=$PERSISTENT_ID"

Example python code to add a file. This may be run by changing these parameters in the sample code:

  • dataverse_server - e.g. https://dataverse.harvard.edu
  • api_key - See the top of this document for a description
  • persistentId - Example: doi:10.5072/FK2/6XACVA
  • dataset_id - Database id of the dataset

In practice, you only need one the dataset_id or the persistentId. The example below shows both uses.

from datetime import datetime
import json
import requests  # http://docs.python-requests.org/en/master/

# --------------------------------------------------
# Update the 4 params below to run this code
# --------------------------------------------------
dataverse_server = 'https://your dataverse server' # no trailing slash
api_key = 'api key'
dataset_id = 1  # database id of the dataset
persistentId = 'doi:10.5072/FK2/6XACVA' # doi or hdl of the dataset

# --------------------------------------------------
# Prepare "file"
# --------------------------------------------------
file_content = 'content: %s' % datetime.now()
files = {'file': ('sample_file.txt', file_content)}

# --------------------------------------------------
# Using a "jsonData" parameter, add optional description + file tags
# --------------------------------------------------
params = dict(description='Blue skies!',
            categories=['Lily', 'Rosemary', 'Jack of Hearts'])

params_as_json_string = json.dumps(params)

payload = dict(jsonData=params_as_json_string)

# --------------------------------------------------
# Add file using the Dataset's id
# --------------------------------------------------
url_dataset_id = '%s/api/datasets/%s/add?key=%s' % (dataverse_server, dataset_id, api_key)

# -------------------
# Make the request
# -------------------
print '-' * 40
print 'making request: %s' % url_dataset_id
r = requests.post(url_dataset_id, data=payload, files=files)

# -------------------
# Print the response
# -------------------
print '-' * 40
print r.json()
print r.status_code

# --------------------------------------------------
# Add file using the Dataset's persistentId (e.g. doi, hdl, etc)
# --------------------------------------------------
url_persistent_id = '%s/api/datasets/:persistentId/add?persistentId=%s&key=%s' % (dataverse_server, persistentId, api_key)

# -------------------
# Update the file content to avoid a duplicate file error
# -------------------
file_content = 'content2: %s' % datetime.now()
files = {'file': ('sample_file2.txt', file_content)}


# -------------------
# Make the request
# -------------------
print '-' * 40
print 'making request: %s' % url_persistent_id
r = requests.post(url_persistent_id, data=payload, files=files)

# -------------------
# Print the response
# -------------------
print '-' * 40
print r.json()
print r.status_code

Files

Note

Please note that files can be added via the native API but the operation is performed on the parent object, which is a dataset. Please see the “Datasets” endpoint above for more information.

Replace an existing file where id is the database id of the file to replace. Note that metadata such as description and tags are not carried over from the file being replaced:

POST http://$SERVER/api/files/{id}/replace?key=$apiKey

A more detailed “replace” example using curl (note that forceReplace is for replacing one file type with another):

curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' "https://example.dataverse.edu/api/files/$FILE_ID/replace"

Example python code to replace a file. This may be run by changing these parameters in the sample code:

  • dataverse_server - e.g. https://dataverse.harvard.edu
  • api_key - See the top of this document for a description
  • file_id - Database id of the file to replace (returned in the GET API for a Dataset)
from datetime import datetime
import json
import requests  # http://docs.python-requests.org/en/master/

# --------------------------------------------------
# Update params below to run code
# --------------------------------------------------
dataverse_server = 'http://127.0.0.1:8080' # no trailing slash
api_key = 'some key'
file_id = 1401  # id of the file to replace

# --------------------------------------------------
# Prepare replacement "file"
# --------------------------------------------------
file_content = 'content: %s' % datetime.now()
files = {'file': ('replacement_file.txt', file_content)}

# --------------------------------------------------
# Using a "jsonData" parameter, add optional description + file tags
# --------------------------------------------------
params = dict(description='Sunset',
            categories=['One', 'More', 'Cup of Coffee'])

# -------------------
# IMPORTANT: If the mimetype of the replacement file differs
#   from the origina file, the replace will fail
#
#  e.g. if you try to replace a ".csv" with a ".png" or something similar
#
#  You can override this with a "forceReplace" parameter
# -------------------
params['forceReplace'] = True


params_as_json_string = json.dumps(params)

payload = dict(jsonData=params_as_json_string)

print 'payload', payload
# --------------------------------------------------
# Replace file using the id of the file to replace
# --------------------------------------------------
url_replace = '%s/api/v1/files/%s/replace?key=%s' % (dataverse_server, file_id, api_key)

# -------------------
# Make the request
# -------------------
print '-' * 40
print 'making request: %s' % url_replace
r = requests.post(url_replace, data=payload, files=files)

# -------------------
# Print the response
# -------------------
print '-' * 40
print r.json()
print r.status_code

Builtin Users

Builtin users are known as “Username/Email and Password” users in the Account Creation + Management of the User Guide. Dataverse stores a password (encrypted, of course) for these users, which differs from “remote” users such as Shibboleth or OAuth users where the password is stored elsewhere. See also “Auth Modes: Local vs. Remote vs. Both” in the Configuration section of the Installation Guide. It’s a valid configuration of Dataverse to not use builtin users at all.

Creating a Builtin User

For security reasons, builtin users cannot be created via API unless the team who runs the Dataverse installation has populated a database setting called BuiltinUsers.KEY, which is described under “Securing Your Installation” and “Database Settings” in the Configuration section of the Installation Guide. You will need to know the value of BuiltinUsers.KEY before you can proceed.

To create a builtin user via API, you must first construct a JSON document. You can download user-add.json or copy the text below as a starting point and edit as necessary.

{
  "firstName": "Lisa",
  "lastName": "Simpson",
  "userName": "lsimpson",
  "affiliation": "Springfield",
  "position": "Student",
  "email": "lsimpson@mailinator.com"
}

Place this user-add.json file in your current directory and run the following curl command, substituting variables as necessary. Note that both the password of the new user and the value of BuiltinUsers.KEY are passed as query parameters:

curl -d @user-add.json -H "Content-type:application/json" "$SERVER_URL/api/builtin-users?password=$NEWUSER_PASSWORD&key=$BUILTIN_USERS_KEY"

Retrieving the API Token of a Builtin User

To retrieve the API token of a builtin user, given that user’s password, use the curl command below:

curl "$SERVER_URL/api/builtin-users/$DV_USER_NAME/api-token?password=$DV_USER_PASSWORD"

Roles

Creates a new role in dataverse object whose Id is dataverseIdtf (that’s an id/alias):

POST http://$SERVER/api/roles?dvo=$dataverseIdtf&key=$apiKey

Shows the role with id:

GET http://$SERVER/api/roles/$id

Deletes the role with id:

DELETE http://$SERVER/api/roles/$id

Explicit Groups

Explicit groups list their members explicitly. These groups are defined in dataverses, which is why their API endpoint is under api/dataverses/$id/, where $id is the id of the dataverse.

Create a new explicit group under dataverse $id:

POST http://$server/api/dataverses/$id/groups

Data being POSTed is json-formatted description of the group:

{
 "description":"Describe the group here",
 "displayName":"Close Collaborators",
 "aliasInOwner":"ccs"
}

List explicit groups under dataverse $id:

GET http://$server/api/dataverses/$id/groups

Show group $groupAlias under dataverse $dv:

GET http://$server/api/dataverses/$dv/groups/$groupAlias

Update group $groupAlias under dataverse $dv. The request body is the same as the create group one, except that the group alias cannot be changed. Thus, the field aliasInOwner is ignored.

PUT http://$server/api/dataverses/$dv/groups/$groupAlias

Delete group $groupAlias under dataverse $dv:

DELETE http://$server/api/dataverses/$dv/groups/$groupAlias

Bulk add role assignees to an explicit group. The request body is a JSON array of role assignee identifiers, such as @admin, &ip/localhosts or :authenticated-users:

POST http://$server/api/dataverses/$dv/groups/$groupAlias/roleAssignees

Add a single role assignee to a group. Request body is ignored:

PUT http://$server/api/dataverses/$dv/groups/$groupAlias/roleAssignees/$roleAssigneeIdentifier

Remove a single role assignee from an explicit group:

DELETE http://$server/api/dataverses/$dv/groups/$groupAlias/roleAssignees/$roleAssigneeIdentifier

Shibboleth Groups

Management of Shibboleth groups via API is documented in the Shibboleth section of the Installation Guide.

Info

CORS Get the Dataverse version. The response contains the version and build numbers:

GET http://$SERVER/api/info/version

Get the server name. This is useful when a Dataverse system is composed of multiple Java EE servers behind a load balancer:

GET http://$SERVER/api/info/server

For now, only the value for the :DatasetPublishPopupCustomText setting from the Configuration section of the Installation Guide is exposed:

GET http://$SERVER/api/info/settings/:DatasetPublishPopupCustomText

Metadata Blocks

CORS Lists brief info about all metadata blocks registered in the system:

GET http://$SERVER/api/metadatablocks

CORS Return data about the block whose identifier is passed. identifier can either be the block’s id, or its name:

GET http://$SERVER/api/metadatablocks/$identifier

Admin

This is the administrative part of the API. For security reasons, it is absolutely essential that you block it before allowing public access to a Dataverse installation. Blocking can be done using settings. See the post-install-api-block.sh script in the scripts/api folder for details. See also “Blocking API Endpoints” under “Securing Your Installation” in the Configuration section of the Installation Guide.

List all settings:

GET http://$SERVER/api/admin/settings

Sets setting name to the body of the request:

PUT http://$SERVER/api/admin/settings/$name

Get the setting under name:

GET http://$SERVER/api/admin/settings/$name

Delete the setting under name:

DELETE http://$SERVER/api/admin/settings/$name

List the authentication provider factories. The alias field of these is used while configuring the providers themselves.

GET http://$SERVER/api/admin/authenticationProviderFactories

List all the authentication providers in the system (both enabled and disabled):

GET http://$SERVER/api/admin/authenticationProviders

Add new authentication provider. The POST data is in JSON format, similar to the JSON retrieved from this command’s GET counterpart.

POST http://$SERVER/api/admin/authenticationProviders

Show data about an authentication provider:

GET http://$SERVER/api/admin/authenticationProviders/$id

Enable or disable an authentication provider (denoted by id):

PUT http://$SERVER/api/admin/authenticationProviders/$id/enabled

Note

The former endpoint, ending with :enabled (that is, with a colon), is still supported, but deprecated.

Check whether an authentication proider is enabled:

GET http://$SERVER/api/admin/authenticationProviders/$id/enabled

The body of the request should be either true or false. Content type has to be application/json, like so:

curl -H "Content-type: application/json"  -X POST -d"false" http://localhost:8080/api/admin/authenticationProviders/echo-dignified/:enabled

Deletes an authentication provider from the system. The command succeeds even if there is no such provider, as the postcondition holds: there is no provider by that id after the command returns.

DELETE http://$SERVER/api/admin/authenticationProviders/$id/

List all global roles in the system.

GET http://$SERVER/api/admin/roles

Creates a global role in the Dataverse installation. The data POSTed are assumed to be a role JSON.

POST http://$SERVER/api/admin/roles

List all users:

GET http://$SERVER/api/admin/authenticatedUsers

List user whose identifier (without the @ sign) is passed:

GET http://$SERVER/api/admin/authenticatedUsers/$identifier

Sample output using “dataverseAdmin” as the identifier:

{
  "authenticationProviderId": "builtin",
  "persistentUserId": "dataverseAdmin",
  "position": "Admin",
  "id": 1,
  "identifier": "@dataverseAdmin",
  "displayName": "Dataverse Admin",
  "firstName": "Dataverse",
  "lastName": "Admin",
  "email": "dataverse@mailinator.com",
  "superuser": true,
  "affiliation": "Dataverse.org"
}

Create an authenticateUser:

POST http://$SERVER/api/admin/authenticatedUsers

POSTed JSON example:

{
  "authenticationProviderId": "orcid",
  "persistentUserId": "0000-0002-3283-0661",
  "identifier": "@pete",
  "firstName": "Pete K.",
  "lastName": "Dataversky",
  "email": "pete@mailinator.com"
}

Toggles superuser mode on the AuthenticatedUser whose identifier (without the @ sign) is passed.

POST http://$SERVER/api/admin/superuser/$identifier

List all role assignments of a role assignee (i.e. a user or a group):

GET http://$SERVER/api/admin/assignments/assignees/$identifier

Note that identifier can contain slashes (e.g. &ip/localhost-users).

List permissions a user (based on API Token used) has on a dataverse or dataset:

GET http://$SERVER/api/admin/permissions/$identifier

The $identifier can be a dataverse alias or database id or a dataset persistent ID or database id.

List a role assignee (i.e. a user or a group):

GET http://$SERVER/api/admin/assignee/$identifier

The $identifier should start with an @ if it’s a user. Groups start with &. “Built in” users and groups start with :. Private URL users start with #.

IpGroups

Lists all the ip groups:

GET http://$SERVER/api/admin/groups/ip

Adds a new ip group. POST data should specify the group in JSON format. Examples are available at the data folder. Using this method, an IP Group is always created, but its alias might be different than the one appearing in the JSON file, to ensure it is unique.

POST http://$SERVER/api/admin/groups/ip

Creates or updates the ip group $groupAlias.

POST http://$SERVER/api/admin/groups/ip/$groupAlias

Returns a the group in a JSON format. $groupIdtf can either be the group id in the database (in case it is numeric), or the group alias.

GET http://$SERVER/api/admin/groups/ip/$groupIdtf

Deletes the group specified by groupIdtf. groupIdtf can either be the group id in the database (in case it is numeric), or the group alias. Note that a group can be deleted only if there are no roles assigned to it.

DELETE http://$SERVER/api/admin/groups/ip/$groupIdtf

Dataset Integrity

Recalculate the UNF value of a dataset version, if it’s missing, by supplying the dataset version database id:

POST http://$SERVER/api/admin/datasets/integrity/{datasetVersionId}/fixmissingunf