Building External Tools
External tools can provide additional features that are not part of the Dataverse Software itself, such as data exploration. Thank you for your interest in building an external tool for the Dataverse Software!
Introduction
External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term “external” is used to indicate that the tool is not part of the main Dataverse Software.
Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you’ve deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org
In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc.
The possibilities for external tools are endless. Let’s look at some examples to get your creative juices flowing. Then we’ll look at a complete list of parameters you can use when creating the manifest file for your tool.
If you’re still looking for more information on external tools, you can also watch a video introduction called Background on the External Tool Framework (slides) from the 2020 Dataverse Community Meeting.
Examples of External Tools
Note: This is the same list that appears in the External Tools section of the Admin Guide.
Tool |
Type |
Scope |
Description |
---|---|---|---|
Data Explorer |
explore |
file |
A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse. |
Whole Tale |
explore |
dataset |
A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the Whole Tale User Guide. |
Binder |
explore |
dataset |
Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. See https://github.com/IQSS/dataverse-binder-redirect for installation instructions. |
File Previewers |
explore |
file |
A set of tools that display the content of files - including audio, html, Hypothes.is annotations, images, PDF, Markdown, text, video, tabular data, spreadsheets, GeoJSON, zip, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers |
Data Curation Tool |
configure |
file |
A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions. |
Ask the Data |
query |
file |
Ask the Data is an experimental tool that allows you ask natural language questions about the data contained in Dataverse tables (tabular data). See the README.md file at https://github.com/IQSS/askdataverse/tree/main/askthedata for the instructions on adding Ask the Data to your Dataverse installation. |
TurboCurator by ICPSR |
configure |
dataset |
TurboCurator generates metadata improvements for title, description, and keywords. It relies on open AI’s ChatGPT & ICPSR best practices. See the TurboCurator Dataverse Administrator page for more details on how it works and adding TurboCurator to your Dataverse installation. |
JupyterHub |
explore |
file |
The Dataverse-to-JupyterHub Data Transfer Connector is a tool that simplifies the transfer of data between Dataverse repositories and the cloud-based platform JupyterHub. It is designed for researchers, scientists, and data analysts, facilitating collaboration on projects by seamlessly moving datasets and files. The tool is a lightweight client-side web application built using React and relies on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently optimized for small to medium-sized files, future plans include extending support for larger files and signed Dataverse endpoints. For more details, you can refer to the external tool manifest: https://forgemia.inra.fr/dipso/eosc-pillar/dataverse-jupyterhub-connector/-/blob/master/externalTools.json |
How External Tools Are Presented to Users
An external tool can appear in your Dataverse installation in a variety of ways:
as an explore, preview, query or configure option for a file
as an explore or configure option for a dataset
as an embedded preview on the file landing page
See also the Testing External Tools section of the Admin Guide for some perspective on how Dataverse installations will expect to test your tool before announcing it to their users.
Creating an External Tool Manifest
External tools must be expressed in an external tool manifest file, a specific JSON format a Dataverse installation requires. As the author of an external tool, you are expected to provide this JSON file and installation instructions on a web page for your tool.
Examples of Manifests
Let’s look at a few examples of external tool manifests (both at the file level and at the dataset level) before we dive into how they work.
External Tools for Files
fabulousFileTool.json
is a file level (both an “explore” tool and a “preview” tool) that operates on tabular files:
{
"displayName": "Fabulous File Tool",
"description": "A non-existent tool that is fabulous fun for files!",
"toolName": "fabulous",
"scope": "file",
"types": [
"explore",
"preview"
],
"toolUrl": "https://fabulousfiletool.com",
"contentType": "text/tab-separated-values",
"httpMethod":"GET",
"toolParameters": {
"queryParameters": [
{
"fileid": "{fileId}"
},
{
"datasetPid": "{datasetPid}"
},
{
"locale":"{localeCode}"
}
]
},
"allowedApiCalls": [
{
"name":"retrieveDataFile",
"httpMethod":"GET",
"urlTemplate":"/api/v1/access/datafile/{fileId}",
"timeOut":270
}
]
}
auxFileTool.json
is a file level preview tool that operates on auxiliary files associated with a data file (note the “requirements” section):
{
"displayName": "AuxFileViewer",
"description": "Show an auxiliary file from a dataset file.",
"toolName": "auxPreviewer",
"scope": "file",
"types": [
"preview"
],
"toolUrl": "https://example.com/AuxFileViewer.html",
"toolParameters": {
"queryParameters": [
{
"fileid": "{fileId}"
}
]
},
"requirements": {
"auxFilesExist": [
{
"formatTag": "myFormatTag",
"formatVersion": "0.1"
}
]
},
"contentType": "application/foobar"
}
External Tools for Datasets
dynamicDatasetTool.json
is a dataset level explore tool:
{
"displayName": "Dynamic Dataset Tool",
"description": "Dazzles! Dizzying!",
"scope": "dataset",
"types": [
"explore"
],
"toolUrl": "https://dynamicdatasettool.com/v2",
"toolParameters": {
"queryParameters": [
{
"PID": "{datasetPid}"
},
{
"locale":"{localeCode}"
}
]
},
"allowedApiCalls": [
{
"name":"retrieveDatasetJson",
"httpMethod":"GET",
"urlTemplate":"/api/v1/datasets/{datasetId}",
"timeOut":10
}
]
}
Terminology
Term |
Definition |
---|---|
external tool manifest |
A JSON file the defines the URL constructed by a Dataverse installation when users click explore or configure tool options. External tool makers are asked to host this JSON file on a website (no app store yet, sorry) and explain how to use install and use the tool. Examples include |
displayName |
The name of the tool in the Dataverse installation web interface. For example, “Data Explorer”. |
description |
The description of the tool, which appears in a popup (for configure tools only) so the user who clicked the tool can learn about the tool before being redirected to the tool in a new tab in their browser. HTML is supported. |
scope |
Whether the external tool appears and operates at the file level or the dataset level. Note that a file level tool much also specify the type of file it operates on (see “contentType” below). |
types |
Whether the external tool is an explore tool, a preview tool, a query tool, a configure tool or any combination of these (multiple types are supported for a single tool). Configure tools require an API token because they make changes to data files (files within datasets). The older “type” keyword that allows you to pass a single type as a string is deprecated but still supported. |
toolUrl |
The base URL of the tool before query parameters are added. |
contentType |
File level tools operate on a specific file type (content type or MIME type such as “application/pdf”) and this must be specified. Dataset level tools do not use contentType. |
toolParameters |
httpMethod, queryParameters, and allowedApiCalls are supported and described below. |
httpMethod |
Either |
queryParameters |
Key/value combinations that can be appended to the toolUrl. For example, once substitution takes place (described below) the user may be redirected to |
query parameter keys |
An arbitrary string to associate with a value that is populated with a reserved word (described below). As the author of the tool, you have control over what “key” you would like to be passed to your tool. For example, if you want to have your tool receive and operate on the query parameter “dataverseFileId=42” instead of just “fileId=42”, that’s fine. |
query parameter values |
A mechanism for substituting reserved words with dynamic content. For example, in your manifest file, you can use a reserved word (described below) such as |
reserved words |
A set of strings surrounded by curly braces such as |
allowedApiCalls |
An array of objects defining callbacks the tool is allowed to make to the Dataverse API. If the dataset or file being accessed is not public, the callback URLs will be signed to allow the tool access for a defined time. |
allowedApiCalls name |
A name the tool will use to identify this callback URL such as |
allowedApiCalls urlTemplate |
The relative URL for the callback using reserved words to indicate where values should by dynamically substituted such as |
allowedApiCalls httpMethod |
Which HTTP method the specified callback uses such as |
allowedApiCalls timeOut |
For non-public datasets and datafiles, how many minutes the signed URLs given to the tool should be valid for. Must be an integer. |
requirements |
Resources your tool needs to function. For now, the only requirement you can specify is that one or more auxiliary files exist (see auxFilesExist in the External Tools for Files example). Currently, requirements only apply to preview tools. If the requirements are not met, the preview tool is not shown. |
auxFilesExist |
An array containing formatTag and formatVersion pairs for each auxiliary file that your tool needs to download to function properly. For example, a required aux file could have a |
toolName |
A name of an external tool that is used to differentiate between external tools and also used in bundle.properties for localization in the Dataverse installation web interface. For example, the toolName for Data Explorer is |
Reserved Words
Reserved word |
Status |
Description |
---|---|---|
|
optional |
The URL of the Dataverse installation from which the tool was launched. For example, |
|
depends |
The database ID of a file the user clicks “Explore” or “Configure” on. For example, |
|
depends |
The Persistent ID (DOI or Handle) of a file the user clicks “Explore” or “Configure” on. For example, |
|
optional |
The Dataverse installation’s API token of the user launching the external tool, if available. Please note that API tokens should be treated with the same care as a password. For example, |
|
depends |
The database ID of the dataset. For example, |
|
depends |
The Persistent ID (DOI or Handle) of the dataset. For example, |
|
optional |
The friendly version number ( or :draft ) of the dataset version the file level tool is being launched from. For example, |
|
optional |
The code for the language (“en” for English, “fr” for French, etc.) that user has selected from the language toggle in a Dataverse installation. See also Internationalization. |
Internationalization of Your External Tool
The name and description of your tool can be localized and made available in different languages in your Dataverse installation’s web interface. Use the toolName
parameter in the manifest JSON file and then add that toolName to bundle.properties.
For example, if the toolName
of your external tool is fabulous
then the lines in Bundle.properties should be:
externaltools.fabulous.displayname=Fabulous File Tool
externaltools.fabulous.description=Fabulous Fun for Files!
Using Example Manifests to Get Started
Again, you can use fabulousFileTool.json
or dynamicDatasetTool.json
as a starting point for your own manifest file.
Additional working examples, including ones using Signed URLs, are available at https://github.com/gdcc/dataverse-previewers .
Testing Your External Tool
As the author of an external tool, you are not expected to learn how to install and operate a Dataverse installation. There’s a very good chance your tool can be added to a server Dataverse Community developers use for testing if you reach out on any of the channels listed under Getting Help in the Developer Guide.
By all means, if you’d like to install a Dataverse installation yourself, a number of developer-centric options are available. For example, there’s a script to spin up a Dataverse installation on EC2 at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible . The process for using curl to add your external tool to your Dataverse installation is documented under Managing External Tools in the Admin Guide.
Spreading the Word About Your External Tool
Adding Your Tool to the Inventory of External Tools
Once you’ve gotten your tool working, please make a pull request to update the list of tools above! You are also welcome to download dataverse-external-tools.tsv
, add your tool to the TSV file, create and issue at https://github.com/IQSS/dataverse/issues , and then upload your TSV file there.
Unless your tool runs entirely in a browser, you may have integrated server-side software with your Dataverse installation. If so, please double check that your software is listed in the Integrations section of the Admin Guide and if not, please open an issue or pull request to add it. Thanks!
If you’ve thought to yourself that there ought to be an app store for Dataverse Software external tools, you’re not alone. Please see https://github.com/IQSS/dataverse/issues/5688 :)
Demoing Your External Tool
https://demo.dataverse.org is the place to play around with the Dataverse Software and your tool can be included. Please email support@dataverse.org to start the conversation about adding your tool. Additionally, you are welcome to open an issue at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible which already includes a number of the tools listed above.
Announcing Your External Tool
You are welcome to announce your external tool at https://groups.google.com/forum/#!forum/dataverse-community
If you’re too shy, we’ll do it for you. We’ll probably tweet about it too. Thank you for your contribution to the Dataverse Project!