Configuration ============= Now that you've successfully logged into your Dataverse installation with a superuser account after going through a basic :doc:`installation-main`, you'll need to secure and configure your installation. Settings within your Dataverse installation itself are managed via JVM options or by manipulating values in the ``setting`` table directly or through API calls. Once you have finished securing and configuring your Dataverse installation, you may proceed to the :doc:`/admin/index` for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the :doc:`shibboleth` and :doc:`oauth2` sections. .. contents:: |toctitle| :local: .. _securing-your-installation: Securing Your Installation -------------------------- Changing the Superuser Password +++++++++++++++++++++++++++++++ The default password for the "dataverseAdmin" superuser account is "admin", as mentioned in the :doc:`installation-main` section, and you should change it, of course. .. _blocking-api-endpoints: Blocking API Endpoints ++++++++++++++++++++++ The :doc:`/api/native-api` contains a useful but potentially dangerous set of API endpoints called "admin" that allows you to change system settings, make ordinary users into superusers, and more. The "builtin-users" endpoints let admins do tasks such as creating a local/builtin user account if they know the key defined in :ref:`BuiltinUsers.KEY`. By default in the code, most of these API endpoints can be operated on remotely and a number of endpoints do not require authentication. However, the endpoints "admin" and "builtin-users" are limited to localhost out of the box by the installer, using the JvmSettings :ref:`dataverse.api.blocked.endpoints` and :ref:`dataverse.api.blocked.policy`. .. note:: The database settings :ref:`:BlockedApiEndpoints` and :ref:`:BlockedApiPolicy` are deprecated and will be removed in a future version. Please use the JvmSettings mentioned above instead. It is **very important** to keep the block in place for the "admin" endpoint, and to leave the "builtin-users" endpoint blocked unless you need to access it remotely. Documentation for the "admin" endpoint is spread across the :doc:`/api/native-api` section of the API Guide and the :doc:`/admin/index`. Given how important it is to avoid exposing the "admin" and "builtin-user" APIs, sites using a proxy, e.g. Apache or Nginx, should also consider blocking them through rules in the proxy. The following examples may be useful: Apache/Httpd Rule: Rewrite lines added to /etc/httpd/conf.d/ssl.conf. They can be the first lines inserted after the RewriteEngine On statement: .. code-block:: apache RewriteRule ^/api/(admin|builtin-users) - [R=403,L] RewriteRule ^/api/(v[0-9]*)/(admin|builtin-users) - [R=403,L] Nginx Configuration Rule: .. code-block:: nginx location ~ ^/api/(admin|v1/admin|builtin-users|v1/builtin-users) { deny all; return 403; } If you are using a load balancer or a reverse proxy, there are some additional considerations. If no additional configurations are made and the upstream is configured to redirect to localhost, the API will be accessible from the outside, as your installation will register as origin the localhost for any requests to the endpoints "admin" and "builtin-users". To prevent this, you have two options: - If your upstream is configured to redirect to localhost, you will need to set the :ref:`JVM option ` to one of the following values ``%client.name% %datetime% %request% %status% %response.length% %header.referer% %header.x-forwarded-for%`` and configure from the load balancer side the chosen header to populate with the client IP address. - Another solution is to set the upstream to the client IP address. In this case no further configuration is needed. For more information on configuring blocked API endpoints, see :ref:`dataverse.api.blocked.endpoints` and :ref:`dataverse.api.blocked.policy` in the JvmSettings documentation. .. note:: It's also possible to prevent file uploads via API by adjusting the :ref:`:UploadMethods` database setting. Forcing HTTPS +++++++++++++ To avoid having your users send credentials in the clear, it's strongly recommended to force all web traffic to go through HTTPS (port 443) rather than HTTP (port 80). The ease with which one can install a valid SSL cert into Apache compared with the same operation in Payara might be a compelling enough reason to front Payara with Apache. In addition, Apache can be configured to rewrite HTTP to HTTPS with rules such as those found at https://wiki.apache.org/httpd/RewriteHTTPToHTTPS or in the section on :doc:`shibboleth`. .. _user-ip-addresses-proxy-security: Recording User IP Addresses +++++++++++++++++++++++++++ By default, the Dataverse installation captures the IP address from which requests originate. This is used for multiple purposes including controlling access to the admin API, IP-based user groups and Make Data Count reporting. When the Dataverse installation is configured behind a proxy such as a load balancer, this default setup may not capture the correct IP address. In this case all the incoming requests will be logged in the access logs, MDC logs etc., as if they are all coming from the IP address(es) of the load balancer itself. Proxies usually save the original address in an added HTTP header, from which it can be extracted. For example, AWS LB records the "true" original address in the standard ``X-Forwarded-For`` header. If your Dataverse installation is running behind an IP-masking proxy, but you would like to use IP groups, or record the true geographical location of the incoming requests with Make Data Count, you may enable the IP address lookup from the proxy header using the JVM option ``dataverse.useripaddresssourceheader``, described further below. Before doing so however, you must absolutely **consider the security risks involved**! This option must be enabled **only** on a Dataverse installation that is in fact fully behind a proxy that properly, and consistently, adds the ``X-Forwarded-For`` (or a similar) header to every request it forwards. Consider the implications of activating this option on a Dataverse installation that is not running behind a proxy, *or running behind one, but still accessible from the insecure locations bypassing the proxy*: Anyone can now add the header above to an incoming request, supplying an arbitrary IP address that the Dataverse installation will trust as the true origin of the call. Thus giving an attacker an easy way to, for example, get in a privileged IP group. The implications could be even more severe if an attacker were able to pretend to be coming from ``localhost``, if a Dataverse installation is configured to trust localhost connections for unrestricted access to the admin API! We have addressed this by making it so that Dataverse installation should never accept ``localhost``, ``127.0.0.1``, ``0:0:0:0:0:0:0:1`` etc. when supplied in such a header. But if you have reasons to still find this risk unacceptable, you may want to consider turning open localhost access to the API off (See :ref:`Securing Your Installation ` for more information.) This is how to verify that your proxy or load balancer, etc. is handling the originating address headers properly and securely: Make sure access logging is enabled in your application server (Payara) configuration. (```` in the ``domain.xml``). Add the address header to the access log format. For example, on a system behind AWS ELB, you may want to use something like ``%client.name% %datetime% %request% %status% %response.length% %header.referer% %header.x-forwarded-for%``. Once enabled, access the Dataverse installation from outside the LB. You should now see the real IP address of your remote client in the access log. For example, something like: ``"1.2.3.4" "01/Jun/2020:12:00:00 -0500" "GET /dataverse.xhtml HTTP/1.1" 200 81082 "NULL-REFERER" "128.64.32.16"`` In this example, ``128.64.32.16`` is your remote address (that you should verify), and ``1.2.3.4`` is the address of your LB. If you're not seeing your remote address in the log, do not activate the JVM option! Also, verify that all the entries in the log have this header populated. The only entries in the access log that you should be seeing without this header (logged as ``"NULL-HEADER-X-FORWARDED-FOR"``) are local requests, made from localhost, etc. In this case, since the request is not coming through the proxy, the local IP address should be logged as the primary one (as the first value in the log entry, ``%client.name%``). If you see any requests coming in from remote, insecure subnets without this header - do not use the JVM option! Once you are ready, enable the :ref:`JVM option `. Verify that the remote locations are properly tracked in your MDC metrics, and/or your IP groups are working. As a final test, if your Dataverse installation is allowing unrestricted localhost access to the admin API, imitate an attack in which a malicious request is pretending to be coming from ``127.0.0.1``. Try the following from a remote, insecure location: ``curl https://your.dataverse.edu/api/admin/settings --header "X-FORWARDED-FOR: 127.0.0.1"`` First of all, confirm that access is denied! If you are in fact able to access the settings api from a location outside the proxy, **something is seriously wrong**, so please let us know, and stop using the JVM option. Otherwise check the access log entry for the header value. What you should see is something like ``"127.0.0.1, 128.64.32.16"``. Where the second address should be the real IP of your remote client. The fact that the "fake" ``127.0.0.1`` you sent over is present in the header is perfectly ok. This is the proper proxy behavior - it preserves any incoming values in the ``X-Forwarded-Header``, if supplied, and adds the detected incoming address to it, *on the right*. It is only this rightmost comma-separated value that Dataverse installation should ever be using. Still feel like activating this option in your configuration? - Have fun and be safe! .. _PrivacyConsiderations: Privacy Considerations ++++++++++++++++++++++ Email Privacy ^^^^^^^^^^^^^ Out of the box, your Dataverse installation will list email addresses of the contacts for datasets when users visit a dataset page and click the "Export Metadata" button. Additionally, out of the box, the Dataverse installation will list email addresses of Dataverse collection contacts via API (see :ref:`View a Dataverse Collection ` in the :doc:`/api/native-api` section of the API Guide). If you would like to exclude these email addresses from export, set :ref:`:ExcludeEmailFromExport <:ExcludeEmailFromExport>` to true. Additional Recommendations ++++++++++++++++++++++++++ Run Payara as a User Other Than Root ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the :ref:`payara` section of :doc:`prerequisites` for details and init scripts for running Payara as non-root. Related to this is that you should remove ``/root/.payara/pass`` to ensure that Payara isn't ever accidentally started as root. Without the password, Payara won't be able to start as root, which is a good thing. .. _secure-password-storage: Secure Password Storage ^^^^^^^^^^^^^^^^^^^^^^^ In development or demo scenarios, we suggest not to store passwords in files permanently. We recommend the use of at least environment variables or production-grade mechanisms to supply passwords. In a production setup, permanently storing passwords as plaintext should be avoided at all cost. Environment variables are dangerous in shared environments and containers, as they may be easily exploited; we suggest not to use them. Depending on your deployment model and environment, you can make use of the following techniques to securely store and access passwords. **Password Aliases** A `password alias`_ allows you to have a plaintext reference to an encrypted password stored on the server, with the alias being used wherever the password is needed. This method is especially useful in a classic deployment, as it does not require any external secrets management. Password aliases are consumable as a MicroProfile Config source and can be referrenced by their name in a `property expression`_. You may also reference them within a `variable substitution`_, e.g. in your ``domain.xml``. Creation example for an alias named *my.alias.name*: .. code-block:: shell echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt asadmin create-password-alias --passwordfile "/tmp/p.txt" "my.alias.name" rm /tmp/p.txt Note: omitting the ``--passwordfile`` parameter allows creating the alias in an interactive fashion with a prompt. **Secrets Files** Payara has a builtin MicroProfile Config source to consume values from files in a directory on your filesystem. This `directory config source`_ is most useful and secure with external secrets management in place, temporarily mounting cleartext passwords as files. Examples are Kubernetes / OpenShift `Secrets `_ or tools like `Vault Agent `_. Please follow the `directory config source`_ documentation to learn about its usage. **Cloud Providers** Running Dataverse on a cloud platform or running an external secret management system like `Vault `_ enables accessing secrets without any intermediate storage of cleartext. Obviously this is the most secure option for any deployment model, but it may require more resources to set up and maintain - your mileage may vary. Take a look at `cloud sources`_ shipped with Payara to learn about their usage. Enforce Strong Passwords for User Accounts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Your Dataverse installation only stores passwords (as salted hash, and using a strong hashing algorithm) for "builtin" users. You can increase the password complexity rules to meet your security needs. If you have configured your Dataverse installation to allow login from remote authentication providers such as Shibboleth, ORCID, GitHub or Google, you do not have any control over those remote providers' password complexity rules. See the :ref:`auth-modes` section below for more on login options. Even if you are satisfied with the out-of-the-box password complexity rules the Dataverse Software ships with, for the "dataverseAdmin" account you should use a strong password so the hash cannot easily be cracked through dictionary attacks. Password complexity rules for "builtin" accounts can be adjusted with a variety of settings documented below. Here's a list: - :ref:`:PVMinLength` - :ref:`:PVMaxLength` - :ref:`:PVNumberOfConsecutiveDigitsAllowed` - :ref:`:PVCharacterRules` - :ref:`:PVNumberOfCharacteristics` - :ref:`:PVDictionaries` - :ref:`:PVGoodStrength` - :ref:`:PVCustomPasswordResetAlertMessage` .. _samesite-cookie-attribute: SameSite Cookie Attribute ^^^^^^^^^^^^^^^^^^^^^^^^^ The SameSite cookie attribute is defined in an upcoming revision to `RFC 6265 `_ (HTTP State Management Mechanism) called `6265bis `_ ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict". "Strict" is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP `cheetsheet `_. We don't recommend "None" for security reasons. By default, Payara doesn't send the SameSite cookie attribute, which browsers should interpret as "Lax" according to `MDN `_. Dataverse installations are explicity set to "Lax" out of the box by the installer (in the case of a "classic" installation) or through the base image (in the case of a Docker installation). For classic, see :ref:`http.cookie-same-site-value` and :ref:`http.cookie-same-site-enabled` for how to change the values. For Docker, you must rebuild the :doc:`base image `. See also Payara's `documentation `_ for the settings above. To inspect cookie attributes like SameSite, you can use ``curl -s -I http://localhost:8080 | grep JSESSIONID``, for example, looking for the "Set-Cookie" header. .. _ongoing-security: Ongoing Security of Your Installation +++++++++++++++++++++++++++++++++++++ Like any application, you should keep up-to-date with patches to both the Dataverse software and the platform (usually Linux) it runs on. Dataverse releases are announced on the dataverse-community_ mailing list, the Dataverse blog_, and in chat.dataverse.org_. .. _dataverse-community: https://groups.google.com/g/dataverse-community .. _blog: https://dataverse.org/blog .. _chat.dataverse.org: https://chat.dataverse.org In addition to these public channels, you can subscribe to receive security notices via email from the Dataverse team. These notices are sent to the ``contact_email`` in the installation spreadsheet_ and you can open an issue in the dataverse-installations_ repo to add or change the contact email. Security notices are also sent to people and organizations that prefer to remain anonymous. To be added to this private list, please email support@dataverse.org. .. _spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0 .. _dataverse-installations: https://github.com/IQSS/dataverse-installations For additional details about security practices by the Dataverse team, see the :doc:`/developers/security` section of the Developer Guide. .. _reporting-security-issues: Reporting Security Issues +++++++++++++++++++++++++ If you have a security issue to report, please email it to security@dataverse.org. .. _network-ports: Network Ports ------------- Remember how under "Decisions to Make" in the :doc:`prep` section we mentioned you'll need to make a decision about whether or not to introduce a proxy in front of the Dataverse Software such as Apache or nginx? The time has come to make that decision. The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Payara puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. In production, you don't want to tell your users to use your Dataverse installation on ports 8080 and 8181. You should have them use the standard HTTPS port, which is 443. Your decision to proxy or not should primarily be driven by which features of the Dataverse Software you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Payara with Apache is required. The details are covered in the :doc:`shibboleth` section. Even if you have no interest in Shibboleth, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web. Still not convinced you should put Payara behind another web server? Even if you manage to get your SSL certificate into Payara, how are you going to run Payara on low ports such as 80 and 443? Are you going to run Payara as root? Bad idea. This is a security risk. Under "Additional Recommendations" under "Securing Your Installation" above you are advised to configure Payara to run as a user other than root. There's also the issue of serving a production-ready version of robots.txt. By using a proxy such as Apache, this is a one-time "set it and forget it" step as explained below in the "Going Live" section. If you are convinced you'd like to try fronting Payara with Apache, the :doc:`shibboleth` section should be good resource for you. If you really don't want to front Payara with any proxy (not recommended), you can configure Payara to run HTTPS on port 443 like this: ``./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-2.port=443`` What about port 80? Even if you don't front your Dataverse installation with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Payara uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Payara can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: https://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to If you are running an installation with Apache and Payara on the same server, and would like to restrict Payara from responding to any requests to port 8080 from external hosts (in other words, not through Apache), you can restrict the AJP listener to localhost only with: ``./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-1.address=127.0.0.1`` You should **NOT** use the configuration option above if you are running in a load-balanced environment, or otherwise have the web server on a different host than the application server. Root Dataverse Collection Permissions ------------------------------------- The user who creates a Dataverse collection is given the "Admin" role on that Dataverse collection. The root Dataverse collection is created automatically for you by the installer and the "Admin" is the superuser account ("dataverseAdmin") we used in the :doc:`installation-main` section to confirm that we can log in. These next steps of configuring the root Dataverse collection require the "Admin" role on the root Dataverse collection, but not the much more powerful superuser attribute. In short, users with the "Admin" role are subject to the permission system. A superuser, on the other hand, completely bypasses the permission system. You can give non-superusers the "Admin" role on the root Dataverse collection if you'd like them to configure the root Dataverse collection. In order for non-superusers to start creating Dataverse collections or datasets, you need click "Edit" then "Permissions" and make choices about which users can add Dataverse collections or datasets within the root Dataverse collection. (There is an API endpoint for this operation as well.) Again, the user who creates a Dataverse collection will be granted the "Admin" role on that Dataverse collection. Non-superusers who are not "Admin" on the root Dataverse collection will not be able to do anything useful until the root Dataverse collection has been published. As the person installing the Dataverse Software, you may or may not be a local metadata expert. You may want to have others sign up for accounts and grant them the "Admin" role at the root Dataverse collection to configure metadata fields, templates, browse/search facets, guestbooks, etc. For more on these topics, consult the :doc:`/user/dataverse-management` section of the User Guide. .. _pids-configuration: Persistent Identifiers and Publishing Datasets ---------------------------------------------- Persistent identifiers (PIDs) are a required and integral part of the Dataverse Software. They provide a URL that is guaranteed to resolve to the datasets or files they represent. The Dataverse Software currently supports creating identifiers using any of several PID types. The most appropriate PIDs for public data are DOIs (e.g., provided by DataCite or EZID) and Handles. Dataverse also supports PermaLinks which could be useful for intranet or catalog use cases. A DOI provider called "FAKE" is recommended only for testing and development purposes. Dataverse can be configured with one or more PID providers, each of which can mint and manage PIDs with a given protocol (e.g., doi, handle, permalink) using a specific service provider/account (e.g. with DataCite, EZId, or HandleNet) to manage an authority/shoulder combination, aka a "prefix" (PermaLinks also support custom separator characters as part of the prefix), along with an optional list of individual PIDs (with different authority/shoulders) than can be managed with that account. Dataverse automatically manages assigning PIDs and making them findable when datasets are published. There are also :ref:`API calls that allow updating the PID target URLs and metadata of already-published datasets manually if needed `, e.g. if a Dataverse instance is moved to a new URL or when the software is updated to generate additional metadata or address schema changes at the PID service. Note that while some forms of PIDs (Handles, PermaLinks) are technically case sensitive, common practice is to avoid creating PIDs that differ only by case. Dataverse treats PIDs of all types as case-insensitive (as DOIs are by definition). This means that Dataverse will find datasets (in search, to display dataset pages, etc.) when the PIDs entered do not match the case of the original but will have a problem if two PIDs that differ only by case exist in one instance. Testing PID Providers +++++++++++++++++++++ By default, the installer configures the Fake DOI provider as the registration provider. Unlike other DOI Providers, the Fake Provider does not involve any external resolution service and is not appropriate for use beyond development and testing. You may wish instead to test with PermaLinks or with a DataCite test account (which uses DataCite's test infrastructure and will help assure your Dataverse instance can make network connections to DataCite. DataCite requires that you register for a test account, which will have a username, password and your own prefix (please contact support@datacite.org for a test account. You may wish to `contact the GDCC `_ instead - GDCC is able to provide DataCite accounts with a group discount and can also provide test accounts.). Once you receive the login name, password, and prefix for the account, configure the credentials as described below. Alternately, you may wish to configure other providers for testing: - EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite. - The PermaLink provider, like the FAKE DOI provider, does not involve an external account. Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs, and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide. Provider-specific configuration is described below. Once all is configured, you will be able to publish datasets and files, but **the persistent identifiers will not be citable** as they, with the exception of PermaLinks, will not redirect to your dataset page in Dataverse. Note that any datasets or files created using a test configuration cannot be directly migrated to a production PID provider and would need to be created again once a valid PID Provider(s) are configured. One you are done testing, to properly configure persistent identifiers for a production installation, an account and associated namespace (e.g. authority/shoulder) must be acquired for a fee from a DOI or HDL provider. (As noted above, PermaLinks May be appropriate for intranet and catalog uses cases.) **DataCite** (https://www.datacite.org) is the recommended DOI provider (see https://dataversecommunity.global for more on joining DataCite through the Global Dataverse Community Consortium) but **EZID** (http://ezid.cdlib.org) is an option for the University of California according to https://www.cdlib.org/cdlinfo/2017/08/04/ezid-doi-service-is-evolving/ . **Handle.Net** (https://www.handle.net) is the HDL provider. Once you have your DOI or Handle account credentials and a prefix, configure your Dataverse installation using the settings below. Configuring PID Providers +++++++++++++++++++++++++ There are two required global settings to configure PID providers - the list of ids of providers and which one of those should be the default. Per-provider settings are also required - some that are common to all types and some type specific. All of these settings are defined to be compatible with the MicroProfile specification which means that 1. Any of these settings can be set via system properties (see :ref:`jvm-options` for how to do this), environment variables, or other MicroProfile Config mechanisms supported by the app server. `See Payara docs for supported sources `_. 2. Remember to protect your secrets. For passwords, use an environment variable (bare minimum), a password alias named the same as the key (OK) or use the `"dir config source" of Payara `_ (best). Alias creation example: .. code-block:: shell echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt asadmin create-password-alias --passwordfile /tmp/p.txt dataverse.pid.datacite1.datacite.password rm /tmp/p.txt 3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase letters. Example: ``dataverse.pid.default-provider`` -> ``DATAVERSE_PID_DEFAULT_PROVIDER`` Global Settings ^^^^^^^^^^^^^^^ The following two global settings are required to configure PID Providers in the Dataverse software: .. _dataverse.pid.providers: dataverse.pid.providers ^^^^^^^^^^^^^^^^^^^^^^^ A comma-separated list of the ids of the PID providers to use. IDs should be simple unique text strings, e.g. datacite1, perma1, etc. IDs are used to scope the provider-specific settings but are not directly visible to users. .. _dataverse.pid.default-provider: dataverse.pid.default-provider ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ID of the default PID provider to use. .. _dataverse.spi.pidproviders.directory: dataverse.spi.pidproviders.directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The path to the directory where JAR files containing additional types of PID Providers can be added. Dataverse includes providers that support DOIs (DataCite, EZId, or FAKE), Handles, and PermaLinks. PID provider jar files added to this directory can replace any of these or add new PID Providers. Per-Provider Settings ^^^^^^^^^^^^^^^^^^^^^ Each Provider listed by id in the dataverse.pid.providers setting must be configured with the following common settings and any settings that are specific to the provider type. .. _dataverse.pid.*.type: dataverse.pid.*.type ^^^^^^^^^^^^^^^^^^^^ The Provider type, currently one of ``datacite``, ``ezid``, ``FAKE``, ``hdl``, or ``perma``. The type defines which protocol a service supports (DOI, Handle, or PermaLink) and, for DOI Providers, which DOI service is used. .. _dataverse.pid.*.label: dataverse.pid.*.label ^^^^^^^^^^^^^^^^^^^^^ A human-readable label for the provider .. _dataverse.pid.*.authority: dataverse.pid.*.authority ^^^^^^^^^^^^^^^^^^^^^^^^^ .. _dataverse.pid.*.shoulder: dataverse.pid.*.shoulder ^^^^^^^^^^^^^^^^^^^^^^^^ In general, PIDs are of the form ``:/*`` where ``*`` is the portion unique to an individual PID. PID Providers must define the authority and shoulder (with the protocol defined by the ``dataverse.pid.*.type`` setting) that defines the set of existing PIDs they can manage and the prefix they can use when minting new PIDs. (Often an account with a PID service provider will be limited to using a single authority/shoulder. If your PID service provider account allows more than one combination that you wish to use in Dataverse, configure multiple PID Provider, one for each combination.) .. _dataverse.pid.*.identifier-generation-style: dataverse.pid.*.identifier-generation-style ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By default, Pid Providers in Dataverse generate a random 6 character string, pre-pended by the Shoulder if set, to use as the identifier for a Dataset. Set this to ``storedProcGenerated`` to generate instead a custom *unique* identifier (again pre-pended by the Shoulder if set) through a database stored procedure or function (the assumed default setting is ``randomString``). When using the ``storedProcGenerated`` setting, a stored procedure or function must be created in the database. As a first example, the script below (downloadable :download:`here `) produces sequential numerical values. You may need to make some changes to suit your system setup, see the comments for more information: .. literalinclude:: ../_static/util/createsequence.sql :language: plpgsql As a second example, the script below (downloadable :download:`here `) produces sequential 8 character identifiers from a base36 representation of current timestamp. .. literalinclude:: ../_static/util/identifier_from_timestamp.sql :language: plpgsql Note that the SQL in these examples scripts is Postgres-specific. If necessary, it can be reimplemented in any other SQL flavor - the standard JPA code in the application simply expects the database to have a saved function ("stored procedure") named ``generateIdentifierFromStoredProcedure()`` returning a single ``varchar`` argument. Please note that this setting interacts with the ``dataverse.pid.*.datafile-pid-format`` setting below to determine how datafile identifiers are generated. .. _dataverse.pid.*.datafile-pid-format: dataverse.pid.*.datafile-pid-format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This setting controls the way that the "identifier" component of a file's persistent identifier (PID) relates to the PID of its "parent" dataset - for a give PID Provider. By default the identifier for a file is dependent on its parent dataset. For example, if the identifier of a dataset is "TJCLKP", the identifier for a file within that dataset will consist of the parent dataset's identifier followed by a slash ("/"), followed by a random 6 character string, yielding "TJCLKP/MLGWJO". Identifiers in this format are what you should expect if you leave ``dataverse.pid.*.datafile-pid-format`` undefined or set it to ``DEPENDENT`` and have not changed the ``dataverse.pid.*.identifier-generation-style`` setting from its default. Alternatively, the identifier for File PIDs can be configured to be independent of Dataset PIDs using the setting ``INDEPENDENT``. In this case, file PIDs will not contain the PIDs of their parent datasets, and their PIDs will be generated the exact same way that datasets' PIDs are, based on the ``dataverse.pid.*.identifier-generation-style`` setting described above (random 6 character strings or custom unique identifiers through a stored procedure, pre-pended by any shoulder). The chart below shows examples from each possible combination of parameters from the two settings. ``dataverse.pid.*.identifier-generation-style`` can be either ``randomString`` (the default) or ``storedProcGenerated`` and ``dataverse.pid.*.datafile-pid-format`` can be either ``DEPENDENT`` (the default) or ``INDEPENDENT``. In the examples below the "identifier" for the dataset is "TJCLKP" for ``randomString`` and "100001" for ``storedProcGenerated`` (when using sequential numerical values, as described in :ref:`dataverse.pid.*.identifier-generation-style` above), or "krby26qt" for ``storedProcGenerated`` (when using base36 timestamps, as described in :ref:`dataverse.pid.*.identifier-generation-style` above). +-----------------+---------------+----------------------+---------------------+ | | randomString | storedProcGenerated | storedProcGenerated | | | | | | | | | (sequential numbers) | (base36 timestamps) | +=================+===============+======================+=====================+ | **DEPENDENT** | TJCLKP/MLGWJO | 100001/1 | krby26qt/1 | +-----------------+---------------+----------------------+---------------------+ | **INDEPENDENT** | MLGWJO | 100002 | krby27pz | +-----------------+---------------+----------------------+---------------------+ As seen above, in cases where ``dataverse.pid.*.identifier-generation-style`` is set to ``storedProcGenerated`` and ``dataverse.pid.*.datafile-pid-format`` is set to ``DEPENDENT``, each file within a dataset will be assigned a number *within* that dataset starting with "1". Otherwise, if ``dataverse.pid.*.datafile-pid-format`` is set to ``INDEPENDENT``, each file within the dataset is assigned with a new PID which is the next available identifier provided from the database stored procedure. In our example: "100002" when using sequential numbers or "krby27pz" when using base36 timestamps. .. _dataverse.pid.*.managed-list: dataverse.pid.*.managed-list ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _dataverse.pid.*.excluded-list: dataverse.pid.*.excluded-list ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ With at least some PID services, it is possible for the authority(permission) to manage specific individual PIDs to be transferred between accounts. To handle these cases, the individual PIDs, written in the standard format, e.g. doi:10.5072/FK2ABCDEF can be added to the comma-separated ``managed`` or ``excluded`` list for a given provider. For entries on the ``managed- list``, Dataverse will assume this PID Provider/account can update the metadata and landing URL for the PID at the service provider (even though it does not match the provider's authority/shoulder settings). Conversely, Dataverse will assume that PIDs on the ``excluded-list`` cannot be managed/updated by this provider (even though they match the provider's authority/shoulder settings). These settings are optional with the default assumption that these lists are empty. .. _dataverse.pid.*.datacite: DataCite-specific Settings ^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.datacite.mds-api-url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.datacite.rest-api-url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.datacite.username ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.datacite.password ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PID Providers of type ``datacite`` require four additional parameters that define how the provider connects to DataCite. DataCite has two APIs that are used in Dataverse: The base URL of the `DataCite MDS API `_, used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.datacite.mds-api-url`` are "https://mds.datacite.org" (production) and "https://mds.test.datacite.org" (testing, the default). The `DataCite REST API `_ is also used - :ref:`PIDs API ` information retrieval and :doc:`/admin/make-data-count`. Current valid values for ``dataverse.pid.*.datacite.rest-api-url`` are "https://api.datacite.org" (production) and "https://api.test.datacite.org" (testing, the default). DataCite uses `HTTP Basic authentication `_ for `Fabrica `_ and their APIs. You need to provide the same credentials (``username``, ``password``) to Dataverse software to mint and manage DOIs for you. As noted above, you should use one of the more secure options for setting the password. CrossRef-specific Settings ^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.rest-api-url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.username ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.password ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.depositor ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.crossref.depositor-email ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CrossRef is an experimental provider. PID Providers of type ``crossref`` require six additional parameters that define how the provider connects to CrossRef. CrossRef has two APIs that are used in Dataverse: The base URL of the `CrossRef `_, used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.crossref.url`` are "https://doi.crossref.org" and ``dataverse.pid.*.crossref.rest-api-url`` are "https://api.crossref.org" (production). ``dataverse.pid.*.crossref.username=crusername`` ``dataverse.pid.*.crossref.password=secret`` ``dataverse.pid.*.crossref.depositor=xyz`` ``dataverse.pid.*.crossref.depositor-email=xyz@example.com`` CrossRef uses `HTTP Basic authentication `_ XML files can be POSTed to CrossRef where they are added to the submission queue to await processing `Post URL `_ REST API allows the search and reuse our members' metadata. `Rest API `_ and their APIs. You need to provide the same credentials (``username``, ``password``) to Dataverse software to mint and manage DOIs for you. As noted above, you should use one of the more secure options for setting the password. Depositor and Depositor Email are used for the generation and distribution of Depositor reports. .. _dataverse.pid.*.ezid: EZId-specific Settings ^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.ezid.api-url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.ezid.username ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.ezid.password ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that use of `EZId `_ is limited primarily to University of California institutions. If you have an EZId account, you will need to configure the ``api-url`` and your account ``username`` and ``password``. As above, you should use one of the more secure options for setting the password. .. _dataverse.pid.*.permalink: PermaLink-specific Settings ^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.permalink.base-url ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.permalink.separator ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PermaLinks are a simple PID option intended for intranet and catalog use cases. They can be used without an external service or be configured with the ``base-url`` of a resolution service. PermaLinks also allow a custom ``separator`` to be used. Note: - If you configure ``base-url``, it should include a "/" after the hostname like this: ``https://demo.dataverse.org/``. - When using multiple PermaLink providers, you should avoid ambiguous authority/separator/shoulder combinations that would result in the same overall prefix. - Configuring PermaLink providers differing only by their separator values is not supported. - In general, PermaLink authority/shoulder values should be alphanumeric. For other cases, admins may need to consider the potential impact of special characters in S3 storage identifiers, resolver URLs, exports, etc. .. _dataverse.pid.*.handlenet: Handle-specific Settings ^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.independent-service ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.auth-handle ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.key ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataverse.pid.*.handlenet.passphrase ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. Configure your Handle.net ``index`` to be used registering new persistent identifiers. Defaults to ``300``. Indices are used to separate concerns within the Handle system. To add data to an index, authentication is mandatory. See also chapter 1.4 "Authentication" of the `Handle.Net Technical Documentation `__ Handle.Net servers use a public key authentication method where the public key is stored in a handle itself and the matching private key is provided from this file. Typically, the absolute path ends like ``handle/svr_1/admpriv.bin``. The key file may (and should) be encrypted with a passphrase (used for encryption with AES-128). See also chapter 1.4 "Authentication" of the `Handle.Net Technical Documentation `__ Provide an absolute ``key.path`` to a private key file authenticating requests to your Handle.Net server. Provide a ``key.passphrase`` to decrypt the private key file at ``dataverse.pid.*.handlenet.key.path``. Set ``independent-service`` to true if you want to use a Handle service which is setup to work 'independently' (No communication with the Global Handle Registry). By default this setting is false. Set ``auth-handle`` to / to be used on a global handle service when the public key is NOT stored in the default handle. This setting is optional. If the public key is, for instance, stored in handle: ``21.T12996/USER01``, ``auth-handle`` should be set to this value. .. _pids-doi-configuration: Backward-compatibility for Single PID Provider Installations ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ While using the PID Provider configuration settings described above is recommended, Dataverse installations only using a single PID Provider can use the settings below instead. In general, these legacy settings mirror those above except for not including a PID Provider id. Configuring Your Dataverse Installation for a Single DOI Provider ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here are the configuration options for DOIs.: **JVM Options for DataCite:** - :ref:`dataverse.pid.datacite.mds-api-url` - :ref:`dataverse.pid.datacite.rest-api-url` - :ref:`dataverse.pid.datacite.username` - :ref:`dataverse.pid.datacite.password` **JVM Options for EZID:** As stated above, with very few exceptions (e.g. University of California), you will not be able to use this provider. - :ref:`dataverse.pid.ezid.api-url` - :ref:`dataverse.pid.ezid.username` - :ref:`dataverse.pid.ezid.password` **Database Settings:** - :ref:`:DoiProvider <:DoiProvider>` - :ref:`:Protocol <:Protocol>` - :ref:`:Authority <:Authority>` - :ref:`:Shoulder <:Shoulder>` - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) .. _pids-handle-configuration: Configuring Your Dataverse Installation for a Single Handle Provider ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here are the configuration options for handles. Most notably, you need to change the ``:Protocol`` setting, as it defaults to DOI usage. **JVM Options:** - :ref:`dataverse.pid.handlenet.key.path` - :ref:`dataverse.pid.handlenet.key.passphrase` - :ref:`dataverse.pid.handlenet.index` **Database Settings:** - :ref:`:Protocol <:Protocol>` - :ref:`:Authority <:Authority>` - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) - :ref:`:IndependentHandleService <:IndependentHandleService>` (optional) - :ref:`:HandleAuthHandle <:HandleAuthHandle>` (optional) Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. .. _permalinks: Configuring Your Dataverse Installation for a Single PermaLink Provider ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here are the configuration options for PermaLinks: **JVM Options:** - :ref:`dataverse.pid.permalink.base-url` **Database Settings:** - :ref:`:Protocol <:Protocol>` - :ref:`:Authority <:Authority>` - :ref:`:Shoulder <:Shoulder>` - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) - :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) You must restart Payara after making changes to these settings. .. _auth-modes: Auth Modes: Local vs. Remote vs. Both ------------------------------------- There are three valid configurations or modes for authenticating users to your Dataverse installation: Local Only Auth +++++++++++++++ Out of the box, your Dataverse installation is configured in "local only" mode. The "dataverseAdmin" superuser account mentioned in the :doc:`/installation/installation-main` section is an example of a local account. Internally, these accounts are called "builtin" because they are built in to the Dataverse Software application itself. Both Local and Remote Auth ++++++++++++++++++++++++++ The ``authenticationproviderrow`` database table controls which "authentication providers" are available within a Dataverse installation. Out of the box, a single row with an id of "builtin" will be present. For each user in a Dataverse installation, the ``authenticateduserlookup`` table will have a value under ``authenticationproviderid`` that matches this id. For example, the default "dataverseAdmin" user will have the value "builtin" under ``authenticationproviderid``. Why is this important? Users are tied to a specific authentication provider but conversion mechanisms are available to switch a user from one authentication provider to the other. As explained in the :doc:`/user/account` section of the User Guide, a graphical workflow is provided for end users to convert from the "builtin" authentication provider to a remote provider. Conversion from a remote authentication provider to the builtin provider can be performed by a sysadmin with access to the "admin" API. See the :doc:`/api/native-api` section of the API Guide for how to list users and authentication providers as JSON. Adding and enabling a second authentication provider (:ref:`native-api-add-auth-provider` and :ref:`api-toggle-auth-provider`) will result in the Log In page showing additional providers for your users to choose from. By default, the Log In page will show the "builtin" provider, but you can adjust this via the :ref:`conf-default-auth-provider` configuration option. Further customization can be achieved by setting :ref:`conf-allow-signup` to "false", thus preventing users from creating local accounts via the web interface. Please note that local accounts can also be created through the API by enabling the ``builtin-users`` endpoint (:ref:`:BlockedApiEndpoints`) and setting the ``BuiltinUsers.KEY`` database setting (:ref:`BuiltinUsers.KEY`). To configure Shibboleth see the :doc:`shibboleth` section and to configure OAuth see the :doc:`oauth2` section. Remote Only Auth ++++++++++++++++ As for the "Remote only" authentication mode, it means that: - Shibboleth or OAuth has been enabled. - ``:AllowSignUp`` is set to "false" to prevent users from creating local accounts via the web interface. - ``:DefaultAuthProvider`` has been set to use the desired authentication provider - The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse installation account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary. .. _bearer-token-auth: Bearer Token Authentication --------------------------- Bearer tokens are defined in `RFC 6750`_ and can be used as an alternative to API tokens. This is an experimental feature hidden behind a feature flag. .. _RFC 6750: https://tools.ietf.org/html/rfc6750 To enable bearer tokens, you must install and configure Keycloak (for now, see :ref:`oidc-dev` in the Developer Guide) and enable ``api-bearer-auth`` under :ref:`feature-flags`. You can test that bearer tokens are working by following the example under :ref:`bearer-tokens` in the API Guide. .. _smtp-config: SMTP/Email Configuration ------------------------ The installer prompts you for some basic options to configure Dataverse to send email using your SMTP server, but in many cases, extra configuration may be necessary. Make sure the :ref:`dataverse.mail.system-email` has been set. Email will not be sent without it. A hint will be logged about this fact. If you want to separate system email from your support team's email, take a look at :ref:`dataverse.mail.support-email`. Then check the list of commonly used settings at the top of :ref:`dataverse.mail.mta`. If you have trouble, consider turning on debugging with :ref:`dataverse.mail.debug`. .. _database-persistence: Database Persistence -------------------- The Dataverse software uses a PostgreSQL database to store objects users create. You can configure basic and advanced settings for the PostgreSQL database connection with the help of MicroProfile Config API. Basic Database Settings +++++++++++++++++++++++ 1. Any of these settings can be set via system properties (see :ref:`jvm-options` starting at :ref:`dataverse.db.name`), environment variables or other MicroProfile Config mechanisms supported by the app server. `See Payara docs for supported sources `_. 2. Remember to protect your secrets. See :ref:`secure-password-storage` for more information. 3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase letters. Example: ``dataverse.db.host`` -> ``DATAVERSE_DB_HOST`` .. list-table:: :widths: 15 60 25 :header-rows: 1 :align: left * - MPCONFIG Key - Description - Default * - dataverse.db.host - The PostgreSQL server to connect to. - ``localhost`` * - dataverse.db.port - The PostgreSQL server port to connect to. - ``5432`` * - dataverse.db.user - The PostgreSQL user name to connect with. - | ``dataverse`` | (installer sets to ``dvnapp``) * - dataverse.db.password - The PostgreSQL users password to connect with. **Please note the safety advisory above.** - *No default* * - dataverse.db.name - The PostgreSQL database name to use for the Dataverse installation. - | ``dataverse`` | (installer sets to ``dvndb``) * - dataverse.db.parameters - Connection parameters, such as ``sslmode=require``. See `Postgres JDBC docs `_ Note: you don't need to provide the initial "?". - *Empty string* Advanced Database Settings ++++++++++++++++++++++++++ The following options are useful in many scenarios. You might be interested in debug output during development or monitoring performance in production. You can find more details within the Payara docs: - `User Guide: Connection Pool Configuration `_ - `Tech Doc: Advanced Connection Pool Configuration `_. Connection Validation ^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: 15 60 25 :header-rows: 1 :align: left * - MPCONFIG Key - Description - Default * - dataverse.db.is-connection-validation-required - ``true``: Validate connections, allow server to reconnect in case of failure. - false * - dataverse.db.connection-validation-method - | The method of connection validation: | ``table|autocommit|meta-data|custom-validation``. - *Empty string* * - dataverse.db.validation-table-name - The name of the table used for validation if the validation method is set to ``table``. - *Empty string* * - dataverse.db.validation-classname - The name of the custom class used for validation if the ``validation-method`` is set to ``custom-validation``. - *Empty string* * - dataverse.db.validate-atmost-once-period-in-seconds - Specifies the time interval in seconds between successive requests to validate a connection at most once. - ``0`` (disabled) Connection & Statement Leaks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: 15 60 25 :header-rows: 1 :align: left * - MPCONFIG Key - Description - Default * - dataverse.db.connection-leak-timeout-in-seconds - Specify timeout when connections count as "leaked". - ``0`` (disabled) * - dataverse.db.connection-leak-reclaim - If enabled, leaked connection will be reclaimed by the pool after connection leak timeout occurs. - ``false`` * - dataverse.db.statement-leak-timeout-in-seconds - Specifiy timeout when statements should be considered to be "leaked". - ``0`` (disabled) * - dataverse.db.statement-leak-reclaim - If enabled, leaked statement will be reclaimed by the pool after statement leak timeout occurs. - ``false`` Logging & Slow Performance ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: 15 60 25 :header-rows: 1 :align: left * - MPCONFIG Key - Description - Default * - dataverse.db.statement-timeout-in-seconds - Timeout property of a connection to enable termination of abnormally long running queries. - ``-1`` (disabled) * - dataverse.db.slow-query-threshold-in-seconds - SQL queries that exceed this time in seconds will be logged. - ``-1`` (disabled) * - dataverse.db.log-jdbc-calls - When set to true, all JDBC calls will be logged allowing tracing of all JDBC interactions including SQL. - ``false`` .. _file-storage: File Storage ------------ By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/payara6/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.\.directory`` JVM option described below. A Dataverse installation can alternately store files in a Swift or S3-compatible object store, or on a Globus endpoint, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection basis. A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web or Globus accessible trusted remote store. A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\.upload-out-of-band`` JVM option to ``true``. By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server). With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset ` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store. The following sections describe how to set up various types of stores and how to configure for multiple stores. Multi-store Basics ++++++++++++++++++ To support multiple stores, a Dataverse installation now requires an id, type, and label for each store (even for a single store configuration). These are configured by defining two required jvm options: .. code-block:: none ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..type=" ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..label=