Advanced installations are not officially supported but here we are at least documenting some tips and tricks that you might find helpful. You can find a diagram of an advanced installation in the Preparation section.
Contents:
You should be conscious of the following when running multiple app servers.
/usr/local/payara5/glassfish/domains/domain1/docroot/logos
./usr/local/payara5/glassfish/domains/domain1/docroot/sitemap
.dataverse.db.password
, etc.) are stored per app server.If you have successfully installed multiple app servers behind a load balancer you might like to know which server a user has landed on. A straightforward solution is to place a file called host.txt
in a directory that is served up by Apache such as /var/www/html
and then configure Apache not to proxy requests to /host.txt
to the app server. Here are some example commands on RHEL/CentOS 7 that accomplish this:
[root@server1 ~]# vim /etc/httpd/conf.d/ssl.conf
[root@server1 ~]# grep host.txt /etc/httpd/conf.d/ssl.conf
ProxyPassMatch ^/host.txt !
[root@server1 ~]# systemctl restart httpd.service
[root@server1 ~]# echo $HOSTNAME > /var/www/html/host.txt
[root@server1 ~]# curl https://dataverse.example.edu/host.txt
server1.example.edu
You would repeat the steps above for all of your app servers. If users seem to be having a problem with a particular server, you can ask them to visit https://dataverse.example.edu/host.txt and let you know what they see there (e.g. “server1.example.edu”) to help you know which server to troubleshoot.
Please note that Network Ports under the Configuration section has more information on fronting your app server with Apache. The Shibboleth section talks about the use of ProxyPassMatch
.
As of Dataverse v5.0 we offer an experimental optimization for the
multi-file, download-as-zip functionality. If this option
(:CustomZipDownloadServiceUrl
) is enabled, instead of enforcing
the size limit on multi-file zipped downloads (as normally specified
by the option :ZipDownloadLimit
), we attempt to serve all the
files that the user requested (that they are authorized to download),
but the request is redirected to a standalone zipper service running
as a cgi-bin executable under Apache. Thus moving these potentially
long-running jobs completely outside the Application Server (Payara);
and preventing worker threads from becoming locked serving them. Since
zipping is also a CPU-intensive task, it is possible to have this
service running on a different host system, freeing the cycles on the
main Application Server. (The system running the service needs to have
access to the database as well as to the storage filesystem, and/or S3
bucket).
Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree for more information.
To install: You can follow the instructions in the file above to build
ZipDownloadService-v1.0.0.jar
. It will also be available, pre-built as part of the Dataverse release on GitHub. Copy it, together with the shell
script scripts/zipdownload/cgi-bin/zipdownload to the cgi-bin
directory of the chosen Apache server (/var/www/cgi-bin standard).
Make sure the shell script (zipdownload) is executable, and edit it to configure the
database access credentials. Do note that the executable does not need
access to the entire Dataverse database. A security-conscious admin
can create a dedicated database user with access to just one table:
CUSTOMZIPSERVICEREQUEST
.
You may need to make extra Apache configuration changes to make sure /cgi-bin/zipdownload is accessible from the outside. For example, if this is the same Apache that’s in front of your Dataverse Payara instance, you will need to add another pass through statement to your configuration:
ProxyPassMatch ^/cgi-bin/zipdownload !
Test this by accessing it directly at <SERVER URL>/cgi-bin/download
. You should get a 404 No such download job!
. If instead you are getting an “internal server error”, this may be an SELinux issue; try setenforce Permissive
. If you are getting a generic Dataverse “not found” page, review the ProxyPassMatch
rule you have added.
To activate in Dataverse:
curl -X PUT -d '/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl