Application Base Image

Contents:

Supported Image Tags
Image Contents
Build Instructions
- Automated Builds & Publishing
- Processor Architecture and Multiarch
Tunables
Locations
Exposed Ports
Entry & Extension Points
Other Hints

A “base image” offers you a pre-installed and pre-tuned application server to deploy Dataverse software to. Adding basic functionality like executing scripts at container boot, monitoring, memory tweaks etc is all done at this layer, to make the application image focus on the app itself.

NOTE: The base image does not contain the Dataverse application itself.

Within the main repository, you may find the base image’s files at <git root>/modules/container-base. This Maven module uses the Maven Docker Plugin to build and ship the image. You may use, extend, or alter this image to your liking and/or host in some different registry if you want to.

NOTE: This image is created, maintained and supported by the Dataverse community on a best-effort basis. IQSS will not offer you support how to deploy or run it, please reach out to the community for help on using it. You might be interested in taking a look at Docker, Kubernetes, and Containers, linking you to some (community-based) efforts.

Supported Image Tags

This image is sourced from the main upstream code repository of the Dataverse software. Development and maintenance of the image’s code happens there (again, by the community). Community-supported image tags are based on the two most important upstream branches:

The unstable tag corresponds to the develop branch, where pull requests are merged. (Dockerfile)
The stable tag corresponds to the master branch, where releases are cut from. (Dockerfile)

Image Contents

The base image provides:

Eclipse Temurin JRE using Java 11
Payara Community Application Server
CLI tools necessary to run Dataverse (i. e. curl or jq - see also Prerequisites in Installation Guide)
Linux tools for analysis, monitoring and so on
Jattach (attach to running JVM)
wait-for (tool to “wait for” a service to be available)
dumb-init (see below for details)

This image is created as a “multi-arch image”, see below.

It inherits (is built on) an Ubuntu environment from the upstream base image of Eclipse Temurin. You are free to change the JRE/JDK image to your liking (see below).

Build Instructions

Assuming you have Docker, Docker Desktop, Moby or some remote Docker host configured, up and running from here on.

Simply execute the Maven modules packaging target with activated “container profile. Either from the projects Git root:

mvn -Pct -f modules/container-base install

Or move to the module and execute:

cd modules/container-base && mvn -Pct install

Some additional notes, using Maven parameters to change the build and use …:

… a different tag only: add -Dbase.image.tag=tag.

Note: default is develop
… a different image name and tag: add -Dbase.image=name:tag.

Note: default is gdcc/base:${base.image.tag}
… a different image registry than Docker Hub: add -Ddocker.registry=registry.example.org (see also DMP docs on registries)
… a different Payara version: add -Dpayara.version=V.YYYY.R.
… a different Temurin JRE version A: add -Dtarget.java.version=A (i.e. 11, 17, …).

Note: must resolve to an available image tag A-jre of Eclipse Temurin! (See also Docker Hub search example)
… a different Java Distribution: add -Djava.image="name:tag" with precise reference to an image available local or remote.
… a different UID/GID for the payara user/group: add -Dbase.image.uid=1234 (or .gid)

Automated Builds & Publishing

To make reusing most simple, the image is built with a Github Action within the IQSS repository and then pushed to Docker Hub gdcc/base repository. It is built and pushed on every edit to its sources plus uncached scheduled nightly builds to make sure security updates are finding their way in.

Note: For the Github Action to be able to push to Docker Hub, two repository secrets (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN) have been added by IQSS admins to their repository.

Processor Architecture and Multiarch

This image is created as a “multi-arch image”, supporting the most common architectures Dataverse usually runs on: AMD64 (Windows/Linux/…) and ARM64 (Apple M1/M2), by using Maven Docker Plugin’s BuildX mode.

Building the image via mvn -Pct package or mvn -Pct install as above will only build for the architecture of the Docker maschine’s CPU.

Only mvn -Pct deploy will trigger building on all enabled architectures. Yet, to enable building with non-native code on your build machine, you will need to setup a cross-platform builder.

On Linux, you should install qemu-user-static (preferably via your package management) on the host and run docker run --rm --privileged multiarch/qemu-user-static --reset -p yes to enable that builder. The Docker plugin will setup everything else for you.

Tunables

The base image provides a Payara domain suited for production use, but can also be used during development. Many settings have been carefully selected for best performance and stability of the Dataverse application.

As with any service, you should always monitor any metrics and make use of the tuning capabilities the base image provides. These are mostly based on environment variables (very common with containers) and provide sane defaults.

Env. variable	Default	Type	Description
`DEPLOY_PROPS`	(empty)	String	Set to add arguments to generated asadmin deploy commands.
`PREBOOT_COMMANDS`	[preboot]	Abs. path	Provide path to file with `asadmin` commands to run before boot of application server. See also Pre/postboot script docs.
`POSTBOOT_COMMANDS`	[postboot]	Abs. path	Provide path to file with `asadmin` commands to run after boot of application server. See also Pre/postboot script docs.
`JVM_ARGS`	(empty)	String	Additional arguments to pass to application server’s JVM on start.
`MEM_MAX_RAM_PERCENTAGE`	`70.0`	Percentage	Maximum amount of container’s allocated RAM to be used as heap space. Make sure to leave some room for native memory, OS overhead etc!
`MEM_XSS`	`512k`	Size	Tune the maximum JVM stack size.
`MEM_MIN_HEAP_FREE_RATIO`	`20`	Integer	Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations.
`MEM_MAX_HEAP_FREE_RATIO`	`40`	Integer	Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations.
`MEM_MAX_GC_PAUSE_MILLIS`	`500`	Milliseconds	Shorter pause times might result in lots of collections causing overhead without much gain. This needs monitoring and tuning. It’s a complex matter.
`MEM_METASPACE_SIZE`	`256m`	Size	Initial size of memory reserved for class metadata, also used as trigger to run a garbage collection once passing this size.
`MEM_MAX_METASPACE_SIZE`	`2g`	Size	The metaspace’s size will not outgrow this limit.
`ENABLE_DUMPS`	`0`	Bool, `0\|1`	If enabled, the argument(s) given in `JVM_DUMP_ARG` will be added to the JVM starting up. This means it will enable dumping the heap to `${DUMPS_DIR}` (see below) in “out of memory” cases. (You should back this location with disk space / ramdisk, so it does not write into an overlay filesystem!)
`JVM_DUMPS_ARG`	[dump-option]	String	Can be fine tuned for more grained controls of dumping behaviour.
`ENABLE_JMX`	`0`	Bool, `0\|1`	Allow insecure JMX connections, enable AMX and tune all JMX monitoring levels to `HIGH`. See also Payara Docs - Basic Monitoring. A basic JMX service is enabled by default in Payara, exposing basic JVM MBeans, but especially no Payara MBeans.
`ENABLE_JDWP`	`0`	Bool, `0\|1`	Enable the “Java Debug Wire Protocol” to attach a remote debugger to the JVM in this container. Listens on port 9009 when enabled. Search the internet for numerous tutorials to use it.
`ENABLE_RELOAD`	`0`	Bool, `0\|1`	Enable the dynamic “hot” reloads of files when changed in a deployment. Useful for development, when new artifacts are copied into the running domain.
`DATAVERSE_HTTP_TIMEOUT`	`900`	Seconds	See Application Server Settings `http.request-timeout-seconds`. Note: can also be set using any other MicroProfile Config Sources available via `dataverse.http.timeout`.

preboot: ${CONFIG_DIR}/pre-boot-commands.asadmin
postboot: ${CONFIG_DIR}/post-boot-commands.asadmin
dump-option: -XX:+HeapDumpOnOutOfMemoryError

Locations

This environment variables represent certain locations and might be reused in your scripts etc. All of these variables aren’t meant to be reconfigurable and reflect state in the filesystem layout!

Writeable at build time:

The overlay filesystem of Docker and other container technologies is not meant to be used for any performance IO. You should avoid writing data anywhere in the file tree at runtime, except for well known locations with mounted volumes backing them (see below).

The locations below are meant to be written to when you build a container image, either this base or anything building upon it. You can also use these for references in scripts, etc.

Env. variable	Value	Description
`HOME_DIR`	`/opt/payara`	Home base to Payara and the application
`PAYARA_DIR`	`${HOME_DIR}/appserver`	Installation directory of Payara server
`SCRIPT_DIR`	`${HOME_DIR}/scripts`	Any scripts like the container entrypoint, init scripts, etc
`CONFIG_DIR`	`${HOME_DIR}/config`	Payara Server configurations like pre/postboot command files go here (Might be reused for Dataverse one day)
`DEPLOY_DIR`	`${HOME_DIR}/deployments`	Any EAR or WAR file, exploded WAR directory etc are autodeployed on start
`DOMAIN_DIR`	`${PAYARA_DIR}/glassfish` `/domains/${DOMAIN_NAME}`	Path to root of the Payara domain applications will be deployed into. Usually `${DOMAIN_NAME}` will be `domain1`.

Writeable at runtime:

The locations below are defined as Docker volumes by the base image. They will by default get backed by an “anonymous volume”, but you can (and should) bind-mount a host directory or named Docker volume in these places to avoid data loss, gain performance and/or use a network file system.

Notes: 1. On Kubernetes you still need to provide volume definitions for these places in your deployment objects! 2. You should not write data into these locations at build time - it will be shadowed by the mounted volumes!

Env. variable	Value	Description
`STORAGE_DIR`	`/dv`	This place is writeable by the Payara user, making it usable as a place to store research data, customizations or other. Images inheriting the base image should create distinct folders here, backed by different mounted volumes.
`SECRETS_DIR`	`/secrets`	Mount secrets or other here, being picked up automatically by Directory Config Source. See also various Configuration options involving secrets.
`DUMPS_DIR`	`/dumps`	Default location where heap dumps will be stored (see above). You should mount some storage here (disk or ephemeral).

Exposed Ports

The default ports that are exposed by this image are:

8080 - HTTP listener
4848 - Admin Service HTTPS listener
8686 - JMX listener
9009 - “Java Debug Wire Protocol” port (when ENABLE_JDWP=1)

The HTTPS listener (on port 8181) becomes deactivated during the build, as we will always need to reverse-proxy the application server and handle SSL/TLS termination at this point. Save the memory and some CPU cycles!

Entry & Extension Points

The entrypoint shell script provided by this base image will by default ensure to:

Run any scripts named ${SCRIPT_DIR}/init_* or in ${SCRIPT_DIR}/init.d/* directory for initialization before the application server starts.
Run an executable script ${SCRIPT_DIR}/startInBackground.sh in the background - if present.
Run the application server startup scripting in foreground (${SCRIPT_DIR}/startInForeground.sh).

If you need to create some scripting that runs in parallel under supervision of dumb-init, e.g. to wait for the application to deploy before executing something, this is your point of extension: simply provide the ${SCRIPT_DIR}/startInBackground.sh executable script with your application image.

Other Hints

By default, domain1 is enabled to use the G1GC garbage collector.

For running a Java application within a Linux based container, the support for CGroups is essential. It has been included and activated by default since Java 8u192, Java 11 LTS and later. If you are interested in more details, you can read about those in a few places like https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness, https://www.eclipse.org/openj9/docs/xxusecontainersupport, etc. The other memory defaults are inspired from run-java-sh recommendations.