Application Base Image

The base image contains Payara and other dependencies that the Dataverse software runs on. It is the foundation for the Dataverse Application Image. Note that some dependencies, such as PostgreSQL and Solr, run in their own containers and are not part of the base image.

A “base image” offers you a pre-installed and pre-tuned application server to deploy Dataverse software to. Adding basic functionality like executing scripts at container boot, monitoring, memory tweaks etc. is all done at this layer, to make the application image focus on the app itself.

NOTE: The base image does not contain the Dataverse application itself.

Within the main repository, you may find the base image’s files at <git root>/modules/container-base. This Maven module uses the Maven Docker Plugin to build and ship the image. You may use, extend, or alter this image to your liking and/or host in some different registry if you want to.

NOTE: This image is created, maintained and supported by the Dataverse community on a best-effort basis. IQSS will not offer you support how to deploy or run it, please reach out to the community (Getting Help) for help on using it. You might be interested in taking a look at Docker, Kubernetes, and Containers, linking you to some (community-based) efforts.

Supported Image Tags

This image is sourced from the main upstream code repository of the Dataverse software. Development and maintenance of the image’s code happens there (again, by the community). Community-supported image tags are based on the two most important upstream branches:

  • The unstable tag corresponds to the develop branch, where pull requests are merged. (Dockerfile)

  • The alpha tag corresponds to the master branch, where releases are cut from. (Dockerfile)

Image Contents

The base image provides:

This image is created as a “multi-arch image”, see below.

It inherits (is built on) an Ubuntu environment from the upstream base image of Eclipse Temurin. You are free to change the JRE/JDK image to your liking (see below).

Build Instructions

Assuming you have Docker, Docker Desktop, Moby or some remote Docker host configured, up and running from here on.

Simply execute the Maven modules packaging target with activated “container” profile. Either from the projects Git root:

mvn -Pct -f modules/container-base install

Or move to the module and execute:

cd modules/container-base && mvn -Pct install

Some additional notes, using Maven parameters to change the build and use …:

  • … a different tag only: add -Dbase.image.tag=tag.
    Note: default is unstable
  • … a different image name and tag: add -Dbase.image=name:tag.
    Note: default is gdcc/base:${base.image.tag}
  • … a different image registry than Docker Hub: add -Ddocker.registry=registry.example.org (see also DMP docs on registries)

  • … a different Payara version: add -Dpayara.version=V.YYYY.R.

  • … a different Temurin JRE version A: add -Dtarget.java.version=A (i.e. 11, 17, …).
    Note: must resolve to an available image tag A-jre of Eclipse Temurin! (See also Docker Hub search example)
  • … a different Java Distribution: add -Djava.image="name:tag" with precise reference to an image available local or remote.

  • … a different UID/GID for the payara user/group: add -Dbase.image.uid=1234 (or .gid)

Automated Builds & Publishing

To make reusing most simple, the image is built with a Github Action within the IQSS repository and then pushed to Docker Hub gdcc/base repository. It is built and pushed on every edit to its sources plus uncached scheduled nightly builds to make sure security updates are finding their way in.

Note: For the Github Action to be able to push to Docker Hub, two repository secrets (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN) have been added by IQSS admins to their repository.

Processor Architecture and Multiarch

This image is created as a “multi-arch image”, supporting the most common architectures Dataverse usually runs on: AMD64 (Windows/Linux/…) and ARM64 (Apple M1/M2), by using Maven Docker Plugin’s BuildX mode.

Building the image via mvn -Pct package or mvn -Pct install as above will only build for the architecture of the Docker machine’s CPU.

Only mvn -Pct deploy will trigger building on all enabled architectures (and will try to push the images to a registry, which is Docker Hub by default).

You can specify which architectures you would like to build for and include by them as a comma separated list: mvn -Pct deploy -Ddocker.platforms="linux/amd64,linux/arm64". The shown configuration is the default and may be omitted.

Yet, to enable building with non-native code on your build machine, you will need to setup a cross-platform builder!

On Linux, you should install qemu-user-static (preferably via your package management) on the host and run docker run --rm --privileged multiarch/qemu-user-static --reset -p yes to enable that builder. The Docker plugin will setup everything else for you.

The upstream CI workflows publish images supporting AMD64 and ARM64 (see e.g. tag details on Docker Hub)

Tunables

The base image provides a Payara domain suited for production use, but can also be used during development. Many settings have been carefully selected for best performance and stability of the Dataverse application.

As with any service, you should always monitor any metrics and make use of the tuning capabilities the base image provides. These are mostly based on environment variables (very common with containers) and provide sane defaults.

Env. variable

Default

Type

Description

DEPLOY_PROPS

(empty)

String

Set to add arguments to generated asadmin deploy commands.

PREBOOT_COMMANDS

[preboot]

Abs. path

Provide path to file with asadmin commands to run before boot of application server. See also Pre/postboot script docs.

POSTBOOT_COMMANDS

[postboot]

Abs. path

Provide path to file with asadmin commands to run after boot of application server. See also Pre/postboot script docs.

JVM_ARGS

(empty)

String

Additional arguments to pass to application server’s JVM on start.

MEM_MAX_RAM_PERCENTAGE

70.0

Percentage

Maximum amount of container’s allocated RAM to be used as heap space. Make sure to leave some room for native memory, OS overhead etc!

MEM_XSS

512k

Size

Tune the maximum JVM stack size.

MEM_MIN_HEAP_FREE_RATIO

20

Integer

Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations.

MEM_MAX_HEAP_FREE_RATIO

40

Integer

Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations.

MEM_MAX_GC_PAUSE_MILLIS

500

Milliseconds

Shorter pause times might result in lots of collections causing overhead without much gain. This needs monitoring and tuning. It’s a complex matter.

MEM_METASPACE_SIZE

256m

Size

Initial size of memory reserved for class metadata, also used as trigger to run a garbage collection once passing this size.

MEM_MAX_METASPACE_SIZE

2g

Size

The metaspace’s size will not outgrow this limit.

ENABLE_DUMPS

0

Bool, 0|1

If enabled, the argument(s) given in JVM_DUMP_ARG will be added to the JVM starting up. This means it will enable dumping the heap to ${DUMPS_DIR} (see below) in “out of memory” cases. (You should back this location with disk space / ramdisk, so it does not write into an overlay filesystem!)

JVM_DUMPS_ARG

[dump-option]

String

Can be fine tuned for more grained controls of dumping behaviour.

ENABLE_JMX

0

Bool, 0|1

Allow insecure JMX connections, enable AMX and tune all JMX monitoring levels to HIGH. See also Payara Docs - Basic Monitoring. A basic JMX service is enabled by default in Payara, exposing basic JVM MBeans, but especially no Payara MBeans.

ENABLE_JDWP

0

Bool, 0|1

Enable the “Java Debug Wire Protocol” to attach a remote debugger to the JVM in this container. Listens on port 9009 when enabled. Search the internet for numerous tutorials to use it.

ENABLE_RELOAD

0

Bool, 0|1

Enable the dynamic “hot” reloads of files when changed in a deployment. Useful for development, when new artifacts are copied into the running domain.

DATAVERSE_HTTP_TIMEOUT

900

Seconds

See Application Server Settings http.request-timeout-seconds.

Note: can also be set using any other MicroProfile Config Sources available via dataverse.http.timeout.

preboot

${CONFIG_DIR}/pre-boot-commands.asadmin

postboot

${CONFIG_DIR}/post-boot-commands.asadmin

dump-option

-XX:+HeapDumpOnOutOfMemoryError

Locations

This environment variables represent certain locations and might be reused in your scripts etc. All of these variables aren’t meant to be reconfigurable and reflect state in the filesystem layout!

Writeable at build time:

The overlay filesystem of Docker and other container technologies is not meant to be used for any performance IO. You should avoid writing data anywhere in the file tree at runtime, except for well known locations with mounted volumes backing them (see below).

The locations below are meant to be written to when you build a container image, either this base or anything building upon it. You can also use these for references in scripts, etc.

Env. variable

Value

Description

HOME_DIR

/opt/payara

Home base to Payara and the application

PAYARA_DIR

${HOME_DIR}/appserver

Installation directory of Payara server

SCRIPT_DIR

${HOME_DIR}/scripts

Any scripts like the container entrypoint, init scripts, etc

CONFIG_DIR

${HOME_DIR}/config

Payara Server configurations like pre/postboot command files go here (Might be reused for Dataverse one day)

DEPLOY_DIR

${HOME_DIR}/deployments

Any EAR or WAR file, exploded WAR directory etc are autodeployed on start

DOMAIN_DIR

${PAYARA_DIR}/glassfish /domains/${DOMAIN_NAME}

Path to root of the Payara domain applications will be deployed into. Usually ${DOMAIN_NAME} will be domain1.

Writeable at runtime:

The locations below are defined as Docker volumes by the base image. They will by default get backed by an “anonymous volume”, but you can (and should) bind-mount a host directory or named Docker volume in these places to avoid data loss, gain performance and/or use a network file system.

Notes: 1. On Kubernetes you still need to provide volume definitions for these places in your deployment objects! 2. You should not write data into these locations at build time - it will be shadowed by the mounted volumes!

Env. variable

Value

Description

STORAGE_DIR

/dv

This place is writeable by the Payara user, making it usable as a place to store research data, customizations or other. Images inheriting the base image should create distinct folders here, backed by different mounted volumes.

SECRETS_DIR

/secrets

Mount secrets or other here, being picked up automatically by Directory Config Source. See also various Configuration options involving secrets.

DUMPS_DIR

/dumps

Default location where heap dumps will be stored (see above). You should mount some storage here (disk or ephemeral).

Exposed Ports

The default ports that are exposed by this image are:

  • 8080 - HTTP listener

  • 4848 - Admin Service HTTPS listener

  • 8686 - JMX listener

  • 9009 - “Java Debug Wire Protocol” port (when ENABLE_JDWP=1)

The HTTPS listener (on port 8181) becomes deactivated during the build, as we will always need to reverse-proxy the application server and handle SSL/TLS termination at this point. Save the memory and some CPU cycles!

Entry & Extension Points

The entrypoint shell script provided by this base image will by default ensure to:

  • Run any scripts named ${SCRIPT_DIR}/init_* or in ${SCRIPT_DIR}/init.d/* directory for initialization before the application server starts.

  • Run an executable script ${SCRIPT_DIR}/startInBackground.sh in the background - if present.

  • Run the application server startup scripting in foreground (${SCRIPT_DIR}/startInForeground.sh).

If you need to create some scripting that runs in parallel under supervision of dumb-init, e.g. to wait for the application to deploy before executing something, this is your point of extension: simply provide the ${SCRIPT_DIR}/startInBackground.sh executable script with your application image.

Other Hints

By default, domain1 is enabled to use the G1GC garbage collector.

For running a Java application within a Linux based container, the support for CGroups is essential. It has been included and activated by default since Java 8u192, Java 11 LTS and later. If you are interested in more details, you can read about those in a few places like https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness, https://www.eclipse.org/openj9/docs/xxusecontainersupport, etc. The other memory defaults are inspired from run-java-sh recommendations.