Application Base Image
The base image contains Payara and other dependencies that the Dataverse software runs on. It is the foundation for the Dataverse Application Image. Note that some dependencies, such as PostgreSQL and Solr, run in their own containers and are not part of the base image.
Contents:
A “base image” offers you a pre-installed and pre-tuned application server to deploy Dataverse software to. Adding basic functionality like executing scripts at container boot, monitoring, memory tweaks etc. is all done at this layer, to make the application image focus on the app itself.
NOTE: The base image does not contain the Dataverse application itself.
Within the main repository, you may find the base image’s files at <git root>/modules/container-base
.
This Maven module uses the Maven Docker Plugin to build and ship the image.
You may use, extend, or alter this image to your liking and/or host in some different registry if you want to.
NOTE: This image is created, maintained and supported by the Dataverse community on a best-effort basis. IQSS will not offer you support how to deploy or run it, please reach out to the community (Getting Help) for help on using it. You might be interested in taking a look at Docker, Kubernetes, and Containers, linking you to some (community-based) efforts.
Supported Image Tags
This image is sourced from the main upstream code repository of the Dataverse software. Development and maintenance of the image’s code happens there (again, by the community). Community-supported image tags are based on the two most important upstream branches:
The
unstable
tag corresponds to thedevelop
branch, where pull requests are merged. (Dockerfile)The
alpha
tag corresponds to themaster
branch, where releases are cut from. (Dockerfile)
Image Contents
The base image provides:
CLI tools necessary to run Dataverse (i. e.
curl
orjq
- see also Prerequisites in Installation Guide)Linux tools for analysis, monitoring and so on
Jattach (attach to running JVM)
wait-for (tool to “wait for” a service to be available)
This image is created as a “multi-arch image”, see below.
It inherits (is built on) an Ubuntu environment from the upstream base image of Eclipse Temurin. You are free to change the JRE/JDK image to your liking (see below).
Build Instructions
Assuming you have Docker, Docker Desktop, Moby or some remote Docker host configured, up and running from here on.
Simply execute the Maven modules packaging target with activated “container” profile. Either from the projects Git root:
mvn -Pct -f modules/container-base install
Or move to the module and execute:
cd modules/container-base && mvn -Pct install
Some additional notes, using Maven parameters to change the build and use …:
- … a different tag only: add
-Dbase.image.tag=tag
.Note: default isunstable
- … a different image name and tag: add
-Dbase.image=name:tag
.Note: default isgdcc/base:${base.image.tag}
… a different image registry than Docker Hub: add
-Ddocker.registry=registry.example.org
(see also DMP docs on registries)… a different Payara version: add
-Dpayara.version=V.YYYY.R
.- … a different Temurin JRE version
A
: add-Dtarget.java.version=A
(i.e.11
,17
, …).Note: must resolve to an available image tagA-jre
of Eclipse Temurin! (See also Docker Hub search example) … a different Java Distribution: add
-Djava.image="name:tag"
with precise reference to an image available local or remote.… a different UID/GID for the
payara
user/group: add-Dbase.image.uid=1234
(or.gid
)
Automated Builds & Publishing
To make reusing most simple, the image is built with a Github Action within the IQSS repository and then pushed to Docker Hub gdcc/base repository. It is built and pushed on every edit to its sources plus uncached scheduled nightly builds to make sure security updates are finding their way in.
Note: For the Github Action to be able to push to Docker Hub, two repository secrets (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN) have been added by IQSS admins to their repository.
Processor Architecture and Multiarch
This image is created as a “multi-arch image”, supporting the most common architectures Dataverse usually runs on: AMD64 (Windows/Linux/…) and ARM64 (Apple M1/M2), by using Maven Docker Plugin’s BuildX mode.
Building the image via mvn -Pct package
or mvn -Pct install
as above will only build for the architecture of
the Docker machine’s CPU.
Only mvn -Pct deploy
will trigger building on all enabled architectures (and will try to push the images to a
registry, which is Docker Hub by default).
You can specify which architectures you would like to build for and include by them as a comma separated list:
mvn -Pct deploy -Ddocker.platforms="linux/amd64,linux/arm64"
. The shown configuration is the default and may be omitted.
Yet, to enable building with non-native code on your build machine, you will need to setup a cross-platform builder!
On Linux, you should install qemu-user-static (preferably via
your package management) on the host and run docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
to enable that builder. The Docker plugin will setup everything else for you.
The upstream CI workflows publish images supporting AMD64 and ARM64 (see e.g. tag details on Docker Hub)
Tunables
The base image provides a Payara domain suited for production use, but can also be used during development. Many settings have been carefully selected for best performance and stability of the Dataverse application.
As with any service, you should always monitor any metrics and make use of the tuning capabilities the base image provides. These are mostly based on environment variables (very common with containers) and provide sane defaults.
Env. variable |
Default |
Type |
Description |
---|---|---|---|
|
(empty) |
String |
Set to add arguments to generated asadmin deploy commands. |
|
Abs. path |
Provide path to file with |
|
|
Abs. path |
Provide path to file with |
|
|
(empty) |
String |
Additional arguments to pass to application server’s JVM on start. |
|
|
Percentage |
Maximum amount of container’s allocated RAM to be used as heap space. Make sure to leave some room for native memory, OS overhead etc! |
|
|
Size |
Tune the maximum JVM stack size. |
|
|
Integer |
Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations. |
|
|
Integer |
Make the heap shrink aggressively and grow conservatively. See also run-java-sh recommendations. |
|
|
Milliseconds |
Shorter pause times might result in lots of collections causing overhead without much gain. This needs monitoring and tuning. It’s a complex matter. |
|
|
Size |
Initial size of memory reserved for class metadata, also used as trigger to run a garbage collection once passing this size. |
|
|
Size |
The metaspace’s size will not outgrow this limit. |
|
|
Bool, |
If enabled, the argument(s) given in |
|
String |
Can be fine tuned for more grained controls of dumping behaviour. |
|
|
|
Bool, |
Allow insecure JMX connections, enable AMX and tune all JMX monitoring levels to |
|
|
Bool, |
Enable the “Java Debug Wire Protocol” to attach a remote debugger to the JVM in this container. Listens on port 9009 when enabled. Search the internet for numerous tutorials to use it. |
|
|
Bool, |
Enable the dynamic “hot” reloads of files when changed in a deployment. Useful for development, when new artifacts are copied into the running domain. |
|
|
Seconds |
See Application Server Settings Note: can also be set using any other MicroProfile Config Sources available via |
- preboot
${CONFIG_DIR}/pre-boot-commands.asadmin
- postboot
${CONFIG_DIR}/post-boot-commands.asadmin
- dump-option
-XX:+HeapDumpOnOutOfMemoryError
Locations
This environment variables represent certain locations and might be reused in your scripts etc. All of these variables aren’t meant to be reconfigurable and reflect state in the filesystem layout!
Writeable at build time:
The overlay filesystem of Docker and other container technologies is not meant to be used for any performance IO. You should avoid writing data anywhere in the file tree at runtime, except for well known locations with mounted volumes backing them (see below).
The locations below are meant to be written to when you build a container image, either this base or anything building upon it. You can also use these for references in scripts, etc.
Env. variable |
Value |
Description |
---|---|---|
|
|
Home base to Payara and the application |
|
|
Installation directory of Payara server |
|
|
Any scripts like the container entrypoint, init scripts, etc |
|
|
Payara Server configurations like pre/postboot command files go here (Might be reused for Dataverse one day) |
|
|
Any EAR or WAR file, exploded WAR directory etc are autodeployed on start |
|
|
Path to root of the Payara domain applications will be deployed into. Usually |
Writeable at runtime:
The locations below are defined as Docker volumes by the base image. They will by default get backed by an “anonymous volume”, but you can (and should) bind-mount a host directory or named Docker volume in these places to avoid data loss, gain performance and/or use a network file system.
Notes: 1. On Kubernetes you still need to provide volume definitions for these places in your deployment objects! 2. You should not write data into these locations at build time - it will be shadowed by the mounted volumes!
Env. variable |
Value |
Description |
---|---|---|
|
|
This place is writeable by the Payara user, making it usable as a place to store research data, customizations or other. Images inheriting the base image should create distinct folders here, backed by different mounted volumes. |
|
|
Mount secrets or other here, being picked up automatically by Directory Config Source. See also various Configuration options involving secrets. |
|
|
Default location where heap dumps will be stored (see above). You should mount some storage here (disk or ephemeral). |
Exposed Ports
The default ports that are exposed by this image are:
8080 - HTTP listener
4848 - Admin Service HTTPS listener
8686 - JMX listener
9009 - “Java Debug Wire Protocol” port (when
ENABLE_JDWP=1
)
The HTTPS listener (on port 8181) becomes deactivated during the build, as we will always need to reverse-proxy the application server and handle SSL/TLS termination at this point. Save the memory and some CPU cycles!
Entry & Extension Points
The entrypoint shell script provided by this base image will by default ensure to:
Run any scripts named
${SCRIPT_DIR}/init_*
or in${SCRIPT_DIR}/init.d/*
directory for initialization before the application server starts.Run an executable script
${SCRIPT_DIR}/startInBackground.sh
in the background - if present.Run the application server startup scripting in foreground (
${SCRIPT_DIR}/startInForeground.sh
).
If you need to create some scripting that runs in parallel under supervision of dumb-init,
e.g. to wait for the application to deploy before executing something, this is your point of extension: simply provide
the ${SCRIPT_DIR}/startInBackground.sh
executable script with your application image.
Other Hints
By default, domain1
is enabled to use the G1GC
garbage collector.
For running a Java application within a Linux based container, the support for CGroups is essential. It has been included and activated by default since Java 8u192, Java 11 LTS and later. If you are interested in more details, you can read about those in a few places like https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness, https://www.eclipse.org/openj9/docs/xxusecontainersupport, etc. The other memory defaults are inspired from run-java-sh recommendations.