Monitoring¶
Once you’re in production, you’ll want to set up some monitoring. This page may serve as a starting point for you but you are encouraged to share your ideas with the Dataverse community!
Contents:
Operating System Monitoring¶
In production you’ll want to monitor the usual suspects such as CPU, memory, free disk space, etc. There are a variety of tools in this space but we’ll highlight Munin below because it’s relatively easy to set up.
Munin¶
http://munin-monitoring.org says, “A default installation provides a lot of graphs with almost no work.” From RHEL or CentOS 7, you can try the following steps.
Enable the EPEL yum repo (if you haven’t already):
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Install Munin:
yum install munin
Start Munin:
systemctl start munin-node.service
Configure Munin to start at boot:
systemctl enable munin-node.service
Create a username/password (i.e. “admin” for both):
htpasswd /etc/munin/munin-htpasswd admin
Assuming you are fronting your app server with Apache, prevent Apache from proxying “/munin” traffic to the app server by adding the following line to your Apache config:
ProxyPassMatch ^/munin !
Then reload Apache to pick up the config change:
systemctl reload httpd.service
Test auth for the web interface:
curl http://localhost/munin/ -u admin:admin
At this point, graphs should start being generated for disk, network, processes, system, etc.
HTTP Traffic¶
HTTP traffic can be monitored from the client side, the server side, or both.
Monitoring HTTP Traffic from the Client Side¶
HTTP traffic for web clients that have cookies enabled (most browsers) can be tracked by Google Analytics (https://www.google.com/analytics/) and Matomo (formerly “Piwik”; https://matomo.org/) as explained in the Web Analytics Code section of the Installation Guide.
To track analytics beyond pageviews, style classes have been added for end user action buttons, which include:
btn-compute
, btn-contact
, btn-download
, btn-explore
, btn-export
, btn-preview
, btn-request
, btn-share
Monitoring HTTP Traffic from the Server Side¶
There are a wide variety of solutions available for monitoring HTTP traffic from the server side. The following are merely suggestions and a pull request against what is written here to add additional ideas is certainly welcome! Are you excited about the ELK stack (Elasticsearch, Logstash, and Kibana)? The TICK stack (Telegraph InfluxDB Chronograph and Kapacitor)? GoAccess? Prometheus? Graphite? Splunk? Please consider sharing your work with the Dataverse community!
AWStats¶
AWStats is a venerable tool for monitoring web traffic based on Apache access logs. On RHEL/CentOS 7, you can try the following steps.
Enable the EPEL yum repo:
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Install AWStats:
yum install awstats
Assuming you are using HTTPS rather than HTTP (and you should!), edit /etc/awstats/awstats.standalone.conf
and change LogFile="/var/log/httpd/access_log"
to LogFile="/var/log/httpd/ssl_access_log"
. In the same file, change LogFormat=1
to LogFormat=4
. Make both of these changes (LogFile
and LogFormat
in /etc/awstats/awstats.localhost.localdomain.conf
as well.
Process the logs:
/usr/share/awstats/tools/awstats_updateall.pl now
Please note that load balancers (such as Amazon’s ELB) might interfere with the LogFormat
mentioned above. To start troubleshooting errors such as AWStats did not find any valid log lines that match your LogFormat parameter
, you might need to bump up the value of NbOfLinesForCorruptedLog
in the config files above and re-try while you interate on your Apache and AWStats config.
Please note that the Dataverse Project team has attempted to parse Glassfish/Payara logs using AWStats but it didn’t seem to just work and posts have been made at https://stackoverflow.com/questions/49134154/what-logformat-definition-does-awstats-require-to-parse-glassfish-http-access-logs and https://sourceforge.net/p/awstats/discussion/43428/thread/9b1befda/ that can be followed up on some day.
Database Connection Pool Used by App Server¶
https://github.com/IQSS/dataverse/issues/2595 contains some information on enabling monitoring of app servers, which is disabled by default. It’s a TODO to document what to do here if there is sufficient interest.
actionlogrecord¶
There is a database table called actionlogrecord
that captures events that may be of interest. See https://github.com/IQSS/dataverse/issues/2729 for more discussion around this table.
An Important Note about ActionLogRecord Table:¶
Please note that in a busy production installation this table will be growing constantly. See the note on How to Keep ActionLogRecord in Trim in the Troubleshooting section of the guide.
Edit Draft Versions Logging¶
Changes made to draft versions of datasets are logged in a folder called logs/edit-drafts. See https://github.com/IQSS/dataverse/issues/5145 for more information on this logging.
Solr Indexing Failures Logging¶
Failures occurring during the indexing of Dataverse collections and datasets are logged in a folder called logs/process-failures. This logging will include instructions for manually re-running the failed processes. It may be advantageous to set up a automatic job to monitor new entries into this log folder so that indexes could be re-run.
EJB Timers¶
Should you be interested in monitoring the EJB timers, this script may be used as an example:
#!/usr/bin/env bash
# example monitoring script for EBJ timers.
# currently assumes that there are two timers
# real monitoring commands should replace the echo statements for production use
r0=`curl -s http://localhost:8080/ejb-timer-service-app/timer`
if [ $? -ne 0 ]; then
echo "alert - no timer service" # put real alert command here
fi
r1=`echo $r0 | grep -c "There are 2 active persistent timers on this container"`
if [ "1" -ne "$r1" ]; then
echo "alert - no active timers" # put real alert command here
fi
AWS RDS¶
Some installations of Dataverse use AWS’s “database as a service” offering called RDS (Relational Database Service) so it’s worth mentioning some monitoring tips here.
There are two documents that are especially worth reviewing:
Monitoring an Amazon RDS DB instance: The official documentation.
Performance Monitoring Workshop for RDS PostgreSQL and Aurora PostgreSQL: A workshop that steps through practical examples and even includes labs featuring tools to generate load.
Tips:
Enable Performance Insights. The product page includes a video from 2017 that is still compelling today. For example, the Top SQL tab shows the SQL queries that are contributing the most to database load. There’s also a video from 2018 mentioned in the overview that’s worth watching.
Note that Performance Insights is only available for PostgreSQL 10 and higher (also mentioned in docs). Version 11 has digest statistics enabled automatically but there’s an extra step for version 10.
Performance Insights policies describes how to give access to Performance Insights to someone who doesn’t have full access to RDS (
AmazonRDSFullAccess
).
Enable the slow query log and consider using pgbadger to analyze the log files. Set
log_min_duration_statement
to “5000”, for example, to log all queries that take 5 seconds or more. See enable query logging in the user guide or slides from the workshop for details. Using pgbadger is also mentioned as a common DBA task.Use CloudWatch. CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance. It’s a separate service to log into so access can be granted more freely than to RDS. See CloudWatch docs.
Use Enhanced Monitoring. Enhanced Monitoring gathers its metrics from an agent on the instance. See Enhanced Monitoring docs.
It’s possible to view and act on RDS Events such as snapshots, parameter changes, etc. See Working with Amazon RDS events for details.
RDS monitoring is available via API and the
aws
command line tool. For example, see Retrieving metrics with the Performance Insights API.To play with monitoring RDS using a server configured by dataverse-ansible set
use_rds
to true to skip some steps that aren’t necessary when using RDS. See also the Deployment section of the Developer Guide.