Deploying the dashboards#
Warning
ANY CHANGES YOU MAKE VIA THE GRAFANA UI WILL BE OVERWRITTEN NEXT TIME YOU RUN deploy.bash. TO MAKE CHANGES, EDIT THE JSONNET FILE AND DEPLOY AGAIN
Pre-requisites#
Locally, you need to have jsonnet installed. The grafonnet library is already vendored in, using jsonnet-builder.
A recent version of prometheus installed on your cluster. Currently, it is assumed that your prometheus instance is installed using the prometheus helm chart, with kube-state-metrics, node-exporter and cadvisor enabled. In addition, you should scrape metrics from the hub instance as well.
Tip
If you’re using a prometheus chart older than version
14.*
, then you can deploy the dashboards available prior to the upgrade, in the1.0 tag
.kube-state-metrics
must be configured to add some labels to metrics since version 2.0. If deployed with the prometheus helm chart, the config should look like this:kube-state-metrics: metricLabelsAllowlist: # to select jupyterhub component pods and get the hub usernames - pods=[app,component,hub.jupyter.org/username] # allowing all labels is probably fine for nodes, since they don't churn much, unlike pods - nodes=[*]
Tip
Make sure this is indented correctly where it should be!
A recent version of Grafana, with a prometheus data source already added.
An API key with ‘admin’ permissions. This is per-organization, and you can make a new one by going to the configuration pane for your Grafana (the gear icon on the left bar), and selecting ‘API Keys’. The admin permission is needed to query list of data sources so we can auto-populate template variable options (such as list of hubs).
Additional prometheus exporters#
Some very useful metrics (such as home directory free space) require additional collectors to be installed in your cluster, customized to your needs.
Per-user home directory metrics (size, last modified, total entries, etc)#
When using a shared home directory for users, it is helpful to collect information on each user’s home directory - total size, last time any files within it were modified, etc. This helps notice when a single user is using a lot of space, often accidentally! The prometheus-dirsize-exporter can be deployed to collect this information efficiently for querying. Here is an example YAML for deployment:
# To provide data for the jupyterhub/grafana-dashboards dashboard about per-user
# home directories in the shared volume, which contains users home folders etc, we deploy
# prometheus node-exporter to collect this data for prometheus server to scrape.
#
# This is based on the Deployment manifest in jupyterhub/grafana-dashboards'
# readme: https://github.com/jupyterhub/grafana-dashboards#additional-collectors
#
apiVersion: apps/v1
kind: Deployment
metadata:
name: shared-dirsize-metrics
labels:
app: jupyterhub
component: shared-dirsize-metrics
spec:
replicas: 1
selector:
matchLabels:
app: jupyterhub
component: shared-dirsize-metrics
template:
metadata:
annotations:
# This enables prometheus to actually scrape metrics from here
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
labels:
app: jupyterhub
component: shared-dirsize-metrics
spec:
containers:
- name: dirsize-exporter
# From https://github.com/yuvipanda/prometheus-dirsize-exporter
image: quay.io/yuvipanda/prometheus-dirsize-exporter:v1.2
resources:
# Provide *very few* resources for this collector, as it can
# baloon up (especially in CPU) quite easily. We are quite ok with
# the collection taking a while as long as we aren't costing too much
# CPU or RAM
requests:
memory: 16Mi
cpu: 0.01
limits:
cpu: 0.05
memory: 128Mi
command:
- dirsize-exporter
- /shared-volume
- "250" # Use only 250 io operations per second at most
- "120" # Wait 2h between runs
- --port=8000
ports:
- containerPort: 8000
name: dirsize-metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
runAsGroup: 0
runAsUser: 1000
volumeMounts:
- name: shared-volume
mountPath: /shared-volume
readOnly: true
securityContext:
fsGroup: 65534
volumes:
# This is the volume that we will mount and monitor. You should reference
# a shared volume containing home directories etc. This is often a PVC
# bound to a PV referencing a NFS server.
- name: shared-volume
persistentVolumeClaim:
claimName: home-nfs
You will likely only need to adjust the claimName
above to use this example.
Deploy the dashbaords#
There’s a helper deploy.py
script that can deploy the dashboards to any grafana installation.
# note the leading space in the command below, it makes the
# sensitive command not be stored in your shell history
export GRAFANA_TOKEN="<API-TOKEN-FOR-YOUR-GRAFANA>
./deploy.py <your-grafana-url>
This creates a folder called ‘JupyterHub Default Dashboards’ in your grafana, and adds a couple of dashboards to it.
The Activity dashboard is unique because it repeats rows of panels for every prometheus datasource accessible by Grafana.
If your Grafana instance uses a self-signed certificate, use the --no-tls-verify
flag when executing the deploy.py
script. For example:
./deploy.py <your-grafana-url> --no-tls-verify