Step-by-step procedure to install KubeLedger on Kubernetes and OpenShift.
Prerequisites
- Kubernetes v1.19+ or OpenShift 4.x+
- Helm 3.x
- Kubernetes Metrics Server deployed in your cluster
- (Optional) NVIDIA DCGM Exporter for GPU metrics
Verify Metrics Server
Before installing, ensure the Metrics Server is running:
# Check if metrics-server is deployed
kubectl -n kube-system get deploy | grep metrics-server
# Verify it's working
kubectl top nodes
If not installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Installation
Option 1: Kustomize (Quick Start)
This approach deploys KubeLedger with default settings. Review the resources in ./manifests/kubeledger/kustomize/resources/.
For advanced customization, use Helm instead.
Default settings:
- Persistent volume with
1Gistorage request - Standard Kubernetes (not OpenShift)
- Pod runs with UID/GID
4583
# Create namespace and deploy
kubectl create ns kubeledger
kubectl -n kubeledger apply -k ./manifests/kubeledger/kustomize
# Wait for the pod to start
kubectl -n kubeledger get po -w
Option 2: Helm
# Add the Helm repository
helm repo add kubeledger https://realopslabs.github.io/kubeledger
helm repo update
# Install latest version
helm install kubeledger kubeledger/kubeledger \
--namespace kubeledger \
--create-namespace
# Or install a specific version
helm install kubeledger kubeledger/kubeledger \
--version 1.0.0 \
--namespace kubeledger \
--create-namespace
Configuration
Create a values.yaml file and customize your installation parameters.
image:
repository: ghcr.io/realopslabs/kubeledger
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
persistence:
enabled: true
size: 1Gi
storageClass: "" # Uses default storage class
dcgm:
enabled: false
endpoint: "http://dcgm-exporter.gpu-operator:9400/metrics"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Common Customizations
| Setting | Description |
|---|---|
.dataVolume.persist: false | Use emptyDir for local testing |
.dataVolume.persist: true | Use persistent volume (default) |
.dataVolume.capacity | Persistent volume size (default: 1Gi) |
.dataVolume.storageClass | Storage class name (uses cluster default if unset) |
.securityContext.openshift: true | Enable OpenShift mode (binds nonroot-v2 SCC) |
.dcgm.enable: true | Enable DCGM integration for GPU monitoring |
.dcgm.endpoint | DCGM metrics endpoint URL (e.g., http://dcgm-exporter.gpu-operator.svc:9400/metrics) |
.resources.requests.cpu | CPU request |
.resources.requests.memory | Memory request |
.envs | Environment variables (see Configuration Settings) |
Install with custom values:
helm install kubeledger oci://ghcr.io/realopslabs/charts/kubeledger \
--namespace kubeledger \
--create-namespace \
-f values.yaml
OpenShift Installation
# Create project
oc new-project kubeledger
# Install with OpenShift-specific settings
helm install kubeledger oci://ghcr.io/realopslabs/charts/kubeledger \
--namespace kubeledger \
--set securityContext.openshift=true
Verify Installation
# Check pod status
kubectl -n kubeledger get pods
# Check events if no pod appears
kubectl -n kubeledger get ev
# Check logs
kubectl -n kubeledger logs -l app=kubeledger
# Port-forward to access the dashboard
kubectl -n kubeledger port-forward svc/kubeledger 5483:80
Open http://localhost:5483 in your browser.
GPU Metrics (Optional)
Ensure DCGM Exporter is deployed:
# Check if DCGM Exporter is running
kubectl get daemonset -A | grep dcgm
If not installed:
helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm install dcgm-exporter gpu-helm-charts/dcgm-exporter \
--namespace gpu-operator \
--create-namespace
Enable GPU metrics in KubeLedger:
helm upgrade kubeledger oci://ghcr.io/realopslabs/charts/kubeledger \
--namespace kubeledger \
--set dcgm.enabled=true \
--set dcgm.endpoint=http://dcgm-exporter.gpu-operator:9400/metrics
Upgrade
helm repo update
helm upgrade kubeledger kubeledger/kubeledger \
--namespace kubeledger
Uninstall
helm uninstall kubeledger -n kubeledger
kubectl delete namespace kubeledger
Prometheus Integration
KubeLedger exposes metrics at /metrics. Add the following scrape config:
- job_name: 'kubeledger'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
regex: kubeledger
action: keep
- source_labels: [__meta_kubernetes_pod_label_app]
regex: kubeledger
action: keep
- source_labels: [__address__]
regex: (.+):.*
replacement: ${1}:5483
target_label: __address__
Troubleshooting
Pod stuck in CrashLoopBackOff
# Check logs
kubectl logs -f deployment/kubeledger -n kubeledger
# Verify RBAC permissions
kubectl auth can-i get pods --as=system:serviceaccount:kubeledger:kubeledger
No data appearing in dashboard
- Wait 5-10 minutes for initial data collection
- Verify the pod can reach the Kubernetes API
- Confirm Metrics Server is working:
kubectl top nodes
Metrics not appearing in Prometheus
- Ensure the
/metricsendpoint is accessible - Check ServiceMonitor/PodMonitor configuration if using Prometheus Operator
- Verify network policies allow Prometheus to scrape the pod
