Grafana + Prometheus: The Complete Self-Hosted Monitoring Stack

Look, I'm going to be real with you. If you're not monitoring your infrastructure, you're basically flying blind. And when something breaks at 3am (it will), you'll wish you had set this up months ago.
The good news? Getting production-grade monitoring running takes about the same time as your morning coffee. Here's how to stop paying DataDog $200/month and run the same thing yourself for ~$15.
Why Prometheus + Grafana (And Why Together)
Prometheus collects metrics. Grafana visualizes them. They're the peanut butter and jelly of self-hosted monitoring, and they've been battle-tested by companies running infrastructure way bigger than yours or mine.
What you get:
- Real-time system metrics (CPU, RAM, disk, network)
- Custom application metrics (request latency, error rates, business KPIs)
- Alerting that actually works (Slack, email, PagerDuty)
- Time-series data storage with efficient compression
- Beautiful dashboards that make you look like you know what you're doing
The best part? Both are open-source with no per-user fees or license costs. Your infrastructure bill stays predictable as you scale.
The Full Stack Setup
Here's the docker-compose.yml that gets everything running. No magic, no vendor lock-in, just containers and configuration:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "172.17.0.1:9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
restart: always
grafana:
image: grafana/grafana:latest
ports:
- "172.17.0.1:3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=your-secure-password
- GF_SERVER_ROOT_URL=https://your-domain.com
restart: always
node-exporter:
image: prom/node-exporter:latest
ports:
- "172.17.0.1:9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
restart: always
volumes:
prometheus-data:
grafana-data:
What's happening here:
- Prometheus scrapes metrics every 15 seconds and stores them for 30 days
- Grafana connects to Prometheus as a data source and visualizes everything
- Node Exporter exposes system metrics (CPU, RAM, disk) that Prometheus collects
Prometheus Configuration
Create prometheus.yml in your project directory:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Add your app metrics here
- job_name: 'my-app'
static_configs:
- targets: ['172.17.0.1:8080']
Prometheus will now scrape metrics from itself, the node exporter, and any app you point it at (assuming your app exposes metrics at /metrics).
Setting Up Grafana Dashboards
Once your stack is running, access Grafana at https://your-domain.com:3000:
Add Prometheus data source:
- Settings → Data Sources → Add Prometheus
- URL:
http://prometheus:9090 - Save & Test
Import pre-built dashboard:
- Dashboards → Import → Enter ID
1860(Node Exporter Full) - Select Prometheus data source
- Click Import
You now have a production-ready dashboard showing CPU, RAM, disk, network, and system load. Took about 2 minutes.
The Part Everyone Messes Up: Alerting
Metrics without alerts are just pretty graphs. Here's how to get notified 30 minutes before your server crashes:
Add to your prometheus.yml:
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- 'alert.rules.yml'
Create alert.rules.yml:
groups:
- name: system_alerts
interval: 30s
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for 5 minutes"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 85% for 5 minutes"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15
for: 10m
annotations:
summary: "Disk space running low"
description: "Less than 15% disk space remaining"
These alerts fire when CPU hits 80%, memory hits 85%, or disk space drops below 15%. Adjust thresholds based on your setup.
Cost Comparison: SaaS vs Self-Hosted
Let's be honest about the money:
| Expense | DataDog/New Relic | Self-Hosted (Elestio) |
|---|---|---|
| License/Platform Fee | $100-200/month | $0 (open-source) |
| Infrastructure | Included | $15/month (2 CPU / 4GB RAM) |
| Per-Host Fees | $15-31/host | $0 |
| Total (5 hosts) | $175-355/month | $15/month |
| Annual Savings | - | $1,920-4,080 |
Running this on Elestio costs about $15/month for a 2 CPU / 4GB RAM instance. That handles monitoring for 10-20 servers easily. DataDog charges you per host, per metric, and per user. The bill gets stupid fast.
Troubleshooting Common Issues
Prometheus not scraping targets:
- Check
Status → Targetsin Prometheus UI - Verify network connectivity:
curl http://node-exporter:9100/metrics - Check firewall rules aren't blocking scrape ports
Grafana dashboard shows "No Data":
- Verify Prometheus data source connection (Settings → Data Sources → Test)
- Check Prometheus is successfully scraping:
Status → Targetsshould show "UP" - Verify metric names in dashboard queries match your Prometheus metrics
Disk fills up quickly:
- Prometheus default retention is 15 days - adjust
--storage.tsdb.retention.time - Configure metric relabeling to drop high-cardinality labels
- Consider using remote storage for long-term retention
Alerts not firing:
- Check Alertmanager is running and reachable
- Verify alert rules syntax:
promtool check rules alert.rules.yml - Check Prometheus logs for evaluation errors
Deploy on Elestio (The Easy Way)
If you don't want to manage this yourself:
- Create Grafana instance on Elestio
- Create Prometheus instance on Elestio
- Both come pre-configured with SSL, backups, and automatic updates
- Connect Grafana to Prometheus (Elestio provides internal URLs)
- Import dashboards and you're done
The whole thing takes maybe 10 minutes, and you get professional infrastructure without the ops overhead.
What You've Built
You now have:
- Production-grade monitoring stack collecting system and application metrics
- Real-time dashboards visualizing CPU, RAM, disk, network, and custom metrics
- Proactive alerts that warn you before disasters happen
- Complete control over your data (no vendor lock-in)
- ~$2,000-4,000/year saved compared to SaaS alternatives
This setup scales from side projects to serious production workloads. Add more exporters for databases, web servers, or custom apps. Build dashboards for business metrics. Set up federated Prometheus for multi-datacenter monitoring.
The infrastructure monitoring problem? Solved. Now go fix the bugs your dashboards are about to reveal.
Deploy Grafana on Elestio: Get Started
Deploy Prometheus on Elestio: Get Started
Thanks for reading ❤️