Monitoring Laravel in Production — CloudWatch, Prometheus & Grafana

Deploying is not the finish line. How many times have you received a message like "Hey, the site is really slow" or "Why can't I place an order?" without having any idea there was a problem until users complained?

Monitoring helps you detect issues before users encounter them. This article walks you through setting up a comprehensive observability system for Laravel, from simple to advanced.

The Three Pillars of Observability

┌─────────────────────────────────────────┐
│            OBSERVABILITY                │
├──────────┬──────────┬───────────────────┤
│  LOGS    │ METRICS  │    TRACES         │
│          │          │                   │
│  "What   │ "How     │  "Where did the   │
│  happened"│ much,   │  request go, how  │
│          │ how long"│  long at each hop"│
│          │          │                   │
│ Monolog  │Prometheus│  OpenTelemetry    │
│CloudWatch│ Grafana  │  Jaeger/Zipkin    │
└──────────┴──────────┴───────────────────┘

Logs: Record events. "404 request at 15:30", "Payment failed for user #123".
Metrics: Aggregated numbers. "CPU 80%", "500 requests/second", "Average response time 200ms".
Traces: Track a single request's lifecycle across services. "Request → Controller → Database (150ms) → Redis (5ms) → Response".

Part 1: Logging — The Foundation

Configuring Monolog for Production

// config/logging.php
'channels' => [
    'stack' => [
        'driver' => 'stack',
        'channels' => ['daily', 'cloudwatch'],
        'ignore_exceptions' => false,
    ],

    'daily' => [
        'driver' => 'daily',
        'path' => storage_path('logs/laravel.log'),
        'level' => env('LOG_LEVEL', 'info'),
        'days' => 14,
        'replace_placeholders' => true,
    ],

    'cloudwatch' => [
        'driver' => 'custom',
        'via' => App\Logging\CloudWatchLoggerFactory::class,
        'level' => env('LOG_LEVEL', 'info'),
        'retention' => 30,
        'group_name' => env('CLOUDWATCH_LOG_GROUP', '/laravel/production'),
        'stream_name' => env('CLOUDWATCH_LOG_STREAM', 'application'),
    ],
],

Creating the CloudWatch Logger

composer require aws/aws-sdk-php maxbanton/cwh

// app/Logging/CloudWatchLoggerFactory.php
namespace App\Logging;

use Aws\CloudWatchLogs\CloudWatchLogsClient;
use Maxbanton\Cwh\Handler\CloudWatch;
use Monolog\Formatter\JsonFormatter;
use Monolog\Logger;

class CloudWatchLoggerFactory
{
    public function __invoke(array $config): Logger
    {
        $client = new CloudWatchLogsClient([
            'region'  => config('services.aws.region', 'ap-southeast-1'),
            'version' => 'latest',
        ]);

        $handler = new CloudWatch(
            client: $client,
            group: $config['group_name'],
            stream: $config['stream_name'],
            retentionDays: $config['retention'],
            batchSize: 25,
        );

        // JSON format for easy querying on CloudWatch Insights
        $handler->setFormatter(new JsonFormatter());

        $logger = new Logger('cloudwatch');
        $logger->pushHandler($handler);

        return $logger;
    }
}

Why JSON format? CloudWatch Logs Insights lets you query logs with SQL-like syntax. With JSON format, you can:

-- Find all errors in the last hour
fields @timestamp, context.exception, message
| filter level = "ERROR"
| sort @timestamp desc
| limit 50

-- Count errors by type
fields context.exception
| filter level = "ERROR"
| stats count(*) as error_count by context.exception
| sort error_count desc

Structured Logging — Add Context

Don't just log empty messages. Add context for easier debugging:

// ❌ Hard to debug
Log::error('Payment failed');

// ✅ Easy to debug
Log::error('Payment failed', [
    'user_id'    => $user->id,
    'order_id'   => $order->id,
    'amount'     => $order->total,
    'gateway'    => 'stripe',
    'error_code' => $e->getCode(),
    'error_msg'  => $e->getMessage(),
    'trace_id'   => request()->header('X-Request-ID'),
]);

Middleware to Attach Request ID

// app/Http/Middleware/RequestId.php
namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use Illuminate\Support\Str;
use Illuminate\Support\Facades\Log;

class RequestId
{
    public function handle(Request $request, Closure $next)
    {
        $requestId = $request->header('X-Request-ID', (string) Str::uuid());

        // Attach to all log entries
        Log::shareContext([
            'request_id' => $requestId,
            'ip'         => $request->ip(),
            'url'        => $request->fullUrl(),
            'method'     => $request->method(),
        ]);

        $response = $next($request);

        $response->headers->set('X-Request-ID', $requestId);

        return $response;
    }
}

Explanation of Log::shareContext(): Since Laravel 10+, this method automatically adds context to all log entries within the same request. You no longer need to pass $requestId everywhere you log.

Part 2: Metrics with Prometheus

Prometheus is an open-source monitoring system that works on a pull model: the Prometheus server periodically "scrapes" metrics from your application.

Installation

composer require promphp/prometheus_client_php

Creating the Metrics Service

// app/Services/MetricsService.php
namespace App\Services;

use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
use Prometheus\Storage\InMemory;
use Prometheus\Storage\Redis;

class MetricsService
{
    private CollectorRegistry $registry;

    public function __construct()
    {
        // Use Redis to persist metrics between requests
        // InMemory is only suitable for testing
        $this->registry = new CollectorRegistry(
            new Redis([
                'host' => config('database.redis.default.host'),
                'port' => config('database.redis.default.port'),
            ])
        );
    }

    /**
     * Count total HTTP requests by method, path, status
     */
    public function recordHttpRequest(
        string $method,
        string $path,
        int $statusCode,
        float $duration
    ): void {
        // Counter: only increments, never decreases
        $counter = $this->registry->getOrRegisterCounter(
            'laravel',
            'http_requests_total',
            'Total HTTP requests',
            ['method', 'path', 'status']
        );
        $counter->inc([$method, $path, $statusCode]);

        // Histogram: response time distribution
        $histogram = $this->registry->getOrRegisterHistogram(
            'laravel',
            'http_request_duration_seconds',
            'Response time (seconds)',
            ['method', 'path'],
            // Buckets: 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
            [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
        );
        $histogram->observe($duration, [$method, $path]);
    }

    /**
     * Measure database query count and duration
     */
    public function recordDatabaseQuery(float $duration, string $connection): void
    {
        $histogram = $this->registry->getOrRegisterHistogram(
            'laravel',
            'database_query_duration_seconds',
            'Database query duration',
            ['connection'],
            [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5]
        );
        $histogram->observe($duration, [$connection]);
    }

    /**
     * Count queue jobs
     */
    public function recordQueueJob(string $job, string $status): void
    {
        $counter = $this->registry->getOrRegisterCounter(
            'laravel',
            'queue_jobs_total',
            'Total queue jobs',
            ['job', 'status']
        );
        $counter->inc([$job, $status]);
    }

    /**
     * Gauge: current value (can go up/down)
     */
    public function setQueueSize(string $queue, int $size): void
    {
        $gauge = $this->registry->getOrRegisterGauge(
            'laravel',
            'queue_size',
            'Number of pending jobs in queue',
            ['queue']
        );
        $gauge->set($size, [$queue]);
    }

    /**
     * Render metrics in Prometheus text format
     */
    public function render(): string
    {
        $renderer = new RenderTextFormat();
        return $renderer->render($this->registry->getMetricFamilySamples());
    }
}

Explanation of metric types:

Counter: Only increments. Example: total requests, total errors. Resets to 0 on restart.
Histogram: Measures value distribution. Example: response time. Prometheus automatically calculates percentiles (p50, p95, p99).
Gauge: Current value, can go up or down. Example: queue size, memory usage.

Middleware to Collect Metrics

// app/Http/Middleware/CollectMetrics.php
namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use App\Services\MetricsService;

class CollectMetrics
{
    public function __construct(
        private MetricsService $metrics,
    ) {}

    public function handle(Request $request, Closure $next)
    {
        $start = microtime(true);

        $response = $next($request);

        $duration = microtime(true) - $start;

        // Normalize path to avoid cardinality explosion
        // /blog/my-post → /blog/{slug}
        $path = $this->normalizePath($request->route());

        $this->metrics->recordHttpRequest(
            method: $request->method(),
            path: $path,
            statusCode: $response->getStatusCode(),
            duration: $duration,
        );

        return $response;
    }

    private function normalizePath($route): string
    {
        if (!$route) {
            return 'unknown';
        }

        // Use route URI pattern instead of actual path
        // Example: /blog/{slug} instead of /blog/my-actual-post
        return '/' . ltrim($route->uri(), '/');
    }
}

Why normalizePath()? If you log actual paths (/blog/post-1, /blog/post-2, ...), Prometheus will create thousands of time series → high memory → slow queries. This is called cardinality explosion. Always group by pattern: /blog/{slug}.

Route Endpoint for Prometheus Scraping

// routes/web.php
use App\Services\MetricsService;

Route::get('/metrics', function (MetricsService $metrics) {
    // Protect this endpoint!
    return response($metrics->render(), 200, [
        'Content-Type' => 'text/plain; version=0.0.4',
    ]);
})->middleware('auth.basic'); // Or restrict by IP

Listening to Database Queries

// app/Providers/AppServiceProvider.php
use Illuminate\Support\Facades\DB;
use App\Services\MetricsService;

public function boot(): void
{
    // Only enable in production, slight overhead
    if (app()->isProduction()) {
        DB::listen(function ($query) {
            app(MetricsService::class)->recordDatabaseQuery(
                duration: $query->time / 1000, // ms → seconds
                connection: $query->connectionName,
            );
        });
    }
}

Part 3: Grafana Dashboards

Grafana connects to Prometheus to display metrics as beautiful charts.

Setup with Docker Compose

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
      - ./monitoring/dashboards:/etc/grafana/provisioning/dashboards
      - ./monitoring/datasources:/etc/grafana/provisioning/datasources
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
      - GF_INSTALL_PLUGINS=grafana-clock-panel

  # Node Exporter: system metrics (CPU, RAM, Disk)
  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'

volumes:
  prometheus_data:
  grafana_data:

Prometheus Config

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Metrics from Laravel app
  - job_name: 'laravel'
    metrics_path: /metrics
    basic_auth:
      username: prometheus
      password: your-metrics-password
    static_configs:
      - targets: ['your-app-server:80']
        labels:
          app: laravel-blog
          env: production

  # System metrics
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  # Nginx metrics
  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx-exporter:9113']

  # MySQL metrics
  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql-exporter:9104']

  # Redis metrics
  - job_name: 'redis'
    static_configs:
      - targets: ['redis-exporter:9121']

Grafana Datasource Auto-provisioning

# monitoring/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

Useful PromQL Queries for Dashboards

# 1. Request rate (requests/second)
rate(laravel_http_requests_total[5m])

# 2. Error rate (% of requests returning 5xx)
sum(rate(laravel_http_requests_total{status=~"5.."}[5m]))
/
sum(rate(laravel_http_requests_total[5m])) * 100

# 3. Response time percentile 95
histogram_quantile(0.95, 
  rate(laravel_http_request_duration_seconds_bucket[5m])
)

# 4. Response time percentile 99
histogram_quantile(0.99, 
  rate(laravel_http_request_duration_seconds_bucket[5m])
)

# 5. Slowest endpoints
topk(10, 
  histogram_quantile(0.95, 
    sum by (path, le) (
      rate(laravel_http_request_duration_seconds_bucket[5m])
    )
  )
)

# 6. Database query rate
rate(laravel_database_query_duration_seconds_count[5m])

# 7. Slow queries (> 100ms)
rate(laravel_database_query_duration_seconds_bucket{le="0.1"}[5m])

# 8. Current queue size
laravel_queue_size

# 9. Failed jobs rate
rate(laravel_queue_jobs_total{status="failed"}[5m])

# 10. CPU usage (from node-exporter)
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Part 4: Alerting — Notifications When Things Go Wrong

Prometheus Alert Rules

# monitoring/alerts.yml
groups:
  - name: laravel-alerts
    rules:
      # Error rate > 5% for 5 minutes
      - alert: HighErrorRate
        expr: |
          sum(rate(laravel_http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(laravel_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Abnormally high error rate"
          description: "Error rate is at {{ $value | humanizePercentage }}. Threshold: 5%"

      # Response time p95 > 2 seconds
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95,
            rate(laravel_http_request_duration_seconds_bucket[5m])
          ) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High response time"
          description: "P95 latency: {{ $value | humanizeDuration }}"

      # Queue size > 1000 jobs
      - alert: QueueBacklog
        expr: laravel_queue_size > 1000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Queue backlog detected"
          description: "{{ $value }} jobs waiting to be processed"

      # Disk usage > 85%
      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_size_bytes - node_filesystem_free_bytes)
          / node_filesystem_size_bytes > 0.85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Disk space running low"
          description: "Disk usage: {{ $value | humanizePercentage }}"

      # Server down
      - alert: ServerDown
        expr: up{job="laravel"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Laravel server is not responding"

Sending Alerts via Slack

# monitoring/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'slack-notifications'
  routes:
    - match:
        severity: critical
      receiver: 'slack-critical'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
        channel: '#monitoring'
        title: '{{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

  - name: 'slack-critical'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
        channel: '#alerts-critical'
        title: '🚨 {{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

Part 5: Health Check Endpoint

Create a simple endpoint to check application health:

// routes/web.php
Route::get('/health', function () {
    $checks = [];
    $healthy = true;

    // Check database
    try {
        DB::connection()->getPdo();
        $checks['database'] = 'ok';
    } catch (\Exception $e) {
        $checks['database'] = 'failed: ' . $e->getMessage();
        $healthy = false;
    }

    // Check Redis
    try {
        Cache::store('redis')->put('health-check', true, 10);
        $checks['redis'] = 'ok';
    } catch (\Exception $e) {
        $checks['redis'] = 'failed: ' . $e->getMessage();
        $healthy = false;
    }

    // Check disk space
    $freeSpace = disk_free_space('/');
    $totalSpace = disk_total_space('/');
    $usagePercent = round((1 - $freeSpace / $totalSpace) * 100, 1);
    $checks['disk'] = $usagePercent . '% used';
    if ($usagePercent > 90) {
        $healthy = false;
    }

    // Check queue
    try {
        $queueSize = Queue::size('default');
        $checks['queue_size'] = $queueSize;
        if ($queueSize > 1000) {
            $healthy = false;
        }
    } catch (\Exception $e) {
        $checks['queue'] = 'failed';
        $healthy = false;
    }

    return response()->json([
        'status' => $healthy ? 'healthy' : 'unhealthy',
        'checks' => $checks,
        'timestamp' => now()->toISOString(),
    ], $healthy ? 200 : 503);
});

Part 6: Laravel Pulse Integration (Laravel 11+)

If you don't want to set up Prometheus + Grafana, Laravel Pulse is the simpler built-in solution:

composer require laravel/pulse
php artisan vendor:publish --provider="Laravel\Pulse\PulseServiceProvider"
php artisan migrate

Pulse automatically tracks:

Slow requests
Slow queries
Slow jobs
Exceptions
Cache hits/misses
Queue throughput

Access the dashboard at /pulse.

Comparison:

	Laravel Pulse	Prometheus + Grafana
Setup	5 minutes	2-4 hours
Custom metrics	Limited	Unlimited
Long-term storage	Database	Prometheus TSDB
Alerting	❌	✅
Multi-server	⚠️	✅
Best for	Small/medium apps	Large apps, teams

Production Monitoring Checklist

Logging: JSON format, CloudWatch/ELK, structured context
Request ID: Every request has a unique ID throughout its lifecycle
Metrics: Response time, error rate, throughput
Health check: /health endpoint for ALB/uptime monitor
Alerting: Slack/email when error rate is high or server is down
Dashboard: Grafana or Pulse with key panels
Log rotation: Don't let log files eat all your disk space
Uptime monitoring: External service (UptimeRobot, Pingdom)

Conclusion

Monitoring isn't "nice to have" — it's mandatory for any production application. Start simple:

Week 1: Structured logging + CloudWatch/stderr
Week 2: Health check endpoint + UptimeRobot
Week 3: Laravel Pulse or basic Prometheus metrics
Week 4: Grafana dashboards + Slack alerts

You don't need to set up everything on day one. But at minimum, have logging with context and a health check endpoint before going live.

"If you can't measure it, you can't improve it." — Peter Drucker (and every SRE engineer)