Integrating LLMs in Laravel: A Production Guide

Integrating AI into Laravel is easy. Making it production-ready is hard. Users hate waiting 10 seconds for a spinner. APIs time out. Tokens cost money.

This guide goes beyond simple Http::post calls and explores how to build robust, streaming AI features in Laravel.

1. Choosing the Right Abstraction

Don't lock yourself into OpenAI. In 2026, models change weekly. You might want to swap GPT-4 for Claude 3.5 Sonnet or a local Llama 3 model without rewriting your app.

Recommended Package: Echo Labs Prism

Prism is the "Laravel interface for AI". It provides a fluent driver-based API, similar to Mail or Notification channels.

composer require echolabs/prism

use EchoLabs\Prism\Prism;
use EchoLabs\Prism\Enums\Provider;

$response = Prism::text()
    ->using(Provider::OpenAI, 'gpt-4o')
    ->withPrompt('Explain Laravel Middleware')
    ->generate();

echo $response->text;

If you switch to Anthropic later, you just change the Provider enum.

2. The Power of Streaming (Server-Sent Events)

AI models generate text token-by-token. You should show this to the user immediately. Waiting for the full response makes the application feel frozen.

Laravel supports "Streamed Responses" natively.

// routes/web.php

Route::get('/chat-stream', function () {
    return response()->stream(function () {
        // Pseudo-code using direct API for clarity
        $stream = OpenAI::chat()->createStreamed([
            'model' => 'gpt-4o',
            'messages' => [['role' => 'user', 'content' => 'Tell me a story']],
        ]);

        foreach ($stream as $response) {
            $text = $response->choices[0]->delta->content;
            if ($text) {
                echo "data: " . json_encode(['text' => $text]) . "\n\n";
                ob_flush();
                flush();
            }
        }
        echo "data: [DONE]\n\n";
    }, 200, [
        'Content-Type' => 'text/event-stream',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no', // Critical for Nginx!
    ]);
});

On the frontend (JS), use EventSource to consume this.

3. Handling Context Windows & Token Limits

Sending your entire database to the LLM will:

Crash the request (Context Limit Exceeded).
Bankrupt you (Cost).

Calculation Strategy

Before sending, estimate tokens. A rough rule of thumb: 1 token ≈ 4 characters.

$maxContext = 128000;
$currentTokens = strlen($prompt) / 4;

if ($currentTokens > $maxContext) {
    throw new Exception("Prompt too large!");
}

For Chat History, implement a "sliding window". Only send the last 10 messages, or summarize older messages.

4. Robust Error Handling

AI APIs are unstable. They rate-limit frequently.

Never call them in a synchronous controller without a try-catch block. For background jobs, use Laravel's retry mechanism with exponential backoff.

// app/Jobs/GenerateSummary.php

public $tries = 5;
public $backoff = [10, 30, 60, 120]; // Wait longer each time

public function handle()
{
    try {
        // Call AI
    } catch (RateLimitException $e) {
        $this->release(30); // Try again in 30s
    }
}

5. Structured Data (JSON Mode)

Don't ask the AI to "return a list separated by commas". Ask for JSON. OpenAI has a dedicated json_object mode.

$response = Http::withToken($key)->post(..., [
    'response_format' => ['type' => 'json_object'],
    'messages' => [
        ['role' => 'system', 'content' => 'You output JSON only.'],
        ['role' => 'user', 'content' => 'Extract name and email from text...'],
    ]
]);

$data = json_decode($response['choices'][0]['message']['content']);
// Reliable PHP object!

Conclusion

Abstract: Use Prism or similar to avoid vendor lock-in.
Stream: Improve UX by showing chunks of text.
Queue: Handle long-running tasks in background jobs.
Retry: Expect API failures and handle them gracefully.

AI integration is less about the "AI" and more about solid "Systems Engineering".