Building an AI Chatbot with Laravel and Streaming Responses
Most AI chatbot tutorials show you how to send a prompt and wait for a complete response. In production, that means users stare at a spinner for 5-15 seconds. Streaming fixes this — tokens appear as they're generated, just like ChatGPT.
This guide builds a complete chatbot: streaming backend with Server-Sent Events, conversation memory, multi-provider support, and a lightweight frontend with vanilla JS.
Architecture
┌──────────┐ POST /chat ┌──────────────┐ Stream ┌───────────┐
│ Browser │ ──────────────────▶│ Laravel │ ──────────▶│ OpenAI/ │
│ (SSE) │ ◀─────────────────│ Controller │ ◀──────────│ Claude │
│ │ text/event-stream │ │ Chunks │ │
└──────────┘ └──────────────┘ └───────────┘
Key decisions:
- Server-Sent Events (SSE) over WebSockets — simpler, HTTP-native, no extra infrastructure
- Session-based memory — no database for conversations (kept simple)
- Provider abstraction — swap OpenAI ↔ Anthropic without touching controllers
Install Dependencies
composer require openai-php/laravel
# or for Anthropic:
composer require anthropic-ai/laravel
# .env
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
# Or Anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
The Chat Service
Provider Interface
// app/Services/AI/ChatProviderInterface.php
namespace App\Services\AI;
use Generator;
interface ChatProviderInterface
{
/**
* Send messages and get a complete response.
*
* @param array<int, array{role: string, content: string}> $messages
*/
public function chat(array $messages, string $model = ''): string;
/**
* Send messages and stream the response token by token.
*
* @param array<int, array{role: string, content: string}> $messages
* @return Generator<int, string, void, void>
*/
public function stream(array $messages, string $model = ''): Generator;
}
OpenAI Implementation
// app/Services/AI/OpenAIChatProvider.php
namespace App\Services\AI;
use Generator;
use OpenAI\Laravel\Facades\OpenAI;
class OpenAIChatProvider implements ChatProviderInterface
{
public function chat(array $messages, string $model = ''): string
{
$model = $model ?: config('services.openai.model', 'gpt-4o');
$response = OpenAI::chat()->create([
'model' => $model,
'messages' => $messages,
'max_tokens' => 2048,
]);
return $response->choices[0]->message->content ?? '';
}
public function stream(array $messages, string $model = ''): Generator
{
$model = $model ?: config('services.openai.model', 'gpt-4o');
$stream = OpenAI::chat()->createStreamed([
'model' => $model,
'messages' => $messages,
'max_tokens' => 2048,
]);
foreach ($stream as $response) {
$delta = $response->choices[0]->delta->content ?? '';
if ($delta !== '') {
yield $delta;
}
}
}
}
Anthropic Implementation
// app/Services/AI/AnthropicChatProvider.php
namespace App\Services\AI;
use Generator;
use Illuminate\Support\Facades\Http;
class AnthropicChatProvider implements ChatProviderInterface
{
private string $baseUrl = 'https://api.anthropic.com/v1';
public function chat(array $messages, string $model = ''): string
{
$model = $model ?: config('services.anthropic.model', 'claude-sonnet-4-20250514');
$response = Http::withHeaders($this->headers())
->post("{$this->baseUrl}/messages", [
'model' => $model,
'max_tokens' => 2048,
'messages' => $this->formatMessages($messages),
'system' => $this->extractSystem($messages),
]);
return $response->json('content.0.text', '');
}
public function stream(array $messages, string $model = ''): Generator
{
$model = $model ?: config('services.anthropic.model', 'claude-sonnet-4-20250514');
$response = Http::withHeaders($this->headers())
->withOptions(['stream' => true])
->post("{$this->baseUrl}/messages", [
'model' => $model,
'max_tokens' => 2048,
'messages' => $this->formatMessages($messages),
'system' => $this->extractSystem($messages),
'stream' => true,
]);
$body = $response->getBody();
$buffer = '';
while (!$body->eof()) {
$buffer .= $body->read(1024);
while (($pos = strpos($buffer, "\n")) !== false) {
$line = substr($buffer, 0, $pos);
$buffer = substr($buffer, $pos + 1);
if (!str_starts_with($line, 'data: ')) {
continue;
}
$data = json_decode(substr($line, 6), true);
if ($data === null) {
continue;
}
if (($data['type'] ?? '') === 'content_block_delta') {
$text = $data['delta']['text'] ?? '';
if ($text !== '') {
yield $text;
}
}
if (($data['type'] ?? '') === 'message_stop') {
return;
}
}
}
}
private function headers(): array
{
return [
'x-api-key' => config('services.anthropic.api_key'),
'anthropic-version' => '2023-06-01',
'Content-Type' => 'application/json',
];
}
private function formatMessages(array $messages): array
{
return array_values(array_filter(
$messages,
fn (array $msg) => $msg['role'] !== 'system'
));
}
private function extractSystem(array $messages): string
{
foreach ($messages as $msg) {
if ($msg['role'] === 'system') {
return $msg['content'];
}
}
return 'You are a helpful assistant.';
}
}
Service Provider Binding
// app/Providers/AppServiceProvider.php
use App\Services\AI\ChatProviderInterface;
use App\Services\AI\OpenAIChatProvider;
use App\Services\AI\AnthropicChatProvider;
public function register(): void
{
$this->app->singleton(ChatProviderInterface::class, function () {
$provider = config('services.ai.default', 'openai');
return match ($provider) {
'anthropic' => new AnthropicChatProvider(),
default => new OpenAIChatProvider(),
};
});
}
// config/services.php
'ai' => [
'default' => env('AI_PROVIDER', 'openai'),
],
Streaming Controller
This is the core — returning a StreamedResponse with Server-Sent Events:
// app/Http/Controllers/ChatController.php
namespace App\Http\Controllers;
use App\Services\AI\ChatProviderInterface;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\RateLimiter;
use Symfony\Component\HttpFoundation\StreamedResponse;
class ChatController extends Controller
{
public function __construct(
private ChatProviderInterface $chatProvider,
) {}
public function index()
{
return view('chat.index');
}
public function stream(Request $request): StreamedResponse
{
$request->validate([
'message' => ['required', 'string', 'max:2000'],
]);
// Rate limiting: 20 messages per minute per session
$key = 'chat:' . $request->session()->getId();
if (RateLimiter::tooManyAttempts($key, 20)) {
abort(429, 'Too many messages. Please wait a moment.');
}
RateLimiter::hit($key, 60);
$userMessage = $request->input('message');
// Build conversation from session
$conversation = $request->session()->get('chat_history', []);
$conversation[] = ['role' => 'user', 'content' => $userMessage];
// Prepare messages with system prompt
$messages = array_merge(
[['role' => 'system', 'content' => $this->systemPrompt()]],
$this->trimConversation($conversation),
);
return new StreamedResponse(function () use ($messages, $conversation, $request) {
$fullResponse = '';
// Send each token as an SSE event
foreach ($this->chatProvider->stream($messages) as $token) {
$fullResponse .= $token;
echo "data: " . json_encode(['token' => $token]) . "\n\n";
if (ob_get_level() > 0) {
ob_flush();
}
flush();
}
// Send completion event
echo "data: " . json_encode(['done' => true]) . "\n\n";
if (ob_get_level() > 0) {
ob_flush();
}
flush();
// Save assistant response to session
$conversation[] = ['role' => 'assistant', 'content' => $fullResponse];
$request->session()->put('chat_history', $conversation);
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'Connection' => 'keep-alive',
'X-Accel-Buffering' => 'no', // Disable Nginx buffering
]);
}
public function clear(Request $request)
{
$request->session()->forget('chat_history');
return response()->json(['status' => 'cleared']);
}
private function systemPrompt(): string
{
return <<<'PROMPT'
You are a helpful technical assistant for a Laravel developer blog.
Answer questions about Laravel, PHP, DevOps, and web development.
Be concise, use code examples when helpful, and format with Markdown.
If you're unsure, say so. Do not make up information.
PROMPT;
}
/**
* Keep only the last N messages to stay within token limits.
*/
private function trimConversation(array $conversation, int $maxMessages = 20): array
{
if (count($conversation) <= $maxMessages) {
return $conversation;
}
return array_slice($conversation, -$maxMessages);
}
}
Routes
// routes/web.php
Route::get('/chat', [ChatController::class, 'index'])->name('chat.index');
Route::post('/chat/stream', [ChatController::class, 'stream'])->name('chat.stream');
Route::post('/chat/clear', [ChatController::class, 'clear'])->name('chat.clear');
Frontend: Vanilla JS with SSE
No React, no Vue — just Blade and vanilla JavaScript:
{{-- resources/views/chat/index.blade.php --}}
@extends('layouts.app')
@section('content')
<div class="max-w-3xl mx-auto px-4 py-8">
<div class="flex items-center justify-between mb-6">
<h1 class="text-2xl font-bold dark:text-white">AI Chat</h1>
<button id="clear-btn"
class="text-sm text-gray-500 hover:text-red-500 transition">
Clear History
</button>
</div>
{{-- Message Container --}}
<div id="messages"
class="space-y-4 mb-6 max-h-[60vh] overflow-y-auto scroll-smooth">
<div class="text-gray-400 text-center py-8" id="empty-state">
Ask me anything about Laravel, PHP, or web development.
</div>
</div>
{{-- Input Form --}}
<form id="chat-form" class="flex gap-3">
@csrf
<input type="text"
id="message-input"
name="message"
placeholder="Type your message..."
maxlength="2000"
autocomplete="off"
class="flex-1 rounded-lg border border-gray-300 dark:border-gray-600
bg-white dark:bg-gray-800 px-4 py-3
text-gray-900 dark:text-white
focus:ring-2 focus:ring-indigo-500 focus:border-transparent
outline-none transition"
required>
<button type="submit"
id="send-btn"
class="bg-indigo-600 hover:bg-indigo-700 text-white px-6 py-3
rounded-lg font-medium transition disabled:opacity-50
disabled:cursor-not-allowed">
Send
</button>
</form>
</div>
<script>
document.addEventListener('DOMContentLoaded', () => {
const form = document.getElementById('chat-form');
const input = document.getElementById('message-input');
const messages = document.getElementById('messages');
const sendBtn = document.getElementById('send-btn');
const clearBtn = document.getElementById('clear-btn');
const emptyState = document.getElementById('empty-state');
let isStreaming = false;
form.addEventListener('submit', async (e) => {
e.preventDefault();
const message = input.value.trim();
if (!message || isStreaming) return;
// Remove empty state
if (emptyState) emptyState.remove();
// Add user message
appendMessage('user', message);
input.value = '';
setStreaming(true);
// Create assistant message container
const assistantEl = appendMessage('assistant', '');
const contentEl = assistantEl.querySelector('.message-content');
try {
const response = await fetch('{{ route("chat.stream") }}', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRF-TOKEN': '{{ csrf_token() }}',
'Accept': 'text/event-stream',
},
body: JSON.stringify({ message }),
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
let fullText = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop(); // Keep incomplete line
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
try {
const data = JSON.parse(line.slice(6));
if (data.token) {
fullText += data.token;
contentEl.innerHTML = renderMarkdown(fullText);
scrollToBottom();
}
if (data.done) {
// Streaming complete
}
} catch {
// Skip malformed JSON
}
}
}
} catch (error) {
contentEl.textContent = `Error: ${error.message}. Please try again.`;
contentEl.classList.add('text-red-500');
} finally {
setStreaming(false);
input.focus();
}
});
clearBtn.addEventListener('click', async () => {
await fetch('{{ route("chat.clear") }}', {
method: 'POST',
headers: {
'X-CSRF-TOKEN': '{{ csrf_token() }}',
},
});
messages.innerHTML = `
<div class="text-gray-400 text-center py-8" id="empty-state">
Ask me anything about Laravel, PHP, or web development.
</div>
`;
});
function appendMessage(role, content) {
const wrapper = document.createElement('div');
wrapper.className = `flex ${role === 'user' ? 'justify-end' : 'justify-start'}`;
const bubble = document.createElement('div');
bubble.className = role === 'user'
? 'bg-indigo-600 text-white rounded-2xl rounded-br-md px-4 py-3 max-w-[80%]'
: 'bg-gray-100 dark:bg-gray-800 text-gray-900 dark:text-gray-100 rounded-2xl rounded-bl-md px-4 py-3 max-w-[80%]';
const contentEl = document.createElement('div');
contentEl.className = 'message-content prose dark:prose-invert prose-sm max-w-none';
contentEl.innerHTML = content ? renderMarkdown(content) : '<span class="animate-pulse">●●●</span>';
bubble.appendChild(contentEl);
wrapper.appendChild(bubble);
messages.appendChild(wrapper);
scrollToBottom();
return wrapper;
}
function renderMarkdown(text) {
// Basic Markdown rendering — for production, use a library like marked.js
return text
// Code blocks
.replace(/```(\w+)?\n([\s\S]*?)```/g, '<pre><code class="language-$1">$2</code></pre>')
// Inline code
.replace(/`([^`]+)`/g, '<code>$1</code>')
// Bold
.replace(/\*\*(.+?)\*\*/g, '<strong>$1</strong>')
// Italic
.replace(/\*(.+?)\*/g, '<em>$1</em>')
// Line breaks
.replace(/\n/g, '<br>');
}
function scrollToBottom() {
messages.scrollTop = messages.scrollHeight;
}
function setStreaming(state) {
isStreaming = state;
sendBtn.disabled = state;
input.disabled = state;
}
// Focus input on load
input.focus();
});
</script>
@endsection
Nginx Configuration
Nginx buffers SSE by default — disable it:
location /chat/stream {
proxy_pass http://127.0.0.1:8000;
proxy_buffering off;
proxy_cache off;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
# Increase timeout for long streams
proxy_read_timeout 120s;
}
Or handle it in the response header (already done in our controller):
'X-Accel-Buffering' => 'no', // This tells Nginx to disable buffering
Conversation Memory
Session-Based (Simple)
Already implemented in the controller. Conversations live in the session and disappear when the session expires.
Database-Based (Persistent)
For persistent conversations across sessions:
// database/migrations/create_chat_conversations_table.php
Schema::create('chat_conversations', function (Blueprint $table) {
$table->id();
$table->string('session_id')->index();
$table->json('messages');
$table->string('title')->nullable();
$table->timestamps();
});
// app/Models/ChatConversation.php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class ChatConversation extends Model
{
protected $fillable = ['session_id', 'messages', 'title'];
protected function casts(): array
{
return [
'messages' => 'array',
];
}
}
Token Counting and Cost Control
Prevent runaway API costs:
// app/Services/AI/TokenCounter.php
namespace App\Services\AI;
class TokenCounter
{
/**
* Rough estimate: ~4 characters per token for English text.
*/
public static function estimate(string $text): int
{
return (int) ceil(mb_strlen($text) / 4);
}
public static function estimateMessages(array $messages): int
{
$total = 0;
foreach ($messages as $message) {
$total += self::estimate($message['content']);
$total += 4; // Message overhead
}
return $total;
}
}
Use in the controller:
// Before sending to API
$estimatedTokens = TokenCounter::estimateMessages($messages);
$maxInputTokens = 4000;
if ($estimatedTokens > $maxInputTokens) {
// Trim oldest messages until within budget
while (TokenCounter::estimateMessages($messages) > $maxInputTokens && count($messages) > 2) {
array_splice($messages, 1, 1); // Remove oldest non-system message
}
}
Error Handling
// In ChatController::stream()
return new StreamedResponse(function () use ($messages, $conversation, $request) {
try {
$fullResponse = '';
foreach ($this->chatProvider->stream($messages) as $token) {
$fullResponse .= $token;
echo "data: " . json_encode(['token' => $token]) . "\n\n";
if (ob_get_level() > 0) {
ob_flush();
}
flush();
}
echo "data: " . json_encode(['done' => true]) . "\n\n";
// Save to session
$conversation[] = ['role' => 'assistant', 'content' => $fullResponse];
$request->session()->put('chat_history', $conversation);
} catch (\OpenAI\Exceptions\ErrorException $e) {
$error = match ($e->getCode()) {
429 => 'Rate limited by AI provider. Please try again in a moment.',
500, 503 => 'AI service is temporarily unavailable.',
default => 'An error occurred. Please try again.',
};
echo "data: " . json_encode(['error' => $error]) . "\n\n";
} catch (\Throwable $e) {
report($e);
echo "data: " . json_encode(['error' => 'An unexpected error occurred.']) . "\n\n";
}
if (ob_get_level() > 0) {
ob_flush();
}
flush();
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'Connection' => 'keep-alive',
'X-Accel-Buffering' => 'no',
]);
Handle errors in the frontend:
// In the SSE reader loop
if (data.error) {
contentEl.textContent = data.error;
contentEl.classList.add('text-red-500');
setStreaming(false);
return;
}
Testing
// tests/Feature/ChatTest.php
namespace Tests\Feature;
use App\Services\AI\ChatProviderInterface;
use Tests\TestCase;
class ChatTest extends TestCase
{
public function test_chat_page_loads(): void
{
$response = $this->get('/chat');
$response->assertStatus(200);
$response->assertSee('AI Chat');
}
public function test_stream_requires_message(): void
{
$response = $this->postJson('/chat/stream', []);
$response->assertStatus(422);
}
public function test_stream_returns_event_stream(): void
{
// Mock the provider
$mock = $this->mock(ChatProviderInterface::class);
$mock->shouldReceive('stream')
->once()
->andReturnUsing(function () {
yield 'Hello';
yield ' world';
});
$response = $this->post('/chat/stream', [
'message' => 'Hi',
]);
$response->assertHeader('content-type', 'text/event-stream');
}
public function test_clear_removes_chat_history(): void
{
$this->session(['chat_history' => [
['role' => 'user', 'content' => 'Hello'],
]]);
$response = $this->postJson('/chat/clear');
$response->assertJson(['status' => 'cleared']);
}
public function test_rate_limiting(): void
{
for ($i = 0; $i < 21; $i++) {
$response = $this->post('/chat/stream', [
'message' => "Message {$i}",
]);
}
$response->assertStatus(429);
}
}
Production Checklist
| Item | Status |
|---|---|
API keys in .env, not in code |
Required |
| Rate limiting per session | Required |
| Input validation and max length | Required |
| Nginx buffering disabled | Required |
| Error handling with user-friendly messages | Required |
| Token counting / conversation trimming | Recommended |
| CSRF protection on POST endpoints | Required |
| Conversation persistence (database) | Optional |
| Response content filtering | Recommended |
| Monitoring API costs | Recommended |
Summary
The streaming chatbot pattern:
- Provider interface → swap AI providers without code changes
- StreamedResponse → SSE delivers tokens in real-time
- Session memory → conversations persist across messages
- Vanilla JS + fetch → read the stream, render incrementally
- Nginx config → disable buffering for SSE
- Rate limiting → protect against abuse and cost overrun
The biggest gotcha is Nginx buffering. If streaming "works locally but not in production," check proxy_buffering and X-Accel-Buffering first.