Building RAG with Laravel: Chat with Your Data

Generative AI models like GPT-4 are incredibly powerful, but they have a fatal flaw: they don't know your private data. They can't answer questions about your company's internal PDFs, your user's latest transaction history, or your specific documentation.

Retrieval-Augmented Generation (RAG) solves this by injecting relevant data into the prompt before the AI generates an answer.

In this deep dive, we'll build a complete "Chat with your Documentation" system using Laravel, OpenAI, and PostgreSQL with pgvector.

The Architecture of RAG

RAG isn't a single technology; it's a pipeline.

Ingestion: Reading files (PDF, Markdown, HTML).
Chunking: Splitting text into manageable pieces.
Embedding: Converting text chunks into vector arrays (e.g., [0.12, -0.98, ...] ).
Storage: Saving vectors in a database.
Retrieval: Finding the most similar vectors to a user's question.
Synthesis: Generating an answer.

Step 1: Database Setup with pgvector

We'll use PostgreSQL's pgvector extension. It allows us to store vectors directly in our relational database, making joins and management trivial.

First, enable the extension and create a migration.

// database/migrations/2026_02_27_000000_create_document_chunks_table.php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up(): void
    {
        // Enable pgvector extension
        DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

        Schema::create('documents', function (Blueprint $table) {
            $table->id();
            $table->string('title');
            $table->string('path');
            $table->timestamps();
        });

        Schema::create('document_chunks', function (Blueprint $table) {
            $table->id();
            $table->foreignId('document_id')->constrained()->cascadeOnDelete();
            $table->text('content'); // The actual text
            $table->vector('embedding', 1536); // OpenAI text-embedding-3-small uses 1536 dimensions
            $table->timestamps();
        });
    }
};

Step 2: Intelligent Chunking Strategy

This is where most RAG implementations fail. If you chunk too small, you lose context. If you chunk too large, you confuse the retrieval system.

Naive Approach: Split every 1000 characters. Better Approach: Split by paragraphs or semantic sections.

Let's implement a robust Chunking Service.

// app/Services/ChunkingService.php

namespace App\Services;

class ChunkingService
{
    public function chunk(string $text, int $maxTokens = 500): array
    {
        // Simple implementation: split by double newlines (paragraphs)
        $paragraphs = explode("\n\n", $text);
        $chunks = [];
        $currentChunk = "";

        foreach ($paragraphs as $paragraph) {
            // Rough estimation: 1 token ~= 4 chars
            if (strlen($currentChunk . $paragraph) / 4 > $maxTokens) {
                if (!empty($currentChunk)) {
                    $chunks[] = trim($currentChunk);
                }
                $currentChunk = $paragraph;
            } else {
                $currentChunk .= "\n\n" . $paragraph;
            }
        }

        if (!empty($currentChunk)) {
            $chunks[] = trim($currentChunk);
        }

        return $chunks;
    }
}

Step 3: Embedding & Storage

We need a service to talk to OpenAI's Embedding API.

// app/Services/EmbeddingService.php

namespace App\Services;

use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    public function generate(string $text): array
    {
        $response = Http::withToken(config('services.openai.key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'input' => str_replace("\n", " ", $text), // Remove newlines for better embeddings
                'model' => 'text-embedding-3-small',
            ]);

        return $response->json('data.0.embedding');
    }
}

Now, the Ingestion Job:

// app/Jobs/ProcessDocument.php

public function handle(ChunkingService $chunker, EmbeddingService $embedder)
{
    $text = file_get_contents($this->document->path);
    $chunks = $chunker->chunk($text);

    foreach ($chunks as $chunk) {
        $vector = $embedder->generate($chunk);

        $this->document->chunks()->create([
            'content' => $chunk,
            'embedding' => $vector, // Laravel pgvector handles the array conversion
        ]);
    }
}

Step 4: Semantic Search (The "Retrieval" in RAG)

When a user asks a question, we don't search for keywords. We search for meaning.

Embed the user's question.
Calculate Cosine Distance (<=>) between the question vector and all database vectors.
Take the top K results.

// app/Services/SearchService.php

use App\Models\DocumentChunk;

class SearchService
{
    public function search(string $query, int $limit = 5): \Illuminate\Database\Eloquent\Collection
    {
        $embeddingService = new EmbeddingService();
        $queryVector = $embeddingService->generate($query);

        // Convert array to string format for raw SQL query if needed
        // But popular libraries handle this. Here is the raw SQL logic:
        $vectorString = '[' . implode(',', $queryVector) . ']';

        return DocumentChunk::query()
            ->select('*')
            ->selectRaw('embedding <=> ? as distance', [$vectorString])
            ->orderByRaw('distance ASC') // Smaller distance = closer meaning
            ->limit($limit)
            ->get();
    }
}

Step 5: The "Generation" Phase

Finally, we construct the prompt. This is often called "Grounding".

// app/Controllers/ChatController.php

public function ask(Request $request, SearchService $searcher)
{
    $question = $request->input('question');
    
    // 1. Retrieve Context
    $relevantChunks = $searcher->search($question);
    $context = $relevantChunks->pluck('content')->implode("\n---\n");

    // 2. Construct System Prompt
    $systemPrompt = <<<EOT
You are a helpful assistant for our documentation. 
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
$context
EOT;

    // 3. Call LLM
    $response = Http::withToken(config('services.openai.key'))
        ->post('https://api.openai.com/v1/chat/completions', [
            'model' => 'gpt-4o',
            'messages' => [
                ['role' => 'system', 'content' => $systemPrompt],
                ['role' => 'user', 'content' => $question],
            ],
        ]);

    return response()->json([
        'answer' => $response->json('choices.0.message.content'),
        'sources' => $relevantChunks->pluck('document.title')->unique()
    ]);
}

Advanced Tips

Hybrid Search: Vectors are bad at exact matches (like SKU numbers). Combine vector search with standard full-text search (e.g., WHERE content LIKE '%...%') for best results.
Reranking: Retrieve 20 chunks via vector search, then use a "Reranker Model" (like Cohere Rerank) to strictly sort the top 5 most relevant ones before sending to GPT.
Metadata Filtering: If a user asks about "Finance", filter your DB query by category_id = 'finance' before doing the vector search to speed it up.

Conclusion

Building RAG in Laravel is powerful because you leverage your existing data models and logic. You don't need a separate Python microservice. With pgvector, your AI memory lives right alongside your transactional data.