Fine-tuning AI Models for PHP/Laravel: Building Your Own Coding Assistant

· 15 min read

Introduction

LLMs like GPT, Claude, or open-source models are trained on diverse programming languages. But what if you want a model specialized in PHP/Laravel with your team's coding style? That's where fine-tuning comes in.

Why Fine-tune?

Factor Base Model Fine-tuned Model
Laravel best practices Basic knowledge Deep expertise
Team conventions Unknown Follows them
Domain knowledge Generic Specific
Response format Varied Consistent
Inference cost High (large models) Lower (smaller)

When to Fine-tune?

Should:

  • Need model to follow specific coding standards
  • Have special domain knowledge
  • Want to reduce latency/cost with smaller models
  • Need consistent output format

Shouldn't:

  • Just need knowledge update (use RAG)
  • Dataset is too small (<1000 examples)
  • Requirements change frequently

Preparing the Dataset

Data Sources

// app/Services/DatasetPreparation/DatasetCollector.php
namespace App\Services\DatasetPreparation;

class DatasetCollector
{
    public function collectFromCodebase(string $path): array
    {
        $examples = [];
        
        // 1. Collect from code comments/docblocks
        $examples = array_merge($examples, $this->extractDocblocks($path));
        
        // 2. Collect from tests (input/output pairs)
        $examples = array_merge($examples, $this->extractFromTests($path));
        
        // 3. Collect from git commits
        $examples = array_merge($examples, $this->extractFromCommits($path));
        
        return $examples;
    }
    
    protected function extractDocblocks(string $path): array
    {
        $finder = new \Symfony\Component\Finder\Finder();
        $finder->files()->in($path)->name('*.php');
        
        $examples = [];
        
        foreach ($finder as $file) {
            $content = $file->getContents();
            
            // Parse docblocks with PHP-Parser
            preg_match_all('/\/\*\*(.*?)\*\/\s*(public|protected|private)?\s*function\s+(\w+)/s', 
                $content, $matches, PREG_SET_ORDER);
            
            foreach ($matches as $match) {
                $docblock = $this->parseDocblock($match[1]);
                $functionName = $match[3];
                
                if ($docblock['description']) {
                    $examples[] = [
                        'instruction' => "Write a PHP function named {$functionName}: {$docblock['description']}",
                        'output' => $this->extractFunction($content, $functionName)
                    ];
                }
            }
        }
        
        return $examples;
    }
}

Data Format for Fine-tuning

OpenAI format (JSONL):

{"messages": [{"role": "system", "content": "You are a Laravel expert..."}, {"role": "user", "content": "Create a migration for users table"}, {"role": "assistant", "content": "<?php\n\nuse Illuminate\\Database\\Migrations..."}]}
{"messages": [{"role": "system", "content": "You are a Laravel expert..."}, {"role": "user", "content": "Write a controller for CRUD posts"}, {"role": "assistant", "content": "<?php\n\nnamespace App\\Http\\Controllers..."}]}

Alpaca format (for open-source models):

{
    "instruction": "Create a Laravel migration for a posts table with title, content, and user_id fields",
    "input": "",
    "output": "<?php\n\nuse Illuminate\\Database\\Migrations\\Migration;\nuse Illuminate\\Database\\Schema\\Blueprint;\nuse Illuminate\\Support\\Facades\\Schema;\n\nreturn new class extends Migration\n{\n    public function up(): void\n    {\n        Schema::create('posts', function (Blueprint $table) {\n            $table->id();\n            $table->foreignId('user_id')->constrained()->cascadeOnDelete();\n            $table->string('title');\n            $table->text('content');\n            $table->timestamps();\n        });\n    }\n\n    public function down(): void\n    {\n        Schema::dropIfExists('posts');\n    }\n};"
}

Dataset Generation Tool

// app/Console/Commands/GenerateTrainingDataset.php
namespace App\Console\Commands;

use Illuminate\Console\Command;
use Illuminate\Support\Facades\File;

class GenerateTrainingDataset extends Command
{
    protected $signature = 'ai:generate-dataset {--source=} {--output=training_data.jsonl}';
    protected $description = 'Generate training dataset from Laravel codebase';

    public function handle(): int
    {
        $source = $this->option('source') ?? base_path();
        $output = $this->option('output');
        
        $this->info('Collecting examples from codebase...');
        
        $examples = [];
        
        // 1. Controllers → CRUD operations
        $examples = array_merge($examples, $this->collectControllerExamples($source));
        
        // 2. Models → Eloquent patterns
        $examples = array_merge($examples, $this->collectModelExamples($source));
        
        // 3. Tests → Expected behavior
        $examples = array_merge($examples, $this->collectTestExamples($source));
        
        // 4. Blade templates → View patterns
        $examples = array_merge($examples, $this->collectBladeExamples($source));
        
        // Convert to JSONL
        $this->info('Converting to JSONL format...');
        
        $jsonl = '';
        foreach ($examples as $example) {
            $jsonl .= json_encode($this->formatForOpenAI($example)) . "\n";
        }
        
        File::put(storage_path($output), $jsonl);
        
        $this->info("Generated " . count($examples) . " examples to {$output}");
        
        return self::SUCCESS;
    }
    
    protected function formatForOpenAI(array $example): array
    {
        return [
            'messages' => [
                [
                    'role' => 'system',
                    'content' => $this->getSystemPrompt()
                ],
                [
                    'role' => 'user', 
                    'content' => $example['instruction']
                ],
                [
                    'role' => 'assistant',
                    'content' => $example['output']
                ]
            ]
        ];
    }
    
    protected function getSystemPrompt(): string
    {
        return <<<PROMPT
You are an expert Laravel developer. Follow these conventions:
- Use PHP 8.3+ features (readonly, typed properties, enums)
- Follow PSR-12 coding standard
- Use strict typing (declare(strict_types=1))
- Prefer dependency injection over facades
- Use descriptive variable and method names
- Write comprehensive PHPDoc when needed
- Follow Laravel best practices and conventions
PROMPT;
    }
    
    protected function collectControllerExamples(string $path): array
    {
        $examples = [];
        $controllerPath = $path . '/app/Http/Controllers';
        
        if (!is_dir($controllerPath)) {
            return $examples;
        }
        
        foreach (File::allFiles($controllerPath) as $file) {
            $content = $file->getContents();
            $className = pathinfo($file->getFilename(), PATHINFO_FILENAME);
            
            // Extract methods with their docblocks
            preg_match_all(
                '/\/\*\*(.*?)\*\/\s*(public\s+function\s+\w+\([^)]*\)[^{]*\{[^}]+\})/s',
                $content,
                $matches,
                PREG_SET_ORDER
            );
            
            foreach ($matches as $match) {
                $docblock = trim($match[1]);
                $method = $match[2];
                
                // Parse purpose from docblock
                if (preg_match('/@purpose\s+(.+)/', $docblock, $purposeMatch)) {
                    $examples[] = [
                        'instruction' => "In Laravel, " . trim($purposeMatch[1]),
                        'output' => $method
                    ];
                }
            }
        }
        
        return $examples;
    }
}

Data Augmentation

Enhance dataset by creating variations:

// app/Services/DatasetPreparation/DataAugmenter.php
namespace App\Services\DatasetPreparation;

class DataAugmenter
{
    public function augment(array $example): array
    {
        $augmented = [$example];
        
        // 1. Rephrase instruction
        $augmented[] = [
            'instruction' => $this->rephrase($example['instruction']),
            'output' => $example['output']
        ];
        
        // 2. Add context variations
        $augmented[] = [
            'instruction' => "As a Laravel developer, " . lcfirst($example['instruction']),
            'output' => $example['output']
        ];
        
        // 3. Add error scenario
        if (str_contains($example['output'], 'function')) {
            $augmented[] = [
                'instruction' => "Fix this Laravel code: " . $this->introduceError($example['output']),
                'output' => $example['output']
            ];
        }
        
        return $augmented;
    }
    
    protected function rephrase(string $instruction): string
    {
        $replacements = [
            'Create' => ['Write', 'Generate', 'Implement', 'Build'],
            'function' => ['method', 'function'],
            'controller' => ['controller class', 'HTTP controller'],
        ];
        
        foreach ($replacements as $original => $alternatives) {
            if (str_contains($instruction, $original)) {
                $instruction = str_replace(
                    $original, 
                    $alternatives[array_rand($alternatives)], 
                    $instruction
                );
                break;
            }
        }
        
        return $instruction;
    }
}

Fine-tuning with OpenAI

Upload Dataset

// app/Services/FineTuning/OpenAIFineTuner.php
namespace App\Services\FineTuning;

use OpenAI\Laravel\Facades\OpenAI;
use Illuminate\Support\Facades\Storage;

class OpenAIFineTuner
{
    public function uploadDataset(string $filePath): string
    {
        $response = OpenAI::files()->upload([
            'purpose' => 'fine-tune',
            'file' => fopen($filePath, 'r'),
        ]);
        
        return $response->id;
    }
    
    public function createFineTuneJob(string $fileId, array $options = []): string
    {
        $response = OpenAI::fineTuning()->createJob([
            'training_file' => $fileId,
            'model' => $options['base_model'] ?? 'gpt-4o-mini-2024-07-18',
            'hyperparameters' => [
                'n_epochs' => $options['epochs'] ?? 3,
                'batch_size' => $options['batch_size'] ?? 'auto',
                'learning_rate_multiplier' => $options['learning_rate'] ?? 'auto',
            ],
            'suffix' => $options['suffix'] ?? 'laravel-assistant',
        ]);
        
        return $response->id;
    }
    
    public function checkJobStatus(string $jobId): array
    {
        $job = OpenAI::fineTuning()->retrieveJob($jobId);
        
        return [
            'status' => $job->status,
            'model' => $job->fineTunedModel,
            'trained_tokens' => $job->trainedTokens,
            'error' => $job->error?->message,
        ];
    }
    
    public function listJobs(): array
    {
        $response = OpenAI::fineTuning()->listJobs(['limit' => 10]);
        
        return array_map(fn($job) => [
            'id' => $job->id,
            'status' => $job->status,
            'model' => $job->fineTunedModel,
            'created_at' => $job->createdAt,
        ], $response->data);
    }
}

Artisan Commands

// app/Console/Commands/FineTuneModel.php
namespace App\Console\Commands;

use App\Services\FineTuning\OpenAIFineTuner;
use Illuminate\Console\Command;

class FineTuneModel extends Command
{
    protected $signature = 'ai:fine-tune 
                            {--dataset= : Path to training dataset}
                            {--model=gpt-4o-mini-2024-07-18 : Base model}
                            {--epochs=3 : Number of training epochs}
                            {--suffix=laravel : Model suffix}';
    
    protected $description = 'Start a fine-tuning job for Laravel code assistant';

    public function handle(OpenAIFineTuner $fineTuner): int
    {
        $datasetPath = $this->option('dataset') ?? storage_path('training_data.jsonl');
        
        if (!file_exists($datasetPath)) {
            $this->error("Dataset not found: {$datasetPath}");
            return self::FAILURE;
        }
        
        $this->info('Uploading dataset...');
        $fileId = $fineTuner->uploadDataset($datasetPath);
        $this->info("File uploaded: {$fileId}");
        
        $this->info('Creating fine-tune job...');
        $jobId = $fineTuner->createFineTuneJob($fileId, [
            'base_model' => $this->option('model'),
            'epochs' => (int) $this->option('epochs'),
            'suffix' => $this->option('suffix'),
        ]);
        
        $this->info("Fine-tune job created: {$jobId}");
        $this->info("Monitor with: php artisan ai:fine-tune-status {$jobId}");
        
        return self::SUCCESS;
    }
}
// app/Console/Commands/FineTuneStatus.php
namespace App\Console\Commands;

use App\Services\FineTuning\OpenAIFineTuner;
use Illuminate\Console\Command;

class FineTuneStatus extends Command
{
    protected $signature = 'ai:fine-tune-status {job?}';
    protected $description = 'Check fine-tuning job status';

    public function handle(OpenAIFineTuner $fineTuner): int
    {
        $jobId = $this->argument('job');
        
        if ($jobId) {
            $status = $fineTuner->checkJobStatus($jobId);
            
            $this->table(
                ['Property', 'Value'],
                collect($status)->map(fn($v, $k) => [$k, $v ?? 'N/A'])->toArray()
            );
            
            if ($status['status'] === 'succeeded') {
                $this->info("Model ready: {$status['model']}");
                $this->info("Add to .env: OPENAI_FINE_TUNED_MODEL={$status['model']}");
            }
        } else {
            $jobs = $fineTuner->listJobs();
            
            $this->table(
                ['ID', 'Status', 'Model', 'Created'],
                $jobs
            );
        }
        
        return self::SUCCESS;
    }
}

Fine-tuning Open-Source Models

Using Unsloth (Llama, Mistral)

# scripts/finetune_llama.py
from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Setup LoRA
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

# Load dataset
dataset = load_dataset("json", data_files="training_data.json", split="train")

# Format prompt
def format_prompt(example):
    return f"""### Instruction:
{example['instruction']}

### Response:
{example['output']}"""

# Training
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    formatting_func=format_prompt,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="adamw_8bit",
    ),
)

trainer.train()

# Save model
model.save_pretrained_merged("laravel-llama-3-8b", tokenizer, save_method="merged_16bit")

Laravel Integration with Local Model

// app/Services/AI/LocalModelService.php
namespace App\Services\AI;

use Illuminate\Support\Facades\Http;

class LocalModelService
{
    public function __construct(
        private string $endpoint = 'http://localhost:11434/api/generate'
    ) {}
    
    public function generate(string $prompt, array $options = []): string
    {
        $response = Http::timeout(120)->post($this->endpoint, [
            'model' => config('ai.local_model', 'laravel-llama'),
            'prompt' => $this->formatPrompt($prompt),
            'stream' => false,
            'options' => [
                'temperature' => $options['temperature'] ?? 0.7,
                'top_p' => $options['top_p'] ?? 0.9,
                'num_predict' => $options['max_tokens'] ?? 2048,
            ],
        ]);
        
        if ($response->failed()) {
            throw new \RuntimeException('Local model request failed');
        }
        
        return $response->json('response');
    }
    
    public function stream(string $prompt, callable $callback): void
    {
        $response = Http::withOptions(['stream' => true])
            ->post($this->endpoint, [
                'model' => config('ai.local_model'),
                'prompt' => $this->formatPrompt($prompt),
                'stream' => true,
            ]);
        
        foreach (explode("\n", $response->body()) as $line) {
            if (empty($line)) continue;
            
            $data = json_decode($line, true);
            if (isset($data['response'])) {
                $callback($data['response'], $data['done'] ?? false);
            }
        }
    }
    
    protected function formatPrompt(string $prompt): string
    {
        return <<<PROMPT
### System:
You are an expert Laravel developer following PSR-12 and Laravel best practices.

### Instruction:
{$prompt}

### Response:
PROMPT;
    }
}

Deploying Fine-tuned Model

API Service

// app/Services/AI/CodeAssistantService.php
namespace App\Services\AI;

use OpenAI\Laravel\Facades\OpenAI;

class CodeAssistantService
{
    private string $model;
    
    public function __construct()
    {
        $this->model = config('ai.fine_tuned_model', 'gpt-4o-mini');
    }
    
    public function generateCode(string $instruction): string
    {
        $response = OpenAI::chat()->create([
            'model' => $this->model,
            'messages' => [
                [
                    'role' => 'system',
                    'content' => $this->getSystemPrompt()
                ],
                [
                    'role' => 'user',
                    'content' => $instruction
                ]
            ],
            'temperature' => 0.3,
            'max_tokens' => 2000,
        ]);
        
        return $response->choices[0]->message->content;
    }
    
    public function reviewCode(string $code): array
    {
        $response = OpenAI::chat()->create([
            'model' => $this->model,
            'messages' => [
                [
                    'role' => 'system',
                    'content' => 'Review PHP/Laravel code and suggest improvements. Return JSON with: issues, suggestions, improved_code.'
                ],
                [
                    'role' => 'user',
                    'content' => "Review this code:\n\n```php\n{$code}\n```"
                ]
            ],
            'response_format' => ['type' => 'json_object'],
        ]);
        
        return json_decode($response->choices[0]->message->content, true);
    }
    
    public function explainCode(string $code): string
    {
        return $this->generateCode("Explain this Laravel code:\n\n```php\n{$code}\n```");
    }
    
    protected function getSystemPrompt(): string
    {
        return <<<PROMPT
You are a Laravel code assistant fine-tuned on high-quality Laravel codebases.

Guidelines:
- Generate clean, readable PHP 8.3+ code
- Follow PSR-12 coding standard
- Use Laravel conventions and best practices
- Include proper type hints and return types
- Add PHPDoc for complex methods
- Handle errors appropriately
- Consider security implications
PROMPT;
    }
}

Artisan Command for Code Generation

// app/Console/Commands/AIGenerate.php
namespace App\Console\Commands;

use App\Services\AI\CodeAssistantService;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\File;

class AIGenerate extends Command
{
    protected $signature = 'ai:generate 
                            {type : Type of code (controller, model, migration, etc)}
                            {name : Name of the class/file}
                            {--description= : Additional description}';
    
    protected $description = 'Generate Laravel code using AI';

    public function handle(CodeAssistantService $assistant): int
    {
        $type = $this->argument('type');
        $name = $this->argument('name');
        $description = $this->option('description');
        
        $prompt = $this->buildPrompt($type, $name, $description);
        
        $this->info("Generating {$type}...");
        
        $code = $assistant->generateCode($prompt);
        
        // Extract code from markdown if present
        if (preg_match('/```php\n(.*?)\n```/s', $code, $matches)) {
            $code = $matches[1];
        }
        
        $path = $this->getPath($type, $name);
        
        if (File::exists($path)) {
            if (!$this->confirm("File exists. Overwrite?")) {
                return self::FAILURE;
            }
        }
        
        File::ensureDirectoryExists(dirname($path));
        File::put($path, "<?php\n\n" . ltrim($code, "<?php\n"));
        
        $this->info("Generated: {$path}");
        
        return self::SUCCESS;
    }
    
    protected function buildPrompt(string $type, string $name, ?string $description): string
    {
        $prompts = [
            'controller' => "Create a Laravel resource controller named {$name}Controller",
            'model' => "Create a Laravel Eloquent model named {$name}",
            'migration' => "Create a Laravel migration for {$name} table",
            'request' => "Create a Laravel Form Request named {$name}Request",
            'service' => "Create a Laravel service class named {$name}Service",
            'action' => "Create a Laravel Action class named {$name}",
            'job' => "Create a Laravel Job named {$name}",
            'event' => "Create a Laravel Event named {$name}",
            'listener' => "Create a Laravel Listener named {$name}",
        ];
        
        $prompt = $prompts[$type] ?? "Create a Laravel {$type} named {$name}";
        
        if ($description) {
            $prompt .= ". Description: {$description}";
        }
        
        return $prompt;
    }
    
    protected function getPath(string $type, string $name): string
    {
        $paths = [
            'controller' => app_path("Http/Controllers/{$name}Controller.php"),
            'model' => app_path("Models/{$name}.php"),
            'request' => app_path("Http/Requests/{$name}Request.php"),
            'service' => app_path("Services/{$name}Service.php"),
            'action' => app_path("Actions/{$name}.php"),
            'job' => app_path("Jobs/{$name}.php"),
            'event' => app_path("Events/{$name}.php"),
            'listener' => app_path("Listeners/{$name}.php"),
        ];
        
        return $paths[$type] ?? app_path("{$name}.php");
    }
}

Evaluation and Monitoring

Quality Metrics

// app/Services/AI/ModelEvaluator.php
namespace App\Services\AI;

use Illuminate\Support\Facades\Process;

class ModelEvaluator
{
    public function evaluateGeneration(string $generated, string $expected): array
    {
        return [
            'syntax_valid' => $this->checkSyntax($generated),
            'similarity' => $this->calculateSimilarity($generated, $expected),
            'follows_conventions' => $this->checkConventions($generated),
            'passes_phpstan' => $this->runPhpStan($generated),
        ];
    }
    
    protected function checkSyntax(string $code): bool
    {
        $tempFile = tempnam(sys_get_temp_dir(), 'php_');
        file_put_contents($tempFile, "<?php\n" . $code);
        
        $result = Process::run("php -l {$tempFile}");
        
        unlink($tempFile);
        
        return $result->successful();
    }
    
    protected function checkConventions(string $code): array
    {
        $checks = [
            'has_strict_types' => str_contains($code, 'declare(strict_types=1)'),
            'has_type_hints' => preg_match('/function \w+\([^)]*\w+ \$/', $code),
            'has_return_type' => preg_match('/\): \w+/', $code),
            'follows_psr12' => $this->runPint($code),
        ];
        
        return $checks;
    }
    
    protected function calculateSimilarity(string $a, string $b): float
    {
        // Normalize code
        $a = preg_replace('/\s+/', ' ', $a);
        $b = preg_replace('/\s+/', ' ', $b);
        
        similar_text($a, $b, $percent);
        
        return $percent / 100;
    }
}

Logging and Analytics

// app/Services/AI/UsageLogger.php
namespace App\Services\AI;

use Illuminate\Support\Facades\DB;

class UsageLogger
{
    public function log(string $model, string $prompt, string $response, array $metadata = []): void
    {
        DB::table('ai_usage_logs')->insert([
            'model' => $model,
            'prompt_tokens' => $metadata['prompt_tokens'] ?? 0,
            'completion_tokens' => $metadata['completion_tokens'] ?? 0,
            'total_tokens' => $metadata['total_tokens'] ?? 0,
            'latency_ms' => $metadata['latency_ms'] ?? 0,
            'prompt_hash' => md5($prompt),
            'response_quality' => $metadata['quality_score'] ?? null,
            'created_at' => now(),
        ]);
    }
    
    public function getStats(string $model, string $period = 'day'): array
    {
        return DB::table('ai_usage_logs')
            ->where('model', $model)
            ->where('created_at', '>=', now()->sub($period, 1))
            ->selectRaw('
                COUNT(*) as requests,
                SUM(total_tokens) as total_tokens,
                AVG(latency_ms) as avg_latency,
                AVG(response_quality) as avg_quality
            ')
            ->first();
    }
}

Conclusion

Fine-tuning LLMs for PHP/Laravel brings many benefits:

  • Higher code quality: Model learns from team's best practices
  • Consistency: Output consistent with coding standards
  • Domain knowledge: Deep understanding of business logic
  • Cost efficiency: Smaller models after fine-tuning can be competitive with larger ones
  1. Collect - Gather high-quality code from codebase
  2. Curate - Select and format data
  3. Augment - Enhance dataset
  4. Train - Fine-tune with appropriate hyperparameters
  5. Evaluate - Assess on test set
  6. Deploy - Integrate into workflow
  7. Monitor - Track quality and iterate

References

Comments