Fine-tuning AI Models for PHP/Laravel: Building Your Own Coding Assistant
·
15 min read
Introduction
LLMs like GPT, Claude, or open-source models are trained on diverse programming languages. But what if you want a model specialized in PHP/Laravel with your team's coding style? That's where fine-tuning comes in.
Why Fine-tune?
| Factor | Base Model | Fine-tuned Model |
|---|---|---|
| Laravel best practices | Basic knowledge | Deep expertise |
| Team conventions | Unknown | Follows them |
| Domain knowledge | Generic | Specific |
| Response format | Varied | Consistent |
| Inference cost | High (large models) | Lower (smaller) |
When to Fine-tune?
Should:
- Need model to follow specific coding standards
- Have special domain knowledge
- Want to reduce latency/cost with smaller models
- Need consistent output format
Shouldn't:
- Just need knowledge update (use RAG)
- Dataset is too small (<1000 examples)
- Requirements change frequently
Preparing the Dataset
Data Sources
// app/Services/DatasetPreparation/DatasetCollector.php
namespace App\Services\DatasetPreparation;
class DatasetCollector
{
public function collectFromCodebase(string $path): array
{
$examples = [];
// 1. Collect from code comments/docblocks
$examples = array_merge($examples, $this->extractDocblocks($path));
// 2. Collect from tests (input/output pairs)
$examples = array_merge($examples, $this->extractFromTests($path));
// 3. Collect from git commits
$examples = array_merge($examples, $this->extractFromCommits($path));
return $examples;
}
protected function extractDocblocks(string $path): array
{
$finder = new \Symfony\Component\Finder\Finder();
$finder->files()->in($path)->name('*.php');
$examples = [];
foreach ($finder as $file) {
$content = $file->getContents();
// Parse docblocks with PHP-Parser
preg_match_all('/\/\*\*(.*?)\*\/\s*(public|protected|private)?\s*function\s+(\w+)/s',
$content, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$docblock = $this->parseDocblock($match[1]);
$functionName = $match[3];
if ($docblock['description']) {
$examples[] = [
'instruction' => "Write a PHP function named {$functionName}: {$docblock['description']}",
'output' => $this->extractFunction($content, $functionName)
];
}
}
}
return $examples;
}
}
Data Format for Fine-tuning
OpenAI format (JSONL):
{"messages": [{"role": "system", "content": "You are a Laravel expert..."}, {"role": "user", "content": "Create a migration for users table"}, {"role": "assistant", "content": "<?php\n\nuse Illuminate\\Database\\Migrations..."}]}
{"messages": [{"role": "system", "content": "You are a Laravel expert..."}, {"role": "user", "content": "Write a controller for CRUD posts"}, {"role": "assistant", "content": "<?php\n\nnamespace App\\Http\\Controllers..."}]}
Alpaca format (for open-source models):
{
"instruction": "Create a Laravel migration for a posts table with title, content, and user_id fields",
"input": "",
"output": "<?php\n\nuse Illuminate\\Database\\Migrations\\Migration;\nuse Illuminate\\Database\\Schema\\Blueprint;\nuse Illuminate\\Support\\Facades\\Schema;\n\nreturn new class extends Migration\n{\n public function up(): void\n {\n Schema::create('posts', function (Blueprint $table) {\n $table->id();\n $table->foreignId('user_id')->constrained()->cascadeOnDelete();\n $table->string('title');\n $table->text('content');\n $table->timestamps();\n });\n }\n\n public function down(): void\n {\n Schema::dropIfExists('posts');\n }\n};"
}
Dataset Generation Tool
// app/Console/Commands/GenerateTrainingDataset.php
namespace App\Console\Commands;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\File;
class GenerateTrainingDataset extends Command
{
protected $signature = 'ai:generate-dataset {--source=} {--output=training_data.jsonl}';
protected $description = 'Generate training dataset from Laravel codebase';
public function handle(): int
{
$source = $this->option('source') ?? base_path();
$output = $this->option('output');
$this->info('Collecting examples from codebase...');
$examples = [];
// 1. Controllers → CRUD operations
$examples = array_merge($examples, $this->collectControllerExamples($source));
// 2. Models → Eloquent patterns
$examples = array_merge($examples, $this->collectModelExamples($source));
// 3. Tests → Expected behavior
$examples = array_merge($examples, $this->collectTestExamples($source));
// 4. Blade templates → View patterns
$examples = array_merge($examples, $this->collectBladeExamples($source));
// Convert to JSONL
$this->info('Converting to JSONL format...');
$jsonl = '';
foreach ($examples as $example) {
$jsonl .= json_encode($this->formatForOpenAI($example)) . "\n";
}
File::put(storage_path($output), $jsonl);
$this->info("Generated " . count($examples) . " examples to {$output}");
return self::SUCCESS;
}
protected function formatForOpenAI(array $example): array
{
return [
'messages' => [
[
'role' => 'system',
'content' => $this->getSystemPrompt()
],
[
'role' => 'user',
'content' => $example['instruction']
],
[
'role' => 'assistant',
'content' => $example['output']
]
]
];
}
protected function getSystemPrompt(): string
{
return <<<PROMPT
You are an expert Laravel developer. Follow these conventions:
- Use PHP 8.3+ features (readonly, typed properties, enums)
- Follow PSR-12 coding standard
- Use strict typing (declare(strict_types=1))
- Prefer dependency injection over facades
- Use descriptive variable and method names
- Write comprehensive PHPDoc when needed
- Follow Laravel best practices and conventions
PROMPT;
}
protected function collectControllerExamples(string $path): array
{
$examples = [];
$controllerPath = $path . '/app/Http/Controllers';
if (!is_dir($controllerPath)) {
return $examples;
}
foreach (File::allFiles($controllerPath) as $file) {
$content = $file->getContents();
$className = pathinfo($file->getFilename(), PATHINFO_FILENAME);
// Extract methods with their docblocks
preg_match_all(
'/\/\*\*(.*?)\*\/\s*(public\s+function\s+\w+\([^)]*\)[^{]*\{[^}]+\})/s',
$content,
$matches,
PREG_SET_ORDER
);
foreach ($matches as $match) {
$docblock = trim($match[1]);
$method = $match[2];
// Parse purpose from docblock
if (preg_match('/@purpose\s+(.+)/', $docblock, $purposeMatch)) {
$examples[] = [
'instruction' => "In Laravel, " . trim($purposeMatch[1]),
'output' => $method
];
}
}
}
return $examples;
}
}
Data Augmentation
Enhance dataset by creating variations:
// app/Services/DatasetPreparation/DataAugmenter.php
namespace App\Services\DatasetPreparation;
class DataAugmenter
{
public function augment(array $example): array
{
$augmented = [$example];
// 1. Rephrase instruction
$augmented[] = [
'instruction' => $this->rephrase($example['instruction']),
'output' => $example['output']
];
// 2. Add context variations
$augmented[] = [
'instruction' => "As a Laravel developer, " . lcfirst($example['instruction']),
'output' => $example['output']
];
// 3. Add error scenario
if (str_contains($example['output'], 'function')) {
$augmented[] = [
'instruction' => "Fix this Laravel code: " . $this->introduceError($example['output']),
'output' => $example['output']
];
}
return $augmented;
}
protected function rephrase(string $instruction): string
{
$replacements = [
'Create' => ['Write', 'Generate', 'Implement', 'Build'],
'function' => ['method', 'function'],
'controller' => ['controller class', 'HTTP controller'],
];
foreach ($replacements as $original => $alternatives) {
if (str_contains($instruction, $original)) {
$instruction = str_replace(
$original,
$alternatives[array_rand($alternatives)],
$instruction
);
break;
}
}
return $instruction;
}
}
Fine-tuning with OpenAI
Upload Dataset
// app/Services/FineTuning/OpenAIFineTuner.php
namespace App\Services\FineTuning;
use OpenAI\Laravel\Facades\OpenAI;
use Illuminate\Support\Facades\Storage;
class OpenAIFineTuner
{
public function uploadDataset(string $filePath): string
{
$response = OpenAI::files()->upload([
'purpose' => 'fine-tune',
'file' => fopen($filePath, 'r'),
]);
return $response->id;
}
public function createFineTuneJob(string $fileId, array $options = []): string
{
$response = OpenAI::fineTuning()->createJob([
'training_file' => $fileId,
'model' => $options['base_model'] ?? 'gpt-4o-mini-2024-07-18',
'hyperparameters' => [
'n_epochs' => $options['epochs'] ?? 3,
'batch_size' => $options['batch_size'] ?? 'auto',
'learning_rate_multiplier' => $options['learning_rate'] ?? 'auto',
],
'suffix' => $options['suffix'] ?? 'laravel-assistant',
]);
return $response->id;
}
public function checkJobStatus(string $jobId): array
{
$job = OpenAI::fineTuning()->retrieveJob($jobId);
return [
'status' => $job->status,
'model' => $job->fineTunedModel,
'trained_tokens' => $job->trainedTokens,
'error' => $job->error?->message,
];
}
public function listJobs(): array
{
$response = OpenAI::fineTuning()->listJobs(['limit' => 10]);
return array_map(fn($job) => [
'id' => $job->id,
'status' => $job->status,
'model' => $job->fineTunedModel,
'created_at' => $job->createdAt,
], $response->data);
}
}
Artisan Commands
// app/Console/Commands/FineTuneModel.php
namespace App\Console\Commands;
use App\Services\FineTuning\OpenAIFineTuner;
use Illuminate\Console\Command;
class FineTuneModel extends Command
{
protected $signature = 'ai:fine-tune
{--dataset= : Path to training dataset}
{--model=gpt-4o-mini-2024-07-18 : Base model}
{--epochs=3 : Number of training epochs}
{--suffix=laravel : Model suffix}';
protected $description = 'Start a fine-tuning job for Laravel code assistant';
public function handle(OpenAIFineTuner $fineTuner): int
{
$datasetPath = $this->option('dataset') ?? storage_path('training_data.jsonl');
if (!file_exists($datasetPath)) {
$this->error("Dataset not found: {$datasetPath}");
return self::FAILURE;
}
$this->info('Uploading dataset...');
$fileId = $fineTuner->uploadDataset($datasetPath);
$this->info("File uploaded: {$fileId}");
$this->info('Creating fine-tune job...');
$jobId = $fineTuner->createFineTuneJob($fileId, [
'base_model' => $this->option('model'),
'epochs' => (int) $this->option('epochs'),
'suffix' => $this->option('suffix'),
]);
$this->info("Fine-tune job created: {$jobId}");
$this->info("Monitor with: php artisan ai:fine-tune-status {$jobId}");
return self::SUCCESS;
}
}
// app/Console/Commands/FineTuneStatus.php
namespace App\Console\Commands;
use App\Services\FineTuning\OpenAIFineTuner;
use Illuminate\Console\Command;
class FineTuneStatus extends Command
{
protected $signature = 'ai:fine-tune-status {job?}';
protected $description = 'Check fine-tuning job status';
public function handle(OpenAIFineTuner $fineTuner): int
{
$jobId = $this->argument('job');
if ($jobId) {
$status = $fineTuner->checkJobStatus($jobId);
$this->table(
['Property', 'Value'],
collect($status)->map(fn($v, $k) => [$k, $v ?? 'N/A'])->toArray()
);
if ($status['status'] === 'succeeded') {
$this->info("Model ready: {$status['model']}");
$this->info("Add to .env: OPENAI_FINE_TUNED_MODEL={$status['model']}");
}
} else {
$jobs = $fineTuner->listJobs();
$this->table(
['ID', 'Status', 'Model', 'Created'],
$jobs
);
}
return self::SUCCESS;
}
}
Fine-tuning Open-Source Models
Using Unsloth (Llama, Mistral)
# scripts/finetune_llama.py
from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
# Setup LoRA
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
)
# Load dataset
dataset = load_dataset("json", data_files="training_data.json", split="train")
# Format prompt
def format_prompt(example):
return f"""### Instruction:
{example['instruction']}
### Response:
{example['output']}"""
# Training
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
formatting_func=format_prompt,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=100,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir="outputs",
optim="adamw_8bit",
),
)
trainer.train()
# Save model
model.save_pretrained_merged("laravel-llama-3-8b", tokenizer, save_method="merged_16bit")
Laravel Integration with Local Model
// app/Services/AI/LocalModelService.php
namespace App\Services\AI;
use Illuminate\Support\Facades\Http;
class LocalModelService
{
public function __construct(
private string $endpoint = 'http://localhost:11434/api/generate'
) {}
public function generate(string $prompt, array $options = []): string
{
$response = Http::timeout(120)->post($this->endpoint, [
'model' => config('ai.local_model', 'laravel-llama'),
'prompt' => $this->formatPrompt($prompt),
'stream' => false,
'options' => [
'temperature' => $options['temperature'] ?? 0.7,
'top_p' => $options['top_p'] ?? 0.9,
'num_predict' => $options['max_tokens'] ?? 2048,
],
]);
if ($response->failed()) {
throw new \RuntimeException('Local model request failed');
}
return $response->json('response');
}
public function stream(string $prompt, callable $callback): void
{
$response = Http::withOptions(['stream' => true])
->post($this->endpoint, [
'model' => config('ai.local_model'),
'prompt' => $this->formatPrompt($prompt),
'stream' => true,
]);
foreach (explode("\n", $response->body()) as $line) {
if (empty($line)) continue;
$data = json_decode($line, true);
if (isset($data['response'])) {
$callback($data['response'], $data['done'] ?? false);
}
}
}
protected function formatPrompt(string $prompt): string
{
return <<<PROMPT
### System:
You are an expert Laravel developer following PSR-12 and Laravel best practices.
### Instruction:
{$prompt}
### Response:
PROMPT;
}
}
Deploying Fine-tuned Model
API Service
// app/Services/AI/CodeAssistantService.php
namespace App\Services\AI;
use OpenAI\Laravel\Facades\OpenAI;
class CodeAssistantService
{
private string $model;
public function __construct()
{
$this->model = config('ai.fine_tuned_model', 'gpt-4o-mini');
}
public function generateCode(string $instruction): string
{
$response = OpenAI::chat()->create([
'model' => $this->model,
'messages' => [
[
'role' => 'system',
'content' => $this->getSystemPrompt()
],
[
'role' => 'user',
'content' => $instruction
]
],
'temperature' => 0.3,
'max_tokens' => 2000,
]);
return $response->choices[0]->message->content;
}
public function reviewCode(string $code): array
{
$response = OpenAI::chat()->create([
'model' => $this->model,
'messages' => [
[
'role' => 'system',
'content' => 'Review PHP/Laravel code and suggest improvements. Return JSON with: issues, suggestions, improved_code.'
],
[
'role' => 'user',
'content' => "Review this code:\n\n```php\n{$code}\n```"
]
],
'response_format' => ['type' => 'json_object'],
]);
return json_decode($response->choices[0]->message->content, true);
}
public function explainCode(string $code): string
{
return $this->generateCode("Explain this Laravel code:\n\n```php\n{$code}\n```");
}
protected function getSystemPrompt(): string
{
return <<<PROMPT
You are a Laravel code assistant fine-tuned on high-quality Laravel codebases.
Guidelines:
- Generate clean, readable PHP 8.3+ code
- Follow PSR-12 coding standard
- Use Laravel conventions and best practices
- Include proper type hints and return types
- Add PHPDoc for complex methods
- Handle errors appropriately
- Consider security implications
PROMPT;
}
}
Artisan Command for Code Generation
// app/Console/Commands/AIGenerate.php
namespace App\Console\Commands;
use App\Services\AI\CodeAssistantService;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\File;
class AIGenerate extends Command
{
protected $signature = 'ai:generate
{type : Type of code (controller, model, migration, etc)}
{name : Name of the class/file}
{--description= : Additional description}';
protected $description = 'Generate Laravel code using AI';
public function handle(CodeAssistantService $assistant): int
{
$type = $this->argument('type');
$name = $this->argument('name');
$description = $this->option('description');
$prompt = $this->buildPrompt($type, $name, $description);
$this->info("Generating {$type}...");
$code = $assistant->generateCode($prompt);
// Extract code from markdown if present
if (preg_match('/```php\n(.*?)\n```/s', $code, $matches)) {
$code = $matches[1];
}
$path = $this->getPath($type, $name);
if (File::exists($path)) {
if (!$this->confirm("File exists. Overwrite?")) {
return self::FAILURE;
}
}
File::ensureDirectoryExists(dirname($path));
File::put($path, "<?php\n\n" . ltrim($code, "<?php\n"));
$this->info("Generated: {$path}");
return self::SUCCESS;
}
protected function buildPrompt(string $type, string $name, ?string $description): string
{
$prompts = [
'controller' => "Create a Laravel resource controller named {$name}Controller",
'model' => "Create a Laravel Eloquent model named {$name}",
'migration' => "Create a Laravel migration for {$name} table",
'request' => "Create a Laravel Form Request named {$name}Request",
'service' => "Create a Laravel service class named {$name}Service",
'action' => "Create a Laravel Action class named {$name}",
'job' => "Create a Laravel Job named {$name}",
'event' => "Create a Laravel Event named {$name}",
'listener' => "Create a Laravel Listener named {$name}",
];
$prompt = $prompts[$type] ?? "Create a Laravel {$type} named {$name}";
if ($description) {
$prompt .= ". Description: {$description}";
}
return $prompt;
}
protected function getPath(string $type, string $name): string
{
$paths = [
'controller' => app_path("Http/Controllers/{$name}Controller.php"),
'model' => app_path("Models/{$name}.php"),
'request' => app_path("Http/Requests/{$name}Request.php"),
'service' => app_path("Services/{$name}Service.php"),
'action' => app_path("Actions/{$name}.php"),
'job' => app_path("Jobs/{$name}.php"),
'event' => app_path("Events/{$name}.php"),
'listener' => app_path("Listeners/{$name}.php"),
];
return $paths[$type] ?? app_path("{$name}.php");
}
}
Evaluation and Monitoring
Quality Metrics
// app/Services/AI/ModelEvaluator.php
namespace App\Services\AI;
use Illuminate\Support\Facades\Process;
class ModelEvaluator
{
public function evaluateGeneration(string $generated, string $expected): array
{
return [
'syntax_valid' => $this->checkSyntax($generated),
'similarity' => $this->calculateSimilarity($generated, $expected),
'follows_conventions' => $this->checkConventions($generated),
'passes_phpstan' => $this->runPhpStan($generated),
];
}
protected function checkSyntax(string $code): bool
{
$tempFile = tempnam(sys_get_temp_dir(), 'php_');
file_put_contents($tempFile, "<?php\n" . $code);
$result = Process::run("php -l {$tempFile}");
unlink($tempFile);
return $result->successful();
}
protected function checkConventions(string $code): array
{
$checks = [
'has_strict_types' => str_contains($code, 'declare(strict_types=1)'),
'has_type_hints' => preg_match('/function \w+\([^)]*\w+ \$/', $code),
'has_return_type' => preg_match('/\): \w+/', $code),
'follows_psr12' => $this->runPint($code),
];
return $checks;
}
protected function calculateSimilarity(string $a, string $b): float
{
// Normalize code
$a = preg_replace('/\s+/', ' ', $a);
$b = preg_replace('/\s+/', ' ', $b);
similar_text($a, $b, $percent);
return $percent / 100;
}
}
Logging and Analytics
// app/Services/AI/UsageLogger.php
namespace App\Services\AI;
use Illuminate\Support\Facades\DB;
class UsageLogger
{
public function log(string $model, string $prompt, string $response, array $metadata = []): void
{
DB::table('ai_usage_logs')->insert([
'model' => $model,
'prompt_tokens' => $metadata['prompt_tokens'] ?? 0,
'completion_tokens' => $metadata['completion_tokens'] ?? 0,
'total_tokens' => $metadata['total_tokens'] ?? 0,
'latency_ms' => $metadata['latency_ms'] ?? 0,
'prompt_hash' => md5($prompt),
'response_quality' => $metadata['quality_score'] ?? null,
'created_at' => now(),
]);
}
public function getStats(string $model, string $period = 'day'): array
{
return DB::table('ai_usage_logs')
->where('model', $model)
->where('created_at', '>=', now()->sub($period, 1))
->selectRaw('
COUNT(*) as requests,
SUM(total_tokens) as total_tokens,
AVG(latency_ms) as avg_latency,
AVG(response_quality) as avg_quality
')
->first();
}
}
Conclusion
Fine-tuning LLMs for PHP/Laravel brings many benefits:
- Higher code quality: Model learns from team's best practices
- Consistency: Output consistent with coding standards
- Domain knowledge: Deep understanding of business logic
- Cost efficiency: Smaller models after fine-tuning can be competitive with larger ones
Recommended Workflow
- Collect - Gather high-quality code from codebase
- Curate - Select and format data
- Augment - Enhance dataset
- Train - Fine-tune with appropriate hyperparameters
- Evaluate - Assess on test set
- Deploy - Integrate into workflow
- Monitor - Track quality and iterate