Database Sharding Strategies Cho Laravel
Khi single database không chịu nổi tải — hàng tỷ rows, hàng nghìn writes/giây — bạn cần chia dữ liệu qua nhiều databases. Đây là sharding (horizontal partitioning across multiple database instances).
Bài viết này đi từ cơ bản đến implementation thực tế: khi nào cần shard, chọn strategy nào, implement với Eloquent, và giải quyết các thách thức cross-shard.
Khi Nào Cần Sharding?
Chắc là chưa cần. Đây là checklist escalation trước khi nghĩ tới sharding:
- Tối ưu query —
EXPLAIN ANALYZE, thêm indexes, rewrite slow queries - Read replicas — Tách read/write workload. Laravel hỗ trợ sẵn trong
database.php - Caching — Redis/Memcached cho hot data
- Table partitioning — Partition by range/hash trong cùng một DB (MySQL, PostgreSQL hỗ trợ native)
- Vertical scaling — Server mạnh hơn, NVMe SSD, tăng RAM
- Archive old data — Move old rows sang cold storage
Mỗi bước trên giải quyết vấn đề mà không tăng complexity. Sharding thêm rất nhiều complexity — đừng chọn sớm.
Shard khi:
- Single table 100M+ rows VÀ tăng nhanh (millions/ngày)
- Write throughput vượt khả năng server (>10K writes/giây sustained)
- Yêu cầu data locality (GDPR: EU data phải ở EU)
- Multi-tenant SaaS cần isolation hoàn toàn
Sharding Strategies
| Strategy | Cách hoạt động | Phù hợp cho | Trade-off |
|---|---|---|---|
| Hash-based | hash(key) % N → Shard N | Phân bổ đều | Khó rebalance khi thêm shard |
| Range-based | ID 1-1M → Shard 1, 1M-2M → Shard 2 | Time-series, sequential | Hot spots ở shard mới nhất |
| Tenant-based | Mỗi tenant → DB riêng | SaaS | Quá nhiều DBs khi scale |
| Geographic | EU users → EU shard | Data residency, latency | Cross-region queries chậm |
| Directory-based | Lookup table nói key ở shard nào | Flexible routing | Lookup table = single point of failure |
Chọn Strategy Nào?
Bạn cần data isolation giữa customers? → Tenant-based
Dữ liệu có yếu tố geographic/legal? → Geographic
Data phân bổ đều, không hotspots? → Hash-based
Time-series hoặc sequential? → Range-based
Không chắc? → Hash-based (safe default)
Config Shard Connections
// config/database.php
'connections' => [
'shard_0' => [
'driver' => 'mysql',
'host' => env('SHARD_0_HOST', '10.0.1.10'),
'port' => env('SHARD_0_PORT', '3306'),
'database' => env('SHARD_0_DATABASE', 'app_shard_0'),
'username' => env('SHARD_0_USERNAME', 'shard_user'),
'password' => env('SHARD_0_PASSWORD'),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
],
'shard_1' => [
'driver' => 'mysql',
'host' => env('SHARD_1_HOST', '10.0.1.11'),
'port' => env('SHARD_1_PORT', '3306'),
'database' => env('SHARD_1_DATABASE', 'app_shard_1'),
'username' => env('SHARD_1_USERNAME', 'shard_user'),
'password' => env('SHARD_1_PASSWORD'),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
],
'shard_2' => [
'driver' => 'mysql',
'host' => env('SHARD_2_HOST', '10.0.1.12'),
'port' => env('SHARD_2_PORT', '3306'),
'database' => env('SHARD_2_DATABASE', 'app_shard_2'),
'username' => env('SHARD_2_USERNAME', 'shard_user'),
'password' => env('SHARD_2_PASSWORD'),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
],
// Global DB cho non-sharded data (users, config, etc.)
'global' => [
'driver' => 'mysql',
'host' => env('GLOBAL_DB_HOST', '10.0.1.1'),
'database' => env('GLOBAL_DB_DATABASE', 'app_global'),
'username' => env('GLOBAL_DB_USERNAME'),
'password' => env('GLOBAL_DB_PASSWORD'),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
],
],
Lưu ý global database: không phải mọi table cần shard. Users, settings, permissions — giữ ở global DB. Chỉ shard bảng thực sự lớn (orders, events, logs).
Shard Manager
// app/Services/ShardManager.php
namespace App\Services;
class ShardManager
{
/** @var array<int, string> shard index → connection name */
private array $connections;
private int $shardCount;
public function __construct()
{
$shards = config('sharding.connections', [
0 => 'shard_0',
1 => 'shard_1',
2 => 'shard_2',
]);
$this->connections = $shards;
$this->shardCount = count($shards);
}
/**
* Hash-based: phẩn bổ đều dựa trên key.
* Dùng CRC32 vì nhanh và phân bổ tốt cho sharding.
*/
public function getConnectionForKey(int|string $key): string
{
$shardIndex = crc32((string) $key) % $this->shardCount;
// crc32 có thể return số âm trên 32-bit systems
$shardIndex = abs($shardIndex) % $this->shardCount;
return $this->connections[$shardIndex];
}
public function allConnections(): array
{
return $this->connections;
}
public function shardCount(): int
{
return $this->shardCount;
}
/**
* Xác định shard cho nhiều keys — nhóm theo connection
* Hữu ích khi batch query: thay vì query từng key, nhóm keys cùng shard
*/
public function groupKeysByConnection(array $keys): array
{
$grouped = [];
foreach ($keys as $key) {
$conn = $this->getConnectionForKey($key);
$grouped[$conn][] = $key;
}
return $grouped;
}
}
groupKeysByConnection() là optimization quan trọng. Ví dụ bạn cần fetch orders cho 100 users — thay vì 100 queries riêng lẻ, nhóm users theo shard rồi query 3 lần (1 per shard).
Sharded Model Trait
// app/Models/Concerns/Shardable.php
namespace App\Models\Concerns;
use App\Services\ShardManager;
use Illuminate\Database\Eloquent\Builder;
use Illuminate\Support\Collection;
trait Shardable
{
/**
* Mỗi model define shard key của mình.
* Thường là user_id hoặc tenant_id.
*/
abstract public function getShardKey(): int|string;
/**
* Override save: tự động route đến đúng shard.
*/
public function save(array $options = []): bool
{
$manager = app(ShardManager::class);
$connection = $manager->getConnectionForKey($this->getShardKey());
$this->setConnection($connection);
return parent::save($options);
}
/**
* Override delete: route đến đúng shard.
*/
public function delete(): ?bool
{
$manager = app(ShardManager::class);
$connection = $manager->getConnectionForKey($this->getShardKey());
$this->setConnection($connection);
return parent::delete();
}
/**
* Query trên shard cụ thể — dùng khi biết shard key.
*/
public static function onShard(int|string $shardKey): Builder
{
$manager = app(ShardManager::class);
return static::on($manager->getConnectionForKey($shardKey));
}
/**
* Query tất cả shards — CẢNH BÁO: chậm, dùng cho admin/reports.
* Cân nhắc: kết quả KHÔNG sorted, cần sort ở application level.
*/
public static function onAllShards(): Collection
{
$manager = app(ShardManager::class);
$results = collect();
foreach ($manager->allConnections() as $connection) {
$shardResults = static::on($connection)->get();
$results = $results->merge($shardResults);
}
return $results;
}
/**
* Fan-out query với WHERE clause — trên tất cả shards.
* Chạy parallelize nếu có thể.
*/
public static function queryAllShards(\Closure $queryBuilder): Collection
{
$manager = app(ShardManager::class);
$results = collect();
foreach ($manager->allConnections() as $connection) {
$query = static::on($connection);
$queryBuilder($query);
$results = $results->merge($query->get());
}
return $results;
}
/**
* Batch query: nhóm keys theo shard, query mỗi shard 1 lần.
*/
public static function findManyAcrossShards(array $shardKeys, string $keyColumn = 'user_id'): Collection
{
$manager = app(ShardManager::class);
$grouped = $manager->groupKeysByConnection($shardKeys);
$results = collect();
foreach ($grouped as $connection => $keys) {
$shardResults = static::on($connection)
->whereIn($keyColumn, $keys)
->get();
$results = $results->merge($shardResults);
}
return $results;
}
}
Trait này giải quyết 4 use cases:
- Single-shard query: biết shard key → query trực tiếp, nhanh
- Save/delete: tự động route, model code không cần biết sharding exists
- Fan-out query: query tất cả shards (admin reports)
- Batch query: smart grouping, giảm số queries
Sử Dụng
class Order extends Model
{
use Shardable;
// UUIDs — bắt buộc khi sharding (tránh ID conflict)
use HasUuids;
protected $fillable = ['user_id', 'total', 'status'];
public function getShardKey(): int|string
{
return $this->user_id;
}
}
// Tạo order — auto-routes đến đúng shard
$order = new Order([
'user_id' => 42,
'total' => 99_900,
'status' => 'pending',
]);
$order->save(); // → shard_X dựa trên hash(42)
// Query trên shard cụ thể (nhanh nhất)
$orders = Order::onShard(42)
->where('user_id', 42)
->where('status', 'completed')
->orderByDesc('created_at')
->get();
// Fan-out — CẢNH BÁO: query TẤT CẢ shards, dùng cho admin
$topOrders = Order::queryAllShards(function ($query) {
$query->where('total', '>', 100_000)
->where('created_at', '>=', now()->subMonth());
})->sortByDesc('total')->take(100);
// Batch: lấy orders cho nhiều users (smart grouping)
$userIds = [42, 99, 150, 200, 311];
$allOrders = Order::findManyAcrossShards($userIds);
Tenant-Based Sharding (SaaS)
Tenant-based phù hợp nhất cho SaaS: mỗi tenant có database riêng → data isolation hoàn toàn.
// app/Services/TenantShardManager.php
class TenantShardManager
{
/**
* Dynamically configure connection cho tenant.
* Không cần define trước tất cả connections trong config.
*/
public function connectTenant(Tenant $tenant): string
{
$connectionName = "tenant_{$tenant->id}";
config([
"database.connections.{$connectionName}" => [
'driver' => 'mysql',
'host' => $tenant->db_host,
'port' => $tenant->db_port ?? 3306,
'database' => $tenant->db_name,
'username' => $tenant->db_username,
'password' => decrypt($tenant->db_password),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
],
]);
return $connectionName;
}
}
// Middleware để set tenant connection cho mỗi request
class SetTenantConnection
{
public function handle(Request $request, Closure $next)
{
$tenant = $request->route('tenant')
?? Tenant::where('domain', $request->getHost())->first();
if (!$tenant) {
abort(404, 'Tenant not found');
}
$connection = app(TenantShardManager::class)->connectTenant($tenant);
// Set default connection cho request này
config(['database.default' => $connection]);
return $next($request);
}
}
Thách Thức Và Giải Pháp
1. Auto-Increment ID Conflicts
IDs conflict giữa shards (cả shard_0 và shard_1 đều có id=1). Giải pháp:
// Dùng UUIDs — recommended
class Order extends Model
{
use HasUuids; // Laravel built-in
}
// Hoặc Snowflake IDs nếu cần sortable
// Format: timestamp + worker_id + sequence
class SnowflakeId
{
private const EPOCH = 1700000000000; // Custom epoch
public static function generate(int $workerId = 0): int
{
$timestamp = (int)(microtime(true) * 1000) - self::EPOCH;
$sequence = random_int(0, 4095);
return ($timestamp << 22) | ($workerId << 12) | $sequence;
}
}
UUID vs Snowflake:
- UUID: simple, no coordination needed, nhưng index performance kém hơn (random)
- Snowflake: sortable, better index performance, nhưng cần coordinate worker IDs
2. Cross-Shard Joins
Không thể JOIN qua databases. Hai cách giải quyết:
// Cách 1: Denormalize — lưu luôn data cần thiết
class Order extends Model
{
// Thay vì join users table:
// KHÔNG: $order->user->name
// CÓ: lưu user_name, user_email trực tiếp trong orders
protected $fillable = ['user_id', 'user_name', 'user_email', 'total'];
}
// Cách 2: Application-level join
$orders = Order::onShard($userId)->where('user_id', $userId)->get();
$products = Product::whereIn('id', $orders->pluck('product_id'))->get()->keyBy('id');
$ordersWithProducts = $orders->map(function ($order) use ($products) {
$order->product = $products[$order->product_id] ?? null;
return $order;
});
Denormalization có trade-off: data duplication và consistency challenges. Nhưng với sharding, đây thường là lựa chọn đúng — join ở application level chậm hơn nhiều.
3. Cross-Shard Transactions
Distributed transactions rất khó đảm bảo ACID. Dùng Saga Pattern:
class TransferFundsSaga
{
public function execute(int $fromUserId, int $toUserId, int $amount): void
{
// Step 1: Debit from source (shard A)
$debit = Order::onShard($fromUserId)
->where('user_id', $fromUserId)
->first();
DB::connection(app(ShardManager::class)->getConnectionForKey($fromUserId))
->transaction(function () use ($fromUserId, $amount) {
$affected = Wallet::on($this->getConnection($fromUserId))
->where('user_id', $fromUserId)
->where('balance', '>=', $amount)
->decrement('balance', $amount);
if ($affected === 0) {
throw new InsufficientFundsException();
}
});
// Step 2: Credit to destination (shard B)
try {
DB::connection(app(ShardManager::class)->getConnectionForKey($toUserId))
->transaction(function () use ($toUserId, $amount) {
Wallet::on($this->getConnection($toUserId))
->where('user_id', $toUserId)
->increment('balance', $amount);
});
} catch (\Exception $e) {
// Compensating action — rollback step 1
DB::connection(app(ShardManager::class)->getConnectionForKey($fromUserId))
->transaction(function () use ($fromUserId, $amount) {
Wallet::on($this->getConnection($fromUserId))
->where('user_id', $fromUserId)
->increment('balance', $amount);
});
throw $e;
}
}
}
Migrate & Seed Tất Cả Shards
class MigrateShards extends Command
{
protected $signature = 'shards:migrate {--seed} {--fresh}';
protected $description = 'Run migrations trên tất cả shards';
public function handle(): int
{
$manager = app(ShardManager::class);
$failed = [];
foreach ($manager->allConnections() as $index => $connection) {
$this->info("Migrating shard {$index} ({$connection})...");
try {
$args = ['--database' => $connection, '--force' => true];
if ($this->option('fresh')) {
$this->call('migrate:fresh', $args);
} else {
$this->call('migrate', $args);
}
if ($this->option('seed')) {
$this->call('db:seed', ['--database' => $connection]);
}
$this->info(" ✓ Shard {$index} done.");
} catch (\Exception $e) {
$this->error(" ✗ Shard {$index} failed: {$e->getMessage()}");
$failed[] = $index;
}
}
if (!empty($failed)) {
$this->error('Failed shards: ' . implode(', ', $failed));
return Command::FAILURE;
}
return Command::SUCCESS;
}
}
Monitoring & Health Check
class ShardHealthCheck extends Command
{
protected $signature = 'shards:health';
public function handle(): int
{
$manager = app(ShardManager::class);
$rows = [];
foreach ($manager->allConnections() as $index => $connection) {
try {
$start = microtime(true);
$result = DB::connection($connection)->select('SELECT 1');
$latency = round((microtime(true) - $start) * 1000, 1);
$count = DB::connection($connection)
->table('orders')
->count();
$rows[] = [$index, $connection, '✓ OK', "{$latency}ms", number_format($count)];
} catch (\Exception $e) {
$rows[] = [$index, $connection, '✗ DOWN', '-', '-'];
}
}
$this->table(
['Shard', 'Connection', 'Status', 'Latency', 'Rows'],
$rows
);
return Command::SUCCESS;
}
}
Output:
+-------+------------+--------+---------+------------+
| Shard | Connection | Status | Latency | Rows |
+-------+------------+--------+---------+------------+
| 0 | shard_0 | ✓ OK | 1.2ms | 33,451,200 |
| 1 | shard_1 | ✓ OK | 1.1ms | 33,129,800 |
| 2 | shard_2 | ✓ OK | 1.3ms | 33,580,100 |
+-------+------------+--------+---------+------------+
Nếu row counts lệch quá nhiều (>20%), shard key distribution có vấn đề — cần xem lại hash function.
Testing
class ShardableTest extends TestCase
{
public function test_order_routes_to_correct_shard(): void
{
$order = new Order(['user_id' => 42, 'total' => 100]);
$order->save();
$expected = app(ShardManager::class)->getConnectionForKey(42);
$this->assertEquals($expected, $order->getConnectionName());
}
public function test_same_key_always_routes_to_same_shard(): void
{
$manager = app(ShardManager::class);
// Deterministic: same key → same shard, always
$conn1 = $manager->getConnectionForKey(42);
$conn2 = $manager->getConnectionForKey(42);
$this->assertSame($conn1, $conn2);
}
public function test_keys_distribute_across_shards(): void
{
$manager = app(ShardManager::class);
$distribution = [];
for ($i = 0; $i < 10000; $i++) {
$conn = $manager->getConnectionForKey($i);
$distribution[$conn] = ($distribution[$conn] ?? 0) + 1;
}
// Expect roughly even distribution (±20%)
$expected = 10000 / $manager->shardCount();
foreach ($distribution as $conn => $count) {
$this->assertEqualsWithDelta($expected, $count, $expected * 0.2,
"Shard {$conn} has uneven distribution: {$count}");
}
}
public function test_batch_query_groups_efficiently(): void
{
$manager = app(ShardManager::class);
$grouped = $manager->groupKeysByConnection([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
// Nên có <= shardCount groups
$this->assertLessThanOrEqual($manager->shardCount(), count($grouped));
// Tổng keys phải = input
$totalKeys = array_sum(array_map('count', $grouped));
$this->assertEquals(10, $totalKeys);
}
}
Kết Luận
Sharding là giải pháp cuối cùng cho scale — nhưng khi cần thì phải biết implement đúng. Checklist:
- Exhaust alternatives first — optimize, cache, replicas, partitioning
- Chọn shard key cẩn thận — thường
user_idhoặctenant_id, phải immutable - UUIDs cho primary keys — tránh ID conflicts
- Hash-based cho phân bổ đều — safe default
- Denormalize — tránh cross-shard joins, accept data duplication
- Global DB cho shared data — users, config, permissions
- Saga pattern cho cross-shard transactions — compensating actions
- Monitor distribution — alert khi skew >20%
- Test shard routing — deterministic, distributed evenly
- Migrate/seed tất cả shards — automation commands