Database Sharding Strategies Cho Laravel

· 13 min read

Khi single database không chịu nổi tải — hàng tỷ rows, hàng nghìn writes/giây — bạn cần chia dữ liệu qua nhiều databases. Đây là sharding (horizontal partitioning across multiple database instances).

Bài viết này đi từ cơ bản đến implementation thực tế: khi nào cần shard, chọn strategy nào, implement với Eloquent, và giải quyết các thách thức cross-shard.

Khi Nào Cần Sharding?

Chắc là chưa cần. Đây là checklist escalation trước khi nghĩ tới sharding:

  1. Tối ưu queryEXPLAIN ANALYZE, thêm indexes, rewrite slow queries
  2. Read replicas — Tách read/write workload. Laravel hỗ trợ sẵn trong database.php
  3. Caching — Redis/Memcached cho hot data
  4. Table partitioning — Partition by range/hash trong cùng một DB (MySQL, PostgreSQL hỗ trợ native)
  5. Vertical scaling — Server mạnh hơn, NVMe SSD, tăng RAM
  6. Archive old data — Move old rows sang cold storage

Mỗi bước trên giải quyết vấn đề mà không tăng complexity. Sharding thêm rất nhiều complexity — đừng chọn sớm.

Shard khi:

  • Single table 100M+ rows VÀ tăng nhanh (millions/ngày)
  • Write throughput vượt khả năng server (>10K writes/giây sustained)
  • Yêu cầu data locality (GDPR: EU data phải ở EU)
  • Multi-tenant SaaS cần isolation hoàn toàn

Sharding Strategies

Strategy Cách hoạt động Phù hợp cho Trade-off
Hash-based hash(key) % N → Shard N Phân bổ đều Khó rebalance khi thêm shard
Range-based ID 1-1M → Shard 1, 1M-2M → Shard 2 Time-series, sequential Hot spots ở shard mới nhất
Tenant-based Mỗi tenant → DB riêng SaaS Quá nhiều DBs khi scale
Geographic EU users → EU shard Data residency, latency Cross-region queries chậm
Directory-based Lookup table nói key ở shard nào Flexible routing Lookup table = single point of failure

Chọn Strategy Nào?

Bạn cần data isolation giữa customers? → Tenant-based
Dữ liệu có yếu tố geographic/legal? → Geographic
Data phân bổ đều, không hotspots? → Hash-based
Time-series hoặc sequential? → Range-based
Không chắc? → Hash-based (safe default)

Config Shard Connections

// config/database.php
'connections' => [
    'shard_0' => [
        'driver' => 'mysql',
        'host' => env('SHARD_0_HOST', '10.0.1.10'),
        'port' => env('SHARD_0_PORT', '3306'),
        'database' => env('SHARD_0_DATABASE', 'app_shard_0'),
        'username' => env('SHARD_0_USERNAME', 'shard_user'),
        'password' => env('SHARD_0_PASSWORD'),
        'charset' => 'utf8mb4',
        'collation' => 'utf8mb4_unicode_ci',
    ],
    'shard_1' => [
        'driver' => 'mysql',
        'host' => env('SHARD_1_HOST', '10.0.1.11'),
        'port' => env('SHARD_1_PORT', '3306'),
        'database' => env('SHARD_1_DATABASE', 'app_shard_1'),
        'username' => env('SHARD_1_USERNAME', 'shard_user'),
        'password' => env('SHARD_1_PASSWORD'),
        'charset' => 'utf8mb4',
        'collation' => 'utf8mb4_unicode_ci',
    ],
    'shard_2' => [
        'driver' => 'mysql',
        'host' => env('SHARD_2_HOST', '10.0.1.12'),
        'port' => env('SHARD_2_PORT', '3306'),
        'database' => env('SHARD_2_DATABASE', 'app_shard_2'),
        'username' => env('SHARD_2_USERNAME', 'shard_user'),
        'password' => env('SHARD_2_PASSWORD'),
        'charset' => 'utf8mb4',
        'collation' => 'utf8mb4_unicode_ci',
    ],
    // Global DB cho non-sharded data (users, config, etc.)
    'global' => [
        'driver' => 'mysql',
        'host' => env('GLOBAL_DB_HOST', '10.0.1.1'),
        'database' => env('GLOBAL_DB_DATABASE', 'app_global'),
        'username' => env('GLOBAL_DB_USERNAME'),
        'password' => env('GLOBAL_DB_PASSWORD'),
        'charset' => 'utf8mb4',
        'collation' => 'utf8mb4_unicode_ci',
    ],
],

Lưu ý global database: không phải mọi table cần shard. Users, settings, permissions — giữ ở global DB. Chỉ shard bảng thực sự lớn (orders, events, logs).

Shard Manager

// app/Services/ShardManager.php

namespace App\Services;

class ShardManager
{
    /** @var array<int, string> shard index → connection name */
    private array $connections;
    private int $shardCount;

    public function __construct()
    {
        $shards = config('sharding.connections', [
            0 => 'shard_0',
            1 => 'shard_1',
            2 => 'shard_2',
        ]);

        $this->connections = $shards;
        $this->shardCount = count($shards);
    }

    /**
     * Hash-based: phẩn bổ đều dựa trên key.
     * Dùng CRC32 vì nhanh và phân bổ tốt cho sharding.
     */
    public function getConnectionForKey(int|string $key): string
    {
        $shardIndex = crc32((string) $key) % $this->shardCount;

        // crc32 có thể return số âm trên 32-bit systems
        $shardIndex = abs($shardIndex) % $this->shardCount;

        return $this->connections[$shardIndex];
    }

    public function allConnections(): array
    {
        return $this->connections;
    }

    public function shardCount(): int
    {
        return $this->shardCount;
    }

    /**
     * Xác định shard cho nhiều keys — nhóm theo connection
     * Hữu ích khi batch query: thay vì query từng key, nhóm keys cùng shard
     */
    public function groupKeysByConnection(array $keys): array
    {
        $grouped = [];
        foreach ($keys as $key) {
            $conn = $this->getConnectionForKey($key);
            $grouped[$conn][] = $key;
        }
        return $grouped;
    }
}

groupKeysByConnection() là optimization quan trọng. Ví dụ bạn cần fetch orders cho 100 users — thay vì 100 queries riêng lẻ, nhóm users theo shard rồi query 3 lần (1 per shard).

Sharded Model Trait

// app/Models/Concerns/Shardable.php

namespace App\Models\Concerns;

use App\Services\ShardManager;
use Illuminate\Database\Eloquent\Builder;
use Illuminate\Support\Collection;

trait Shardable
{
    /**
     * Mỗi model define shard key của mình.
     * Thường là user_id hoặc tenant_id.
     */
    abstract public function getShardKey(): int|string;

    /**
     * Override save: tự động route đến đúng shard.
     */
    public function save(array $options = []): bool
    {
        $manager = app(ShardManager::class);
        $connection = $manager->getConnectionForKey($this->getShardKey());
        $this->setConnection($connection);

        return parent::save($options);
    }

    /**
     * Override delete: route đến đúng shard.
     */
    public function delete(): ?bool
    {
        $manager = app(ShardManager::class);
        $connection = $manager->getConnectionForKey($this->getShardKey());
        $this->setConnection($connection);

        return parent::delete();
    }

    /**
     * Query trên shard cụ thể — dùng khi biết shard key.
     */
    public static function onShard(int|string $shardKey): Builder
    {
        $manager = app(ShardManager::class);
        return static::on($manager->getConnectionForKey($shardKey));
    }

    /**
     * Query tất cả shards — CẢNH BÁO: chậm, dùng cho admin/reports.
     * Cân nhắc: kết quả KHÔNG sorted, cần sort ở application level.
     */
    public static function onAllShards(): Collection
    {
        $manager = app(ShardManager::class);
        $results = collect();

        foreach ($manager->allConnections() as $connection) {
            $shardResults = static::on($connection)->get();
            $results = $results->merge($shardResults);
        }

        return $results;
    }

    /**
     * Fan-out query với WHERE clause — trên tất cả shards.
     * Chạy parallelize nếu có thể.
     */
    public static function queryAllShards(\Closure $queryBuilder): Collection
    {
        $manager = app(ShardManager::class);
        $results = collect();

        foreach ($manager->allConnections() as $connection) {
            $query = static::on($connection);
            $queryBuilder($query);
            $results = $results->merge($query->get());
        }

        return $results;
    }

    /**
     * Batch query: nhóm keys theo shard, query mỗi shard 1 lần.
     */
    public static function findManyAcrossShards(array $shardKeys, string $keyColumn = 'user_id'): Collection
    {
        $manager = app(ShardManager::class);
        $grouped = $manager->groupKeysByConnection($shardKeys);
        $results = collect();

        foreach ($grouped as $connection => $keys) {
            $shardResults = static::on($connection)
                ->whereIn($keyColumn, $keys)
                ->get();
            $results = $results->merge($shardResults);
        }

        return $results;
    }
}

Trait này giải quyết 4 use cases:

  • Single-shard query: biết shard key → query trực tiếp, nhanh
  • Save/delete: tự động route, model code không cần biết sharding exists
  • Fan-out query: query tất cả shards (admin reports)
  • Batch query: smart grouping, giảm số queries

Sử Dụng

class Order extends Model
{
    use Shardable;

    // UUIDs — bắt buộc khi sharding (tránh ID conflict)
    use HasUuids;

    protected $fillable = ['user_id', 'total', 'status'];

    public function getShardKey(): int|string
    {
        return $this->user_id;
    }
}
// Tạo order — auto-routes đến đúng shard
$order = new Order([
    'user_id' => 42,
    'total' => 99_900,
    'status' => 'pending',
]);
$order->save(); // → shard_X dựa trên hash(42)

// Query trên shard cụ thể (nhanh nhất)
$orders = Order::onShard(42)
    ->where('user_id', 42)
    ->where('status', 'completed')
    ->orderByDesc('created_at')
    ->get();

// Fan-out — CẢNH BÁO: query TẤT CẢ shards, dùng cho admin
$topOrders = Order::queryAllShards(function ($query) {
    $query->where('total', '>', 100_000)
          ->where('created_at', '>=', now()->subMonth());
})->sortByDesc('total')->take(100);

// Batch: lấy orders cho nhiều users (smart grouping)
$userIds = [42, 99, 150, 200, 311];
$allOrders = Order::findManyAcrossShards($userIds);

Tenant-Based Sharding (SaaS)

Tenant-based phù hợp nhất cho SaaS: mỗi tenant có database riêng → data isolation hoàn toàn.

// app/Services/TenantShardManager.php

class TenantShardManager
{
    /**
     * Dynamically configure connection cho tenant.
     * Không cần define trước tất cả connections trong config.
     */
    public function connectTenant(Tenant $tenant): string
    {
        $connectionName = "tenant_{$tenant->id}";

        config([
            "database.connections.{$connectionName}" => [
                'driver' => 'mysql',
                'host' => $tenant->db_host,
                'port' => $tenant->db_port ?? 3306,
                'database' => $tenant->db_name,
                'username' => $tenant->db_username,
                'password' => decrypt($tenant->db_password),
                'charset' => 'utf8mb4',
                'collation' => 'utf8mb4_unicode_ci',
            ],
        ]);

        return $connectionName;
    }
}
// Middleware để set tenant connection cho mỗi request

class SetTenantConnection
{
    public function handle(Request $request, Closure $next)
    {
        $tenant = $request->route('tenant')
            ?? Tenant::where('domain', $request->getHost())->first();

        if (!$tenant) {
            abort(404, 'Tenant not found');
        }

        $connection = app(TenantShardManager::class)->connectTenant($tenant);

        // Set default connection cho request này
        config(['database.default' => $connection]);

        return $next($request);
    }
}

Thách Thức Và Giải Pháp

1. Auto-Increment ID Conflicts

IDs conflict giữa shards (cả shard_0 và shard_1 đều có id=1). Giải pháp:

// Dùng UUIDs — recommended
class Order extends Model
{
    use HasUuids;  // Laravel built-in
}

// Hoặc Snowflake IDs nếu cần sortable
// Format: timestamp + worker_id + sequence
class SnowflakeId
{
    private const EPOCH = 1700000000000; // Custom epoch

    public static function generate(int $workerId = 0): int
    {
        $timestamp = (int)(microtime(true) * 1000) - self::EPOCH;
        $sequence = random_int(0, 4095);
        return ($timestamp << 22) | ($workerId << 12) | $sequence;
    }
}

UUID vs Snowflake:

  • UUID: simple, no coordination needed, nhưng index performance kém hơn (random)
  • Snowflake: sortable, better index performance, nhưng cần coordinate worker IDs

2. Cross-Shard Joins

Không thể JOIN qua databases. Hai cách giải quyết:

// Cách 1: Denormalize — lưu luôn data cần thiết
class Order extends Model
{
    // Thay vì join users table:
    // KHÔNG: $order->user->name
    // CÓ: lưu user_name, user_email trực tiếp trong orders
    protected $fillable = ['user_id', 'user_name', 'user_email', 'total'];
}

// Cách 2: Application-level join
$orders = Order::onShard($userId)->where('user_id', $userId)->get();
$products = Product::whereIn('id', $orders->pluck('product_id'))->get()->keyBy('id');

$ordersWithProducts = $orders->map(function ($order) use ($products) {
    $order->product = $products[$order->product_id] ?? null;
    return $order;
});

Denormalization có trade-off: data duplication và consistency challenges. Nhưng với sharding, đây thường là lựa chọn đúng — join ở application level chậm hơn nhiều.

3. Cross-Shard Transactions

Distributed transactions rất khó đảm bảo ACID. Dùng Saga Pattern:

class TransferFundsSaga
{
    public function execute(int $fromUserId, int $toUserId, int $amount): void
    {
        // Step 1: Debit from source (shard A)
        $debit = Order::onShard($fromUserId)
            ->where('user_id', $fromUserId)
            ->first();

        DB::connection(app(ShardManager::class)->getConnectionForKey($fromUserId))
            ->transaction(function () use ($fromUserId, $amount) {
                $affected = Wallet::on($this->getConnection($fromUserId))
                    ->where('user_id', $fromUserId)
                    ->where('balance', '>=', $amount)
                    ->decrement('balance', $amount);

                if ($affected === 0) {
                    throw new InsufficientFundsException();
                }
            });

        // Step 2: Credit to destination (shard B)
        try {
            DB::connection(app(ShardManager::class)->getConnectionForKey($toUserId))
                ->transaction(function () use ($toUserId, $amount) {
                    Wallet::on($this->getConnection($toUserId))
                        ->where('user_id', $toUserId)
                        ->increment('balance', $amount);
                });
        } catch (\Exception $e) {
            // Compensating action — rollback step 1
            DB::connection(app(ShardManager::class)->getConnectionForKey($fromUserId))
                ->transaction(function () use ($fromUserId, $amount) {
                    Wallet::on($this->getConnection($fromUserId))
                        ->where('user_id', $fromUserId)
                        ->increment('balance', $amount);
                });

            throw $e;
        }
    }
}

Migrate & Seed Tất Cả Shards

class MigrateShards extends Command
{
    protected $signature = 'shards:migrate {--seed} {--fresh}';
    protected $description = 'Run migrations trên tất cả shards';

    public function handle(): int
    {
        $manager = app(ShardManager::class);
        $failed = [];

        foreach ($manager->allConnections() as $index => $connection) {
            $this->info("Migrating shard {$index} ({$connection})...");

            try {
                $args = ['--database' => $connection, '--force' => true];

                if ($this->option('fresh')) {
                    $this->call('migrate:fresh', $args);
                } else {
                    $this->call('migrate', $args);
                }

                if ($this->option('seed')) {
                    $this->call('db:seed', ['--database' => $connection]);
                }

                $this->info("  ✓ Shard {$index} done.");
            } catch (\Exception $e) {
                $this->error("  ✗ Shard {$index} failed: {$e->getMessage()}");
                $failed[] = $index;
            }
        }

        if (!empty($failed)) {
            $this->error('Failed shards: ' . implode(', ', $failed));
            return Command::FAILURE;
        }

        return Command::SUCCESS;
    }
}

Monitoring & Health Check

class ShardHealthCheck extends Command
{
    protected $signature = 'shards:health';

    public function handle(): int
    {
        $manager = app(ShardManager::class);
        $rows = [];

        foreach ($manager->allConnections() as $index => $connection) {
            try {
                $start = microtime(true);
                $result = DB::connection($connection)->select('SELECT 1');
                $latency = round((microtime(true) - $start) * 1000, 1);

                $count = DB::connection($connection)
                    ->table('orders')
                    ->count();

                $rows[] = [$index, $connection, '✓ OK', "{$latency}ms", number_format($count)];
            } catch (\Exception $e) {
                $rows[] = [$index, $connection, '✗ DOWN', '-', '-'];
            }
        }

        $this->table(
            ['Shard', 'Connection', 'Status', 'Latency', 'Rows'],
            $rows
        );

        return Command::SUCCESS;
    }
}

Output:

+-------+------------+--------+---------+------------+
| Shard | Connection | Status | Latency | Rows       |
+-------+------------+--------+---------+------------+
| 0     | shard_0    | ✓ OK   | 1.2ms   | 33,451,200 |
| 1     | shard_1    | ✓ OK   | 1.1ms   | 33,129,800 |
| 2     | shard_2    | ✓ OK   | 1.3ms   | 33,580,100 |
+-------+------------+--------+---------+------------+

Nếu row counts lệch quá nhiều (>20%), shard key distribution có vấn đề — cần xem lại hash function.

Testing

class ShardableTest extends TestCase
{
    public function test_order_routes_to_correct_shard(): void
    {
        $order = new Order(['user_id' => 42, 'total' => 100]);
        $order->save();

        $expected = app(ShardManager::class)->getConnectionForKey(42);
        $this->assertEquals($expected, $order->getConnectionName());
    }

    public function test_same_key_always_routes_to_same_shard(): void
    {
        $manager = app(ShardManager::class);

        // Deterministic: same key → same shard, always
        $conn1 = $manager->getConnectionForKey(42);
        $conn2 = $manager->getConnectionForKey(42);
        $this->assertSame($conn1, $conn2);
    }

    public function test_keys_distribute_across_shards(): void
    {
        $manager = app(ShardManager::class);
        $distribution = [];

        for ($i = 0; $i < 10000; $i++) {
            $conn = $manager->getConnectionForKey($i);
            $distribution[$conn] = ($distribution[$conn] ?? 0) + 1;
        }

        // Expect roughly even distribution (±20%)
        $expected = 10000 / $manager->shardCount();
        foreach ($distribution as $conn => $count) {
            $this->assertEqualsWithDelta($expected, $count, $expected * 0.2,
                "Shard {$conn} has uneven distribution: {$count}");
        }
    }

    public function test_batch_query_groups_efficiently(): void
    {
        $manager = app(ShardManager::class);
        $grouped = $manager->groupKeysByConnection([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);

        // Nên có <= shardCount groups
        $this->assertLessThanOrEqual($manager->shardCount(), count($grouped));

        // Tổng keys phải = input
        $totalKeys = array_sum(array_map('count', $grouped));
        $this->assertEquals(10, $totalKeys);
    }
}

Kết Luận

Sharding là giải pháp cuối cùng cho scale — nhưng khi cần thì phải biết implement đúng. Checklist:

  1. Exhaust alternatives first — optimize, cache, replicas, partitioning
  2. Chọn shard key cẩn thận — thường user_id hoặc tenant_id, phải immutable
  3. UUIDs cho primary keys — tránh ID conflicts
  4. Hash-based cho phân bổ đều — safe default
  5. Denormalize — tránh cross-shard joins, accept data duplication
  6. Global DB cho shared data — users, config, permissions
  7. Saga pattern cho cross-shard transactions — compensating actions
  8. Monitor distribution — alert khi skew >20%
  9. Test shard routing — deterministic, distributed evenly
  10. Migrate/seed tất cả shards — automation commands

Bình luận