MySQL Indexing Strategies — Designing Effective Indexes for Every Use Case

An index is like a "table of contents" that helps MySQL find data thousands of times faster than scanning the entire table. But indexes aren't a case of "more is better" — each index has costs in storage and write performance.

This article will help you deeply understand how indexes work and design index strategies appropriate for each specific use case.

How Do Indexes Work?

B-Tree Index (Default)

MySQL InnoDB uses B+Tree as the primary data structure for indexes. Think of it like a branching tree:

                    [50]
                   /    \
              [20,30]   [70,80]
              /  |  \    /  |  \
           [10][25][35][60][75][90]
            ↓   ↓   ↓   ↓   ↓   ↓
          Data Data Data Data Data Data

B+Tree characteristics:

All data resides in leaf nodes
Leaf nodes are linked together (linked list)
Tree depth is very shallow (typically 3-4 levels for millions of rows)
Efficient for: =, <, >, <=, >=, BETWEEN, LIKE 'prefix%'

Search example: Find id = 35

Start from root: 35 < 50 → go left
30 < 35 → go right
Find leaf node containing 35 → Return data

Only 3 operations to find among millions of records!

Hash Index

CREATE TABLE sessions (
    session_id VARCHAR(64),
    user_id INT,
    data TEXT,
    INDEX USING HASH (session_id)  -- Only works with MEMORY engine
) ENGINE=MEMORY;

Hash Index only supports equality = comparisons, not range queries.

InnoDB uses Adaptive Hash Index automatically — MySQL creates hash indexes in memory for frequently accessed B-Tree pages.

Full-Text Index

CREATE TABLE articles (
    id INT PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    FULLTEXT INDEX idx_content (title, content)
);

-- Full-text search
SELECT * FROM articles 
WHERE MATCH(title, content) AGAINST('mysql performance' IN NATURAL LANGUAGE MODE);

Used for text search, not exact match.

Primary Key vs Secondary Index

Primary Key (Clustered Index)

In InnoDB, Primary Key IS the table. Data is organized in Primary Key order:

Primary Key Index (Clustered):
[1] → {id:1, name:"Alice", email:"alice@example.com", ...entire row}
[2] → {id:2, name:"Bob", email:"bob@example.com", ...entire row}
[3] → {id:3, name:"Charlie", email:"charlie@example.com", ...entire row}

Best practices for Primary Key:

Use AUTO_INCREMENT integer (avoid UUID if possible)
Never update Primary Key
Keep Primary Key compact

Why avoid UUID as Primary Key?

-- ❌ BAD: Random UUID causes constant page splits
CREATE TABLE users (
    id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
    name VARCHAR(100)
);

-- ✅ GOOD: Auto-increment, inserts always at the end
CREATE TABLE users (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    uuid CHAR(36) UNIQUE DEFAULT (UUID()),
    name VARCHAR(100)
);

Secondary Index

Secondary indexes contain index value + Primary Key, not the entire row:

Secondary Index on email:
["alice@example.com"] → PK: 1
["bob@example.com"] → PK: 2
["charlie@example.com"] → PK: 3

When querying: SELECT * FROM users WHERE email = 'bob@example.com'
1. Search secondary index → PK = 2
2. Use PK to lookup in clustered index → Get full row

Step 2 is called "bookmark lookup" or "key lookup" — this is why Covering Indexes are important.

Composite Index (Multi-column Index)

Syntax

CREATE INDEX idx_country_city ON addresses(country, city);
CREATE INDEX idx_user_status_created ON orders(user_id, status, created_at);

"Leftmost Prefix" Rule

A composite index can only be used if the query uses columns from left to right, continuously:

-- Index: (country, city, district)

-- ✅ Can use index
SELECT * FROM addresses WHERE country = 'VN';
SELECT * FROM addresses WHERE country = 'VN' AND city = 'Hanoi';
SELECT * FROM addresses WHERE country = 'VN' AND city = 'Hanoi' AND district = 'CauGiay';

-- ❌ Cannot use index
SELECT * FROM addresses WHERE city = 'Hanoi';  -- Missing country
SELECT * FROM addresses WHERE country = 'VN' AND district = 'CauGiay';  -- Skips city
SELECT * FROM addresses WHERE district = 'CauGiay';  -- Only last column

Column Order in Composite Index

Golden rule: Put columns with high selectivity (fewer duplicate values) first.

-- Assume: status has only 3 values (pending, completed, cancelled)
--         user_id has thousands of different values

-- ✅ GOOD: user_id first (high selectivity)
CREATE INDEX idx_user_status ON orders(user_id, status);

-- ❌ LESS OPTIMAL: status first (low selectivity)
CREATE INDEX idx_status_user ON orders(status, user_id);

Exception: If query pattern always has status = ? but user_id is optional, put status first.

Composite Index for ORDER BY

-- Frequent query
SELECT * FROM products 
WHERE category_id = 5 
ORDER BY created_at DESC 
LIMIT 20;

-- Optimal index: WHERE columns + ORDER BY columns
CREATE INDEX idx_category_created ON products(category_id, created_at DESC);

Note on sort direction:

-- Index: (a ASC, b DESC)
-- ✅ Works with: ORDER BY a ASC, b DESC
-- ✅ Works with: ORDER BY a DESC, b ASC (backward scan)
-- ❌ Does NOT work with: ORDER BY a ASC, b ASC

Covering Index

A Covering Index is an index containing ALL columns the query needs, no need to go back to the clustered index.

-- Query
SELECT user_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND status = 'completed';

-- Covering Index
CREATE INDEX idx_user_status_amount ON orders(user_id, status, total_amount);

In EXPLAIN:

Extra: Using index  -- ← Sign of Covering Index

When to use Covering Index?

Query runs very frequently
Only needs a few columns
Performance is highest priority

Trade-off: Larger index = more storage + slower INSERT/UPDATE.

Partial Index (MySQL 8.0+)

MySQL 8.0 supports Functional Index that can achieve similar effects to partial indexes:

-- Only index unprocessed orders (assuming processed_at NULL means unprocessed)
CREATE INDEX idx_unprocessed ON orders((CASE WHEN processed_at IS NULL THEN 1 END));

-- Index only first 10 chars of a column
CREATE INDEX idx_email_prefix ON users((LEFT(email, 10)));

Index for JSON Columns

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    attributes JSON,
    INDEX idx_brand ((CAST(attributes->>'$.brand' AS CHAR(50)) COLLATE utf8mb4_bin))
);

-- Query using index
SELECT * FROM products 
WHERE attributes->>'$.brand' = 'Apple';

Anti-patterns to Avoid

1. Function on Indexed Column

-- ❌ BAD: Index not used
SELECT * FROM users WHERE YEAR(created_at) = 2024;
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ✅ GOOD: Index is used
SELECT * FROM users 
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

SELECT * FROM users WHERE email = 'john@example.com';  -- If collation is case-insensitive

2. Implicit Type Conversion

-- phone_number is VARCHAR
-- ❌ BAD: MySQL converts phone_number to number for comparison
SELECT * FROM users WHERE phone_number = 0912345678;

-- ✅ GOOD: Compare same type
SELECT * FROM users WHERE phone_number = '0912345678';

3. Leading Wildcard in LIKE

-- ❌ BAD: Full table scan
SELECT * FROM products WHERE name LIKE '%phone%';

-- ✅ OK: Can use index
SELECT * FROM products WHERE name LIKE 'phone%';

-- ✅ BETTER: Use Full-Text Search for flexible searching
SELECT * FROM products WHERE MATCH(name) AGAINST('phone');

4. OR with Different Columns

-- ❌ BAD: Usually can't use index effectively
SELECT * FROM products WHERE category_id = 5 OR brand_id = 10;

-- ✅ GOOD: Use UNION
SELECT * FROM products WHERE category_id = 5
UNION
SELECT * FROM products WHERE brand_id = 10;

5. NOT IN / NOT EXISTS / <>

-- ❌ Index less effective with negative conditions
SELECT * FROM users WHERE status != 'deleted';
SELECT * FROM orders WHERE user_id NOT IN (1, 2, 3);

-- ✅ Consider restructuring query or accept scan if exclusion condition filters few records

6. Too Many Indexes

-- ❌ BAD: Every INSERT/UPDATE must update all indexes
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    category_id INT,
    brand_id INT,
    price DECIMAL(10,2),
    stock INT,
    created_at DATETIME,
    updated_at DATETIME,
    INDEX idx_name (name),
    INDEX idx_category (category_id),
    INDEX idx_brand (brand_id),
    INDEX idx_price (price),
    INDEX idx_stock (stock),
    INDEX idx_created (created_at),
    INDEX idx_updated (updated_at),
    INDEX idx_category_brand (category_id, brand_id),
    INDEX idx_category_price (category_id, price),
    INDEX idx_brand_price (brand_id, price)
    -- ... too many!
);

Guideline: Start with few indexes, add based on actual slow queries.

Analyzing Index Usage

Check if Index is Being Used

-- View all indexes of a table
SHOW INDEX FROM orders;

-- View index usage statistics (MySQL 8.0+)
SELECT 
    object_schema,
    object_name,
    index_name,
    count_read,
    count_write
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'your_database'
ORDER BY count_read DESC;

Find Unused Indexes

-- Indexes not read but still written to (candidates to drop)
SELECT 
    object_schema,
    object_name,
    index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
    AND count_read = 0
    AND count_write > 0
    AND object_schema = 'your_database';

Find Duplicate Indexes

SELECT 
    t.table_schema,
    t.table_name,
    GROUP_CONCAT(t.index_name) as duplicate_indexes,
    t.column_names
FROM (
    SELECT 
        table_schema,
        table_name,
        index_name,
        GROUP_CONCAT(column_name ORDER BY seq_in_index) as column_names
    FROM information_schema.statistics
    WHERE table_schema = 'your_database'
    GROUP BY table_schema, table_name, index_name
) t
GROUP BY t.table_schema, t.table_name, t.column_names
HAVING COUNT(*) > 1;

Index Strategy for Common Use Cases

Use Case 1: E-commerce Product Listing

-- Query pattern
SELECT * FROM products 
WHERE category_id = ? 
  AND price BETWEEN ? AND ?
  AND status = 'active'
ORDER BY created_at DESC
LIMIT 20;

-- Recommended indexes
CREATE INDEX idx_category_status_created ON products(category_id, status, created_at DESC);
CREATE INDEX idx_category_status_price ON products(category_id, status, price);

Use Case 2: User Authentication

-- Query pattern
SELECT * FROM users WHERE email = ? AND deleted_at IS NULL;

-- Recommended index (covering if possible)
CREATE UNIQUE INDEX idx_email ON users(email) WHERE deleted_at IS NULL;  -- PostgreSQL syntax
-- MySQL equivalent:
CREATE INDEX idx_email_deleted ON users(email, deleted_at);

Use Case 3: Analytics / Reporting

-- Query pattern
SELECT 
    DATE(created_at) as date,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE created_at >= ? AND created_at < ?
GROUP BY DATE(created_at);

-- Recommended index
CREATE INDEX idx_created_at ON orders(created_at);
-- Consider summary tables for heavy reporting

Use Case 4: Activity Feed / Timeline

-- Query pattern
SELECT * FROM activities 
WHERE user_id IN (?, ?, ?) 
ORDER BY created_at DESC 
LIMIT 50;

-- Recommended index
CREATE INDEX idx_user_created ON activities(user_id, created_at DESC);

Use Case 5: Full-Text Search

-- Query pattern
SELECT * FROM articles WHERE title LIKE '%keyword%' OR content LIKE '%keyword%';

-- Better approach: Full-Text Index
CREATE FULLTEXT INDEX idx_ft_articles ON articles(title, content);

SELECT * FROM articles 
WHERE MATCH(title, content) AGAINST('keyword' IN NATURAL LANGUAGE MODE);

Index Maintenance

Rebuild Index (Optimize Table)

-- Rebuild indexes and reclaim space
OPTIMIZE TABLE orders;

-- Or with ALTER TABLE (online in MySQL 8.0)
ALTER TABLE orders ENGINE=InnoDB;

Analyze Table

-- Update index statistics (helps optimizer choose correct index)
ANALYZE TABLE orders;

Monitor Index Size

SELECT 
    table_name,
    index_name,
    ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size'
    AND database_name = 'your_database'
ORDER BY stat_value DESC;

Indexing in Laravel

Migration

Schema::create('orders', function (Blueprint $table) {
    $table->id();
    $table->foreignId('user_id')->constrained();
    $table->string('status');
    $table->decimal('total_amount', 10, 2);
    $table->timestamps();
    
    // Composite index
    $table->index(['user_id', 'status', 'created_at']);
    
    // Unique index
    $table->unique(['user_id', 'order_number']);
});

// Add index to existing table
Schema::table('products', function (Blueprint $table) {
    $table->index(['category_id', 'price'], 'idx_category_price');
});

Raw Index for Complex Cases

Schema::table('products', function (Blueprint $table) {
    // Descending index (MySQL 8.0+)
    DB::statement('CREATE INDEX idx_category_created ON products(category_id, created_at DESC)');
    
    // Functional index (MySQL 8.0+)
    DB::statement('CREATE INDEX idx_year ON products((YEAR(created_at)))');
});

Index Design Checklist

When designing indexes for a new table:

Identify query patterns
- Which queries run most frequently?
- Which columns in WHERE clauses?
- ORDER BY which columns?
- JOIN on which columns?
Primary Key
- Using AUTO_INCREMENT integer?
- If UUID needed, separate column with UNIQUE index?
Foreign Keys
- All FK columns have indexes?
Composite Index
- Column order correct (high selectivity first)?
- Includes ORDER BY columns?
Covering Index
- Can hottest query use covering index?
Verification
- EXPLAIN for important queries
- No duplicate/redundant indexes?

Conclusion

Indexes are powerful tools but not a "silver bullet". Understanding how indexes work helps you:

Design better schemas from the start
Debug performance issues faster
Avoid common anti-patterns

Key takeaways:

B+Tree index is efficient for range queries
Primary Key = Clustered Index in InnoDB
Composite index follows "leftmost prefix rule"
Covering index avoids bookmark lookup
Avoid functions on indexed columns
Start with few indexes, add based on actual queries

In the next article, we'll explore MySQL Slow Query Log — how to find and fix slow queries in production.