The Challenge
The client ran a B2B e-commerce platform with a MySQL database holding 4.7 million product records. Despite having a powerful server — a Dell PowerEdge R730 with 48 cores and 504GB RAM — MySQL was pegged at 76% CPU continuously. Pages were timing out. The slow query log was being flooded.
Their hosting provider had a simple suggestion: upgrade to a larger server for $8,400/year. We had a different hypothesis: the problem wasn't hardware — it was missing indexes and poorly structured queries doing full table scans on millions of rows.
The real problem: A single "similar products" query was executing with no index, forcing MySQL to read every one of 4.7 million rows — on every product page load. With any meaningful traffic, this alone could saturate a 48-core server.
Diagnosing the Slow Queries
We started with the slow query log and SHOW PROCESSLIST. Within an hour we had a ranked list of offenders. Here's what we found:
| Query | Before | Rows Scanned | Problem |
|---|---|---|---|
| Similar products lookup | 17.0s | 4,700,000 | Full table scan — no index on mfg_id + category |
| Category product count | 17.0s | 2,400,000 | NOT IN subquery forcing full scan |
| FULLTEXT search (broad terms) | 1.2s | 4,700,000 | Terms like "support", "renewal" matching everything |
| Email domain lookup | 2.2s | 722,000 | LIKE '%@domain' — leading wildcard kills indexes |
| Tracking verification | 1.1s | 90,000 | No index on tracking_number column |
The Solutions
1. Composite Index for Similar Products
The "similar products" query filtered by manufacturer ID and category, then sorted by sales rank. It was doing a full table scan because the index on mfg_id alone wasn't selective enough without the second column. We created a composite index covering both filter columns plus the sort column:
-- EXPLAIN: type=ALL, rows=4721843, Extra=Using filesort
CREATE INDEX idx_similar_mfg
ON products (mfg_id, category_id, top_selling)
ALGORITHM=INPLACE LOCK=NONE;
-- After: index range scan, 96 rows examined
-- EXPLAIN: type=ref, rows=96, Extra=Using index condition
-- Query time: 17.0s → 0.001s
2. Replacing NOT IN with LEFT JOIN
The category count query used a NOT IN (SELECT ...) subquery. MySQL's optimizer handles this poorly — it re-executes the subquery for every row in the outer table. Rewriting as a LEFT JOIN ... WHERE IS NULL pattern allowed the optimizer to use indexes on both sides:
COUNT(*) as cnt
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM discontinued
)
GROUP BY category_id;
COUNT(*) as cnt
FROM products p
LEFT JOIN discontinued d
ON p.product_id = d.product_id
WHERE d.product_id IS NULL
GROUP BY p.category_id;
3. Fixing Broad FULLTEXT Queries
FULLTEXT search was being used for operational queries — things like finding all records related to "renewal" or "support." These terms matched hundreds of thousands of rows, making FULLTEXT's ranking overhead enormous. We replaced the FULLTEXT fallback with a category-based random selection query that used the existing category index, eliminating the 1+ second FULLTEXT scans entirely.
4. Stored Generated Column for Email Domain Lookup
The contacts table had email addresses but no domain column. Queries were using WHERE email LIKE '%@example.com' — the leading % wildcard prevents any index use. We added a stored generated column that MySQL auto-populates:
ALTER TABLE contacts
ADD COLUMN email_domain VARCHAR(100)
GENERATED ALWAYS AS (
SUBSTRING_INDEX(email, '@', -1)
) STORED,
ADD INDEX idx_sender_domain (email_domain);
-- Query now: WHERE email_domain = 'example.com'
-- Lookup time: 2.2s → 0.0003s (scans 1 row, not 722K)
5. Right-Sizing the InnoDB Buffer Pool
Perhaps the most shocking finding: the buffer pool was configured at 200GB, but the entire database was only 30GB. The oversized pool was wasting memory that could be used for OS file cache and PHP workers, and was actually increasing contention on InnoDB's internal mutex structures.
innodb_buffer_pool_size = 200G # 6.7x the 30GB database!
innodb_buffer_pool_instances = 16
log_queries_not_using_indexes = ON # flooding slow log with sub-ms queries
# After (applied with SET GLOBAL — zero downtime)
innodb_buffer_pool_size = 96G # 3x DB size — the recommended ratio
innodb_buffer_pool_instances = 12 # one per 8GB of pool
log_queries_not_using_indexes = OFF # only log queries > long_query_time
Why 3x the database size? The rule of thumb is buffer pool = 1.5x–3x your working dataset. With 96GB, the entire 30GB dataset fits in memory with room for indexes and hot page buffers. More than 3x provides no performance benefit and wastes memory that other processes need.
6. Top-Selling Facets Cached to Disk
Manufacturer pages ran an aggregation query to count top-selling products per facet — a 17-second query that ran on every page load. We cached the results to /tmp with a 10-minute TTL. The first request after expiry takes 17 seconds (usually triggered by a cron job, not a visitor). Every subsequent request reads from the flat file in under 1ms.
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Similar products query | 17.0s | 0.001s | 99.99% faster |
| Category count query | 17.0s | 0.008s | 99.95% faster |
| Email domain lookup | 2.2s | 0.0003s | 99.99% faster |
| Tracking verification | 1.1s | 0.002s | 99.8% faster |
| MySQL CPU usage | 76% | 15% | 80% reduction |
| Buffer pool size | 200GB | 96GB | 104GB freed |
| Server upgrade cost | $8,400/yr | $0 | Cancelled |
Tech Stack
MySQL 8.0 • InnoDB • PHP 8.1 • AlmaLinux 8.10 • Dell PowerEdge R730 • EXPLAIN ANALYZE • information_schema.INNODB_METRICS
Key Takeaway
Hardware is rarely the real problem. Before spending money on a server upgrade, run EXPLAIN on your slowest queries. In this case, five indexes and three query rewrites eliminated the need for an $8,400 annual server upgrade. The entire engagement cost less than one month of the proposed upgrade.