The Server
This wasn't an underpowered machine. The hardware was genuinely enterprise-grade:
The problem wasn't hardware capacity — it was configuration that hadn't been revisited since the server was first provisioned. The MySQL defaults had been scaled up for the hardware without regard to the actual database size.
InnoDB Buffer Pool: 200GB for a 30GB Database
The most glaring issue was the innodb_buffer_pool_size setting of 200GB. The recommended configuration is 1.5x–3x your working dataset. With a 30GB database, 96GB is generous — it fits the entire dataset in memory with plenty of room for indexes and hot pages.
More than 3x provides no performance benefit. Worse, an oversized buffer pool creates additional InnoDB mutex contention — the internal structures that manage the pool require locking, and more pool instances means more lock coordination overhead.
Why 96GB specifically? We used innodb_buffer_pool_instances of 12, which means each instance is exactly 8GB. The InnoDB documentation recommends one instance per 1–8GB of pool size. 96GB / 12 = 8GB per instance — exactly at the recommended boundary.
SHOW VARIABLES LIKE 'innodb_buffer_pool%';
-- innodb_buffer_pool_size: 214748364800 (200 GB)
-- innodb_buffer_pool_instances: 16
-- Resize live — MySQL 5.7.5+ supports online buffer pool resize
SET GLOBAL innodb_buffer_pool_size = 96 * 1024 * 1024 * 1024;
SET GLOBAL innodb_buffer_pool_instances = 12;
-- Monitor resize progress
SELECT event_name, work_completed, work_estimated
FROM performance_schema.events_stages_current
WHERE event_name LIKE '%buffer%';
-- Also update my.cnf so it persists after restart
Slow Query Log De-Noising
The slow query log was configured with log_queries_not_using_indexes = ON. In theory this is useful — it catches queries that do full table scans. In practice, on a busy e-commerce site, it was logging thousands of queries per hour that took 0.003 seconds — well under the long_query_time threshold of 1 second, but logged anyway because they didn't use an index.
The result was a slow query log that was impossible to read. The genuinely slow 17-second queries were buried in thousands of sub-millisecond entries. We disabled log_queries_not_using_indexes at runtime, and the log immediately became actionable — showing only what was actually slow:
SET GLOBAL log_queries_not_using_indexes = OFF;
-- Ensure long_query_time is meaningful
SET GLOBAL long_query_time = 1.0;
-- Before: 4,800 lines/hour, mostly sub-1ms queries
-- After: 12 lines/hour, all genuinely slow queries
-- Now you can actually read and act on the slow query log
Online Index Creation: 8+ Indexes, Zero Downtime
MySQL InnoDB supports ALGORITHM=INPLACE LOCK=NONE for index creation — meaning the table remains fully readable and writable while the index is built. We created 8+ indexes across multiple tables without taking the site offline or even slowing it down perceptibly:
CREATE INDEX idx_similar_mfg ON products
(mfg_id, category_id, top_selling)
ALGORITHM=INPLACE LOCK=NONE;
CREATE INDEX idx_top_selling ON products
(top_selling, category_id)
ALGORITHM=INPLACE LOCK=NONE;
-- email_maildir_ingest (103K rows)
CREATE INDEX idx_email_domain ON email_maildir_ingest
(email_domain)
ALGORITHM=INPLACE LOCK=NONE;
-- tracking_details (90K rows)
CREATE INDEX idx_tracking_num ON tracking_details
(tracking_number)
ALGORITHM=INPLACE LOCK=NONE;
-- contacts (722K rows)
CREATE INDEX idx_sender_domain ON contacts
(email_domain)
ALGORITHM=INPLACE LOCK=NONE;
-- All created while site was live and serving traffic
Top-Selling Facets Cache
Manufacturer category pages ran an aggregation query on every load to count products per facet. This was a 17-second query — necessary for correct counts, but not necessary on every single page request.
The solution: cache the result to a flat file in /tmp with a 10-minute TTL. A background cron job warms the cache before it expires. Visitors always get a fast response from the file. The slow aggregation runs at most once every 10 minutes, triggered by cron rather than by a real visitor.
$ttl = 600; // 10 minutes
if (file_exists($cacheFile) &&
(time() - filemtime($cacheFile)) < $ttl) {
// Serve from cache — 0ms
$facets = json_decode(file_get_contents($cacheFile), true);
} else {
// Run the 17s aggregation query (cron-triggered)
$facets = $db->query($heavyAggregationQuery)->fetchAll();
file_put_contents($cacheFile, json_encode($facets));
}
// Visitor response time: 0ms (cache hit)
// Cache refresh: 17s (background, not visitor-facing)
Deep Pagination from Bot Crawlers
The server logs showed frequent OFFSET 100000+ queries being executed — these came from search engine crawlers paginating deep into manufacturer product listings. MySQL's LIMIT x OFFSET y is inefficient at high offsets because it must scan and discard y rows before returning x.
We evaluated whether to block this with robots.txt or a X-Robots-Tag: noindex header — but the deep pages had genuine SEO value (long-tail product searches). Instead, we let the crawlers proceed and addressed the query cost by ensuring the underlying queries used their indexes efficiently, limiting the damage to page-build time rather than preventing crawling.
Results
| Setting / Metric | Before | After |
|---|---|---|
| InnoDB buffer pool | 200GB | 96GB |
| Buffer pool instances | 16 | 12 |
| log_queries_not_using_indexes | ON | OFF |
| Slow log entries/hour | ~4,800 | ~12 |
| MySQL CPU usage | 76% | 15% |
| Indexes created | 0 new | 8+ online |
| Downtime | N/A | Zero |
| Slowest query | 17.0s | 0.001s |
Key principle: All runtime changes were applied with SET GLOBAL first to verify the effect, then written to my.cnf for persistence. This meant we could observe the impact on live traffic before committing the change — and roll back instantly if needed.
Tech Stack
MySQL 8.0 • InnoDB • AlmaLinux 8.10 • Dell PowerEdge R730 • 48-core Xeon E5-2687W • 504GB RAM • PHP 8.1 • performance_schema • information_schema.INNODB_METRICS