Query Optimization with MySQL

Optimization of query is joint effort of you and mysql.

6 min readOct 10, 2019

For insights of mysql’s architecture take a quick read — MySQL Logical Architecture

MySQL parses queries to create an internal structure (the parse tree), and then applies a variety of optimizations. These can include rewriting the query, determining the order in which it will read tables, choosing which indexes to use, and so on. You can pass hints to the optimizer through special keywords in the query, affecting its decision making process. You can also ask the server to explain various aspects of optimization.
This lets you know what decisions the server is making and gives you a reference point for reworking queries, schemas, and settings to make everything run as efficiently as possible.

Here are some types of optimizations MySQL knows how to do:

Reordering joins

Tables don’t always have to be joined in the order you specify in the query. Determining the best join order is an important optimization.

Converting OUTER JOINs to INNER JOINs

An OUTER JOIN doesn’t necessarily have to be executed as an OUTER JOIN. Some factors, such as the WHERE clause and table schema, can actually cause an OUTER JOIN to be equivalent to an INNER JOIN. MySQL can recognize this and rewrite the join, which makes it eligible for reordering.

Applying algebraic equivalence rules

MySQL applies algebraic transformations to simplify and order expressions. It can also fold and reduce constants, eliminating impossible constraints and constant conditions. For example, the term (5=5 AND a>5) will reduce to just a>5. Similarly, (a<b AND b=c) AND a=5 becomes b>5 AND b=c AND a=5. These rules are very useful for writing conditional queries.

COUNT(), MIN(), and MAX() optimizations

Indexes and column nullability can often help MySQL optimize away these expressions. For example, to find the minimum value of a column that’s leftmost in a B-Tree index, MySQL can just request the first row in the index. It can even do this in the query optimization stage, and treat the value as a constant for the rest of the query. Similarly, to find the maximum value in a B-Tree index, the server reads the last row. Likewise, COUNT(*) queries without a WHERE clause can often be optimized away on some storage engines (such as MyISAM, which keeps an exact count of rows in the table at all times).

Covering indexes

MySQL can sometimes use an index to avoid reading row data, when the index contains all the columns the query needs it returns the data from index itself.

Subquery optimization

MySQL can convert some types of subqueries into more efficient alternative forms, reducing them to index lookups instead of separate queries.

Early termination

MySQL can stop processing a query (or a step in a query) as soon as it fulfills the query or step. The obvious case is a LIMIT clause, but there are several other kinds of early termination. For instance, if MySQL detects an impossible condition, it can abort the entire query. You can see this in the following example:

his query stopped during the optimization step, but MySQL can also terminate the execution early in some other cases. The server can use this optimization when the query execution engine recognizes the need to retrieve distinct values, or to stop when a value doesn’t exist.

IN() list comparisons

In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists).

MySQL performs more optimizations than we could fit into this entire Blog but it gives a general idea about the intelligence it has.

Table and index statistics

The server layer, which contains the query optimizer, doesn’t store statistics
on data and indexes. That’s a job for the storage engines, because each storage engine might keep different kinds of statistics (or keep them in a different way). Some engines, such as Archive engine, don’t keep statistics at all!
Because the server doesn’t store statistics, the MySQL query optimizer has to ask the engines for statistics on the tables in a query. The engines provide the optimizer with statistics such as the number of pages per table or index, the cardinality of tables and indexes, the length of rows and keys, and key distribution information. The optimizer can use this information to help it decide on the best execution plan.

The MySQL query optimizer uses two API calls to ask the storage engines how index values are distributed when deciding how to use indexes.

The first is the records_in_range() call, which accepts range end points and returns the number of records in that range. This can be exact for some storage engines such as MyISAM, but is only an estimate for InnoDB.
The second API call is info(), which can return various types of data, including index cardinality (approximately how many records there are for each key value).

When the storage engine provides the optimizer with inexact information about the number of rows a query might examine, or when the query plan is too complex to know exactly how many rows will be matched at various stages, the optimizer uses the index statistics to estimate the number of rows. MySQL’s optimizer is cost-based, and the main cost metric is how much data the query will access. If the statistics were never generated, or if they are out of date, the optimizer can make bad decisions. The solution is to run ANALYZE TABLE, which regenerates the statistics.

In InnoDb The statistics are generated by sampling a few random pages in the index and assuming that the rest of the index looks similar.
The number of pages sampled is eight in older InnoDB versions, but in more recent versions it can be controlled with the innodb_stats_sample_pages variable. Setting this to a value larger than eight can in theory help generate more representative index statistics, especially on very large tables, but your mileage may vary.

IMPORTANT NOTE:

Before trying to write fast queries, remember that it’s all about response time. Queries are tasks, but they are composed of subtasks, and those subtasks consume time. To optimize a query, you must optimize its subtasks by eliminating them, making them happen fewer times, or making them happen more quickly.

Once you’re sure your queries retrieve only the data you need, you can look for queries that examine too much data while generating results. In MySQL, the simplest query cost metrics are:
• Response time
• Number of rows examined
• Number of rows returned

None of these metrics is a perfect way to measure query cost, but they reflect roughly how much data MySQL must access internally to execute a query and translate approximately into how fast the query runs. All three metrics are logged in the slow query log, so looking at the slow query log is one of the best ways to find queries that examine too much data.

When you’re thinking about the cost of a query, consider the cost of finding a single row in a table. MySQL can use several access methods to find and return a row. Some require examining many rows, but others might be able to generate the result without examining any.
The access method(s) appear in the type column in EXPLAIN’s output. The access types range from a full table scan to index scans, range scans, unique index lookups, and constants. Each of these is faster than the one before it, because it requires reading less data. You don’t need to memorize the access types, but you should understand the general concepts of scanning a table, scanning an index, range accesses, and single- value accesses.

************************ THANK YOU ***************************

For more details refer High Performance MySQL book by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko.