Tags: optimisation
Making queries more efficient 2) Optimisation of sql.
By admin on Mar 19, 2009 | In Formula 1 - More advanced stuff, Highway Maintenance - DBA stuff | Send feedback »
One major aid in examining code that you think is operating inefficiently, is to use the EXPLAIN command.
mysql manual
The EXPLAIN command tells you how Mysql intends to execute a particular statement. We can determine from the results whether Mysql has chosen an optimal way of getting the data. Two of the columns are particularly worth looking at. The first is the Type column (that shows the access type Mysql is using for the query)
In order of least efficient access to most efficient, these are the values that should appear in the Type column:
All - index - range - index_subquery - unique_subquery - index_merge - ref_or_null - ref - eq_ref - const - system
The next is the Extra column (this shows extra information concerning this particular access strategy).
Now if we use the Explain command on a query (looking at a data set of 2921 records, recorded against 50 cities) we get:
Example script
Code:
explain | |
select a.* from city a | |
where a.city_id in | |
(select b.city_id | |
from people b | |
group by b.city_id | |
having max(b.income) > 68500) \G |
*************** 1. row ****
id: 1
select_type: PRIMARY
table: a
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 50
Extra: Using where
*************** 2. row ****
id: 2
select_type: DEPENDENT SUBQUERY
table: b
type: index
possible_keys: NULL
key: city_id
key_len: 3
ref: NULL
rows: 2921
Extra: Using filesort
2 rows in set (0.06 sec)
Note that the subquery row in the Extra column states that "using filesort" is needed. If Filesort is shown it means that the sort can't be used from an index ie the sort is carried out in memory (and sometimes continued on disk if there are memory issues). This is inefficient, and will increasingly take up more resources as the table increases. Even with a small data set, having to sort 2921 rows may have an adverse impact.
In this case, there already is an index on the field city_id in the people table. In fact it's not the index issue that is making this query slow - it's the fact that Mysql processes subqueries very inefficiently.Now, according to the following link, it looks like Mysql version 6.0.x is much more efficient at processing subqueries using the 'IN' clause
In the meantime, the best way to optimise this type of query is by writing it in a Join format.
Code:
explain | |
select a.state, a.capital, a.largest, a.city_id | |
from city a | |
inner join people p on a.city_id = p.city_id | |
group by a.state, a.capital, a.largest, a.city_id | |
having max(p.income) > 68500 \G |
Now if we run the Explain command again we get:
**************** 1. row ***
id: 1
select_type: SIMPLE
table: a
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 50
Extra: Using temporary; Using filesort
**************** 2. row ***
id: 1
select_type: SIMPLE
table: p
type: ref
possible_keys: city_id
key: city_id
key_len: 3
ref: cookbook.a.city_id
rows: 40
Extra: Using where
Yikes - the first row still uses filesort! However this is no longer performed on the inefficient subquery; moreover it only applies to the 50 rows brought back in the city grouping (in the initial statement the filesort was performed on 2921 rows in the subquery). Incidentally, this query ran on average by just .2 of a second faster; however in database terms 2921 records is relatively small, and the larger the data-set, the greater the gain in speed will be by using table joins.
Of course, although re-writing your queries to use joins will help immensely, it doesn't mean that the role of indexes has to be forgotten. On the contrary, the importance of using indexes correctly is crucial. I hope to look in more detail at this in my next article. :
Making queries more efficient 1) Locating queries that run slowly
By admin on Mar 17, 2009 | In Formula 1 - More advanced stuff, Highway Maintenance - DBA stuff | 1 feedback »
One of the most important tasks for those in charge of administering a database system (and also for many developers) is to make sure queries run efficiently. In this first part of several articles on optimising queries, I'm going to show how to establish which queries are taking a long time to execute. This is not always obvious, particularly if you are the administrator of the system and not the user.
The Mysql system uses something called the Slow query log, which can be extremely useful in optimising system performance. However by default, Mysql does not log slow queries. In order to use this log, you need to go into your my.cnf file (this may be in /etc/mysql/ or in /opt/lampp/etc/ if you are using a lampp server). You then need to make sure the following two lines (in the [mysqld] section) are enabled:
Code:
log-slow-queries = /var/log/mysql/mysql-slow.log | |
long_query_time = 2 |
In the above line I have entered 2 (for seconds), you can change this if you wish - I understand the default time in Mysql is 20 seconds. A word about the path-name set for log-slow-queries - if you leave this path out, the error log will be written in your data directory, and the file name will be [systemname]-slow.log. You can find the location of your data directory by calling up mysql and running the following command:
Code:
SHOW VARIABLES LIKE 'datadir'; |
One more thing. You will need to make sure that a file already exists in /var/log/mysql called mysql-slow.log (use the command touch to create this file if you're using a Unix or Linux machine) and that the file permissions allow users to write to it. All being well, any queries that take place after the mysql server is restarted, will be written to this log if they take longer than the specified period of time.
If you're interested in finding out more on Mysql Server logs, here's the link to the relevant part of the manual:
Mysql manual link
Well, once we've established which queries are taking a long time to complete, we need to see how we can make them run faster. In forthcoming articles I will look at:
* Changing the coding - this can involve a better use of indexes. It can also mean re-writing the query to use a join which is usually far more efficient in Mysql than using sub-queries. I have to admit this has been one of my weak points. I will also show how useful "Explain plan" can be.
* Poor indexing choices and Poor schema design. Sometimes the way you create indexes or design database tables can have a negative impact on performance issues.
One thing I have to point out is that databases change (through necessity) over time, so unfortunately performance may have to be checked over at regular intervals. However, we can go too far in constantly changing things to gain speed. We should be realistic in these matters. Is it worthwhile spending 4 hours of a dba's time to make a query run 1 second quicker, if that query is only run once a day or once a week? Common-sense (as in all things) should be taken into consideration.
Well, I hope to have the next article on query optimisation fairly soon. Until then, enjoy your coding!