Category: Highway Maintenance - DBA stuff
Quick Perl solution.
By admin on May 18, 2009 | In Daily Commute - Standard hints/tips, Highway Maintenance - DBA stuff | Send feedback »
A very quick entry this time. I was having problems getting Perl to use the DBI module with Mysql. I kept getting this error message:
Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'
Well, after doing quite a bit of googling I found the answer. My database is set up on a Apache server, which means the mysqld socket is located in a different place. To find out where it is located, you go into Mysql and run the following command:
Code:
show variables like 'socket%'; |
It seems that the location on my machine was /opt/lampp/var/mysql/mysqld.sock
so, I then changed my Data Source connect line in the Perl program to read:
Code:
$dsn = "DBI:mysql:$database:localhost:3306:mysql_socket=/opt/lampp/var/mysql/mysqld.sock"; |
It seems it's a common stumbling block, so I hope this helps others to continue with their Perl programming.
Making queries more efficient (3) - The importance of indexes
By admin on Apr 10, 2009 | In Formula 1 - More advanced stuff, Highway Maintenance - DBA stuff | Send feedback »
It's often said that poor use of indexes (or indeed a missing index) is the fastest way to kill a system. An index works by providing a pre-sorted list that the query can access, instead of having to perform a large sort in memory. It must be remembered that indexes come at a cost though, mainly in using up disk storage space (and also in increasing INSERT and UPDATE times). One important thing to note when creating an index, is that it's preferable to minimize the size of indexed fields. An index on a small int field is far better than an index say, on a varchar(100) field. If it's necessary to index a varchar field, consider limiting the index to the first x places, for example:
Code:
CREATE INDEX addr_line_indx ON members (addr_line1(15)); |
The index generally works on a direct WHERE condition for example where colA = 123, or when looking for a min(field) or max(field) value. The index can also be used for LIKE comparisons, so long as the comparison string does not start with a wildcard. So, using "WHERE fielda LIKE 'FARSC%'" will use an index on that field, the following won't: "WHERE fielda LIKE '%FARSCAPE%'". To read up more on how Mysql uses indexes:
Another factor to consider in putting an index on a field, is that of selectivity. If there are too many data records of the same value (eg if the field records gender, 'M' or 'F') then Mysql will find it quicker to load all the records into memory and scan through them, instead of using the index. The Mysql database keeps statistics on table indexes so it can make an informed guess as to when to use an index or not. So, it only makes sense to use indexes where the field has a high selectivty ie the data set contains mostly or entirely unique values.
One thing to remember in creating an index is it doesn't have to restricted to just one column. For example, if you query often on two columns firstname and lastname, you can create a multi-column (or composite) index:
Code:
CREATE INDEX name_idx ON members (lastname, firstname) |
When you use a composite index you have to remember that the index is only used on the second field mentioned, when the WHERE clause includes the first field. It may be necessary in some cases to create a further separate index on that second field, when you know queries will only specify that second field in the WHERE clause. Also, if you use OR (ie WHERE lastname = 'SMITH' OR firstname = 'PETE') then the index is ignored. You can read more about multiple indexes here:
mysql manual on multiple indexes
Sometimes an index is not used in a query because a function or conversion is carried out on the field.
If you can isolate an indexed field on the left side of a WHERE condition, you can then persuade Mysql to use the index in the query. The best example of this is something I found in the Book Pro Mysql (Kruckenberg and Pipes):
Bad performance:
Code:
SELECT * FROM customer_orders | |
WHERE TO_DAYS(order_created) - TO_DAYS(NOW()) <= 7; |
Good performance (the index on order_created field will be used):
Code:
SELECT * FROM customer_orders | |
WHERE order_created >= DATE_SUB(NOW(), INTERVAL 7 DAY); |
There will be cases however where you can't perform a search on a field without using a function or conversion of that field. In these cases, if the query is needed often, it may be more efficient to create an additional column containing the calculated value of that column (always assuming that the extra storage space isn't an issue). For example (again from the ProMysql book), the following SQL will not use an index, even if one exists on the email_address column:
Code:
SELECT * FROM customers | |
WHERE email_address LIKE '%aol.com'; |
The workaround for this is to create an additional column containing the reverse of the email address, and then index that column. We can then utilise the index by means of the following query:
Code:
SELECT * FROM customers | |
WHERE email_address_reversed | |
LIKE CONCAT(REVERSE('%aol.com'), '%'); |
Of course, as well as the additional disk space required for the additional column and index, we also need to ensure that the email_address_reversed field is populated regularly. This could be via the use of triggers on updates and inserts. Or you could run a table update routine at regular intervals. It all depends on how often you need the query to run efficiently.
There's an important consideration I'm only going to touch upon in this article. Sometimes you will need to persuade mysql to use the right index. Although it will normally guess correctly via looking at the query and analysing statistics, there are cases where it chooses the inefficient option. This subject is too large to include in this article, but I can recommend you read the following link showing 7 ways to convince mysql to use the right index
7-ways-to-convince-mysql-to-use-the-right-index
One last thing that may make your queries run faster. If your table has had a lot of deletes, inserts and updates, it may well be worth using the command OPTIMIZE TABLE, for example
Code:
OPTIMIZE TABLE large_order_table; |
This will effectively de-fragment a mysql database ie it will reduce the size of the table on disk and reduce query response time. It also keeps the index statistics current which helps Mysql decide when to use a particular index on a column.
Well, I feel I've shown a lot of different ways in how to optimize your use of indexes in Mysql. Please feel free to leave comments if you find this information of use (or where you disagree!).
Until next time - Happy coding!
Making queries more efficient 2) Optimisation of sql.
By admin on Mar 19, 2009 | In Formula 1 - More advanced stuff, Highway Maintenance - DBA stuff | Send feedback »
One major aid in examining code that you think is operating inefficiently, is to use the EXPLAIN command.
mysql manual
The EXPLAIN command tells you how Mysql intends to execute a particular statement. We can determine from the results whether Mysql has chosen an optimal way of getting the data. Two of the columns are particularly worth looking at. The first is the Type column (that shows the access type Mysql is using for the query)
In order of least efficient access to most efficient, these are the values that should appear in the Type column:
All - index - range - index_subquery - unique_subquery - index_merge - ref_or_null - ref - eq_ref - const - system
The next is the Extra column (this shows extra information concerning this particular access strategy).
Now if we use the Explain command on a query (looking at a data set of 2921 records, recorded against 50 cities) we get:
Example script
Code:
explain | |
select a.* from city a | |
where a.city_id in | |
(select b.city_id | |
from people b | |
group by b.city_id | |
having max(b.income) > 68500) \G |
*************** 1. row ****
id: 1
select_type: PRIMARY
table: a
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 50
Extra: Using where
*************** 2. row ****
id: 2
select_type: DEPENDENT SUBQUERY
table: b
type: index
possible_keys: NULL
key: city_id
key_len: 3
ref: NULL
rows: 2921
Extra: Using filesort
2 rows in set (0.06 sec)
Note that the subquery row in the Extra column states that "using filesort" is needed. If Filesort is shown it means that the sort can't be used from an index ie the sort is carried out in memory (and sometimes continued on disk if there are memory issues). This is inefficient, and will increasingly take up more resources as the table increases. Even with a small data set, having to sort 2921 rows may have an adverse impact.
In this case, there already is an index on the field city_id in the people table. In fact it's not the index issue that is making this query slow - it's the fact that Mysql processes subqueries very inefficiently.Now, according to the following link, it looks like Mysql version 6.0.x is much more efficient at processing subqueries using the 'IN' clause
In the meantime, the best way to optimise this type of query is by writing it in a Join format.
Code:
explain | |
select a.state, a.capital, a.largest, a.city_id | |
from city a | |
inner join people p on a.city_id = p.city_id | |
group by a.state, a.capital, a.largest, a.city_id | |
having max(p.income) > 68500 \G |
Now if we run the Explain command again we get:
**************** 1. row ***
id: 1
select_type: SIMPLE
table: a
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 50
Extra: Using temporary; Using filesort
**************** 2. row ***
id: 1
select_type: SIMPLE
table: p
type: ref
possible_keys: city_id
key: city_id
key_len: 3
ref: cookbook.a.city_id
rows: 40
Extra: Using where
Yikes - the first row still uses filesort! However this is no longer performed on the inefficient subquery; moreover it only applies to the 50 rows brought back in the city grouping (in the initial statement the filesort was performed on 2921 rows in the subquery). Incidentally, this query ran on average by just .2 of a second faster; however in database terms 2921 records is relatively small, and the larger the data-set, the greater the gain in speed will be by using table joins.
Of course, although re-writing your queries to use joins will help immensely, it doesn't mean that the role of indexes has to be forgotten. On the contrary, the importance of using indexes correctly is crucial. I hope to look in more detail at this in my next article. :