Powerful string manipulation
By admin on Feb 7, 2009 | In Daily Commute - Standard hints/tips, Showroom - Examples of Mysql usage | Send feedback »
Hello everyone, I hope your Mysql work is progressing nicely.
In this article I want to look at string manipulation,. To start with I will look at the ability to move through a string one character at a time. I find that the best way to break apart a string is by using a cartesian join on a table that has enough rows to join against each character. So on my database I like to have a table (in this case called pivotnumber) that has just one field called id, populated with ascending numbers 1, 2, 3 etc.
So for example, if I wanted to display a text field vertically I could use the following code:
Code:
mysql> select substr(c.cat,it.pos,1) as L | |
-> from (select cat from categories where cat like 'WORK%') c, | |
-> (select id as pos from pivotnumber) it | |
-> where it.pos <= length(c.cat); | |
+---+ | |
| L | | |
+---+ | |
| W | | |
| o | | |
| r | | |
| k | | |
+---+ | |
4 rows in set |
The where clause in Line 4 makes sure that the join is only carried out for as many characters in the cat field.
Of course this is just a cosmetic function, but walking through the string can prove very useful if performing data analysis:
Another benefit from using this cross join is you can imitate a loop within normal sql. Merely change the where clause to indicate how many loops you wish to perform.
Here's an interesting use of this method, that I gained from the fantastic SQL Cookbook by Anthony Molinaro. The code below uses the function GROUP_CONCAT (that returns a concatenated string) together with the string walking method, to place all the characters within a field in alphabetical order.
Code:
mysql> select ename, group_concat(c order by c separator '') | |
-> from ( | |
-> select ename, substr(a.ename,it.pos,1) c | |
-> from emp a, | |
-> (select id pos from pivotnumber) it | |
-> where it.pos <= length(a.ename) | |
-> ) x | |
-> group by ename; | |
+---------+-----------------------------------------+ | |
| ename | group_concat(c order by c separator '') | | |
+---------+-----------------------------------------+ | |
| ADAMS | AADMS | | |
| ALLEN | AELLN | | |
| BLAKE | ABEKL | | |
| CLARK | ACKLR | | |
| CLINTON | CILNNOT | | |
| FORD | DFOR | | |
| JAMES | AEJMS | | |
| JONES | EJNOS | | |
| KING | GIKN | | |
| MARTIN | AIMNRT | | |
| SCOTT | COSTT | | |
| SMITH | HIMST | | |
| TURNER | ENRRTU | | |
| WARD | AADDRRWW | | |
+---------+-----------------------------------------+ |
Lastly, I must mention a couple of useful ways of manipulating strings, that doesn't require the above methods. Often we import fields that contain a full name, perhaps in the format 'Lastname, Firstname'. It's a fairly simple matter to extract the first bit ie the Lastname up to the comma. We use the LOCATE function to find the position of the first comma, and then use substring to extract the string up to that point.
Code:
mysql> select fullname, | |
-> substring(fullname, 1, LOCATE(',', fullname)-1) as surname | |
-> from people; | |
+-------------------+-----------+ | |
| fullname | surname | | |
+-------------------+-----------+ | |
| Blanco, Martin | Blanco | | |
| Black, Clint | Black | | |
| Lopez, Julia | Lopez | | |
| Zimmerman, Robert | Zimmerman | | |
+-------------------+-----------+ | |
4 rows in set |
To extract the first name which is at the end of the string, requires a bit more cunning. Find the location of the space prior to the last bit of the name (remember the surname may have two elements) by reversing the whole string (using REVERSE ) and using the LOCATE function. Then use this value in the RIGHT function (which takes a substring of the string from the right). Like this!
Code:
mysql> select fullname, | |
-> right(fullname, LOCATE(',', REVERSE(fullname)) -2) as FirstName | |
-> from people; | |
+-------------------+-----------+ | |
| fullname | FirstName | | |
+-------------------+-----------+ | |
| Blanco, Martin | Martin | | |
| Black, Clint | Clint | | |
| Lopez, Julia | Julia | | |
| Zimmerman, Robert | Robert | | |
+-------------------+-----------+ | |
4 rows in set |
Of course, there may be other complications, but a combination of the various string functions can normally solve the data issues with names. For more information, please look at the Mysql documentation page here:
Mysql Reference Page
I hope I've given a flavour of what can be achieved using string functions in MySql. If there is a particular subject you want me to cover in this blog, please leave a request in a comment, and I'll do my best to oblige.
If you're interested in getting the SQL Cookbook (a book I thoroughly recommend) the Amazon UK link page is here:
Amazon link SQL Cookbook
Until next time, happy coding!
Feedback awaiting moderation
This post has 1 feedback awaiting moderation...
Leave a comment
| « Finding duplicates | Checking for non-text entries. » |