• Español
  • Blog

The road to using Mysql

Getting the most from the open source database Mysql
  • Home
  • Contact
  • Log in
  • The road to using Mysql

  • A resource blog for those seeking to utilise Mysql, by Mark Nessfield. Useful tips and links for all who need to utilise their Mysql database to the max.

  • Contents

    • Creating websites with Perl (cgi)
    • Calculating distance with Mysql
    • Project Euler Q3 with Mysql
    • Project Euler Q2 with Mysql
    • Project Euler Q1 with Mysql
    • A quick look at using dates in Mysql
    • Bumper tricks
    • Date Issues importing from an Excel csv file
    • Quick Perl solution.
    • More on Variables
    • Making queries more efficient (3) - The importance of indexes
    • Change of host
    • Making queries more efficient 2) Optimisation of sql.
    • Making queries more efficient 1) Locating queries that run slowly
    • Finding duplicates
    • Powerful string manipulation
    • Checking for non-text entries.
    • More useful GROUP BY stuff.
    • Beautiful code
    • Getting Innodb to work
  • Categories

    • All
    • Daily Commute - Standard hints/tips
    • Formula 1 - More advanced stuff
    • Highway Maintenance - DBA stuff
    • Off-Road - Unrelated (?) stuff
    • Showroom - Examples of Mysql usage
    • Uncategorized
    • Recently
    • Archives
    • Categories
    • Latest comments
  • Search

  • XML Feeds

    • RSS 2.0: Posts, Comments
    • Atom: Posts, Comments
    What is RSS?

Creating websites with Perl (cgi)

By admin on Nov 14, 2009 | In Formula 1 - More advanced stuff, Showroom - Examples of Mysql usage | Send feedback »

The normal way to produce a website that accesses data from a Mysql database is via the scripting language PHP. I wondered if it were possible to use Perl instead, and found that with the use of a couple of modules, it's not so hard to get started. (Here's a brief introduction to using modules with web programming.
Here's the Perl I wrote (calling the file empnew.cgi). I made sure the program was executable, and placed the file in a directory on my web server (make sure your path points to this directory):


#!/usr/bin/perl
use CGI;
use HTML::Template;
use DBI;
use strict;

my $dsn = 'dbi:mysql:one:localhost:3306:mysql_socket=/opt/lampp/var/mysql/mysqld.sock';
# the database here is called 'one'

my $CGI = CGI->new();
my $user = 'root';
my $pass = 'passwordx';


my $dbh = DBI->connect($dsn, $user, $pass)
or die "Can’t connect to the DB: $DBI::errstr\n";

my $sth = $dbh->prepare("select empno, ename, job, sal from emp order by ename");
$sth->execute;



# Prepare a data structure for HTML::Template
my $rows;
push @{$rows}, $_ while $_ = $sth->fetchrow_hashref();

#
# initiate the template and substitute the values:

my $template = HTML::Template->new(filename => 'empnew.tmpl',
die_on_bad_params => 0, );
# call param to fill in the loop with the loop data by reference.
$template->param(ROWS => $rows);


print $CGI->header();
# print the template
print $template->output();

$dbh->disconnect();

The module CGI deals quickly and efficiently with http headers.

The module HTML::Template that is called, allows you to state the Html layout in a separate file. In our example this calls empnew.tmpl

Code:

<!-- empnew.tmpl -->
<html>
<head>
<title>Employee Listing</title>
</head>
<body>
<h1>Employees</h1>
<table border=1>
            <tr>
              <td bgcolor="#ffff00"><font COLOR="#800500"><b>Emp Number</b></font></td>
              <td bgcolor="#ffff00"><font COLOR="#800500"><b>Emp Name</b></font></td>
              <td bgcolor="#ffff00"><font COLOR="#800500"><b>Job</b></font></td>
              <td bgcolor="#ffff00"><font COLOR="#800500"><b>Salary</b></font></td>
            </tr>
 
<!-- TMPL_LOOP NAME=ROWS -->
   <tr>
      <td><!-- TMPL_VAR NAME=EMPNO --></td>
      <td><!-- TMPL_VAR NAME=ENAME --></td>
      <td><!-- TMPL_VAR NAME=JOB --></td>
      <td><!-- TMPL_VAR NAME=SAL --></td>
   </tr>
<!-- /TMPL_LOOP -->
</table>
</body>
</html>

There are basically 3 new kinds of tags to be aware of when making a .tmpl file for access by HTML::Template. These are

Code:

* VARIABLES defined by <tmp_var>
* LOOPS defined by <tmpl_loop>
* CONDITIONALS you can use <tmpl_if> <tmpl_unless> or <tmpl_else>
 
Variable tags such as <TMPL_VAR NAME=FOO> will be replaced by whatever value is assigned to parameters when calling the HTMP::Template to the parameter FOO

The other tags can be standard html formatting stuff. Here's a good introduction to using the HTML::Template

Finally, in order to access data from the Mysql database we need to use the DBI module. You can see from the Perl script that we connect to the database using parameters given (not a good idea from the security pov to have the password in this script in real life - better to bring in as a parameter from a file that can't be seen by public eyes), prepare the sql script, and then execute. For more information on starting with DBI see this tutorial

Once this is setup, it's fairly simple to change the layout by editing the tmpl file directly without needing to change the program code.

I hope this has been of interest! Good luck with your web programming!

Tags: cgi, dbi, html, perl, sql

Calculating distance with Mysql

By admin on Sep 30, 2009 | In Formula 1 - More advanced stuff, Showroom - Examples of Mysql usage | Send feedback »

After the recent posting of data in Wikileaks, it's claimed that here in the UK we can now freely access our longitude and latitude co-ordinates for each post code. However, some of the data in their files may well be deliberately corrupted - you may be infringing copyright law by using that data, and the mistakes will prove that you've used illegal data. Free the Postcode is perhaps a better place to get free data - http://www.freethepostcode.org/
Now, it's possible to calculate distances between each point using the following calculation (originally from the fine book Pro Mysql):

The distance d between two points (x1,y1) and (x2,y2) can be calculated from the following equation (so long as x values are latitude and y values are longitude in radians, r is the radius of the sphere which is 3956 in miles):

Code:

d= acos(sin(x1)sin(x2)+cos(x1)cos(x2)*cos(y2-y1)) * r

Initially I downloaded relevant postcode data file onto my Linux system. I then (by using regex in egrep) selected those post codes beginning NG and a single digit (Nottingham - I'm not interested in loading every single postcode at this stage) and use the shell command cut to extract only those fields I'm after. In this case it's a comma-delimited file so we use the -d switch to let the function cut know that commas are the delimiter.

Code:

egrep 'NG[0-9]+ [0-9]'  freepostcodesfile | cut -d , -f 1,14,15 > ng.txt

I did the same for OX postcodes so I had some data in two different towns to play around with.

Prior to inserting the postcode data I used the following script to create the table (I'll explain the last three columns a little later).

Code:

CREATE TABLE postcodes ( postcode  CHAR(8) NOT NULL
, longitude  DECIMAL(9,6) NOT NULL
, latitude   DECIMAL(9,6) NOT NULL
, rad_xaxis   DECIMAL(9,6)
, rad_yaxis   DECIMAL(9,6)
, rad_zaxis   DECIMAL(9,6)
, PRIMARY KEY(postcode));

To insert the data into my table I used :

LOAD DATA LOCAL INFILE '/home/usname/gis/ng.txt' INTO TABLE postcodes FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' (postcode, longitude, latitude);

and again for the ox file (Oxford postcodes)
LOAD DATA LOCAL INFILE '/home/usname/gis/ox.txt' INTO TABLE postcodes FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' (postcode, longitude, latitude);

After putting our data into a table we can then use the trig functions available in Mysql to calculate distances between two points. However we have to be aware of time-consuming functions, that make the selection of records from the table where the distance is below a certain figure (ie after the calculation of all relevant records) can be very cumbersome. It's better to precompute and store much of this in the database table where we're recording the postcodes. To start with we need to change the latitude and longitude figures given from degrees into radians. This is done using the following formula:

radians= degrees*(PI/180)

Luckily we can obtain PI in Mysql by using PI()
What I will do here is precompute and populate 3 extra fields for each record (in radians) and store them as rad_xaxis, rad_yaxis and rad_zaxis, as per the following definition:

xaxis = cos(radians(Lat)) * cos(radians(Lon))
yaxis = cos(radians(Lat)) * sin(radians(Lon))
zaxis = sin(radians(Lat))

So we update our calculated fields as per the following:

Code:

update postcodes set rad_xaxis
= cos((latitude * (PI() /180))) * cos((longitude * (PI() /180)));
update postcodes set rad_yaxis
= cos((latitude * (PI() /180))) * sin((longitude * (PI() /180)));
update postcodes set rad_zaxis
= sin((latitude * (PI() /180)));

The distance between points can then be calculated using SQL loosely like (acos( xaxis * $xaxis + yaxis * $yaxis + zaxis * $zaxis ) * 3956) (where those starting with a $ are precomputed for the start point in question in the same manner as above)
So to get my distance between say the post code OX1 1AB (central Oxford) and those postcodes starting 'OX4 3Y%'

Code:

select b.postcode, acos(b.rad_xaxis * a.rad_xaxis + b.rad_yaxis * a.rad_yaxis + b.rad_zaxis * a.rad_zaxis) * 3956 AS distance
from postcodes a, postcodes b
where a.postcode = 'OX1 1AB'
and b.postcode like 'OX4 3Y%'
and b.latitude > 0;

Note I don't want to calculate where the file concerned didn't have data, so add to the where clause the restriction that the latitude figure is greater than 0.
I have done some calculations using this formula (for example where the fixed post code is NG7 4ER in Nottingham), and it seems to be more accurate over greater distances - I'm not sure that you could trust it for under 5 miles. Even so it's very responsive and you can add different parts of the UK and have a distance calculator for different areas easily available. It should be remembered that as the Earth is not a perfect sphere, our calculations here will not be totally accurate, but we seem to be within 200 yards for the distance between Oxford and Nottingham (as the crow flies).

Now it's often the case that you won't want all the distances between lots of random postcodes (even within your home city). However, suppose you want to bring back all supermarkets (or takeaway shops or pubs etc) within 6 miles of your designated postcode? So long as you have the supermarkets recorded in a table along with the correct post code (make sure the format is exactly the same as the postcode table) then you can use the following statement:

Code:

select b.postcode, acos(b.rad_xaxis * a.rad_xaxis
     + b.rad_yaxis * a.rad_yaxis
     + b.rad_zaxis * a.rad_zaxis) * 3956 AS distance,
     s.name, s.address
     from postcodes a, postcodes b, shops s
     where b.postcode = s.postcode and a.postcode = 'OX1 1AB'
     and b.latitude > 0
     and s.shoptype = 'Supermarket'
and acos(b.rad_xaxis *a.rad_xaxis
+ b.rad_yaxis * a.rad_yaxis
+ b.rad_zaxis * a.rad_zaxis ) * 3956 <= 6 order by distance \G

This brought back the following records (those records I've recorded in my shops database table - in real life obviously there are others within the specified distance):

******* 1. row *******
postcode: OX2 7BY
distance: 2.66911298653386
name: Midcounties Co-Op
address: 228-240 Banbury Road, Oxford
******* 2. row *******
postcode: OX4 3XQ
distance: 3.50779225250805
name: Midcounties Co-Op
address: 21 Templars Square, Cowley
******* 3. row *******
postcode: OX3 8RA
distance: 3.88607301210902
name: Midcounties Co-Op
address: Atkyns Road, Headington
******* 4. row *******
postcode: OX4 6XJ
distance: 4.67029242514653
name: Tesco
address: Oxford Retail Park, Cowley
4 rows in set (0.01 sec)

How good is that? I'll have to round the figures though - it doesn't make any sense to calculate to 15 decimal points! Hope you get as much enjoyment from using this technique as I do. Until next time - Enjoy your Mysql coding!

Tags: gis, load data, postcodes, promysql, regexp

Project Euler Q3 with Mysql

By admin on Sep 7, 2009 | In Formula 1 - More advanced stuff, Showroom - Examples of Mysql usage | Send feedback »

Here is the question set for us this time from The Euler Project

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143

In Mysql you can calulate prime numbers using a "numbers" table eg:

Code:

select * from t500 a where a.id != 1
and not exists
(select * from t500 b where b.id !=1  
and b.id < a.id and mod(a.id,b.id) = 0)

But this is probably not efficient or realistic for larger numbers (and we are probably looking at big numbers in this exercise).
Another way to approach this is to work out the smallest divisor of the original number starting from 2, divide out that number, and again (starting from 2) find the next smallest divisor (ie bigger than or equal to the previous result) of that new number. Carry on doing this until we can go no further and we should end up with the largest prime factor (as for instance a factor of 8 would be broken down by 2 twice). I hope the following Mysql code makes sense; we increment the variable var1 by 1 each time and test (via the mod function) whether it's a factor of the variable bignuma. If it is, we divide out var1 to get a smaller bignuma, and start incrementing var1 again from 2.

Code:

set @var1=2;
set @varreset=2;
set @bignuma= 600851475143;
 
*** a1factor displays prime factors in bignuma ***
select a1factor from
(select  
case when mod(@bignuma,@var1) = 0
   then @var1:=@varreset
   else @var1:=@var1+1  end as a1factor,
case when mod(@bignuma,@var1) = 0
   then @bignuma:=@bignuma/@var1
   end as new_factor
from orders where @var1<@bignuma ) tmp
where new_factor is not null
order by a1factor desc;
 
# This produces the values: 6857, 1471, 839, 71

In my three blog entries on iterating through numbers I've used a table that I've already created and populated in my database (in this case it's called orders). Although it's relatively straightforward to create a table yourself with lots of rows, there are system tables you can use for this purpose. I recently found this on the web - openark blog mysql - a very good read. If they are installed you can use rows from mysql.help_topic, otherwise the author suggests you can use tables from the data dictionary INFORMATION_SCHEMA instead like COLLATIONS or COLUMNS. You do need to check first that there are enough rows to suit your requirement though. As this method can also be slow I would recommend that you create a table with the required number of rows for iteration purposes where possible. Here's one way you can populate a table with a range of numbers in Mysql ( taken from a question on Stack overflow by a user known as PittsburghDBA). This generates 1000 rows (from 0 to 999) using the combination of using a cross join method with the UNION keyword, but you can always adapt it to suit yourself.

Code:

INSERT INTO
    myTable
    (
    nr
    )
SELECT
    SEQ.SeqValue
FROM
(
SELECT
    (HUNDREDS.SeqValue + TENS.SeqValue + ONES.SeqValue) SeqValue
FROM
    (
    SELECT 0  SeqValue
    UNION ALL
    SELECT 1 SeqValue
    UNION ALL
    SELECT 2 SeqValue
    UNION ALL
    SELECT 3 SeqValue
    UNION ALL
    SELECT 4 SeqValue
    UNION ALL
    SELECT 5 SeqValue
    UNION ALL
    SELECT 6 SeqValue
    UNION ALL
    SELECT 7 SeqValue
    UNION ALL
    SELECT 8 SeqValue
    UNION ALL
    SELECT 9 SeqValue
    ) ONES
CROSS JOIN
    (
    SELECT 0 SeqValue
    UNION ALL
    SELECT 10 SeqValue
    UNION ALL
    SELECT 20 SeqValue
    UNION ALL
    SELECT 30 SeqValue
    UNION ALL
    SELECT 40 SeqValue
    UNION ALL
    SELECT 50 SeqValue
    UNION ALL
    SELECT 60 SeqValue
    UNION ALL
    SELECT 70 SeqValue
    UNION ALL
    SELECT 80 SeqValue
    UNION ALL
    SELECT 90 SeqValue
    ) TENS
CROSS JOIN
    (
    SELECT 0 SeqValue
    UNION ALL
    SELECT 100 SeqValue
    UNION ALL
    SELECT 200 SeqValue
    UNION ALL
    SELECT 300 SeqValue
    UNION ALL
    SELECT 400 SeqValue
    UNION ALL
    SELECT 500 SeqValue
    UNION ALL
    SELECT 600 SeqValue
    UNION ALL
    SELECT 700 SeqValue
    UNION ALL
    SELECT 800 SeqValue
    UNION ALL
    SELECT 900 SeqValue
    ) HUNDREDS
) SEQ


I hope in the last 3 blog entries I've given some idea on how to approach solving the problems in the Euler Project with Mysql. Although I'll continue looking at the problems myself, I won't be publishing most of the solutions here, so it's up to you if you continue or not (with or without Mysql). Good luck!

Tags: data dictionary, project euler, union, variables
1 2 3 4 5 6 7 8 9 >>
  • Some of the best links you'll use

    Gentle Intro to sql
    Mysql Cheat Sheet
    Database tutorials
    MySql Magazine
    Mysql 5.0 Reference Manual
    MySQL dba Feed Resource
    My Tech blog
    dba stuff on users/privileges
  • Top MySql blogs

    Planet Mysql
    Jay Pipes
    Performance blog

    My other sites
    My home site
    My Blogger blog
  • Tag cloud

    apache case command line crossjoin data cleansing data dictionary dates dba dbi duplicates excel gis group by indexes innodb linux load data looping mod optimisation optimization oracle perl php project euler promysql rand regexp rownum sql sql cookbook subquery union all variables

powered by free blog software

©2010 by admin | Contact | evoCamp skin | Credits: Blog Design | blog software | UK hosts | monetize