Tags: average
Using Mysql to calculate average (mean, mode or median)
By admin on Sep 28, 2010 | In Daily Commute - Standard hints/tips, Showroom - Examples of Mysql usage | Send feedback »
To calculate the mean (or average) we can utilise the AVG function within Mysql.
Code:
select avg(sal) as mean_sal | |
from emp; |
mean_sal
========
2073.9286
Remember though that the function will not work on a NULL field. If you want to include NULL values (eg assume zero value) use COALESCE to default the NULL to zero.
Code:
select avg(coalesce(sal,0)) as mean_sal | |
from emp; |
We can now produce the mean figure per grouped column, for example by department
Code:
select deptno, avg(coalesce(sal,0)) as mean_sal | |
from emp | |
group by deptno; |
deptno mean_sal
====== ========
10 2916.6667
20 2175.0000
30 1568.3333
The mode is the value that occurs most frequently in a given set of data.
If I want to find out from the PRODUCT table, what is the mode of the FULLPRICE field for a certain productgroup (in this case 'CA'), I can use a subquery to find the mode.
The subquery will calculate the various counts for each Fullprice, while the main part of the query will choose the price that occurs the most.
Code:
SELECT FullPrice FROM product | |
WHERE productgroupcode = 'CA' | |
GROUP BY FullPrice | |
HAVING count(*) >= all (SELECT count(*) | |
FROM product | |
WHERE productgroupcode = 'CA' | |
GROUP BY FullPrice) |
FullPrice
=========
14
There is another common requirement in finding an average, which is called the median.
The median is the value of the middle item in a set of ordered elements.
To calculate the median we can use a self join (join the table to itself), and then use the HAVING clause to filter those rows where the number of
times (P.FULLPRICE = Q.FULLPRICE) is greater than or equal to the number of times (Q.FULLPRICE > P.FULLPRICE) :
Code:
SELECT avg(FullPrice) | |
FROM ( select q.FullPrice | |
FROM product q, product p | |
WHERE p.productgroupcode = q.productgroupcode | |
AND q.productgroupcode = 'CA' | |
GROUP BY q.FullPrice | |
HAVING sum(case when q.FullPrice = p.FullPrice then 1 else 0 end) | |
>= abs(sum(sign(q.FullPrice - p.FullPrice))) | |
) t |
avg(FullPrice)
=========
14.0000
We can show the individual workings with this code that moves the SUM calculations into the SELECT list:
Code:
select q.FullPrice, p.FullPrice, | |
sum(case when q.FullPrice = p.FullPrice then 1 else 0 end) as cnt1, | |
abs(sum(sign(q.FullPrice - p.FullPrice))) as cnt2 | |
FROM product q, product p | |
WHERE p.productgroupcode = q.productgroupcode | |
AND q.productgroupcode = 'CA' | |
group by q.FullPrice |
FullPrice --- cnt1 --- cnt2
10 --- 25 --- 130
11 --- 4 --- 38
12 --- 1 --- 16
14 --- 64 --- 56
15 --- 1 --- 2
20 --- 1 --- 4
24 --- 9 --- 24
26 --- 1 --- 12
29 --- 9 --- 48
39 --- 25 --- 120
630 --- 1 --- 30
Finally, here I will list the records from the PRODUCT table so you can see the whole set we worked upon.
Code:
SELECT PRODUCTID, FULLPRICE | |
FROM product | |
WHERE PRODUCTGROUPCODE = 'CA' | |
ORDER BY FULLPRICE |
PRODUCTID --- FULLPRICE
12175 ---- 10
12014 ---- 10
13297 ---- 10
10002 ---- 10
10003 ---- 10
10838 ---- 11
13109 ---- 11
13107 ---- 12
13299 ---- 14
12177 ---- 14
14030 ---- 14
13621 ---- 14
12016 ---- 14
11003 ---- 14
11057 ---- 14
11056 ---- 14
10001 ---- 15
12949 ---- 20
13360 ---- 24
13470 ---- 24
10955 ---- 24
13359 ---- 26
13298 ---- 29
12015 ---- 29
12176 ---- 29
13024 ---- 39
10863 ---- 39
13023 ---- 39
10862 ---- 39
10993 ---- 39
12141 ---- 630