[ACCEPTED]-Calculating percentile rank in MySQL-percentile
Here's a different approach that doesn't 10 require a join. In my case (a table with 9 15,000+) rows, it runs in about 3 seconds. (The 8 JOIN method takes an order of magnitude 7 longer).
In the sample, assume that measure is the 6 column on which you're calculating the percent 5 rank, and id is just a row identifier (not 4 required):
SELECT
id,
@prev := @curr as prev,
@curr := measure as curr,
@rank := IF(@prev > @curr, @rank+@ties, @rank) AS rank,
@ties := IF(@prev = @curr, @ties+1, 1) AS ties,
(1-@rank/@total) as percentrank
FROM
mytable,
(SELECT
@curr := null,
@prev := null,
@rank := 0,
@ties := 1,
@total := count(*) from mytable where measure is not null
) b
WHERE
measure is not null
ORDER BY
measure DESC
Credit for this method goes to 3 Shlomi Noach. He writes about it in detail 2 here:
http://code.openark.org/blog/mysql/sql-ranking-without-self-join
I've tested this in MySQL and it works 1 great; no idea about Oracle, SQLServer, etc.
SELECT
c.id, c.score, ROUND(((@rank - rank) / @rank) * 100, 2) AS percentile_rank
FROM
(SELECT
*,
@prev:=@curr,
@curr:=a.score,
@rank:=IF(@prev = @curr, @rank, @rank + 1) AS rank
FROM
(SELECT id, score FROM mytable) AS a,
(SELECT @curr:= null, @prev:= null, @rank:= 0) AS b
ORDER BY score DESC) AS c;
0
there is no easy way to do this. see http://rpbouman.blogspot.com/2008/07/calculating-nth-percentile-in-mysql.html
0
This is a relatively ugly answer, and I 26 feel guilty saying it. That said, it might 25 help you with your issue.
One way to determine 24 the percentage would be to count all of 23 the rows, and count the number of rows that 22 are greater than the number you provided. You 21 can calculate either greater or less than 20 and take the inverse as necessary.
Create 19 an index on your number. total = select 18 count(); less_equal = select count() where value > indexed_number;
The 17 percentage would be something like: less_equal 16 / total or (total - less_equal)/total
Make 15 sure that both of them are using the index 14 that you created. If they are not, tweak 13 them until they are. The explain query should 12 have "using index" in the right hand column. In 11 the case of the select count(*) it should 10 be using index for InnoDB and something 9 like const for MyISAM. MyISAM will know 8 this value at any time without having to 7 calculate it.
If you needed to have the percentage 6 stored in the database, you can use the 5 setup from above for performance and then 4 calculate the value for each row by using 3 the second query as an inner select. The 2 first query's value can be set as a constant.
Does 1 this help?
Jacob
If you're combining your SQL with a procedural 6 language like PHP, you can do the following. This 5 example breaks down excess flight block 4 times into an airport, into their percentiles. Uses 3 the LIMIT x,y clause in MySQL in combination 2 with ORDER BY
. Not very pretty, but does the job 1 (sorry struggled with the formatting):
$startDt = "2011-01-01";
$endDt = "2011-02-28";
$arrPort= 'JFK';
$strSQL = "SELECT COUNT(*) as TotFlights FROM FIDS where depdt >= '$startDt' And depdt <= '$endDt' and ArrPort='$arrPort'";
if (!($queryResult = mysql_query($strSQL, $con)) ) {
echo $strSQL . " FAILED\n"; echo mysql_error();
exit(0);
}
$totFlights=0;
while($fltRow=mysql_fetch_array($queryResult)) {
echo "Total Flights into " . $arrPort . " = " . $fltRow['TotFlights'];
$totFlights = $fltRow['TotFlights'];
/* 1906 flights. Percentile 90 = int(0.9 * 1906). */
for ($x = 1; $x<=10; $x++) {
$pctlPosn = $totFlights - intval( ($x/10) * $totFlights);
echo "PCTL POSN for " . $x * 10 . " IS " . $pctlPosn . "\t";
$pctlSQL = "SELECT (ablk-sblk) as ExcessBlk from FIDS where ArrPort='" . $arrPort . "' order by ExcessBlk DESC limit " . $pctlPosn . ",1;";
if (!($query2Result = mysql_query($pctlSQL, $con)) ) {
echo $pctlSQL . " FAILED\n";
echo mysql_error();
exit(0);
}
while ($pctlRow = mysql_fetch_array($query2Result)) {
echo "Excess Block is :" . $pctlRow['ExcessBlk'] . "\n";
}
}
}
MySQL 8 finally introduced window functions, and 8 among them, the PERCENT_RANK()
function you were looking 7 for. So, just write:
SELECT col, percent_rank() OVER (ORDER BY col)
FROM t
ORDER BY col
Your question mentions 6 "percentiles", which are a slightly 5 different thing. For completeness' sake, there 4 are PERCENTILE_DISC
and PERCENTILE_CONT
inverse distribution functions 3 in the SQL standard and in some RBDMS (Oracle, PostgreSQL, SQL 2 Server, Teradata), but not in MySQL. With 1 MySQL 8 and window functions, you can emulate PERCENTILE_DISC
, however, again using the PERCENT_RANK
and FIRST_VALUE
window functions.
To get the rank, I'd say you need to (left) outer 13 join the table on itself something like 12 :
select t1.name, t1.value, count(distinct isnull(t2.value,0))
from table t1
left join table t2
on t1.value>t2.value
group by t1.name, t1.value
For each row, you will count how many 11 (if any) rows of the same table have an 10 inferior value.
Note that I'm more familiar 9 with sqlserver so the syntax might not be 8 right. Also the distinct may not have the 7 right behaviour for what you want to achieve. But 6 that's the general idea.
Then to get the 5 real percentile rank you will need to first 4 get the number of values in a variable (or 3 distinct values depending on the convention 2 you want to take) and compute the percentile 1 rank using the real rank given above.
Suppose we have a sales table like :
user_id,units
then 6 following query will give percentile of 5 each user :
select a.user_id,a.units,
(sum(case when a.units >= b.units then 1 else 0 end )*100)/count(1) percentile
from sales a join sales b ;
Note that this will go for cross 4 join so result in O(n2) complexity so can 3 be considered as unoptimized solution but 2 seems simple given we do not have any function 1 in mysql version.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.