MySQL Manual | 6.3.7 Функции, используемые в операторах GROUP BY

Previous / Next / Up / Table of Contents

6.3.7 Функции, используемые в операторах `GROUP BY`

Вызов групповых функций для SQL-команд, не содержащих GROUP BY, эквивалентен выполнению этих функций над всем набором возвращаемых данных.

COUNT(expr)

Возвращает количество величин со значением, не равным NULL, в строках, полученных при помощи команды SELECT:

mysql> SELECT student.student_name,COUNT(*) FROM student,course
        WHERE student.student_id=course.student_id
        GROUP BY student_name;

Функция COUNT(*) несколько отличается от описанной выше: она возвращает количество извлеченных строк, содержащих величины со значением NULL. COUNT(*) оптимизирована для очень быстрого возврата результата при условии, что команда SELECT извлекает данные из одной таблицы, никакие другие столбцы не обрабатываются и функция не содержит выражения WHERE. Например:

mysql> SELECT COUNT(*) FROM student;

COUNT(DISTINCT expr,[expr...])

Возвращает количество различающихся величин со значением, не равным NULL:

mysql> SELECT COUNT(DISTINCT results) FROM student;

В MySQL для того, чтобы получить количество различающихся комбинаций выражений, не содержащих NULL, нужно просто задать список этих выражений. В ANSI SQL необходимо провести конкатенацию всех выражений внутри CODE(DISTINCT ...).

AVG(expr)

Возвращает среднее значение аргумента expr:

mysql> SELECT student_name, AVG(test_score)
        FROM student
        GROUP BY student_name;

MIN(expr)

MAX(expr)

Возвращает минимальную или максимальную величину аргумента expr. Функции MIN() и MAX() могут принимать строковый аргумент; в таких случаях они возвращают минимальную или максимальную строковую величину. See section 5.4.3 Использование индексов в MySQL.

mysql> SELECT student_name, MIN(test_score), MAX(test_score)
        FROM student
        GROUP BY student_name;

SUM(expr)

Возвращает сумму величин в аргументе expr. Обратите внимание: если возвращаемый набор данных не содержит ни одной строки, то функция возвращает NULL!

STD(expr)

STDDEV(expr)

Возвращает среднеквадратичное отклонение значения в аргументе expr. Эта функция является расширением ANSI SQL. Форма STDDEV() обеспечивает совместимость с Oracle.

BIT_OR(expr)

Возвращает побитовое ИЛИ для всех битов в expr. Вычисление производится с 64-битовой (BIGINT) точностью.

BIT_AND(expr)

Возвращает побитовое И для всех битов в expr. Вычисление производится с 64-битовой (BIGINT) точностью.

В MySQL расширены возможности использования оператора GROUP BY. Теперь в выражениях SELECT можно использовать столбцы или вычисления, которые не присутствуют в части GROUP BY. Это справедливо для любой возможной величины для этой группы. Данная возможность позволяет повысить производительность за счет исключения сортировки и группирования ненужных величин. Например, в следующем запросе нет необходимости в группировке customer.name:

mysql> SELECT order.custid,customer.name,MAX(payments)
        FROM order,customer
        WHERE order.custid = customer.custid
        GROUP BY order.custid;

В ANSI SQL к предложению GROUP BY необходимо добавлять customer.name. В MySQL, если работа происходит не в режиме ANSI, это имя избыточно.

Не используйте данное свойство, если столбцы, пропущенные в части GROUP BY, не являются уникальными в данной группе! Возможны непредсказуемые результаты.

В некоторых случаях можно применять функции MIN() и MAX() для получения указанной величины столбца, даже если он не является уникальным. В следующем примере выдается значение столбца column из строки, содержащей наименьшую величину в столбце sort:

SUBSTR(MIN(CONCAT(RPAD(sort,6,' '),column)),7)

See section 3.5.4 Строка, содержащая максимальное значение некоторого столбца.

Следует отметить, что в версии MySQL 3.22 (или более ранней) либо при попытке работы в рамках ANSI SQL применение выражений в предложениях GROUP BY или ORDER BY невозможно. Это ограничение можно обойти, используя для выражения псевдоним:

mysql> SELECT id,FLOOR(value/100) AS val FROM tbl_name
        GROUP BY id,val ORDER BY val;

В версии MySQL 3.23 можно также выполнить следующее:

mysql> SELECT id,FLOOR(value/100) FROM tbl_name ORDER BY RAND();

User Comments

Posted by on Thursday June 20 2002, @6:28pm

[Delete] [Edit]

For medians in almost any SQL:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)
+1)/2

assumes (COUNT(*)+1)/2 returns an int, if not
then use INT((COUNT(*)+1)/2)
There are variations for the varied defs of
median too.

See Rozenshtein's book on Optimizing T-SQL
Advanced Programming Techniques. Search Amazon.
Learn how to use encoded characteristic functions
(delta[]), then you can do all sorts of fun stuff
quickly - for instance you could pivot a table
into summary columns of totals where each column
counted different things in one table pass.
I have utilized this technique with great success
on large tables: a multi million row table using
the existing summary code took minutes while this
SQL trimmed that down to less than 3 seconds!
Once you get used to Delta functions, the code is
very readable too!

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

Will MySQL some time be able to tell what
timezone it is running on? For instance CET,
CEST, GMT etc.? Wouldn't that be a helpful
function at various occasions?

Posted by on Wednesday December 18 2002, @5:27pm

[Delete] [Edit]

Another GROUP functions NEEDED!!! There's no way
to pick some value from grouped column, accordign
to another column. E.g., the newest value, acc.
to 'mydate' column, what is INSERTed as NOW().
Somebody should develop something like
SELECT user, DECICE(val,when = MAX(when)) AS
newest_val GROUP BY user;
DECIDE would be a group-usable fuction, what
would select the value from the column set in the
first arg from the row where the expression in
the 2nd arg is true. Who agrees, make the | :)
|

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

I agree with Ondra, the standard GROUP BY
functions do not allow a selection of one field
based on the value of a second field : SELECT
index_num FROM table GROUP BY (value WHERE
expression_is_true_for_row) WHERE expression.
A more advanced selection of defined rows.

Posted by [name withheld] on Friday May 17 2002, @6:24am

[Delete] [Edit]

Listen up kids, 'cause this doesn't seem to be anywhere in the documentation and it's really
important to know if you're planning on matching plaintext strings against password()
encrypted fields in your tables:
Apparently password() creates a garbled alpha-numeric string that is ALWAYS 16
CHARACTERS LONG. If your field is less than 16 characters long, the string will get
truncated and you'll be out in the wind when the time comes to compare your plaintext
candidates.
I can't tell you how long it took me to figure this out, so I hope this note saves someone
else the agrivation I went through to end up writing it. Additionally, if someone from mySQL
is reading this, you might want to add this fact to the manual officially. It seems to me like a
pretty vital scrap of information to omit.

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

A nice feature would be :
SELECT ARRAY(catid) FROM categories;

This way I could get all catid's from categories
as an array.

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

For what it's worth, the SUM() function does not
work on TIME values directly, although it appears
to work at first glance. To add a bunch of TIME
values (intervals, presumably), you need to do
something like this:

SEC_TO_TIME(SUM(TIME_TO_SEC(events.length)))

This will return another TIME value, suitable for
TIME_FORMAT() or whatever else.

Posted by purge on Friday May 17 2002, @6:24am

[Delete] [Edit]

does calc_found_rows work for anyone? When i
select found_rows() the result i get is always
the count of the last query ran, and the offset
is NOT ignored. So if offset=0 and max=5 the
result i get is 5, where it should be a larger
number?

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

A function for doing Base64 encoding would be a
good thing to have. This can be used with
encryption, but also, binary files can be
converted to a convenient universal character
format and stored neatly as text.

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

During the development on one of our systems, I
stumbled over some weird behavior while using
the FORMAT function (version 3.23.45).

FORMAT('nan',2) makes the server restart/crash,
while FORMAT('dontcrash',2) doesn't.

Yes, i know this function is for numbers, but it
really shouldn't make the server restart/crash
anyway ;)

Posted by Mike McFadden on Friday May 17 2002, @6:24am

[Delete] [Edit]

Is there a way to get the values of a group clause
instead of count() or max() or ave() into a
column? A comma delimited list of values would be
nice.
The best solution I have now is to replace the
"GROUP BY

" with a "ORDER BY " and have
my perl driver script to the grouping on the
non-distinct list.

Or, if you want to be REALLY slow about it, do 1
query for the first list, then do a second query
for every result in the first list (ARGH!)

There has to be a better way. Or maybe I just
need to read more.

Posted by Peter MЬrch on Friday May 17 2002, @6:24am

[Delete] [Edit]

I'm with Mike McFadden. E.g. make CONCAT_WS work
with GROUP BY...

Posted by Christian Sonntag on Thursday July 4 2002, @12:44pm

[Delete] [Edit]

I would like to see a GROUP BY function 'ANY',
which just returns any of the values and optimises
the query instead. This could be quite useful.

Posted by Adam Hooper on Friday May 17 2002, @6:24am

[Delete] [Edit]

I haven't tested this much, but I can say for
sure that in at least one single-table query
using SQL_CALC_FOUND_ROWS/FOUND_ROWS() is MUCH
slower than running the query with a LIMIT and
then running it again with a COUNT(*). This
particular query is run from PHP and returns
16720 rows with a LIMIT of 25. Running it with
two queries (i.e., one with LIMIT and the other
with COUNT(*)) gives a query time of around
450ms, while running it with SQL_CALC_FOUND_ROWS
(LIMIT 25)/FOUND_ROWS() takes slightly over
600ms. Pentium II 400, 512MB, InnoDB....

Posted by on Friday May 17 2002, @6:24am

[Delete] [Edit]

count(1) and count(*) provide equal results, but
1 is easier to evaluate than *, so you should
try to use it instead.

Posted by Aubrey McAuley on Thursday December 19 2002, @3:22pm

[Delete] [Edit]

I agree that the limitations of GROUP BY when
used along with ORDER BY are very frustrating.
They've caused me quite a lot of grief. I think
the standard way to get around this limitation
is with a sub-select, which MySQL doesn't
support. However, there is a way around this
problem that's already documented:

http://www.mysql.com/doc/en/example-Maximum-column-group-row.html

I don't know why it's not mentioned here. Even
though it's not a simple solution, it does work.
If you sort things the way you need them while
populating the temporary table (without GROUP
BY), then subsequent queries that join the
temporary table will provide suitable results.
Thus, the GROUP BY can in a round-about way be
used with ORDER BY.

Posted by Dan on Thursday July 11 2002, @6:13pm

[Delete] [Edit]

i'm rapidly learning a lot of little tips and
tricks for getting information out of mysql. One
of the things I *defintely* think would be good
to include is the ability to extract and average
of a set of "sum()" values. E.g. If I have a
logging system that logs the number of hits to
various parts of my website per day, then:

select date, sum(hits) from stats group by date;

would give me the total number of hits for every
day. what i'd *like* to be able to do then is say:

select avg(sum(hits)) from stats group by date;

to give me the daily average hits to my site.

Posted by Garrett Casey on Friday May 17 2002, @6:24am

[Delete] [Edit]

Your MySQL connection id is 604530 to server
version: 3.22.32

mysql> select COUNT(DISTINCT results) from
student;
ERROR 1064: You have an error in your SQL syntax
near 'DISTINCT results) from student' at line 1

Posted by Matthew Waygood on Friday May 17 2002, @6:24am

[Delete] [Edit]

Disappointed with the GROUP BY taking precedence
over the ORDER BY. Would be much easier to see a
SELECT FIRST, or allowing Order By to be placed
ahead of GROUP BY

Posted by Matthew Waygood on Friday May 17 2002, @6:24am

[Delete] [Edit]

And for all of you unsure why. A table containing
a series of Events each with a unique ID,
Start_Date, Event_Type, Description. To select
the next event for each of the different types of
event, you need to :-
SELECT *
FROM table
ORDER BY start_date
GROUP BY event_type

BUT since GROUP BY is needed before ORDER BY then
the result the query actually gives you a list of
the first occurence of each event in the table,
reguardless of when they start.
The only work around I can see is to order the
table in one query and then group those results
in another query???, or run a query for each
event_type.

Posted by Filip Wolak on Friday May 17 2002, @6:24am

[Delete] [Edit]

nice feature would be something that will work as
SUM(), but on text results. something like join
(',', $array) in PHP. The result will be a joined
set, with and delimitter of text results.
i tried CONCAT and CONCAT_WS, but does not work
the way i want. or maybe i do something wrong

Posted by sylvain burgert on Friday May 17 2002, @6:24am

[Delete] [Edit]

About Matthew Waygood 's question : you can take
in your SELECT clause :
MAX(CONCAT(field1,"-",field2)) AS field3
if you want field2's value for field1's maximum...
Maybee you'll have to format field1 to have the
CONCAT sort in the right order.
Then you "explode" the value of field3 to get
field2's value.

Posted by on Wednesday December 18 2002, @5:28pm

[Delete] [Edit]

NULL handling seems to be a bit iffy in the
HAVING clause, which you might compensate for by
using IFNULL:
Suppose you have a Users table with column
(of unique) Id and a Logins table with Id,
LastUse colums such that Id's in Logins may be
repeated or might not appear at all. Then if you
wanted to know only about users not active in the
last three weeks you might think to write:
SELECT Users.Id, UNIX_TIMESTAMP()-UNIX_TIMESTAMP
(MAX(LastUse)) AS Diff FROM Users LEFT JOIN
Logins USING (Id) GROUP BY Users.Id HAVING
(Diff IS NULL) OR (Diff>504*3600);
which will give you nothing. Solve
the problem by using IFNULL in either the SELECT
or HAVING clause:
SELECT Users.Id, IFNULL(UNIX_TIMESTAMP()-
UNIX_TIMESTAMP(MAX(LastUse)),0) AS Diff FROM
Users LEFT JOIN Logins USING (Id) GROUP BY
Users.Id HAVING (Diff=0 OR Diff>504*3600);

Posted by Alan McKay on Friday June 14 2002, @9:45am

[Delete] [Edit]

I see there are functions for Min, Max, Average
and Standard Deviation, but what about Median? Is
there some way to do that?

Posted by Phytia King on Thursday August 15 2002, @6:04pm

[Delete] [Edit]

is there a command for restricting the number of
results of your select statement?

Posted by Phytia King on Thursday August 15 2002, @6:04pm

[Delete] [Edit]

is there a command for restricting the number of
results of your select statement?

Posted by Rich Brome on Sunday August 18 2002, @10:57am

[Delete] [Edit]

I agree with others that some kind of CONCAT()
functionality with GROUP BY would be extremely
useful. For example, say I have a table of people,
and table of medical conditions they might have. It's
a many-to-many relationship, and so there's a table
in-between that links people to a small set of
medical conditions. If I want a list of people, grouped
by their combinations of conditions, how would I do
that? For example, people who just have bad
hearing and migraines (and no other conditions)
should be grouped together. A CONCAT() function
with GROUP BY would let me group by a result
column containing values like "Bad Hearing /
Migraines". Then my application could easily display a
list of combinations of mdeical conditions, with
people grouped under each combination.

Posted by on Wednesday August 21 2002, @10:00pm

[Delete] [Edit]

Phytia: Use LIMIT 50 to limit the number of rows to
the first 50 returned. If you want to start at some
other position than 1, then use LIMIT X, Y to return
Y rows from the Xth row.

e.g. LIMIT 2,2 gives rows 2 and 3 of the full result
set.

NB LIMIT 50 is the same as LIMIT 1,50

Posted by [name withheld] on Tuesday September 10 2002, @8:04am

[Delete] [Edit]

This Query works fine on the console, but via ODBC
it returns no rows. SELECT UserID, Count(*) AS
Expr1 FROM messageboard where
messageboard.UserID<>'Admin' GROUP BY
messageboard.UserID ORDER BY Expr1 DESC LIMIT
40;

Posted by [name withheld] on Wednesday December 18 2002, @5:27pm

[Delete] [Edit]

i have one table with 4 fields
(nik,name,child_name,child_number)
i wan to query that the result will be like this
nik name child1 child2 child3 child4
-------------------------------------
1 pai mai main mian naim
2 pay mia mau

in oracle i can do this query
select nik,name,
max(decode(child_number,1,child_name))
child1,
max(decode(child_number,2,child_name))
child2,
max(decode(child_number,3,child_name))
child3,
max(decode(child_number,4,child_name))
child4,
from test
group by nik,name

and its work

now i want to try in MYSQL
but not work
i use query like this

SELECT nik, name,
max(case when child_number='1' THEN child_name
end) child1,
max(case when child_number='2' THEN child_name
end) child2,
max(case when child_number='3' THEN child_name
end) child3,
max(case when child_number='4' THEN child_name
end) child4
FROM test group by nik,name

but the result is
+------+-------+--------+--------+--------+--------
+
| nik | name | child1 | child2 | child3 | child4 |
+------+-------+--------+--------+--------+--------
+
| 1 | pai | mai | NULL | NULL | NULL |
| 2 | mai | NULL | NULL | NULL | NULL
|
+------+-------+--------+--------+--------+--------
+

why ? any body know ?

Posted by [name withheld] on Wednesday December 18 2002, @5:27pm

[Delete] [Edit]

Kevin A.
DUPLICATE RECORDS How to
find
them
While DISTINCT works for finding duplicates color='red'> IN ONE COLUMN
color='black'>it does not
allow you to add other information to your query,
try:
CREATE TABLE Distinct_Address AS SELECT
StreetAddress, COUNT(StreetAddress)
AS
HouseHoldCount,First,Middle,Last,StreetAddress,City,
State,Zip,Phone
FROM Voters2002
GROUP BY StreetAddress
HAVING COUNT(Voters2002.StreetAddress)
>=1;
This will create a table with a HOUSE HOLD count
and only ONE address. This can be used as a mailing
list where you only want to send ONE piece of mail
to a house hold.

Posted by on Wednesday December 18 2002, @5:28pm

[Delete] [Edit]

FINDIND DUPLICATED RECORDS
------------------------------------------------
Well, I don't know toomuch about this but it works
for me, and fast enough.
SELECT columnname, COUNT(columnname)
FROM
tablename GROUP BY columnname HAVING COUNT
(columnname)>1
Bye community!

Posted by on Sunday September 29 2002, @3:35pm

[Delete] [Edit]

To select the latest 5 unique topics, using a datetime
field called time, use something like: "select topic,
max(time) as m from nuke_stories group by topic
order by m desc limit 5;"

Posted by on Thursday October 3 2002, @11:53pm

[Delete] [Edit]

Another take on MEDIAN: The SELECT statement
listed in a comment above does not work for all
lists of values (especially when the median value
is not unique in the list, or when the median
should be the average of the two middle-most
values in an even-sized list). Here is a
three-step process that has worked for me:

CREATE TEMPORARY TABLE medians SELECT x.val FROM
data x, data y GROUP BY x.val HAVING
((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val))))) <=
floor((COUNT(*) +1)/2)) and
((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val))))) <=
floor((COUNT(*) +1)/2));

The resulting medians table will either have 1 or
2 rows. If the size of your original data is an
odd number or the two middle-most data values are
the same there will
only be 1 row. If the size of
your original data is an even number and the
middle data values are not equal, there will be 2
rows. Take the average to get the true median:

SELECT AVG(val) AS median FROM medians;

The last step is to drop your temporary table
medians:

DROP TABLE medians;

I couldn't think of another way to compute the
MEDIAN without using a temporary table. If you
know how to optimize my technique, I would love to
hear from you.

Posted by on Sunday October 6 2002, @12:52am

[Delete] [Edit]

Is it just me that finds the (lacking) description
of GROUP BY hard to follow?

> MySQL has extended the use of GROUP BY.
(okay)
> You can use columns or calculations in the
SELECT expressions that don't appear in the GROUP
BY part.
(fine)
> This stands for any possible value for this group.
(what?)

> Don't use this feature if the columns you omit
from the GROUP BY part aren't unique in the group!
You will get unpredictable results.

What does this last bit mean? "aren't unique in
the group" .. should that not read "aren't
IDENTICAL in the group"???

Good feature though, the ANSI GROUP BY is too
redundant, and limits you to 10 GROUP columns,
which while high, is still a limit.

Posted by Raoul Mengis on Saturday February 8 2003, @4:34am

[Delete] [Edit]

this function with mysql not get correctly!
... ORDER BY RIGHT(index, 1) ASC;

Add your own comment.

Top / Previous / Next / Up / Table of Contents