The PARTITION BY clause sets the range of records that will be used for each "GROUP" within the OVER clause.
In your example SQL, DEPT_COUNT will return the number of employees within that department for every employee record. (It is as if you're de-nomalising the emp table; you still return every record in the emp table.)
emp_no dept_no DEPT_COUNT
1 10 3
2 10 3
3 10 3 <- three because there are three "dept_no = 10" records
4 20 2
5 20 2 <- two because there are two "dept_no = 20" records
If there was another column (e.g., state) then you could count how many departments in that State.
It is like getting the results of a GROUP BY (SUM, AVG, etc.) without the aggregating the result set (i.e. removing matching records).
It is useful when you use the LAST OVER or MIN OVER functions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary without a sub select, which is much faster.
Read the linked AskTom article for further details.
Answer from Guy on Stack OverflowVideos
The PARTITION BY clause sets the range of records that will be used for each "GROUP" within the OVER clause.
In your example SQL, DEPT_COUNT will return the number of employees within that department for every employee record. (It is as if you're de-nomalising the emp table; you still return every record in the emp table.)
emp_no dept_no DEPT_COUNT
1 10 3
2 10 3
3 10 3 <- three because there are three "dept_no = 10" records
4 20 2
5 20 2 <- two because there are two "dept_no = 20" records
If there was another column (e.g., state) then you could count how many departments in that State.
It is like getting the results of a GROUP BY (SUM, AVG, etc.) without the aggregating the result set (i.e. removing matching records).
It is useful when you use the LAST OVER or MIN OVER functions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary without a sub select, which is much faster.
Read the linked AskTom article for further details.
The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:
- Boss says "get me number of items we have in stock grouped by brand"
You say: "no problem"
SELECT
BRAND
,COUNT(ITEM_ID)
FROM
ITEMS
GROUP BY
BRAND;
Result:
+--------------+---------------+
| Brand | Count |
+--------------+---------------+
| H&M | 50 |
+--------------+---------------+
| Hugo Boss | 100 |
+--------------+---------------+
| No brand | 22 |
+--------------+---------------+
- The boss says "Now get me a list of all items, with their brand AND number of items that the respective brand has"
You may try:
SELECT
ITEM_NR
,BRAND
,COUNT(ITEM_ID)
FROM
ITEMS
GROUP BY
BRAND;
But you get:
ORA-00979: not a GROUP BY expression
This is where the OVER (PARTITION BY BRAND) comes in:
SELECT
ITEM_NR
,BRAND
,COUNT(ITEM_ID) OVER (PARTITION BY BRAND)
FROM
ITEMS;
Which means:
COUNT(ITEM_ID)- get the number of itemsOVER- Over the set of rows(PARTITION BY BRAND)- that have the same brand
And the result is:
+--------------+---------------+----------+
| Items | Brand | Count() |
+--------------+---------------+----------+
| Item 1 | Hugo Boss | 100 |
+--------------+---------------+----------+
| Item 2 | Hugo Boss | 100 |
+--------------+---------------+----------+
| Item 3 | No brand | 22 |
+--------------+---------------+----------+
| Item 4 | No brand | 22 |
+--------------+---------------+----------+
| Item 5 | H&M | 50 |
+--------------+---------------+----------+
etc...