Problem
Table: Stadium
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| visit_date | date |
| people | int |
+---------------+---------+
visit_date is the column with unique values for this table.
Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit.
As the id increases, the date increases as well.Problem Definition
Write a solution to display the records with three or more rows with consecutive id’s, and the number of people is greater than or equal to 100 for each.
Return the result table ordered by visit_date in ascending order.
Example
Image scaled to 60%
Output
Image scaled to 60%
Try It Yourself
Database Exercise
Database Schema:
-- Database schema would be rendered hereExercise Script:
-- Exercise script would be rendered hereAvailable actions: Execute
Solution
To identify and display records from the Stadium table where three or more consecutive id’s have a number of people greater than or equal to 100, we can utilize SQL’s window functions and Common Table Expressions (CTEs). This approach allows us to group consecutive qualifying records efficiently and filter out those groups that meet the specified criteria. The final results will be ordered by visit_date in ascending order.
Approach Overview
-
Filter Records with
people≥ 100:- Select only those records where the number of people is greater than or equal to 100. These are the records of interest for identifying consecutive groups.
-
Assign Group Identifiers for Consecutive
id’s:- Use the difference between the
idand a sequential row number to assign a unique group identifier (grp) for consecutive records. This technique ensures that consecutiveid’s share the samegrpvalue.
- Use the difference between the
-
Identify Groups with Three or More Consecutive Records:
- Group the filtered records by the calculated
grpand count the number of records in each group. Retain only those groups that have three or more records.
- Group the filtered records by the calculated
-
Retrieve the Desired Records:
- Join the qualifying groups back to the filtered records to extract the complete details (
id,visit_date,people) of each record within these groups.
- Join the qualifying groups back to the filtered records to extract the complete details (
-
Order the Results:
- Sort the final output by
visit_datein ascending order to present the data chronologically.
- Sort the final output by
SQL Query
WITH ConsecutiveGroups AS (
-- Step 1: Assign groups where `people` >= 100 and `id`'s are consecutive
SELECT
id,
visit_date,
people,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM
Stadium
WHERE
people >= 100
),
GroupedCounts AS (
-- Step 2: Count the number of rows in each group
SELECT
grp,
COUNT(*) AS group_size
FROM
ConsecutiveGroups
GROUP BY
grp
HAVING
COUNT(*) >= 3
)
-- Step 3: Retrieve rows from groups with 3 or more consecutive rows
SELECT
c.id,
c.visit_date,
c.people
FROM
ConsecutiveGroups c
JOIN
GroupedCounts g
ON
c.grp = g.grp
ORDER BY
c.visit_date;Step-by-Step Explanation
Step 1: Assigning Group Identifiers (ConsecutiveGroups CTE)
We begin by filtering the Stadium records to include only those where the people count is greater than or equal to 100. For these filtered records, we assign a group identifier (grp) based on the difference between the id and a sequential row number. This method ensures that consecutive id’s share the same grp value, effectively grouping them together.
WITH ConsecutiveGroups AS (
SELECT
id,
visit_date,
people,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM
Stadium
WHERE
people >= 100
)Explanation of Each Line:
-
WITH ConsecutiveGroups AS (- Initiates a Common Table Expression (CTE) named
ConsecutiveGroups.
- Initiates a Common Table Expression (CTE) named
-
SELECT id, visit_date, people,- Selects the
id,visit_date, andpeoplecolumns from theStadiumtable.
- Selects the
-
id - ROW_NUMBER() OVER (ORDER BY id) AS grp- Utilizes the
ROW_NUMBER()window function to assign a sequential number to each row based on the ascending order ofid. - Subtracts this row number from the
idto compute a group identifier (grp). For consecutiveid’s, this difference remains constant, thereby grouping them together.
- Utilizes the
-
FROM Stadium- Specifies the
Stadiumtable as the source of data.
- Specifies the
-
WHERE people >= 100- Filters the records to include only those where the
peoplecount is greater than or equal to 100.
- Filters the records to include only those where the
Intermediate Output After Step 1:
Based on the provided input data:
| id | visit_date | people | grp |
|---|---|---|---|
| 2 | 2017-01-02 | 109 | 1 |
| 3 | 2017-01-03 | 150 | 1 |
| 5 | 2017-01-05 | 145 | 2 |
| 6 | 2017-01-06 | 1455 | 2 |
| 7 | 2017-01-07 | 199 | 2 |
| 8 | 2017-01-08 | 188 | 2 |
Note:
- Record with
id= 4 is excluded becausepeople= 99, which is less than 100.
Step 2: Counting Records in Each Group (GroupedCounts CTE)
Next, we group the filtered records by the grp identifier and count the number of records in each group. We retain only those groups that have three or more records, as they meet the criteria of consecutive id’s with people ≥ 100.
GroupedCounts AS (
SELECT
grp,
COUNT(*) AS group_size
FROM
ConsecutiveGroups
GROUP BY
grp
HAVING
COUNT(*) >= 3
)Explanation of Each Line:
-
GroupedCounts AS (- Initiates another CTE named
GroupedCounts.
- Initiates another CTE named
-
SELECT grp, COUNT(*) AS group_size- Selects the
grpidentifier and counts the number of records in each group, aliasing it asgroup_size.
- Selects the
-
FROM ConsecutiveGroups- Specifies the
ConsecutiveGroupsCTE as the source of data.
- Specifies the
-
GROUP BY grp- Groups the records based on the
grpidentifier.
- Groups the records based on the
-
HAVING COUNT(*) >= 3- Filters the groups to include only those with three or more records.
Intermediate Output After Step 2:
| grp | group_size |
|---|---|
| 2 | 4 |
Step 3: Retrieving Qualified Records
Finally, we join the ConsecutiveGroups CTE with the GroupedCounts CTE on the grp identifier to extract all records that belong to groups with three or more consecutive id’s. The results are then ordered by visit_date in ascending order.
SELECT
c.id,
c.visit_date,
c.people
FROM
ConsecutiveGroups c
JOIN
GroupedCounts g
ON
c.grp = g.grp
ORDER BY
c.visit_date;Explanation of Each Line:
-
SELECT c.id, c.visit_date, c.people- Selects the
id,visit_date, andpeoplecolumns from theConsecutiveGroupsCTE.
- Selects the
-
JOIN GroupedCounts g ON c.grp = g.grp- Performs an inner join with the
GroupedCountsCTE (aliased asg) on thegrpidentifier. This ensures that only records from groups that have three or more consecutiveid’s are included.
- Performs an inner join with the
-
ORDER BY c.visit_date- Orders the final results by
visit_datein ascending order to present the data chronologically.
- Orders the final results by
Final Output:
Based on the provided input data, the final output will include records from groups that have three or more consecutive id’s with people ≥ 100, ordered by visit_date in ascending order.
+----+----------------------------+--------+
| id | visit_date | people |
+----+----------------------------+--------+
| 5 | 2017-01-05T00:00:00.000Z | 145 |
| 6 | 2017-01-06T00:00:00.000Z | 1455 |
| 7 | 2017-01-07T00:00:00.000Z | 199 |
| 8 | 2017-01-08T00:00:00.000Z | 188 |
+----+----------------------------+--------+