Problem
Table: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
| grade | int |
+---------------+---------+
student_id is the primary key for this table.
This table contains the information of students in a high school.Table: Attendance
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| attendance_date | date |
| status | enum |
+---------------+---------+
(student_id, attendance_date) is the primary key for this table.
status is an ENUM of type ('Present', 'Absent').Problem Definition
Write a solution to find the students who were absent for three or more consecutive days.
Return the result table ordered by student_id and start_date, where start_date is the first date of the consecutive absences.
Example
Input:
Students table:
+------------+--------------+-------+
| student_id | student_name | grade |
+------------+--------------+-------+
| 1 | Alice | 10 |
| 2 | Bob | 11 |
| 3 | Charlie | 10 |
+------------+--------------+-------+Attendance table:
+------------+-----------------+--------+
| student_id | attendance_date | status |
+------------+-----------------+--------+
| 1 | 2020-11-01 | Absent |
| 1 | 2020-11-02 | Absent |
| 1 | 2020-11-03 | Absent |
| 2 | 2020-11-01 | Present|
| 2 | 2020-11-02 | Absent |
| 2 | 2020-11-03 | Absent |
| 2 | 2020-11-04 | Absent |
| 3 | 2020-11-01 | Present|
| 3 | 2020-11-02 | Present|
+------------+-----------------+--------+Output:
+------------+-------------+
| student_id | start_date |
+------------+-------------+
| 1 | 2020-11-01 |
| 2 | 2020-11-02 |
+------------+-------------+Try It Yourself
Database Exercise
Database Schema:
-- Database schema would be rendered hereExercise Script:
-- Exercise script would be rendered hereAvailable actions: Execute
Solution
To identify students who were absent for three or more consecutive days, we can utilize SQL’s window functions and Common Table Expressions (CTEs). This approach allows us to efficiently analyze attendance records, detect consecutive absences, and extract the required information for each qualifying student.
- Filter Absent Records: Isolate records where the student’s status is ‘Absent’ to focus solely on absences.
- Assign Sequential Ranks: Use
ROW_NUMBER()to assign a sequential number to each absence per student based on theattendance_date. - Identify Consecutive Absences: Calculate a grouping identifier by subtracting the row number from the sequential rank. This helps in grouping consecutive absences.
- Aggregate Consecutive Absence Groups: Group the records based on the student and the calculated grouping identifier to identify sequences of consecutive absences.
- Filter Groups with Three or More Consecutive Absences: Select only those groups where the count of consecutive absences is three or more.
- Determine the Start Date of Absence Streaks: For each qualifying group, identify the earliest
attendance_dateas thestart_dateof the consecutive absence streak. - Order the Results: Present the final list ordered by
student_idandstart_datefor clarity.
SQL Query
WITH RankedAttendance AS (
SELECT
student_id,
attendance_date,
status,
ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank
FROM Attendance
WHERE status = 'Absent'
),
ConsecutiveAttendance AS (
SELECT
student_id,
MIN(attendance_date) AS start_date,
COUNT(*) AS consecutive_days
FROM (
SELECT
student_id,
attendance_date,
s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col
FROM RankedAttendance
) AS grouped_absences
GROUP BY student_id, grouping_col
HAVING COUNT(*) >= 3
)
SELECT student_id, start_date
FROM ConsecutiveAttendance
ORDER BY student_id, start_date;Step-by-Step Approach
Step 1: Filter Absent Records and Assign Sequential Ranks
Extract all absence records and assign a sequential rank to each absence per student based on the attendance_date. This ranking facilitates the detection of consecutive absences.
SQL Query:
WITH RankedAttendance AS (
SELECT
student_id,
attendance_date,
status,
ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank
FROM Attendance
WHERE status = 'Absent'
)Explanation:
-
WITH RankedAttendance AS (...):- Defines a Common Table Expression (CTE) named
RankedAttendanceto store intermediate results.
- Defines a Common Table Expression (CTE) named
-
SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (...) AS s_rank:-
student_id&attendance_date&status:- Selects relevant columns from the
Attendancetable.
- Selects relevant columns from the
-
ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank:- Assigns a sequential number (
s_rank) to each absence perstudent_idordered byattendance_date. This helps in identifying the sequence of absences.
- Assigns a sequential number (
-
-
FROM Attendance:- Specifies the
Attendancetable as the data source.
- Specifies the
-
WHERE status = 'Absent':- Filters the records to include only those where the student’s status is ‘Absent’.
Output After Step 1:
Assuming the example input provided, the RankedAttendance CTE would produce:
+------------+-----------------+--------+--------+
| student_id | attendance_date | status | s_rank |
+------------+-----------------+--------+--------+
| 1 | 2020-11-01 | Absent | 1 |
| 1 | 2020-11-02 | Absent | 2 |
| 1 | 2020-11-03 | Absent | 3 |
| 2 | 2020-11-02 | Absent | 1 |
| 2 | 2020-11-03 | Absent | 2 |
| 2 | 2020-11-04 | Absent | 3 |
+------------+-----------------+--------+--------+Step 2: Calculate Grouping Identifier to Detect Consecutive Absences
Compute a grouping identifier (grouping_col) to help group consecutive absences. By subtracting the row number from the sequential rank, consecutive dates will have the same grouping_col.
SQL Query:
ConsecutiveAttendance AS (
SELECT
student_id,
MIN(attendance_date) AS start_date,
COUNT(*) AS consecutive_days
FROM (
SELECT
student_id,
attendance_date,
s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col
FROM RankedAttendance
) AS grouped_absences
GROUP BY student_id, grouping_col
HAVING COUNT(*) >= 3
)Explanation:
-
SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days:-
student_id:- Identifies the student.
-
MIN(attendance_date) AS start_date:- Determines the earliest date in the consecutive absence streak.
-
COUNT(*) AS consecutive_days:- Counts the number of consecutive absent days in the group.
-
-
FROM (...) AS grouped_absences:- Utilizes a subquery to compute the
grouping_col.
- Utilizes a subquery to compute the
-
SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (...) AS grouping_col:s_rank - ROW_NUMBER() OVER (...) AS grouping_col:- Calculates the
grouping_colby subtracting the row number from the sequential rank. Consecutive dates will result in the samegrouping_col, effectively grouping them together.
- Calculates the
-
FROM RankedAttendance:- References the
RankedAttendanceCTE from Step 1.
- References the
-
GROUP BY student_id, grouping_col:- Groups the records by
student_idandgrouping_colto identify distinct consecutive absence streaks.
- Groups the records by
-
HAVING COUNT(*) >= 3:- Filters the groups to include only those where the count of consecutive absences is three or more.
Output After Step 2:
Based on the example input, the ConsecutiveAttendance CTE would produce:
+------------+-------------+-----------------+
| student_id | start_date | consecutive_days|
+------------+-------------+-----------------+
| 1 | 2020-11-01 | 3 |
| 2 | 2020-11-02 | 3 |
+------------+-------------+-----------------+Step 3: Select and Order the Final Results
Retrieve the student_id and start_date of each qualifying student who was absent for three or more consecutive days. Order the results by student_id and start_date.
SQL Query:
SELECT student_id, start_date
FROM ConsecutiveAttendance
ORDER BY student_id, start_date;Explanation:
-
SELECT student_id, start_date:- Selects the
student_idand the correspondingstart_dateof the consecutive absence streak.
- Selects the
-
FROM ConsecutiveAttendance:- Utilizes the
ConsecutiveAttendanceCTE from Step 2, which contains the grouped consecutive absences.
- Utilizes the
-
ORDER BY student_id, start_date:- Orders the final output first by
student_idin ascending order and then bystart_dateto organize the results clearly.
- Orders the final output first by
Final Output:
+------------+-------------+
| student_id | start_date |
+------------+-------------+
| 1 | 2020-11-01 |
| 2 | 2020-11-02 |
+------------+-------------+