Problem Statement
Table: CollaborationAccepted
Each row in this table records an instance where one employee invites another to collaborate on a project, and the invitation is accepted.
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| inviter_id | int |
| invitee_id | int |
| project_id | int |
| accept_date | date |
+----------------+---------+
The combination of (inviter_id, invitee_id, project_id) is the primary key for this table.
This table includes the ID of the inviter, the ID of the invitee, the project they are collaborating on, and the date the invitation was accepted.Develop a solution to find the employee(s) who have collaborated on the most number of unique projects. In case of a tie, list all such employees.
Example
Input:
CollaborationAccepted table:
+------------+------------+------------+-------------+
| inviter_id | invitee_id | project_id | accept_date |
+------------+------------+------------+-------------+
| 1 | 2 | 100 | 2020-05-01 |
| 1 | 3 | 101 | 2020-05-02 |
| 2 | 3 | 102 | 2020-05-03 |
| 3 | 4 | 103 | 2020-05-04 |
| 2 | 4 | 104 | 2020-05-05 |
+------------+------------+------------+-------------+Output:
+----+--------------+
| id | num_projects |
+----+--------------+
| 2 | 3 |
| 3 | 3 |
+----+--------------+In this example, both employees 2 and 3 have collaborated on 3 unique projects, which is the highest among all employees.
Try It yourself
Database Exercise
Database Schema:
-- Database schema would be rendered hereExercise Script:
-- Exercise script would be rendered hereAvailable actions: Execute
Solution
To identify the employee(s) who have collaborated on the most number of unique projects, we analyze the CollaborationAccepted table. We can accomplishes this by leveraging Common Table Expressions (CTEs) and set operations to aggregate and determine the desired results. Below is a comprehensive breakdown of the approach, step-by-step execution, and detailed explanations for each line of the SQL query.
Approach Overview
- Combine Collaborations: Merge the inviter and invitee collaborations to treat all collaborations uniformly.
- Count Unique Projects per Employee: Calculate the number of distinct projects each employee has collaborated on.
- Identify Maximum Project Count: Determine the highest number of unique projects any employee has collaborated on.
- Select Top Collaborators: Retrieve the employee(s) whose project count matches the maximum identified.
- Present the Final Results: Display the employee IDs alongside their respective number of unique projects.
SQL Query
WITH AllCollaborations AS (
-- Combine inviter and invitee collaborations
SELECT inviter_id AS id, project_id
FROM CollaborationAccepted
UNION
SELECT invitee_id AS id, project_id
FROM CollaborationAccepted
),
ProjectCounts AS (
-- Count unique projects per employee
SELECT id, COUNT(DISTINCT project_id) AS num_projects
FROM AllCollaborations
GROUP BY id
)
-- Select employees with the maximum number of projects
SELECT id, num_projects
FROM ProjectCounts
WHERE num_projects = (
SELECT MAX(num_projects)
FROM ProjectCounts
)
ORDER BY id;Step-by-Step Approach
Step 1: Combine Collaborations (AllCollaborations)
Merge the inviter and invitee collaborations to create a unified view of all collaborations, treating both roles equally.
SQL Snippet:
WITH AllCollaborations AS (
-- Combine inviter and invitee collaborations
SELECT inviter_id AS id, project_id
FROM CollaborationAccepted
UNION
SELECT invitee_id AS id, project_id
FROM CollaborationAccepted
),Explanation:
-
WITH AllCollaborations AS (- Initiates a Common Table Expression (CTE) named
AllCollaborations. CTEs allow for temporary result sets that can be referenced within the main query.
- Initiates a Common Table Expression (CTE) named
-
SELECT inviter_id AS id, project_id- Selects the
inviter_idfrom theCollaborationAcceptedtable and aliases it asid. - Retrieves the associated
project_idfor each invitation.
- Selects the
-
UNION- Combines the results of the first
SELECTwith the secondSELECT. - The
UNIONoperator ensures that duplicate records are eliminated, resulting in a list of unique (id, project_id) pairs.
- Combines the results of the first
-
SELECT invitee_id AS id, project_id- Selects the
invitee_idfrom theCollaborationAcceptedtable and aliases it asid. - Retrieves the associated
project_idfor each accepted invitation.
- Selects the
Intermediate Output After Step 1 (AllCollaborations):
+----+------------+
| id | project_id |
+----+------------+
| 1 | 100 |
| 2 | 100 |
| 1 | 101 |
| 3 | 101 |
| 2 | 102 |
| 3 | 102 |
| 3 | 103 |
| 4 | 103 |
| 2 | 104 |
| 4 | 104 |
+----+------------+Step 2: Count Unique Projects per Employee (ProjectCounts)
Calculate the number of distinct projects each employee has collaborated on.
SQL Snippet:
ProjectCounts AS (
-- Count unique projects per employee
SELECT id, COUNT(DISTINCT project_id) AS num_projects
FROM AllCollaborations
GROUP BY id
)Explanation:
-
ProjectCounts AS (- Initiates a second CTE named
ProjectCounts.
- Initiates a second CTE named
-
SELECT id, COUNT(DISTINCT project_id) AS num_projects- Selects the
id(employee ID) from theAllCollaborationsCTE. - Counts the number of distinct
project_ids each employee has collaborated on. - Aliases the count as
num_projects.
- Selects the
-
GROUP BY id- Groups the data by
idto perform the aggregation for each employee.
- Groups the data by
Intermediate Output After Step 2 (ProjectCounts):
+----+--------------+
| id | num_projects |
+----+--------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 3 |
| 4 | 2 |
+----+--------------+Step 3: Identify Maximum Project Count
Determine the highest number of unique projects any employee has collaborated on.
SQL Snippet:
SELECT id, num_projects
FROM ProjectCounts
WHERE num_projects = (
SELECT MAX(num_projects)
FROM ProjectCounts
)Explanation:
-
SELECT id, num_projects- Selects the
idandnum_projectscolumns from theProjectCountsCTE.
- Selects the
-
WHERE num_projects = (- Introduces a condition to filter employees whose
num_projectsequals the maximum found.
- Introduces a condition to filter employees whose
-
SELECT MAX(num_projects)- Selects the highest value of
num_projectsacross all employees.
- Selects the highest value of
-
ORDER BY id;- Orders the final result set by
idin ascending order for clarity.
- Orders the final result set by
Final Output:
+----+--------------+
| id | num_projects |
+----+--------------+
| 2 | 3 |
| 3 | 3 |
+----+--------------+Explanation of Output:
-
Employee 2:
- Collaborated on 3 unique projects (100, 102, 104)
-
Employee 3:
- Collaborated on 3 unique projects (101, 102, 103)