Problem
Table: Visits
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| visit_date | date |
+---------------+---------+
(user_id, visit_date) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that user_id has visited the bank in visit_date.Table: Transactions
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| user_id | int |
| transaction_date | date |
| amount | int |
+------------------+---------+
This table may contain duplicate rows.
Each row of this table indicates that user_id has made a transaction of the amount in transaction_date.
It is guaranteed that the user has visited the bank in the transaction_date.(i.e The Visits table contains (user_id, transaction_date) in one row)Problem Definition
A bank wants to draw a chart of the number of transactions bank visitors did in one visit to the bank and the corresponding number of visitors who have done this number of transactions in one visit.
Write a solution to find how many users visited the bank and didn’t do any transactions, how many visited the bank and did one transaction, and so on.
The result table will contain two columns:
transactions_countwhich is the number of transactions done in one visit.visits_countwhich is the corresponding number of users who didtransactions_countin one visit to the bank.
transactions_count should take all values from 0 to max(transactions_count) done by one or more users.
Return the result table ordered by transactions_count.
Example
Output
Image scaled to 50%
Try It Yourself
Database Exercise
Database Schema:
-- Database schema would be rendered hereExercise Script:
-- Exercise script would be rendered hereAvailable actions: Execute
Solution
To analyze the transaction behavior of bank visitors, we aim to determine how many transactions each visitor made during their visits and summarize this data to understand the distribution of transaction counts across all visits. Specifically, we want to generate a report that shows:
- The number of transactions (
transactions_count) a visitor made in a single visit. - The number of visitors (
visits_count) who made that number of transactions in one visit.
This analysis helps the bank understand customer engagement and transaction patterns, which can inform service improvements and marketing strategies.
Approach Overview
-
Generate a Sequence of Possible Transaction Counts (
tCTE):- Create a list of all possible transaction counts from
0up to the total number of transactions to ensure that even if no visits have a certain number of transactions, the count for that transaction number is still represented.
- Create a list of all possible transaction counts from
-
Calculate Transactions per Visit (
t1CTE):- For each visit, count the number of transactions made by the visitor. Include visits with zero transactions by performing a
LEFT JOINbetweenVisitsandTransactions.
- For each visit, count the number of transactions made by the visitor. Include visits with zero transactions by performing a
-
Aggregate Visits by Transaction Count:
- Join the sequence of possible transaction counts with the actual transaction counts per visit and count how many visits correspond to each transaction count.
-
Finalize the Results:
- Present the
transactions_countalongside the correspondingvisits_count, ensuring that all transaction counts from0tomax(transactions_count)are included and ordered accordingly.
- Present the
SQL Query
WITH t AS (
SELECT Row_number() OVER() row_num
FROM Transactions
UNION
SELECT 0
),
t1 AS (
SELECT Count(transaction_date) transaction_count
FROM Visits v
LEFT JOIN Transactions t
ON v.user_id = t.user_id
AND v.visit_date = transaction_date
GROUP BY v.user_id,
v.visit_date
)
SELECT row_num AS Transactions_count,
Count(transaction_count) visits_count
FROM t
LEFT JOIN t1
ON row_num = transaction_count
GROUP BY row_num
HAVING row_num <= (SELECT Max(transaction_count)
FROM t1)
ORDER BY row_num;Step-by-Step Approach
Step 1: Generate the t Common Table Expression (CTE)
Create a sequence of numbers representing possible transaction counts, including 0. This ensures that even if no visits have a certain number of transactions, the count for that transaction number is still represented as 0.
SQL Snippet:
WITH t AS (
SELECT Row_number() OVER() row_num
FROM Transactions
UNION
SELECT 0
)Explanation:
-
Row_number() OVER() row_num:- Assigns a unique sequential number (
row_num) to each row in theTransactionstable. If there areNtransactions,row_numranges from1toN.
- Assigns a unique sequential number (
-
UNION SELECT 0:- Adds
0to the list ofrow_numvalues to account for visits with no transactions.
- Adds
Intermediate Output After Step 1 (t CTE):
+---------+
| row_num |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 0 |
+---------+Step 2: Calculate Transactions per Visit (t1 CTE)
Determine the number of transactions each visitor made during each visit to the bank.
SQL Snippet:
t1 AS (
SELECT Count(transaction_date) transaction_count
FROM Visits v
LEFT JOIN Transactions t
ON v.user_id = t.user_id
AND v.visit_date = transaction_date
GROUP BY v.user_id,
v.visit_date
)Explanation:
-
LEFT JOIN Transactions t ON v.user_id = t.user_id AND v.visit_date = transaction_date:- Joins each visit with its corresponding transactions based on
user_idandvisit_date. This ensures that visits without any transactions are included withNULLvalues for transactions.
- Joins each visit with its corresponding transactions based on
-
COUNT(transaction_date) AS transaction_count:- Counts the number of transactions for each visit. Since it’s a
LEFT JOIN, visits with no transactions will have atransaction_countof0.
- Counts the number of transactions for each visit. Since it’s a
-
GROUP BY v.user_id, v.visit_date:- Aggregates the data for each unique visit.
Intermediate Output After Step 2 (t1 CTE):
Based on the example data:
| user_id | visit_date | transaction_count |
|---|---|---|
| 1 | 2020-01-01 | 0 |
| 2 | 2020-01-02 | 0 |
| 12 | 2020-01-01 | 0 |
| 19 | 2020-01-03 | 0 |
| 1 | 2020-01-02 | 1 |
| 2 | 2020-01-03 | 1 |
| 1 | 2020-01-04 | 1 |
| 7 | 2020-01-11 | 1 |
| 9 | 2020-01-25 | 3 |
| 8 | 2020-01-28 | 1 |
Step 3: Aggregate Visits by Transaction Count
Determine how many visits correspond to each number of transactions (transactions_count), ensuring that all possible counts from 0 to the maximum observed are included.
SQL Snippet:
SELECT row_num AS Transactions_count,
Count(transaction_count) visits_count
FROM t
LEFT JOIN t1
ON row_num = transaction_count
GROUP BY row_num
HAVING row_num <= (SELECT Max(transaction_count)
FROM t1)
ORDER BY row_num;Explanation:
-
LEFT JOIN t1 ON row_num = transaction_count:- Joins the sequence of possible transaction counts (
t) with the actual transaction counts per visit (t1).
- Joins the sequence of possible transaction counts (
-
COUNT(transaction_count) AS visits_count:- Counts the number of visits that have a specific
transactions_count. If no visits have a certaintransactions_count, the count will be0.
- Counts the number of visits that have a specific
-
GROUP BY row_num:- Aggregates the data based on each transaction count.
-
HAVING row_num <= (SELECT Max(transaction_count) FROM t1):- Ensures that only relevant transaction counts (up to the maximum observed) are included in the final result.
-
ORDER BY row_num:- Sorts the results by
transactions_countin ascending order for clarity.
- Sorts the results by
Final Output:
Based on the example data, the final output will be:
+--------------------+--------------+
| Transactions_count | visits_count |
+--------------------+--------------+
|0 |4 |
|1 |5 |
|2 |0 |
|3 |1 |
+--------------------+--------------+