Revenue per customer
You've
been hired at Delivr as a data analyst! A customer just called Delivr's
Customer Support team; she wants to double-check whether her receipts
add up. Going by her receipts, she calculated that her total bill on
Delivr is $271, and she wants to make sure of that. Her user ID is 15.
Help the Customer Support team by calculating her total bill! Sum up
everything she's spent on Delivr orders; in other words, calculate the
total revenue that Delivr has generated from her.
-- Calculate revenue
SELECT SUM(orders.order_quantity * meals.meal_price) AS revenue
FROM meals
JOIN orders ON meals.meal_id = orders.meal_id
-- Keep only the records of customer ID 15
WHERE orders.user_id = 15;
Revenue per week
Delivr's
first full month of operations was June 2018. At launch, the Marketing
team ran an ad campaign on popular food channels on TV, with the number
of ads increasing each week through the end of the month. The Head of
Marketing asks you to help her assess that campaign's success.
Get the revenue per week for each week in June and check whether there's any consistent growth in revenue.
Note: Don't be surprised if you get a date in May in the result. DATE_TRUNC('week', '2018-06-02')
returns '2018-05-28'
, since '2018-06-02'
is a Saturday and the preceding Monday is on '2018-05-28'
.
- Write the expression for revenue.
- Keep only the records of June 2018.
SELECT DATE_TRUNC('week', order_date) :: DATE AS delivr_week,
-- Calculate revenue
Sum(meals.meal_price * orders.order_quantity) AS revenue
FROM meals
JOIN orders ON meals.meal_id = orders.meal_id
-- Keep only the records in June 2018
WHERE orders.order_date between '2018-06-01' And '2018-06-30'
GROUP BY delivr_week
ORDER BY delivr_week ASC;
Top meals by cost
Alice from Finance wants to know what Delivr's top 5 meals are by overall cost; in other words, Alice wants to know the 5 meals that Delivr has spent the most on for stocking.
You're provided with an aggregate query; you'll need to fill in the blanks to get the output Alice needs.
Note: Recall that in the meals
table, meal_price
is what the user pays Delivr for the meal, while meal_cost
is what Delivr pays its eateries to stock this meal.
SELECT
-- Calculate cost per meal ID
meals.meal_id,
sum(meals.meal_cost * stock.stocked_quantity) AS cost
FROM meals
JOIN stock ON meals.meal_id = stock.meal_id
GROUP BY meals.meal_id
ORDER BY cost DESC
-- Only the top 5 meal IDs by purchase cost
LIMIT 5;
Growth Rate and Delta
WITH orders AS (
SELECT
DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
-- Count the unique order IDs
count (distinct order_id) AS orders
FROM orders
GROUP BY delivr_month),
orders_with_lag AS (
SELECT
delivr_month,
-- Fetch each month's current and previous orders
orders,
COALESCE(
lag(orders) over(order by delivr_month),
1) AS last_orders
FROM orders)
SELECT
delivr_month,
-- Calculate the MoM order growth rate
ROUND(
(orders - last_orders) :: numeric / last_orders,
2) AS growth
FROM orders_with_lag
ORDER BY delivr_month ASC;
Retention rate
Bob's
requested your help again now that you're done with Carol's MAU
monitor. His meeting with potential investors is fast approaching, and
he wants to wrap up his pitch deck. You've already helped him with the
registrations running total by month and MAU line charts; the investors,
Bob says, would be convinced that Delivr is growing both in new users
and in MAUs.
However, Bob wants to show that Delivr not only attracts new users
but also retains existing users. Send him a table of MoM retention rates
so that he can highlight Delivr's high user loyalty.
WITH user_monthly_activity AS (
SELECT DISTINCT
DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
user_id
FROM orders)
SELECT
-- Calculate the MoM retention rates
previous.delivr_month,
round(
(count(DISTINCT current.user_id))::numeric / greatest(
count(DISTINCT previous.user_id),1),
2) as retention_rate
FROM user_monthly_activity AS previous
LEFT JOIN user_monthly_activity AS current
-- Fill in the user and month join conditions
ON previous.user_id = current.user_id
AND previous.delivr_month = (current.delivr_month - interval '1 month')
GROUP BY previous.delivr_month
ORDER BY previous.delivr_month ASC;
Average revenue per user
Dave
from Finance wants to study Delivr's performance in revenue and orders
per each of its user base. In other words, he wants to understand its
unit economics.
Help Dave kick off his study by calculating the overall average
revenue per user (ARPU) using the first way discussed in Lesson 3.1.
-- Create a CTE named kpi
with kpi as (
SELECT
-- Select the user ID and calculate revenue
user_id,
SUM(m.meal_price * o.order_quantity) AS revenue
FROM meals AS m
JOIN orders AS o ON m.meal_id = o.meal_id
GROUP BY user_id)
-- Calculate ARPU
SELECT ROUND(avg(revenue) :: numeric, 2) AS arpu
FROM kpi;
ARPU per week
Next,
Dave wants to see whether ARPU has increased over time. Even if
Delivr's revenue is increasing, it's not scaling well if its ARPU is
decreasing—it's generating less revenue from each of its customers.
Send Dave a table of ARPU by week using the second way discussed in Lesson 3.1.
WITH kpi AS (
SELECT
-- Select the week, revenue, and count of users
date_trunc('week', o.order_date) :: date AS delivr_week,
sum(m.meal_price * o.order_quantity ) AS revenue,
count(distinct o.user_id ) AS users
FROM meals AS m
JOIN orders AS o ON m.meal_id = o.meal_id
GROUP BY delivr_week)
SELECT
delivr_week,
-- Calculate ARPU
ROUND(
revenue :: numeric / users,
2) AS arpu
FROM kpi
-- Order by week in ascending order
ORDER BY delivr_week ASC;
Average orders per user
Dave
wants to add the average orders per user value to his unit economics
study, since more orders usually correspond to more revenue.
Calculate the average orders per user for Dave.
Note: The count of distinct orders is different than
the sum of ordered meals. One order can have many meals within it.
Average orders per user depends on the count of orders, not the sum of
ordered meals.
WITH kpi AS (
SELECT
-- Select the count of orders and users
count(distinct order_id) AS orders,
count(distinct user_id) AS users
FROM orders)
SELECT
-- Calculate the average orders per user
ROUND(
orders :: numeric / users,
2) AS arpu
FROM kpi;
Histogram of revenue
After
determining that Delivr is doing well at scaling its business model,
Dave wants to explore the distribution of revenues. He wants to see
whether the distribution is U-shaped or normal to see how best to
categorize users by the revenue they generate.
Send Dave a frequency table of revenues by user.
WITH user_revenues AS (
SELECT
-- Select the user ID and revenue
o.user_id,
sum(m.meal_price * o.order_quantity) AS revenue
FROM meals AS m
JOIN orders AS o ON m.meal_id = o.meal_id
GROUP BY user_id)
SELECT
-- Return the frequency table of revenues by user
round(revenue::numeric,-2) AS revenue_100,
count(distinct user_id) AS users
FROM user_revenues
GROUP BY revenue_100
ORDER BY revenue_100 ASC;
Histogram of orders
Dave also wants to plot the histogram of orders to see if it matches the shape of the histogram of revenues.
Send Dave a frequency table of orders by user.
WITH user_orders AS (
SELECT
user_id,
COUNT(DISTINCT order_id) AS orders
FROM orders
GROUP BY user_id)
SELECT
-- Return the frequency table of orders by user
orders,
count(distinct user_id) AS users
FROM user_orders
GROUP BY orders
ORDER BY orders ASC;
Bucketing users by revenue
Based
on his analysis, Dave identified that $150 is a good cut-off for
low-revenue users, and $300 is a good cut-off for mid-revenue users. He
wants to find the number of users in each category to tweak Delivr's
business model.
Split the users into low, mid, and high-revenue buckets, and return the count of users in each group.
with user_revenues as(
select
user_id,
sum(m.meal_price * o.order_quantity) as revenue
from meals as m
join orders as o
on m.meal_id = o.meal_id
group by o.user_id
)
select
CASE
when revenue <150 then 'Low-revenue users'
when revenue <300 then 'Mid-revenue users'
Else 'High-revenue users'
END as revenue_group,
count(distinct user_id) as users
from user_revenues
group by revenue_group;
Bucketing users by orders
Dave
is repeating his bucketing analysis on orders to have a more complete
profile of each group. He determined that 8 orders is a good cut-off for
the low-orders group, and 15 is a good cut-off for the medium orders
group.
Send Dave a table of each order group and how many users are in it.
-- Store each user's count of orders in a CTE named user_orders
with user_orders as (
SELECT
user_id,
count(distinct order_id) AS orders
FROM orders
GROUP BY user_id)
SELECT
-- Write the conditions for the three buckets
CASE
WHEN orders<8 THEN 'Low-orders users'
WHEN orders<15 THEN 'Mid-orders users'
ELSE 'High-orders users'
END AS order_group,
-- Count the distinct users in each bucket
count(distinct user_id) AS users
FROM user_orders
GROUP BY order_group;
How to find percentiles/quartiles.
Revenue quartiles
Dave
is wrapping up his study, and wants to calculate a few more figures. He
wants to find out the first, second, and third revenue quartiles. He
also wants to find the average to see in which direction the data is
skewed.
Calculate the first, second, and third revenue quartiles, as well as the average.
Note: You can calculate the 30th percentile for a column named column_a
by using PERCENTILE_CONT(0.30) WITHIN GROUP (ORDER BY column_a ASC)
.
WITH user_revenues AS (
-- Select the user IDs and their revenues
SELECT
o.user_id,
sum(m.meal_price * o.order_quantity) AS revenue
FROM meals AS m
JOIN orders AS o ON m.meal_id = o.meal_id
GROUP BY user_id)
SELECT
-- Calculate the first, second, and third quartile
ROUND(
percentile_cont(0.25) within group (order by revenue ASC) :: numeric,
2) AS revenue_p25,
ROUND(
percentile_cont(0.50) within group (order by revenue ASC) :: numeric,
2) AS revenue_p50,
ROUND(
percentile_cont (0.75) within group (order by revenue ASC):: numeric,
2) AS revenue_p75,
-- Calculate the average
ROUND(avg(revenue) :: numeric, 2) AS avg_revenue
FROM user_revenues;
25 percent of the users have revenue below 120.69. To interpret the data, we must remember that we always want to find the opposite of the column we are working with. 120 is the revenue. so the opposite column is users. so 25 percent users are bellow 120... it must be always percentile + opposite column + below + result.
Interquartile range
The
final value that Dave wants is the count of users in the revenue
interquartile range (IQR). Users outside the revenue IQR are outliers,
and Dave wants to know the number of "typical" users.
Return the count of users in the revenue IQR.
WITH user_revenues AS (
SELECT
-- Select user_id and calculate revenue by user
user_id,
SUM(m.meal_price * o.order_quantity) AS revenue
FROM meals AS m
JOIN orders AS o
ON m.meal_id = o.meal_id
GROUP BY user_id),
quartiles AS (
SELECT
-- Calculate the first and third revenue quartiles
ROUND(
PERCENTILE_CONT(0.25) WITHIN GROUP
(ORDER BY revenue ASC) :: NUMERIC,
2) AS revenue_p25,
ROUND(
PERCENTILE_CONT(0.75) WITHIN GROUP
(ORDER BY revenue ASC) :: NUMERIC,
2) AS revenue_p75
FROM user_revenues)
SELECT
-- Count the number of users in the IQR
count(distinct user_id) AS users
FROM user_revenues
CROSS JOIN quartiles
-- Only keep users with revenues in the IQR range
WHERE revenue :: NUMERIC >= revenue_p25
AND revenue :: NUMERIC <= revenue_p75;
Important window functions
to_char function to convert date into full description
Rank users by their count of orders
Eve tells you that she wants to report which user IDs have the most orders each month. She doesn't want to display long numbers, which will only distract C-level execs, so she wants to display only their ranks. The top 1 rank goes to the user with the most orders, the second-top 2 rank goes to the user with the second-most orders, and so on.
Send Eve a list of the top 3 user IDs by orders in August 2018 with their ranks.
-- Set up the user_count_orders CTE
WITH user_count_orders AS (
SELECT
user_id,
COUNT(DISTINCT order_id) AS count_orders
FROM orders
-- Only keep orders in August 2018
WHERE DATE_TRUNC('month', order_date) = '2018-08-01'
GROUP BY user_id)
SELECT
-- Select user ID, and rank user ID by count_orders
user_id,
RANK() OVER (ORDER BY count_orders DESC) AS count_orders_rank
FROM user_count_orders
ORDER BY count_orders_rank ASC
-- Limit the user IDs selected to 3
LIMIT 3;
Pivoting user revenues by month
Next, Eve tells you that the C-level execs prefer wide tables over long ones because they're easier to scan. She prepared a sample report of user revenues by month, detailing the first 5 user IDs' revenues from June to August 2018. The execs told her to pivot the table by month. She's passed that task off to you.
Pivot the user revenues by month query so that the user ID is a row and each month from June to August 2018 is a column.
-- Import tablefunc
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB($$
SELECT
user_id,
DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
SUM(meal_price * order_quantity) :: FLOAT AS revenue
FROM meals
JOIN orders ON meals.meal_id = orders.meal_id
WHERE user_id IN (0, 1, 2, 3, 4)
AND order_date < '2018-09-01'
GROUP BY user_id, delivr_month
ORDER BY user_id, delivr_month;
$$)
-- Select user ID and the months from June to August 2018
AS ct (user_id INT,
"2018-06-01" FLOAT,
"2018-07-01" FLOAT,
"2018-08-01" FLOAT)
ORDER BY user_id ASC;
Costs
The C-level execs next tell Eve that they want a report on the total costs by eatery in the last two months.
First, write a query to get the total costs by eatery in November and December 2018, then pivot by month.
Note: Recall from Chapter 1 that total cost is the sum of each meal's cost times its stocking quantity.
[we have to note something interesting that is. We can get a clue of what we are looking for by looking at the question. The question will tell us what we need to pivot by and for what measures.- in this question, it already mentioned that we are looking to build a table with " totals costs of each eatery by last two months]
-- Import tablefunc
create extension if not exists tablefunc;
SELECT * FROM CROSSTAB($$
SELECT
-- Select eatery and calculate total cost
eatery,
DATE_TRUNC('month', stocking_date) :: DATE AS delivr_month,
SUM(meal_cost * stocked_quantity) :: FLOAT AS cost
FROM meals
JOIN stock ON meals.meal_id = stock.meal_id
-- Keep only the records after October 2018
WHERE DATE_TRUNC('month', stocking_date) > '2018-10-01'
GROUP BY eatery, delivr_month
ORDER BY eatery, delivr_month;
$$)
-- Select the eatery and November and December 2018 as columns
AS ct (eatery TEXT,
"2018-11-01" FLOAT,
"2018-12-01" FLOAT)
ORDER BY eatery ASC;
results
Producing executive reports
Executive summary 1:
a) aggregation and formatting
b) use of rank
c) crosstab for pivoting the table
Executive report
Eve wants to produce a final executive report about the rankings of eateries by the number of unique users who order from them by quarter. She said she'll handle the pivoting, so you only need to prepare the source table for her to pivot.
Send Eve a table of unique ordering users by eatery and by quarter.
visit the link to practice on datacamp: Executive report | SQL (datacamp.com)
-- Import tablefunc
Create extension if not exists tablefunc;
-- Pivot the previous query by quarter
select * from crosstab ($$
WITH eatery_users AS (
SELECT
eatery,
-- Format the order date so "2018-06-01" becomes "Q2 2018"
TO_CHAR(order_date, '"Q"Q YYYY') AS delivr_quarter,
-- Count unique users
COUNT(DISTINCT user_id) AS users
FROM meals
JOIN orders ON meals.meal_id = orders.meal_id
GROUP BY eatery, delivr_quarter
ORDER BY delivr_quarter, users)
SELECT
-- Select eatery and quarter
eatery,
delivr_quarter,
-- Rank rows, partition by quarter and order by users
RANK() OVER
(PARTITION BY delivr_quarter
ORDER BY users DESC) :: INT AS users_rank
FROM eatery_users
ORDER BY eatery, delivr_quarter;
$$)
-- Select the columns of the pivoted table
AS ct (eatery TEXT,
"Q2 2018" INT,
"Q3 2018" INT,
"Q4 2018" INT)
ORDER BY "Q4 2018";
result