1. Learn SQL with sqlbolt.com
2. Download public datasets from tableau.com
3. Use PowerBI to practice visualization skills
4. Build a of portfolio of 2-3 pet projects
5. Take Python course on alison.com
6. Get Python certified on alison.com
7. Look for interview tips on LinkedIn or Youtube
8. Be as original as possible in cover letter
9. Apply for at least 10 openings
10. Get invited
11. Showcase acquired skills in interview
12. Get hired
Monday, 18 July 2022
12 steps to get into data analytics without paying a single penny
Monday, 11 July 2022
SQL for Business Decision
Starts here....
Working with dates. In the EDA course I have already learnt this though. I will try again.
Data format
The format of dates is 'YYYY-MM-DD'.
Select all movies rented on October 9th, 2018. From the renting table.
Columns of renting table:
renting_id | customer_id | movie_id | rating | date_renting |
---|
- Select all records of movie rentals between beginning of April 2018 till end of August 201
- Put the most recent records of movie rentals on top of the resulting table and order them in decreasing order.
Selecting movies
The table movies
contains all movies available on the online platform.
Columns in table Movies:
movie_id | title | genre | runtime | year_of_release | renting_price |
---|
- Select the movies 'Showtime', 'Love Actually' and 'The Fighter'.
- Order the movies by increasing renting price.
- Select from table
renting
all movie rentals from 2018. - Filter only those records which have a movie rating.
Summarizing customer information
In most business decisions customers are analyzed in groups, such as customers per country or customers per age group.
Customers table:
customer_id | name | country | gender | date_of_birth | date_account_start |
---|
Count the number of total customers born in 1980s
Count( distinct country) where movieNow has customers:
number of countries where movie now has customers.
- Select all movie rentals of the movie with
movie_id
25 from the tablerenting
. - For those records, calculate the minimum, maximum and average rating and count the number of ratings for this movie.
- SELECT min(rating) as min_rating,-- Calculate the minimum rating and use alias min_ratingmax(rating) as max_rating,-- Calculate the maximum rating and use alias max_ratingavg(rating) as avg_rating,-- Calculate the average rating and use alias avg_ratingcount(rating) number_ratings-- Count the number of ratings and use alias number_ratings
-- we are specifying the column because we are not counting
-- the null values
FROM rentingwhere movie_id = 15;-- Select all records of the movie with ID 25
Examining annual rentals
You are asked to provide a report about the development of the company. Specifically, your manager is interested in the total number of movie rentals, the total number of ratings and the average rating of all movies since the beginning of 2019.
- First, select all records of movie rentals since January 1st 2019.
- Now, count the number of movie rentals and calculate the average rating since the beginning of 2019.
count how many ratings exist since 2019-01-01.
Group By Applications:
In order to understand the business we can group by country or gender,
We can investigate the popularity of movies by genre or year of release(this year, all movies were popular
that genre is usually popular...)
Average price of movies by genre. (what is the average price of movies in the so and so genre)
movies_selected table:
title, genre, renting_price
Average renting price per genre
Select genre,
avg(renting_price) as avg_price -- because we want to average the whole pricing column...
Count(*) as number_movies -- will add another column to genre.
-- used the count all to count everything but now the 'group by' enables us to count only the
--number of movies in each genre.
From movies_selected
group by genre
Having Count(*) >= 2;
table columns:
genre, avg_price, number_movies
Wait, what is the difference between having and where?
Having filters by placing the condition on the group by.
Where filters by placing the condition at the row level.
Conduct an analysis to see when the first customer accounts were created for each country.
- Create a table with a row for each country and columns for the country name and the date when the first customer account was created.
- Use the alias
first_account
for the column with the dates. - Order by date in ascending order.
Which movie had the highest income?
First we will select movies title.
select
rm.title,
SUM(rm.renting_price)
from
(Select
r.renting_price,
m.title
from renting as r
left join movies as m
on r.movie_id=m.movie_id) as rm
Group by rm.title
Order By income_movies
Age of actors from the USA
Now you will explore the age of American actors and actresses. Report the date of birth of the oldest and youngest US actor and actress.
- Create a subsequent
SELECT
statements in theFROM
clause to get all information about actors from the USA. - Give the subsequent
SELECT
statement the aliasa
. - Report for actors from the USA the year of birth of the oldest and the year of birth of the youngest actor and actress
Who is the favorite actor for a certain group, in our case by Gender Male?
Select
a.name,
count(*)
from
renting as r
left join customers as c
On r.customer_id = c.customer_id
Left Join actsin as ai
on r.movie_id=ai.movie_id
Left join actors as a
On ai.actor_id=a.actor_id
where c.gender = 'male'
Group by a.name;
Who is the favorite actor with the highest average rating for a certain group, in our case by Gender Male?
(Insert the code later)
Identify favorite movies for a group of customers
Which is the favorite movie on MovieNow? Answer this question for a specific group of customers: for all customers born in the 70s.
- Augment the table
renting
with customer information and information about the movies. - For each join use the first letter of the table name as alias.
- Select only those records of customers born in the 70s.
- For each movie, report the number of times it was rented, as well as the average rating. Limit your results to customers born in the 1970s.
- Remove those movies from the table with only one rental.
- Order the result table such that movies with highest rating come first.
Identify favorite actors for Spain
You're now going to explore actor popularity in Spain. Use as alias the first letter of the table, except for the table actsin
use ai
instead.
- Augment the table renting with information about customers and actors.
- Report the number of movie rentals and the average rating for each actor, separately for male and female customers.
- Report only actors with more than 5 movie rentals.
- Now, report the favorite actors only for customers from Spain.
Hint
- In GROUP BY two column names can be listed.
- Always use where clause after the joins.
SELECT a.name, c.gender,COUNT(*) AS number_views,AVG(r.rating) AS avg_ratingFROM renting as rLEFT JOIN customers AS cON r.customer_id = c.customer_idLEFT JOIN actsin as aiON r.movie_id = ai.movie_idLEFT JOIN actors as aON ai.actor_id = a.actor_idWHERE c.country = 'Spain' -- Select only customers from SpainGROUP BY a.name, c.gender -- we are grouping by two columnsHAVING AVG(r.rating) IS NOT NULLAND COUNT(*) > 5ORDER BY avg_rating DESC, number_views DESC;KPIs per country
In chapter 1 you were asked to provide a report about the development of the company. This time you have to prepare a similar report with KPIs for each country separately. Your manager is interested in the total number of movie rentals, the average rating of all movies and the total revenue for each country since the beginning of 2019.
- Augment the table
renting
with information about customers and movies. - Use as alias the first latter of the table name.
- Select only records about rentals since beginning of 2019.
- Calculate the number of movie rentals.
- Calculate the average rating.
- Calculate the revenue from movie rentals.
- Report these KPIs for each country.
Analyzing customer behavior
A new advertising campaign is going to focus on customers who rented fewer than 5 movies. Use a correlated query to extract all customer information for the customers of interest.
Correlated nested subquery
Customers who gave low ratings
Identify customers who were not satisfied with movies they watched on MovieNow. Report a list of customers with minimum rating smaller than 4.
Customers with at least one rating
Having active customers is a key performance indicator for MovieNow. Make a list of customers who gave at least one rating.
Select all customers with at least one rating. Use the first letter of the table as an alias.
Nationality of actors
In order to analyze the diversity of actors in comedies, first, report a list of actors who play in comedies and then, the number of actors for each nationality playing in comedies.
But this was my solution. There is another solution which I provided.
Union and Intersect in SQL
As a #dataanalyst , you might not need to use UNION as often as you might think, but I find UNION and INTERSECTS to be really cool.
In #postgresql , we want to select names, year of birth and also number of icecreams he purchased.
We want to select only those kids who was born in 1996 and also purchased more than 3 ice-creams.
This is how I would solve using UNION or INTERSECT.
SELECT name,
year_of_birth,
icecream_count
FROM icecreamtable
WHERE icecream_count > 3
INTERSECT
-- we will use intersect because we need name of kid who was bron in
1996 and also he has purchase more than 3 icecreams.
SELECT name,
year_of_birth,
icecream_count
FROM actors
WHERE year_of_birth = 1996;
How would you have solved it if you were in my place? #data #SQL
#postgresql
Rollup
we drop a level of detail at each step.
we can have two aggregations in rollup
Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre. |
---|
this means, it wants specific country wise genre aggregation and also overall rating for |
---|
country only. If you flip the oder in rollup, we will still get the insights, but we will not get the |
---|
total counts and numbers for country-only. | ||||||
---|---|---|---|---|---|---|
-- Group by each county and genre with OLAP extension SELECT c.country, m.genre, AVG(r.rating) AS avg_rating, COUNT(*) AS num_rating FROM renting AS r LEFT JOIN movies AS m ON m.movie_id = r.movie_id LEFT JOIN customers AS c ON r.customer_id = c.customer_id GROUP BY rollup (c.country, m.genre) ORDER BY c.country, m.genre; | ||||||
Grouping Sets | ||||||
Now you will investigate the average rating of customers aggregated by country and gender. SELECT c.country, c.gender, AVG(r.rating) FROM renting AS r LEFT JOIN customers AS c ON r.customer_id = c.customer_id -- Report all info from a Pivot table for country and gender GROUP BY GROUPING SETS ((country, gender), (c.country), (c.gender), ()); | ||||||