Saturday 10 December 2022

Intermediate SQL tutorial for you.

If you need the datasets used in this tutorial please reach out to me through email. Datasets are not free. 

 Count() --count the number of values in a field/column

Count(*) -- number of records in a table. 

Distinct() -- for all the unique values in a field. 

Count (distinct Birthdays) -- count the number of unique 


Select count(birthdate) as count_birthdate
from people; 

-- Count the languages and countries represented in the films table
Select count (language) as count_languages,
count (country) as count_countries
from films; 

-- Return the unique countries from the films table
select DISTINCT country from films; 


## Debugging SQL code
> misspelling
> incorrect Capitalization of data
> incorrect or missing punctuation in functions or data fields
##SQL order of execution


***Very Important for you to understand ***
1. From and Joins : since these two forms the basis of the query
2. Where : Filters out the rows
3. Group By : Grouping values based on the column specified in the Group By clause
4. Having : Filters out the grouped rows
5. Select
6. Distinct : Rows with duplicate values in the column marked as Distinct are discarded
7. Order By : Rows are sorted based on Order By clause
8. Limit, Offset : Finally the limit or offset is applied

Filtering with WHERE:
WHERE color = 'green'
Remember, this will be a single quotation.
Unlike python, this matters.
OperatorDescription
<less than / Before
>greater than / After
<=less than or equal to
>=greater than or equal to
=equal
<> or !=not equal
-- Select film_ids and imdb_score with an imdb_score over 7.0
SELECT film_id,
        imdb_score
FROM reviews
WHERE imdb_score > 7.0;
 
-- Count the records with at least 100,000 votes
Select count (*) as films_over_100K_votes
from reviews 
where num_votes >= 100000;

-- Count the Spanish-language films
Select count(language) as count_spanish
from films
where language = 'Spanish'; 

### Multiple criteria operator
AND. OR . Between . . . .
>>>if your query has multiple criteria operators with multiple
clauses containing Criteria Operators,
don't forget to use double parenthesis.
Example:
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
    AND (certificate = 'PG' OR certificate = 'R');
>>> Between 1994 and 2000, is inclusive of the years.
>>> Between is more powerful and behaves like a normal function,
independent of other functions
>>> SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000 AND country = 'UK';

-- Select the title and release_year 
for all German-language films released before 2000
and then Update the query to see all 
German-language films released after 2000
SELECT title,
    release_year
FROM films
WHERE release_year < 2000 AND language = 'German';

SELECT title, release_year
FROM films
WHERE release_year < 2000
    AND language = 'German';
-- Select all records for German-language films released after 2000
 and before 2010
Select * 
From films
Where (release_year > 2000 and release_year < 2010)
and (language = 'German');

-- Find the title and year of films from the 1990 or 1999
SELECT title, 
    release_year
from films
WHERE release_year = 1990 OR release_year = 1999;

-- Add a filter to see only English or Spanish-language films
SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
AND (language = 'English' OR language = 'Spanish');

SELECT title, release_year, gross
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
    AND (language = 'English' OR language = 'Spanish')
-- Filter films with more than $2,000,000 gross income
    AND gross > 2000000;

Difference in Between and Before-After:
-- Select all records for German-language films released after 2000
 and before 2010
Select * 
From films
Where (release_year > 2000 and release_year < 2010)
and (language = 'German');
-- Select the title and release_year for films released between 1990 and 2000
Select title
    release_year
FROM films
WHERE release_year
BETWEEN 1990 AND 2000;

FILTERING TEXTS:
LIKE, NOT LIKE, IN >>Your three bosses.
% zero, one or many characters
_ means single character

WHERE name LIKE 'Ade%'
WHERE name LIKE 'Ev_'
WHERE name LIKE '_ _t%'

WHERE release_year IN(1920, 1930, 1940)
-- read this as release_year column contains any of 1920, 1930, 1940
-- this is similar to writing 3 OR conditions

-- Select the names that start with K
SELECT name
FROM people
WHERE name LIKE 'K%';

SELECT name
FROM people
-- Select the names that have r as the second letter
WHERE name LIKE '_r%'; 

SELECT name
FROM people
-- Select names that don't start with A
WHERE name NOT LIKE 'A%'; 

-- Find the title and release_year for all films over two hours in length released in 1990 and 2000
Select title, 
release_year
FROM films
WHERE duration > 120
AND release_year IN (1990, 2000); 


-- Find the title and language of all films in English, Spanish, and French
SELECT title,
language
FROM films
WHERE language IN ('English', 'Spanish', 'French'); 
-- did you notice that I have put all the
all the language in single quote?

-- Find the title, certification, and language all films certified NC-17 or R that are in English, Italian, or Greek
SELECT title,
certification,
language
FROM films
WHERE certification IN('NC-17', 'R')
AND language IN('English', 'Italian', 'Greek');

SELECT COUNT(DISTINCT title) AS nineties_english_films_for_teens
FROM films
-- Filter to release_years to between 1990 and 1999
WHERE release_year BETWEEN 1990 AND 1999
-- Filter to English-language films
    AND language = 'English'
-- Narrow it down to G, PG, and PG-13 certifications
    AND certification IN('G', 'PG', 'PG-13');

NULL values will make you go CRAZY
if you are new:
IS NULL / IS NOT NULL

-- List all film titles with missing budgets
SELECT title as no_budget_info
FROM films
WHERE budget IS NULL; 

SELECT COUNT(*) AS count_language_known
FROM films
WHERE language IS NOT NULL; 

COUNT() VS IS NOT NULL:

Select count(rookies)
from humans
where rookies IS NOT NULL;


















































































































Count (distinct Birthdays) -- count the number of un

Tuesday 6 December 2022

VBA Codes written for QC Automation for Excel

 

Sub AppendHyphen()
Dim c As Range
For Each c In ActiveSheet.UsedRange
If IsNumeric(c.Value) = True Then c.Value = "'" & c.Value
Next
End Sub


Sub Range_Example4()

Range("A1").End(xlDown).Select

End Sub

Sub Range_Example6()

ActiveSheet.UsedRange.Select

End Sub


Saturday 3 December 2022

Loan Amortization Model DAX Project

In this tutorial, I am listing all the calculations and DAX I wrote for the loan interest rate analysis for the US government. 


Loan Period = GENERATESERIES(1, 360, 1)

We are getting another column in the SAME table to convernt the months into current number of the years.

Years into loan = ROUNDUP('Loan Period'[Loan Period] / 12, 0)

1.01 Loan Amount (PV) = 100000
1.02 Annual interest rate = 0.04

Formatting Dates:
1.03 Date measure = date(2022, 01, 15)
adding a table using date formatting function.
You can also use calendarauto() function.

Dates = CALENDAR(DATE(1991, 01, 01), DATE(2052, 12, 31))

1.04 Average Interest Rate = AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate])

Finding average for a particular date using Calculate function:

1.06 Calculate example =
CALCULATE( AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), 'Mortgage interest rates'[Date] = Date(2022,8,4))

1.06a Calculate with OR condition = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), 'Mortgage interest rates'[Date] = Date(2022,8,4) || 'Mortgage interest rates'[Date] = DATE(2022,8,11))
// this is simply using OR condition, an exapmle of condition in DAX CALCULATE function

1.06b Calculate with AND condition = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), 'Mortgage interest rates'[Date] = Date(2022,8,4) && 'Mortgage interest rates'[Date] = DATE(2022,8,11))
// this is simply using And condition, an exapmle of condition in DAX CALCULATE function,
tihs will return nothing because both conditions cannot be true

1.06c Calculate with OR condition = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), Year('Mortgage interest rates'[Date]) = 2021, MONTH('Mortgage interest rates'[Date]) = 6)
// This will simple calculate based on the year and month.

1.07 firstdate = CALCULATE(FIRSTDATE('Mortgage interest rates'[Date]))
// it will return the first date of the column

calculating firstdate average rate = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), FIRSTDATE('Mortgage interest rates'[Date]))
// it will return the average of interest rate for the firstdate only.

1.08 Lastdate = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), LASTDATE('Mortgage interest rates'[Date]))
// it will return the average of interest rate for the lastdate of the column.

1.09 Parallelperiod = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), PARALLELPERIOD('Mortgage interest rates'[Date], 0, MONTH))

// this will give you an average for the same month based on the values of the month. Averages based
//on the month

1.09 Parallelperiod 3 months earlier= CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), PARALLELPERIOD('Mortgage interest rates'[Date], 0, MONTH))

// this will give you an average for the three months earlier
data... remember the SQL lag function? ovevr here the lag is for
first three months average. We do the average and push down for the first
three months row based on the date column.

Calculating change in crop production volume:

Total crop production for 2015 = CALCULATE(sum(Crops[Volume]), Year(Crops[Year]) = 2015)

Total crop production for 2021 = CALCULATE(sum(Crops[Volume]), Year(Crops[Year]) = 2021)
Change in crop production 2015 - 2021 = [Total crop production for 2021] - [Total crop production for 2015]

1.11 Total periods in loan = CALCULATE(MAX('Loan Period'[Loan Period]), ALL('Loan Period'[Loan Period]))
// this will show the total number of periods in loan in every row and it will not change based on the selection.
// what happens in calculate function is that, filter is applied first, therefore
// we are applying the all filter to take the full table or column into the consideration and take the maximum based on that.
1.12 Period remaining in loan = 'Model 1'[1.11 Total periods in loan] - 'Model 1'[1.05 Period in loan]

1.11a Total selected periods in loan = CALCULATE(MAX('Loan Period'[Loan Period]), ALLSELECTED('Loan Period'[Loan Period]))
// this will show the total number of periods in loan in every row and it will not change based on the selection.
// what happens in calculate function is that, filter is applied first, therefore
// we are applying the ALLSELECTED filter to take the selected table or selected column into the consideration and take the maximum based on that.
// but where are we selecting this? We are selecting through another Slicer in the page.
// in the Slicer, we have first value of the calculate function.

1.13 Monthly interest rate = POWER(1 + [1.02 Annual interest rate], 1 / 12) - 1

1.13 Monthly interest rate = POWER(1 + [1.02 Annual interest rate], 1 / 12) - 1

// monthly interest rate is calculated based on annual interest rate. Power function takes the baes and the exponent to execute the calculation.

1.14 Accrued Value = [1.01 Loan Amount (PV)] * POWER(1 + 'Model 1'[1.13 Monthly interest rate], [1.05 Period in loan] - 1)

// accrued value is the gathered value of the present loan amount after accumulating each month's / period's interest in real amount or usd.

1.15 Payment Amount = DIVIDE([1.13 Monthly interest rate] * [1.01 Loan Amount (PV)], 1 - POWER(1 + [1.13 Monthly interest rate], - [1.11 Total periods in loan]), 0)
// note that for exponent, we have to add a minus sign before speficying the exponent if we have minus in the formula for the power value.
// 0 as the third clause for the devide function, it is the value to indicate undefined.
// undefined is returned when you devide something by zero.

An alternative formula also works just fine:

1.15 Payment Amount sihans try = [1.01 Loan Amount (PV)] * DIVIDE([1.13 Monthly interest rate], 1 - POWER(1 + [1.13 Monthly interest rate], - [1.11 Total periods in loan]), 0)
// note that for exponent, we have to add a minus sign before speficying the exponent if we have minus in the formula for the power value.
// 0 as the third clause for the devide function, it is the value to indicate undefined.
// undefined is returned when you devide something by zero.
// and my approach is universally best


1.16 Accumuated Payments = [1.15 Payment Amount] * DIVIDE(POWER(1 + [1.13 Monthly interest rate], [1.05 Period in loan] -1 ) -1, [1.13 Monthly interest rate], 0)
// period in loan is actually equivalent to the current period and we have to subtract 1 to get the previous period
// similar to any Divide function, 0 clause returns 0 for undefined values.
// undefined comes when denominator is 0

1.17 Running Payment Total = [1.05 Period in loan] * [1.15 Payment Amount]
// does not consider the incremental time value quantified by interest rate.

1.18 Loan Balance =
Var interest_factor = POWER(1 + [1.13 Monthly interest rate], [1.05 Period in loan] - 1)
Return [1.01 Loan Amount (PV)] * interest_factor - [1.15 Payment Amount] * DIVIDE(interest_factor -1, [1.13 Monthly interest rate], 0)


// all we did was we replaced the the codes for the variable interest rate inside the return row.
// Loan Balance = Accrued Value - Accumulated Value
another way for making it shorter :


1.18 Loan Balance =
Var interest_factor = POWER(1 + [1.13 Monthly interest rate], [1.05 Period in loan] - 1)
Var accrued_value = [1.01 Loan Amount (PV)] * interest_factor
Var accumulated_payment = [1.15 Payment Amount] * DIVIDE(interest_factor -1, [1.13 Monthly interest rate], 0)
Return accrued_value - accumulated_payment


// all we did was we replaced the the codes for the variable interest rate inside the return row.
// Loan Balance = Accrued Value - Accumulated Value

Decrease is not equal to the Payment Amount AKA installment we are paying. Because, interest is
recalculated on the remaining loan balance.

This is how the calculation goes

Future loan Balance = Current remaining balance + (current remaining balance * interest rate) - Payment Amount

1.19 Interest Paid for loan  peroid = [1.13 Monthly interest rate] * [1.18 Loan Balance]
// this is exactly like my formula above to extract the interest paid in the month.
// current remaining balance * interest rate

1.20 Principle paid for each loan period = [1.15 Payment Amount] - [1.19 Interest Paid for loan  peroid]


You will see at the end of the period of loan that the loan balance is equal to the principle paid. Because
we already paid all our interests towards the beginning of the loan payment.

Total Cost = SUMX( Data, Data[Units] * Data[Unit Price] * Data[Exchange Rate])


1.22 total principal paid = SUMX(All('Loan Period'[Loan Period]), [1.20 Principal paid for each loan period] )

// instead of giving it the full table, we are giving it the full column to load. Column here acting as a table.
// we are using a measure inside the second expression

1.23 Total interest paid = SUMX(ALL('Loan Period'[Loan Period]), [1.19 Interest paid for loan period])

// We did not do any aggregation in the <expression> of SUMX here. Because we want to return the final SUM value.
// any column in the expression of SUMX or any other aggregation will give you SUM of that particular column
1.24 Principal Paid (running total) = SUMX(FILTER(ALL('Loan Period'[Loan Period]), [1.05 Period in loan] <= MAX('Loan Period'[Loan Period])), [1.20 Principal paid for each loan period])

// this is very similar to this sql code
//SELECT * ,(
// SELECT SUM(T2.[SALARY])  
//  FROM [department] AS T2
//        WHERE T2.[ID] <= T1.[ID]
// ) AS [Running Total]
// FROM [department] AS T1

1.25 Interest Paid (running total) = SUMX(FILTER(ALL('Loan Period'[Loan Period]), [1.05 Period in loan] <= MAX('Loan Period'[Loan Period])), [1.19 Interest paid for loan period])

// this is very similar to this sql code
// we simply changed to the interest paid in a loan period
//SELECT * ,(
// SELECT SUM(T2.[SALARY])  
//  FROM [department] AS T2
//        WHERE T2.[ID] <= T1.[ID]
// ) AS [Running Total]
// FROM [department] AS T1
// in the filter we will take all the periods below or equal to current period and sum them all up

1.26 Principal compared to the loan balance = [1.18 Loan Balance] + [1.24 Principal Paid (running total)] - [1.20 Principal paid for each loan period]

rate of change = POWER(DIVIDE([Total crop production in 2021], [Total crop production in 2015], BLANK()), 1/6) - 1

2.01 Selected date = SELECTEDVALUE('Mortgage interest rates'[Date])
2.02 Loan Duration = If( SELECTEDVALUE('Select Loan Duration'[Duration Length]) = "15 years", 15*12, 30*12)
2.03 Interest rate = IF(SELECTEDVALUE('Select Loan Duration'[Duration Length]) = "15 years", CALCULATE(MAX('Mortgage interest rates'[15-year fixed mortgage rate])), CALCULATE(MAX('Mortgage interest rates'[30-year fixed mortgage rate])))

2.05 Forecast date = EOMONTH([2.01 Selected date],0) + 1
// this will take you to the end of that selected month. plus one day forward.
2.06 Loan period =
VAR period = DATEDIFF([2.05 Forecast date], MAX(Dates[Date]), MONTH) + 1
Return IF( period > 0 && period <= [2.05 Forecast date], period, BLANK())
2.07 Payment Amount =
Var payment = DIVIDE([2.04 Monthly interest rate] * [1.01 Loan Amount (PV)], 1 - POWER(1 + [2.04 Monthly interest rate], -[2.02 Loan duration]), 0)
Return IF(ISBLANK([2.06 Loan period]), BLANK(), payment)

2.10 Interest paid = [2.04 Monthly interest rate] * [2.08 Loan Balance]

2.11 Average interest for the current month =
VAR start_of_month = MAX(Dates[Date])
VAR interest_rate_15 = CALCULATE(AVERAGE('Mortgage interest rates'[15-year fixed mortgage rate]), FILTER(All('Mortgage interest rates'), year('Mortgage interest rates'[Date]) = YEAR(start_of_month) && MONTH('Mortgage interest rates'[Date]) = MONTH(start_of_month)))
VAR interest_rate_30 = CALCULATE(AVERAGE('Mortgage interest rates'[30-year fixed mortgage rate]), FILTER(All('Mortgage interest rates'), year('Mortgage interest rates'[Date]) = YEAR(start_of_month) && MONTH('Mortgage interest rates'[Date]) = MONTH(start_of_month)))
VAR selected_rate = IF(SELECTEDVALUE('Select Loan Duration'[Duration Length]) = "15 years", interest_rate_15, interest_rate_30)
return if(ISBLANK([2.06 Loan period]), BLANK(), selected_rate)


Creating Columns in a table using row
Dates version 2 = ROW("Dates", DATE(1991, 1, 1))

Using Generate and Row function to create formula
Dates version 2 = GENERATE(GENERATESERIES(1,800), ROW("Dates", DATE(1991, [Value], 1)))

Week Over Week = 
VAR todaycost =CALCULATE(SUM(UsageDetails[Cost]),DATEADD('Date'[Date],0,DAY))
   
VAR LastWeek =
    CALCULATE (
        SUM ( UsageDetails[Cost] ),
        DATEADD(  'Date'[Date],-7, DAY)
    )
RETURN
   todaycost-LastWeek

Period into production =
VAR period = DATEDIFF(DATE(2015, 1, 1), LASTDATE(Crops[Year]), YEAR)
RETURN IF(period < 0, BLANK(), period)


Rate of change = POWER(DIVIDE([Total crop production in 2021], [Total crop production in 2015], BLANK()), 1/6) - 1

Rate of change for pumpkins = CALCULATE([Rate of change], Crops[Commodity] = "Pumpkins")
Total crop production for pumpkins = IF(YEAR(MAX(Crops[Year])) >= 2015 && YEAR(MAX(Crops[Year])) <= 2021, CALCULATE(SUM(Crops[Volume]), Crops[Commodity] = "Pumpkins"), BLANK())

Projected volume for pumpkins = [Total crop production in 2015 for pumpkins] * POWER(1 + [Rate of change for pumpkins], [Period into production])