9 SQL
This BP chapter is a bit different from many of the DataCamp-based chapters you’ve done. For Econ 380, you need to understand R programming well, but you likely won’t need to use SQL at all. I’m having you learn about basic SQL because having a basic understanding of SQL might be useful for you in the future (e.g., to get an internship/job and/or to do well there). The purpose of this BP chapter is for you to become familiar with a few fundamentals of SQL. You will not know enough to develop a database for a large corporation, but you should know enough to be able to get data from a SQL database (e.g., an internship that has something like “familiarity with SQL queries” should apply to you).
There are several ways for you to achieve this goal. One is to go through part or all of the following DataCamp courses, along with the main things you should learn in each (note each title is a link to the course)):
- Introduction to SQL
- What SQL is, and knowledge of the existence of different SQL flavors (e.g., SQL Server and PostgreSQL)
- Structure of relational databases (e.g., tables, fields)
SELECT
statement
- Intermediate SQL
DISTINCT
WHERE
AND
,OR
LIKE
,NOT LIKE
, andIN
- NULL values
- Aggregate Functions (
AVG()
,SUM()
,MIN()
,MAX()
,COUNT()
) ORDER BY
(sorting)GROUP BY
ROUND
HAVING
(thereWHERE
that goes withGROUP BY
)
- Joining Data in SQL
INNER JOIN
LEFT JOIN
(orRIGHT JOIN
)
Note that these 3 courses are the first 3 courses in the DataCamp “SQL Fundamentals” skill track. If you are interested in learning more about SQL, this skill track has additional courses, as well as DataCcamp “projects” to practice your skills.
The other way to learn about these aspects of SQL is you can read about them in this tutorial: https://www.sqlitetutorial.net/sqlite-sample-database. I’m giving you this particular website/tutorial because it comes with a useful sample database that also works in R (as shown below). I want you to learn about all the things included in the outline above. You can learn about them through any combination of DataCamp and this online tutorial. For example, to learn about ORDER BY
, you could go through that part of the Intermediate SQL DC course, and/or you could read through (and experiment with) the order by page in the online tutorial. You are also free to use other websites, videos, AI, etc. I don’t care how you learn about SQL, I just want you to be familiar with it.
The website explains how to do some things if you install SQLite on your computer. You do not need to do that. You can (and should) do everything in R via the RSQLite
package, as explained below.
Note that the specifics on this website are for the SQLite version of SQL. DataCamp seems to be specific to PostgreSQL. But you’re just learning about the basics, and the basics work the same in all the different versions/flavors. There are a few exceptions. For example, you might want to use the keyword LIMIT
to only output the top, say, 10 rows. This is done using LIMIT
in SQLite, PostgreSQL, and MySQL. If instead you use Microsoft SQL Server or Access (not unlikely at an internship or job), the keyword is TOP
instead of LIMIT
. That’s the only one I can think of that you might want to use now. Other than that, they are the same. Thus, any website discussing the basics should be the same if you want to look beyond the DataCamp courses and website I’ve given you.
To demonstrate what you have learned, and to serve as a reference for you later, you are going to take notes here on the key features of SQL with which I want you to be familiar. You should use the RSQLite
package with the chinook.db
database. I demonstrate below how to do so. This means that you should have working examples of everything in SQL, just like you do for R. I provide you a working example of SELECT
below.
I will add section headings to serve as a structure for your notes, and to serve as a checklist so you include everything I want you to include. As with all BP chapters, you are free to add more if you want.
9.1 Loading a SQL database in R via RSQLite
Packages used in this chapter:
Load the connection (that we have named con
) to the chinook database (the one from the website tutorial):
As the online documentation for the database shows, this database has the following tables:
## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "ranks" "sqlite_sequence"
## [13] "sqlite_stat1" "tracks"
You can also see the list of fields in a particular table like this:
## [1] "CustomerId" "FirstName" "LastName" "Company" "Address"
## [6] "City" "State" "Country" "PostalCode" "Phone"
## [11] "Fax" "Email" "SupportRepId"
You can then create working examples using SQL code chunks, like this:
FirstName | LastName |
---|---|
Luís | Gonçalves |
Leonie | Köhler |
François | Tremblay |
Bjørn | Hansen |
František | Wichterlová |
Helena | Holý |
Astrid | Gruber |
Daan | Peeters |
Kara | Nielsen |
Eduardo | Martins |
The rest of this file should have your notes, including code chunks with working examples similar to the one above (the code chunk with a working example showing how to use SELECT
). If you want the notes to go in a different order, feel free to rearange the headings. But make sure to include an example of everything with a level 3 (###) or 4 (####) heading below.
9.2 Selecting data
9.2.1 Select
Your notes on what select does…
Select email, phone FROM customers;
--use * to select all feilds
-- comments in SQL code chunks start with --
Phone | |
---|---|
luisg@embraer.com.br | +55 (12) 3923-5555 |
leonekohler@surfeu.de | +49 0711 2842222 |
ftremblay@gmail.com | +1 (514) 721-4711 |
bjorn.hansen@yahoo.no | +47 22 44 22 22 |
frantisekw@jetbrains.com | +420 2 4172 5555 |
hholy@gmail.com | +420 2 4177 0449 |
astrid.gruber@apple.at | +43 01 5134505 |
daan_peeters@apple.be | +32 02 219 03 03 |
kara.nielsen@jubii.dk | +453 3331 9991 |
eduardo@woodstock.com.br | +55 (11) 3033-5446 |
9.2.2 Distinct
Distinct allows us to get all values that occcur, its used with a Select, and it returns all unique values.
PostalCode | FirstName |
---|---|
12227-000 | Luís |
70174 | Leonie |
H2G 1A7 | François |
0171 | Bjørn |
14700 | František |
14300 | Helena |
1010 | Astrid |
1000 | Daan |
1720 | Kara |
01007-010 | Eduardo |
9.2.3 Where
where acts as filter()
would in R. use <> for not equal, and you only need to use a single = sign, unlike R. This comes after the select and from, before limit.
CustomerId |
---|
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
When filtering for a character value, put it in single quotes (’name)
FirstName |
---|
Luis |
9.2.3.1 And, Or
And and Or unction largely the same as they do in R, allowing us to filter for multiple conditions, 2017 and in the Midwest, before 1800 or after 1900. In SQL they are just AND
and OR
. These will often be used with WHERE
. We need to specify the field directly with each condition. In a query has multiple long conditions, we need to use parentheses to ensure the proper order is used.
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
11 | Alexandre | Rocha | Banco do Brasil S.A. | Av. Paulista, 2022 | São Paulo | SP | Brazil | 01310-200 | +55 (11) 3055-3278 | +55 (11) 3055-8131 | alero@uol.com.br | 5 |
12 | Roberto | Almeida | Riotur | Praça Pio X, 119 | Rio de Janeiro | RJ | Brazil | 20040-020 | +55 (21) 2271-7000 | +55 (21) 2271-7070 | roberto.almeida@riotur.gov.br | 3 |
13 | Fernanda | Ramos | NA | Qe 7 Bloco G | Brasília | DF | Brazil | 71020-677 | +55 (61) 3363-5547 | +55 (61) 3363-7855 | fernadaramos4@uol.com.br | 4 |
14 | Mark | Philips | Telus | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | +1 (780) 434-4554 | +1 (780) 434-5565 | mphilips12@shaw.ca | 5 |
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | Eduardo | Martins | Woodstock Discos | Rua Dr. Falcão Filho, 155 | São Paulo | SP | Brazil | 01007-010 | +55 (11) 3033-5446 | +55 (11) 3033-4564 | eduardo@woodstock.com.br | 4 |
14 | Mark | Philips | Telus | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | +1 (780) 434-4554 | +1 (780) 434-5565 | mphilips12@shaw.ca | 5 |
9.2.3.2 Like, Not Like, and IN
When combined with a where Like, Not like and In allow us to filter for a pattern, such as names that start with an M, or citys that don’t have a c as a second Letter.
-- Use of LIKE
Select *
FROM customers
WHERE CustomerId LIKE "_9";
-- The Underscore makes space for 1 didgt, a % makes space for multiple
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
19 | Tim | Goyer | Apple Inc. | 1 Infinite Loop | Cupertino | CA | USA | 95014 | +1 (408) 996-1010 | +1 (408) 996-1011 | tgoyer@apple.com | 3 |
29 | Robert | Brown | NA | 796 Dundas Street West | Toronto | ON | Canada | M6J 1V1 | +1 (416) 363-8888 | NA | robbrown@shaw.ca | 3 |
39 | Camille | Bernard | NA | 4, Rue Milton | Paris | NA | France | 75009 | +33 01 49 70 65 65 | NA | camille.bernard@yahoo.fr | 4 |
49 | Stanisław | Wójcik | NA | Ordynacka 10 | Warsaw | NA | Poland | 00-358 | +48 22 828 37 39 | NA | stanisław.wójcik@wp.pl | 4 |
59 | Puja | Srivastava | NA | 3,Raj Bhavan Road | Bangalore | NA | India | 560001 | +91 080 22289999 | NA | puja_srivastava@yahoo.in | 3 |
9.2.4 Is Null and Is Not Null
using is null with a where gives all the entries with a null value in that entry. Is not null does the oppisite
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Leonie | Köhler | NA | Theodor-Heuss-Straße 34 | Stuttgart | NA | Germany | 70174 | +49 0711 2842222 | NA | leonekohler@surfeu.de | 5 |
3 | François | Tremblay | NA | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | NA | ftremblay@gmail.com | 3 |
4 | Bjørn | Hansen | NA | Ullevålsveien 14 | Oslo | NA | Norway | 0171 | +47 22 44 22 22 | NA | bjorn.hansen@yahoo.no | 4 |
6 | Helena | Holý | NA | Rilská 3174/6 | Prague | NA | Czech Republic | 14300 | +420 2 4177 0449 | NA | hholy@gmail.com | 5 |
7 | Astrid | Gruber | NA | Rotenturmstraße 4, 1010 Innere Stadt | Vienne | NA | Austria | 1010 | +43 01 5134505 | NA | astrid.gruber@apple.at | 5 |
8 | Daan | Peeters | NA | Grétrystraat 63 | Brussels | NA | Belgium | 1000 | +32 02 219 03 03 | NA | daan_peeters@apple.be | 4 |
9 | Kara | Nielsen | NA | Sønder Boulevard 51 | Copenhagen | NA | Denmark | 1720 | +453 3331 9991 | NA | kara.nielsen@jubii.dk | 4 |
13 | Fernanda | Ramos | NA | Qe 7 Bloco G | Brasília | DF | Brazil | 71020-677 | +55 (61) 3363-5547 | +55 (61) 3363-7855 | fernadaramos4@uol.com.br | 4 |
18 | Michelle | Brooks | NA | 627 Broadway | New York | NY | USA | 10012-2612 | +1 (212) 221-3546 | +1 (212) 221-4679 | michelleb@aol.com | 3 |
20 | Dan | Miller | NA | 541 Del Medio Avenue | Mountain View | CA | USA | 94040-111 | +1 (650) 644-3358 | NA | dmiller@comcast.com | 4 |
9.2.5 Order By (Sorting)
The ORDER BY keyword is used to sort results of one or more fields. When used on its own, it is written after the FROM statement, We could also add the ASC keyword to our query to clarify that we are sorting in ascending order. We can use the DESC keyword to sort the results in descending order. ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, etc. To specify multiple fields, we separate the field names with a comma. The second field we sort by can be thought of as a tie-breaker when the first field is not decisive in telling the order.
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | Aaron | Mitchell | NA | 696 Osborne Street | Winnipeg | MB | Canada | R3L 2B9 | +1 (204) 452-6452 | NA | aaronmitchell@yahoo.ca | 4 |
11 | Alexandre | Rocha | Banco do Brasil S.A. | Av. Paulista, 2022 | São Paulo | SP | Brazil | 01310-200 | +55 (11) 3055-3278 | +55 (11) 3055-8131 | alero@uol.com.br | 5 |
7 | Astrid | Gruber | NA | Rotenturmstraße 4, 1010 Innere Stadt | Vienne | NA | Austria | 1010 | +43 01 5134505 | NA | astrid.gruber@apple.at | 5 |
4 | Bjørn | Hansen | NA | Ullevålsveien 14 | Oslo | NA | Norway | 0171 | +47 22 44 22 22 | NA | bjorn.hansen@yahoo.no | 4 |
39 | Camille | Bernard | NA | 4, Rue Milton | Paris | NA | France | 75009 | +33 01 49 70 65 65 | NA | camille.bernard@yahoo.fr | 4 |
8 | Daan | Peeters | NA | Grétrystraat 63 | Brussels | NA | Belgium | 1000 | +32 02 219 03 03 | NA | daan_peeters@apple.be | 4 |
20 | Dan | Miller | NA | 541 Del Medio Avenue | Mountain View | CA | USA | 94040-111 | +1 (650) 644-3358 | NA | dmiller@comcast.com | 4 |
56 | Diego | Gutiérrez | NA | 307 Macacha Güemes | Buenos Aires | NA | Argentina | 1106 | +54 (0)11 4311 4333 | NA | diego.gutierrez@yahoo.ar | 4 |
40 | Dominique | Lefebvre | NA | 8, Rue Hanovre | Paris | NA | France | 75002 | +33 01 47 42 71 71 | NA | dominiquelefebvre@gmail.com | 4 |
10 | Eduardo | Martins | Woodstock Discos | Rua Dr. Falcão Filho, 155 | São Paulo | SP | Brazil | 01007-010 | +55 (11) 3033-5446 | +55 (11) 3033-4564 | eduardo@woodstock.com.br | 4 |
same thing, but ordered from Z-A
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
42 | Wyatt | Girard | NA | 9, Place Louis Barthou | Bordeaux | NA | France | 33000 | +33 05 56 96 96 96 | NA | wyatt.girard@yahoo.fr | 3 |
25 | Victor | Stevens | NA | 319 N. Frances Street | Madison | WI | USA | 53703 | +1 (608) 257-0597 | NA | vstevens@yahoo.com | 5 |
19 | Tim | Goyer | Apple Inc. | 1 Infinite Loop | Cupertino | CA | USA | 95014 | +1 (408) 996-1010 | +1 (408) 996-1011 | tgoyer@apple.com | 3 |
44 | Terhi | Hämäläinen | NA | Porthaninkatu 9 | Helsinki | NA | Finland | 00530 | +358 09 870 2000 | NA | terhi.hamalainen@apple.fi | 3 |
54 | Steve | Murray | NA | 110 Raeburn Pl | Edinburgh | NA | United Kingdom | EH4 1HH | +44 0131 315 3300 | NA | steve.murray@yahoo.uk | 5 |
49 | Stanisław | Wójcik | NA | Ordynacka 10 | Warsaw | NA | Poland | 00-358 | +48 22 828 37 39 | NA | stanisław.wójcik@wp.pl | 4 |
12 | Roberto | Almeida | Riotur | Praça Pio X, 119 | Rio de Janeiro | RJ | Brazil | 20040-020 | +55 (21) 2271-7000 | +55 (21) 2271-7070 | roberto.almeida@riotur.gov.br | 3 |
29 | Robert | Brown | NA | 796 Dundas Street West | Toronto | ON | Canada | M6J 1V1 | +1 (416) 363-8888 | NA | robbrown@shaw.ca | 3 |
26 | Richard | Cunningham | NA | 2211 W Berry Street | Fort Worth | TX | USA | 76110 | +1 (817) 924-7272 | NA | ricunningham@hotmail.com | 4 |
59 | Puja | Srivastava | NA | 3,Raj Bhavan Road | Bangalore | NA | India | 560001 | +91 080 22289999 | NA | puja_srivastava@yahoo.in | 3 |
Ordering 2 variables
-- Order BY customer last name from Z-A, if same last name, use first name
Select *
FROM customers
Order By LastName DESC, FirstName DESC;
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
37 | Fynn | Zimmermann | NA | Berger Straße 10 | Frankfurt | NA | Germany | 60316 | +49 069 40598889 | NA | fzimmermann@yahoo.de | 3 |
49 | Stanisław | Wójcik | NA | Ordynacka 10 | Warsaw | NA | Poland | 00-358 | +48 22 828 37 39 | NA | stanisław.wójcik@wp.pl | 4 |
5 | František | Wichterlová | JetBrains s.r.o. | Klanova 9/506 | Prague | NA | Czech Republic | 14700 | +420 2 4172 5555 | +420 2 4172 5555 | frantisekw@jetbrains.com | 4 |
48 | Johannes | Van der Berg | NA | Lijnbaansgracht 120bg | Amsterdam | VV | Netherlands | 1016 | +31 020 6223130 | NA | johavanderberg@yahoo.nl | 5 |
3 | François | Tremblay | NA | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | NA | ftremblay@gmail.com | 3 |
55 | Mark | Taylor | NA | 421 Bourke Street | Sidney | NSW | Australia | 2010 | +61 (02) 9332 3633 | NA | mark.taylor@yahoo.au | 4 |
33 | Ellie | Sullivan | NA | 5112 48 Street | Yellowknife | NT | Canada | X1A 1N6 | +1 (867) 920-2233 | NA | ellie.sullivan@shaw.ca | 3 |
25 | Victor | Stevens | NA | 319 N. Frances Street | Madison | WI | USA | 53703 | +1 (608) 257-0597 | NA | vstevens@yahoo.com | 5 |
59 | Puja | Srivastava | NA | 3,Raj Bhavan Road | Bangalore | NA | India | 560001 | +91 080 22289999 | NA | puja_srivastava@yahoo.in | 3 |
17 | Jack | Smith | Microsoft Corporation | 1 Microsoft Way | Redmond | WA | USA | 98052-8300 | +1 (425) 882-8080 | +1 (425) 882-8081 | jacksmith@microsoft.com | 5 |
9.3 Grouping and aggregating
9.3.1 Group By
We’ll often need to summarize data for a particular group of results. SQL allows us to group with the GROUP BY clause. SQL will return an error if we try to SELECT a field that is not in our GROUP BY clause. GROUP BY fits into our order after FROM and before all other clauses. T0 group by 2 2 variables, use a comma to seperate
-- Create a summary table grouped by country with there average customer ID
Select Country, Avg(customerId)
FROM customers
GROUP BY Country;
Country | Avg(customerId) |
---|---|
Argentina | 56.000 |
Australia | 55.000 |
Austria | 7.000 |
Belgium | 8.000 |
Brazil | 9.400 |
Canada | 23.375 |
Chile | 57.000 |
Czech Republic | 5.500 |
Denmark | 9.000 |
Finland | 44.000 |
9.3.2 Aggregate Functions
9.3.2.1 Avg()
Avg can only be used with numeric variables. provides an average of all values in the selected column.
-- Use of Avg to find the average of the customer Id's
Select AVG(CustomerId) AS avg_customerId
FROM customers;
avg_customerId |
---|
30 |
9.3.2.2 SUM()
Can only be used with numerical data. Provides a sum of all the values in the selected column.
-- Use of Sum to find the sum of the customer Id's
Select SUM(CustomerId) AS sum_customerId
FROM customers;
sum_customerId |
---|
1770 |
9.3.2.3 MIN() and MAX()
minimum and maximum will give the record that is figuratively the lowest or highest. Lowest can mean the letter A when dealing with strings or the earliest date when dealing with dates. And, of course, with numbers, it is the highest or the lowest number.
-- Use of min to find the lowest customer Id
Select MIN(CustomerId) AS min_customerId
FROM customers;
min_customerId |
---|
1 |
-- Use of Max to find the highest customer Id
Select MAX(CustomerId) AS max_customerId
FROM customers;
max_customerId |
---|
59 |
9.3.2.4 Count()
Using count with a field name count the number of values in a field, count with an * counts the records in a table
-- Count the records of First names and city
SELECT COUNT(FirstName) AS count_Name, Count(city) AS count_city
From customers;
count_Name | count_city |
---|---|
59 | 59 |
same thing but with distinct
-- Count the records of distinct First names and city
SELECT COUNT( Distinct FirstName) AS count_distinct_Name, Count(Distinct city) AS count_distinct_city
From customers;
count_distinct_Name | count_distinct_city |
---|---|
57 | 53 |
9.3.3 Where with Aggregate Functions
We can easily use where with aggregate functions
-- Use find the average customer ID for people whose name starts with M
Select avg(CustomerId) AS avg_customerId_M_names
FROM customers
Where FirstName Like 'M%';
avg_customerId_M_names |
---|
36 |
9.3.4 Round
the round function can be used with our aggregrate functions to round out answers
-- Use of round the result to the tenth decimal place (there will be no decimal, done for demo)
Select ROUND(MAX(customerId), 1) AS max_customerId
FROM customers
max_customerId |
---|
59 |
9.3.5 Having (where for groups)
In SQL, we can’t filter aggregate functions with WHERE clauses. That means that if we want to filter based on the result of an aggregate function, we need another way. Groups have their own special filtering word: HAVING. WHERE filters individual records while HAVING filters grouped records.
-- Create a summary table grouped by country with there average customer ID, and filter for average ID's greater than 10
Select Country, Avg(customerId)
FROM customers
GROUP BY Country
HAVING Avg(customerId) > 10;
Country | Avg(customerId) |
---|---|
Argentina | 56.000 |
Australia | 55.000 |
Canada | 23.375 |
Chile | 57.000 |
Finland | 44.000 |
France | 41.000 |
Germany | 28.250 |
Hungary | 45.000 |
India | 58.500 |
Ireland | 46.000 |
9.4 Joining tables
In relational databases, data is often distributed in many related tables. A table is associated with another table using foreign keys.
9.4.1 Inner join
To query data from multiple tables, you use INNER JOIN clause. The INNER JOIN clause combines columns from correlated tables. The INNER JOIN clause returns rows from the A table that has the corresponding row in B table. This logic is applied if you join more than 2 tables.
-- Join the "tracks" and "albums" tables
SELECT *
FROM tracks
INNER JOIN albums ON albums.albumid = tracks.albumid;
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | AlbumId | Title | ArtistId |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
7 | Let’s Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
11 | C.O.D. | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 199836 | 6566314 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
12 | Breaking The Rules | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263288 | 8596840 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
13 | Night Of The Long Knives | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205688 | 6706347 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
14 | Spellbound | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 270863 | 8817038 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
Here is an example with 3 tables:
-- Join the 3 tables
SELECT trackid, tracks.name AS track, albums.title AS album, artists.name AS artist
FROM tracks
INNER JOIN albums ON albums.albumid = tracks.albumid
INNER JOIN artists ON artists.artistid = albums.artistid;
TrackId | track | album | artist |
---|---|---|---|
1 | For Those About To Rock (We Salute You) | For Those About To Rock We Salute You | AC/DC |
6 | Put The Finger On You | For Those About To Rock We Salute You | AC/DC |
7 | Let’s Get It Up | For Those About To Rock We Salute You | AC/DC |
8 | Inject The Venom | For Those About To Rock We Salute You | AC/DC |
9 | Snowballed | For Those About To Rock We Salute You | AC/DC |
10 | Evil Walks | For Those About To Rock We Salute You | AC/DC |
11 | C.O.D. | For Those About To Rock We Salute You | AC/DC |
12 | Breaking The Rules | For Those About To Rock We Salute You | AC/DC |
13 | Night Of The Long Knives | For Those About To Rock We Salute You | AC/DC |
14 | Spellbound | For Those About To Rock We Salute You | AC/DC |
-- Join the 3 tables
SELECT *
FROM playlists
INNER JOIN playlist_track ON playlist_track.PlayListId = playlists.PlayListId
PlaylistId | Name | PlaylistId | TrackId |
---|---|---|---|
1 | Music | 1 | 1 |
1 | Music | 1 | 2 |
1 | Music | 1 | 3 |
1 | Music | 1 | 4 |
1 | Music | 1 | 5 |
1 | Music | 1 | 6 |
1 | Music | 1 | 7 |
1 | Music | 1 | 8 |
1 | Music | 1 | 9 |
1 | Music | 1 | 10 |
9.4.2 Left join (or right join)
The LEFT JOIN clause is an optional clause of the SELECT statement. You use the LEFT JOIN clause to query data from multiple related tables. All rows in table A are included in the result set whether there are matching rows in table B or not. In case you have a WHERE clause in the statement, the search_condition in the WHERE clause is applied after the matching of the LEFT JOIN clause completes
Example of a left join where some records are in one table but not the other (i.e., not just what should be an inner join but that is done with left join)
-- Left Join
SELECT
artists.ArtistId
, AlbumId
FROM
artists
LEFT JOIN albums ON
albums.ArtistId = artists.ArtistId
ArtistId | AlbumId |
---|---|
1 | 1 |
1 | 4 |
2 | 2 |
2 | 3 |
3 | 5 |
4 | 6 |
5 | 7 |
6 | 8 |
6 | 34 |
7 | 9 |
As you can see in the table above, some artist Id’s don’t have a matching Album ID, these are still included in the table since they are in the first table. This is functionally the same as a left join in R. Keeps all the observations in the first data set, and all the observations in the second that have a match in the first. The reverse is true for a Right Join, as shown below.
-- Right Join
SELECT
artists.ArtistId
, AlbumId
FROM
artists
RIGHT JOIN albums ON
albums.ArtistId = artists.ArtistId
ArtistId | AlbumId |
---|---|
1 | 1 |
1 | 4 |
2 | 2 |
2 | 3 |
3 | 5 |
4 | 6 |
5 | 7 |
6 | 8 |
6 | 34 |
7 | 9 |
9.4.3 Other types of joins
9.4.3.1 Cross Join
Suppose, the A table has N rows and B table has M rows, the CROSS JOIN of these two tables will produce a result set that contains NxM rows. It is used the same as an Inner join or a Left Join.
-- Cross Join (this table reveals no real informtion, other than how to use a cross join)
SELECT CustomerId, EmployeeId
FROM customers
CROSS JOIN employees;
CustomerId | EmployeeId |
---|---|
1 | 1 |
1 | 2 |
1 | 6 |
1 | 3 |
1 | 4 |
1 | 5 |
1 | 7 |
1 | 8 |
3 | 1 |
3 | 2 |
9.4.3.2 Self Join
The self-join is a special kind of joins that allow you to join a table to itself using either LEFT JOIN or INNER JOIN clause. You use self-join to create a result set that joins the rows with the other rows within the same table. Because you cannot refer to the same table more than one in a query, you need to use a table alias to assign the table a different name when you use self-join. The self-join compares values of the same or different columns in the same table. Only one table is involved in the self-join.
-- use of a self join to join to join rows with other rows in the same table
SELECT m.firstname || ' ' || m.lastname AS 'Manager',
e.firstname || ' ' || e.lastname AS 'Direct report'
FROM employees e
INNER JOIN employees m ON m.employeeid = e.reportsto
ORDER BY manager;
Manager | Direct report |
---|---|
Andrew Adams | Nancy Edwards |
Andrew Adams | Michael Mitchell |
Michael Mitchell | Robert King |
Michael Mitchell | Laura Callahan |
Nancy Edwards | Jane Peacock |
Nancy Edwards | Margaret Park |
Nancy Edwards | Steve Johnson |
9.4.3.3 Full Outer Join
the result of the FULL OUTER JOIN is a combination of a LEFT JOIN and a RIGHT JOIN. The result set of the full outer join has NULL values for every column of the table that does not have a matching row in the other table. For the matching rows, the FULL OUTER JOIN produces a single row with values from columns of the rows in both tables. This is the same as in R
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | AlbumId | Title | ArtistId |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
2 | Balls to the Wall | 2 | 2 | 1 | NA | 342562 | 5510424 | 0.99 | 2 | Balls to the Wall | 2 |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 | 3 | Restless and Wild | 2 |
4 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman | 252051 | 4331779 | 0.99 | 3 | Restless and Wild | 2 |
5 | Princess of the Dawn | 3 | 2 | 1 | Deaffy & R.A. Smith-Diesel | 375418 | 6290521 | 0.99 | 3 | Restless and Wild | 2 |
6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
7 | Let’s Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |
10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 | 1 | For Those About To Rock We Salute You | 1 |