9 SQL

This BP chapter is a bit different from many of the DataCamp-based chapters you’ve done. For Econ 380, you need to understand R programming well, but you likely won’t need to use SQL at all. I’m having you learn about basic SQL because having a basic understanding of SQL might be useful for you in the future (e.g., to get an internship/job and/or to do well there). The purpose of this BP chapter is for you to become familiar with a few fundamentals of SQL. You will not know enough to develop a database for a large corporation, but you should know enough to be able to get data from a SQL database (e.g., an internship that has something like “familiarity with SQL queries” should apply to you).

There are several ways for you to achieve this goal. One is to go through part or all of the following DataCamp courses, along with the main things you should learn in each (note each title is a link to the course)):

  1. Introduction to SQL
    • What SQL is, and knowledge of the existence of different SQL flavors (e.g., SQL Server and PostgreSQL)
    • Structure of relational databases (e.g., tables, fields)
    • SELECT statement
  2. Intermediate SQL
    • DISTINCT
    • WHERE
    • AND, OR
    • LIKE, NOT LIKE, and IN
    • NULL values
    • Aggregate Functions (AVG(), SUM(), MIN(), MAX(), COUNT())
    • ORDER BY (sorting)
    • GROUP BY
    • ROUND
    • HAVING (there WHERE that goes with GROUP BY)
  3. Joining Data in SQL
    • INNER JOIN
    • LEFT JOIN (or RIGHT JOIN)

Note that these 3 courses are the first 3 courses in the DataCamp “SQL Fundamentals” skill track. If you are interested in learning more about SQL, this skill track has additional courses, as well as DataCcamp “projects” to practice your skills.

The other way to learn about these aspects of SQL is you can read about them in this tutorial: https://www.sqlitetutorial.net/sqlite-sample-database. I’m giving you this particular website/tutorial because it comes with a useful sample database that also works in R (as shown below). I want you to learn about all the things included in the outline above. You can learn about them through any combination of DataCamp and this online tutorial. For example, to learn about ORDER BY, you could go through that part of the Intermediate SQL DC course, and/or you could read through (and experiment with) the order by page in the online tutorial. You are also free to use other websites, videos, AI, etc. I don’t care how you learn about SQL, I just want you to be familiar with it.

The website explains how to do some things if you install SQLite on your computer. You do not need to do that. You can (and should) do everything in R via the RSQLite package, as explained below.

Note that the specifics on this website are for the SQLite version of SQL. DataCamp seems to be specific to PostgreSQL. But you’re just learning about the basics, and the basics work the same in all the different versions/flavors. There are a few exceptions. For example, you might want to use the keyword LIMIT to only output the top, say, 10 rows. This is done using LIMIT in SQLite, PostgreSQL, and MySQL. If instead you use Microsoft SQL Server or Access (not unlikely at an internship or job), the keyword is TOP instead of LIMIT. That’s the only one I can think of that you might want to use now. Other than that, they are the same. Thus, any website discussing the basics should be the same if you want to look beyond the DataCamp courses and website I’ve given you.

To demonstrate what you have learned, and to serve as a reference for you later, you are going to take notes here on the key features of SQL with which I want you to be familiar. You should use the RSQLite package with the chinook.db database. I demonstrate below how to do so. This means that you should have working examples of everything in SQL, just like you do for R. I provide you a working example of SELECT below.

I will add section headings to serve as a structure for your notes, and to serve as a checklist so you include everything I want you to include. As with all BP chapters, you are free to add more if you want.

9.1 Loading a SQL database in R via RSQLite

Packages used in this chapter:

## Load all packages used in this chapter
library(RSQLite)

Load the connection (that we have named con) to the chinook database (the one from the website tutorial):

con <- dbConnect(SQLite(), "data/chinook.db")

As the online documentation for the database shows, this database has the following tables:

dbListTables(conn = con)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "ranks"           "sqlite_sequence"
## [13] "sqlite_stat1"    "tracks"

You can also see the list of fields in a particular table like this:

dbListFields(conn = con, name = "customers")
##  [1] "CustomerId"   "FirstName"    "LastName"     "Company"      "Address"     
##  [6] "City"         "State"        "Country"      "PostalCode"   "Phone"       
## [11] "Fax"          "Email"        "SupportRepId"

You can then create working examples using SQL code chunks, like this:

SELECT FirstName, LastName FROM customers;
Table 9.1: Displaying records 1 - 10
FirstName LastName
Luís Gonçalves
Leonie Köhler
François Tremblay
Bjørn Hansen
František Wichterlová
Helena Holý
Astrid Gruber
Daan Peeters
Kara Nielsen
Eduardo Martins

The rest of this file should have your notes, including code chunks with working examples similar to the one above (the code chunk with a working example showing how to use SELECT). If you want the notes to go in a different order, feel free to rearange the headings. But make sure to include an example of everything with a level 3 (###) or 4 (####) heading below.

9.2 Selecting data

9.2.1 Select

Your notes on what select does…

Select email, phone FROM customers;

--use * to select all feilds 

-- comments in SQL code chunks start with --
Table 9.2: Displaying records 1 - 10
Email Phone
+55 (12) 3923-5555
+49 0711 2842222
+1 (514) 721-4711
+47 22 44 22 22
+420 2 4172 5555
+420 2 4177 0449
+43 01 5134505
+32 02 219 03 03
+453 3331 9991
+55 (11) 3033-5446

9.2.2 Distinct

Distinct allows us to get all values that occcur, its used with a Select, and it returns all unique values.

Select DISTINCT postalcode, FirstName FROM customers;
Table 9.3: Displaying records 1 - 10
PostalCode FirstName
12227-000 Luís
70174 Leonie
H2G 1A7 François
0171 Bjørn
14700 František
14300 Helena
1010 Astrid
1000 Daan
1720 Kara
01007-010 Eduardo

9.2.3 Where

where acts as filter() would in R. use <> for not equal, and you only need to use a single = sign, unlike R. This comes after the select and from, before limit.

Select CustomerId 
FROM customers
WHERE CustomerId >= 15
LIMIT 10;
Table 9.4: Displaying records 1 - 10
CustomerId
15
16
17
18
19
20
21
22
23
24

When filtering for a character value, put it in single quotes (’name)

Select FirstName 
FROM customers
WHERE FirstName = 'Luis';
Table 9.5: 1 records
FirstName
Luis

9.2.3.1 And, Or

And and Or unction largely the same as they do in R, allowing us to filter for multiple conditions, 2017 and in the Midwest, before 1800 or after 1900. In SQL they are just AND and OR. These will often be used with WHERE. We need to specify the field directly with each condition. In a query has multiple long conditions, we need to use parentheses to ensure the proper order is used.

-- Use of and
Select * 
FROM customers
WHERE CustomerId > 10 AND CustomerId < 15;
Table 9.6: 4 records
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
11 Alexandre Rocha Banco do Brasil S.A. Av. Paulista, 2022 São Paulo SP Brazil 01310-200 +55 (11) 3055-3278 +55 (11) 3055-8131 5
12 Roberto Almeida Riotur Praça Pio X, 119 Rio de Janeiro RJ Brazil 20040-020 +55 (21) 2271-7000 +55 (21) 2271-7070 3
13 Fernanda Ramos NA Qe 7 Bloco G Brasília DF Brazil 71020-677 +55 (61) 3363-5547 +55 (61) 3363-7855 4
14 Mark Philips Telus 8210 111 ST NW Edmonton AB Canada T6G 2C7 +1 (780) 434-4554 +1 (780) 434-5565 5
-- Use of or
Select * 
FROM customers
WHERE CustomerId = 10 OR CustomerId =  14;
Table 9.7: 2 records
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
10 Eduardo Martins Woodstock Discos Rua Dr. Falcão Filho, 155 São Paulo SP Brazil 01007-010 +55 (11) 3033-5446 +55 (11) 3033-4564 4
14 Mark Philips Telus 8210 111 ST NW Edmonton AB Canada T6G 2C7 +1 (780) 434-4554 +1 (780) 434-5565 5

9.2.3.2 Like, Not Like, and IN

When combined with a where Like, Not like and In allow us to filter for a pattern, such as names that start with an M, or citys that don’t have a c as a second Letter.

-- Use of LIKE
Select * 
FROM customers
WHERE CustomerId LIKE "_9";

-- The Underscore makes space for 1 didgt, a % makes space for multiple 
Table 9.8: 5 records
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
19 Tim Goyer Apple Inc. 1 Infinite Loop Cupertino CA USA 95014 +1 (408) 996-1010 +1 (408) 996-1011 3
29 Robert Brown NA 796 Dundas Street West Toronto ON Canada M6J 1V1 +1 (416) 363-8888 NA 3
39 Camille Bernard NA 4, Rue Milton Paris NA France 75009 +33 01 49 70 65 65 NA 4
49 Stanisław Wójcik NA Ordynacka 10 Warsaw NA Poland 00-358 +48 22 828 37 39 NA 4
59 Puja Srivastava NA 3,Raj Bhavan Road Bangalore NA India 560001 +91 080 22289999 NA 3

9.2.4 Is Null and Is Not Null

using is null with a where gives all the entries with a null value in that entry. Is not null does the oppisite

-- Use of IS NULL
Select * 
FROM customers
WHERE Company IS NULL;
Table 9.9: Displaying records 1 - 10
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
2 Leonie Köhler NA Theodor-Heuss-Straße 34 Stuttgart NA Germany 70174 +49 0711 2842222 NA 5
3 François Tremblay NA 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 NA 3
4 Bjørn Hansen NA Ullevålsveien 14 Oslo NA Norway 0171 +47 22 44 22 22 NA 4
6 Helena Holý NA Rilská 3174/6 Prague NA Czech Republic 14300 +420 2 4177 0449 NA 5
7 Astrid Gruber NA Rotenturmstraße 4, 1010 Innere Stadt Vienne NA Austria 1010 +43 01 5134505 NA 5
8 Daan Peeters NA Grétrystraat 63 Brussels NA Belgium 1000 +32 02 219 03 03 NA 4
9 Kara Nielsen NA Sønder Boulevard 51 Copenhagen NA Denmark 1720 +453 3331 9991 NA 4
13 Fernanda Ramos NA Qe 7 Bloco G Brasília DF Brazil 71020-677 +55 (61) 3363-5547 +55 (61) 3363-7855 4
18 Michelle Brooks NA 627 Broadway New York NY USA 10012-2612 +1 (212) 221-3546 +1 (212) 221-4679 3
20 Dan Miller NA 541 Del Medio Avenue Mountain View CA USA 94040-111 +1 (650) 644-3358 NA 4

9.2.5 Order By (Sorting)

The ORDER BY keyword is used to sort results of one or more fields. When used on its own, it is written after the FROM statement, We could also add the ASC keyword to our query to clarify that we are sorting in ascending order. We can use the DESC keyword to sort the results in descending order. ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, etc. To specify multiple fields, we separate the field names with a comma. The second field we sort by can be thought of as a tie-breaker when the first field is not decisive in telling the order.

-- Order BY customer name from A-Z
Select *
FROM customers
Order By FirstName;
Table 9.10: Displaying records 1 - 10
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
32 Aaron Mitchell NA 696 Osborne Street Winnipeg MB Canada R3L 2B9 +1 (204) 452-6452 NA 4
11 Alexandre Rocha Banco do Brasil S.A. Av. Paulista, 2022 São Paulo SP Brazil 01310-200 +55 (11) 3055-3278 +55 (11) 3055-8131 5
7 Astrid Gruber NA Rotenturmstraße 4, 1010 Innere Stadt Vienne NA Austria 1010 +43 01 5134505 NA 5
4 Bjørn Hansen NA Ullevålsveien 14 Oslo NA Norway 0171 +47 22 44 22 22 NA 4
39 Camille Bernard NA 4, Rue Milton Paris NA France 75009 +33 01 49 70 65 65 NA 4
8 Daan Peeters NA Grétrystraat 63 Brussels NA Belgium 1000 +32 02 219 03 03 NA 4
20 Dan Miller NA 541 Del Medio Avenue Mountain View CA USA 94040-111 +1 (650) 644-3358 NA 4
56 Diego Gutiérrez NA 307 Macacha Güemes Buenos Aires NA Argentina 1106 +54 (0)11 4311 4333 NA 4
40 Dominique Lefebvre NA 8, Rue Hanovre Paris NA France 75002 +33 01 47 42 71 71 NA 4
10 Eduardo Martins Woodstock Discos Rua Dr. Falcão Filho, 155 São Paulo SP Brazil 01007-010 +55 (11) 3033-5446 +55 (11) 3033-4564 4

same thing, but ordered from Z-A

-- Order BY customer name from Z-A
Select *
FROM customers
Order By FirstName DESC;
Table 9.11: Displaying records 1 - 10
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
42 Wyatt Girard NA 9, Place Louis Barthou Bordeaux NA France 33000 +33 05 56 96 96 96 NA 3
25 Victor Stevens NA 319 N. Frances Street Madison WI USA 53703 +1 (608) 257-0597 NA 5
19 Tim Goyer Apple Inc. 1 Infinite Loop Cupertino CA USA 95014 +1 (408) 996-1010 +1 (408) 996-1011 3
44 Terhi Hämäläinen NA Porthaninkatu 9 Helsinki NA Finland 00530 +358 09 870 2000 NA 3
54 Steve Murray NA 110 Raeburn Pl Edinburgh NA United Kingdom EH4 1HH +44 0131 315 3300 NA 5
49 Stanisław Wójcik NA Ordynacka 10 Warsaw NA Poland 00-358 +48 22 828 37 39 NA 4
12 Roberto Almeida Riotur Praça Pio X, 119 Rio de Janeiro RJ Brazil 20040-020 +55 (21) 2271-7000 +55 (21) 2271-7070 3
29 Robert Brown NA 796 Dundas Street West Toronto ON Canada M6J 1V1 +1 (416) 363-8888 NA 3
26 Richard Cunningham NA 2211 W Berry Street Fort Worth TX USA 76110 +1 (817) 924-7272 NA 4
59 Puja Srivastava NA 3,Raj Bhavan Road Bangalore NA India 560001 +91 080 22289999 NA 3

Ordering 2 variables

-- Order BY customer last name from Z-A, if same last name, use first name
Select *
FROM customers
Order By LastName DESC, FirstName DESC;
Table 9.12: Displaying records 1 - 10
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
37 Fynn Zimmermann NA Berger Straße 10 Frankfurt NA Germany 60316 +49 069 40598889 NA 3
49 Stanisław Wójcik NA Ordynacka 10 Warsaw NA Poland 00-358 +48 22 828 37 39 NA 4
5 František Wichterlová JetBrains s.r.o. Klanova 9/506 Prague NA Czech Republic 14700 +420 2 4172 5555 +420 2 4172 5555 4
48 Johannes Van der Berg NA Lijnbaansgracht 120bg Amsterdam VV Netherlands 1016 +31 020 6223130 NA 5
3 François Tremblay NA 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 NA 3
55 Mark Taylor NA 421 Bourke Street Sidney NSW Australia 2010 +61 (02) 9332 3633 NA 4
33 Ellie Sullivan NA 5112 48 Street Yellowknife NT Canada X1A 1N6 +1 (867) 920-2233 NA 3
25 Victor Stevens NA 319 N. Frances Street Madison WI USA 53703 +1 (608) 257-0597 NA 5
59 Puja Srivastava NA 3,Raj Bhavan Road Bangalore NA India 560001 +91 080 22289999 NA 3
17 Jack Smith Microsoft Corporation 1 Microsoft Way Redmond WA USA 98052-8300 +1 (425) 882-8080 +1 (425) 882-8081 5

9.3 Grouping and aggregating

9.3.1 Group By

We’ll often need to summarize data for a particular group of results. SQL allows us to group with the GROUP BY clause. SQL will return an error if we try to SELECT a field that is not in our GROUP BY clause. GROUP BY fits into our order after FROM and before all other clauses. T0 group by 2 2 variables, use a comma to seperate

-- Create a summary table grouped by country with there average customer ID
Select Country, Avg(customerId)
FROM customers
GROUP BY Country;
Table 9.13: Displaying records 1 - 10
Country Avg(customerId)
Argentina 56.000
Australia 55.000
Austria 7.000
Belgium 8.000
Brazil 9.400
Canada 23.375
Chile 57.000
Czech Republic 5.500
Denmark 9.000
Finland 44.000

9.3.2 Aggregate Functions

9.3.2.1 Avg()

Avg can only be used with numeric variables. provides an average of all values in the selected column.

-- Use of Avg to find the average of the customer Id's
Select AVG(CustomerId) AS avg_customerId
FROM customers;
Table 9.14: 1 records
avg_customerId
30

9.3.2.2 SUM()

Can only be used with numerical data. Provides a sum of all the values in the selected column.

-- Use of Sum to find the sum of the customer Id's
Select SUM(CustomerId) AS sum_customerId
FROM customers;
Table 9.15: 1 records
sum_customerId
1770

9.3.2.3 MIN() and MAX()

minimum and maximum will give the record that is figuratively the lowest or highest. Lowest can mean the letter A when dealing with strings or the earliest date when dealing with dates. And, of course, with numbers, it is the highest or the lowest number.

-- Use of min to find the lowest customer Id
Select MIN(CustomerId) AS min_customerId
FROM customers;
Table 9.16: 1 records
min_customerId
1
-- Use of Max to find the highest customer Id
Select MAX(CustomerId) AS max_customerId
FROM customers;
Table 9.17: 1 records
max_customerId
59

9.3.2.4 Count()

Using count with a field name count the number of values in a field, count with an * counts the records in a table

-- Count the records of First names and city
SELECT COUNT(FirstName) AS count_Name, Count(city) AS count_city
From customers;
Table 9.18: 1 records
count_Name count_city
59 59

same thing but with distinct

-- Count the records of distinct First names and city
SELECT COUNT( Distinct FirstName) AS count_distinct_Name, Count(Distinct city) AS count_distinct_city
From customers;
Table 9.19: 1 records
count_distinct_Name count_distinct_city
57 53

9.3.3 Where with Aggregate Functions

We can easily use where with aggregate functions

-- Use find the average customer ID for people whose name starts with M
Select avg(CustomerId) AS avg_customerId_M_names
FROM customers
Where FirstName Like 'M%';
Table 9.20: 1 records
avg_customerId_M_names
36

9.3.4 Round

the round function can be used with our aggregrate functions to round out answers

-- Use of round the result to the tenth decimal place (there will be no decimal, done for demo)
Select ROUND(MAX(customerId),  1) AS max_customerId
FROM customers
Table 9.21: 1 records
max_customerId
59

9.3.5 Having (where for groups)

In SQL, we can’t filter aggregate functions with WHERE clauses. That means that if we want to filter based on the result of an aggregate function, we need another way. Groups have their own special filtering word: HAVING. WHERE filters individual records while HAVING filters grouped records.

-- Create a summary table grouped by country with there average customer ID, and filter for average ID's greater than 10
Select Country, Avg(customerId)
FROM customers
GROUP BY Country
HAVING Avg(customerId) > 10;
Table 9.22: Displaying records 1 - 10
Country Avg(customerId)
Argentina 56.000
Australia 55.000
Canada 23.375
Chile 57.000
Finland 44.000
France 41.000
Germany 28.250
Hungary 45.000
India 58.500
Ireland 46.000

9.4 Joining tables

In relational databases, data is often distributed in many related tables. A table is associated with another table using foreign keys.

9.4.1 Inner join

To query data from multiple tables, you use INNER JOIN clause. The INNER JOIN clause combines columns from correlated tables. The INNER JOIN clause returns rows from the A table that has the corresponding row in B table. This logic is applied if you join more than 2 tables.

-- Join the "tracks" and "albums" tables 
SELECT *
FROM tracks
INNER JOIN albums ON albums.albumid = tracks.albumid;
Table 9.23: Displaying records 1 - 10
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice AlbumId Title ArtistId
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99 1 For Those About To Rock We Salute You 1
6 Put The Finger On You 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205662 6713451 0.99 1 For Those About To Rock We Salute You 1
7 Let’s Get It Up 1 1 1 Angus Young, Malcolm Young, Brian Johnson 233926 7636561 0.99 1 For Those About To Rock We Salute You 1
8 Inject The Venom 1 1 1 Angus Young, Malcolm Young, Brian Johnson 210834 6852860 0.99 1 For Those About To Rock We Salute You 1
9 Snowballed 1 1 1 Angus Young, Malcolm Young, Brian Johnson 203102 6599424 0.99 1 For Those About To Rock We Salute You 1
10 Evil Walks 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263497 8611245 0.99 1 For Those About To Rock We Salute You 1
11 C.O.D. 1 1 1 Angus Young, Malcolm Young, Brian Johnson 199836 6566314 0.99 1 For Those About To Rock We Salute You 1
12 Breaking The Rules 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263288 8596840 0.99 1 For Those About To Rock We Salute You 1
13 Night Of The Long Knives 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205688 6706347 0.99 1 For Those About To Rock We Salute You 1
14 Spellbound 1 1 1 Angus Young, Malcolm Young, Brian Johnson 270863 8817038 0.99 1 For Those About To Rock We Salute You 1

Here is an example with 3 tables:

-- Join the 3 tables
SELECT trackid, tracks.name AS track, albums.title AS album, artists.name AS artist
FROM tracks
    INNER JOIN albums ON albums.albumid = tracks.albumid
    INNER JOIN artists ON artists.artistid = albums.artistid;
Table 9.24: Displaying records 1 - 10
TrackId track album artist
1 For Those About To Rock (We Salute You) For Those About To Rock We Salute You AC/DC
6 Put The Finger On You For Those About To Rock We Salute You AC/DC
7 Let’s Get It Up For Those About To Rock We Salute You AC/DC
8 Inject The Venom For Those About To Rock We Salute You AC/DC
9 Snowballed For Those About To Rock We Salute You AC/DC
10 Evil Walks For Those About To Rock We Salute You AC/DC
11 C.O.D. For Those About To Rock We Salute You AC/DC
12 Breaking The Rules For Those About To Rock We Salute You AC/DC
13 Night Of The Long Knives For Those About To Rock We Salute You AC/DC
14 Spellbound For Those About To Rock We Salute You AC/DC
-- Join the 3 tables
SELECT *
FROM playlists
INNER JOIN playlist_track ON playlist_track.PlayListId =  playlists.PlayListId
Table 9.25: Displaying records 1 - 10
PlaylistId Name PlaylistId TrackId
1 Music 1 1
1 Music 1 2
1 Music 1 3
1 Music 1 4
1 Music 1 5
1 Music 1 6
1 Music 1 7
1 Music 1 8
1 Music 1 9
1 Music 1 10

9.4.2 Left join (or right join)

The LEFT JOIN clause is an optional clause of the SELECT statement. You use the LEFT JOIN clause to query data from multiple related tables. All rows in table A are included in the result set whether there are matching rows in table B or not. In case you have a WHERE clause in the statement, the search_condition in the WHERE clause is applied after the matching of the LEFT JOIN clause completes

Example of a left join where some records are in one table but not the other (i.e., not just what should be an inner join but that is done with left join)

-- Left Join
SELECT
   artists.ArtistId
   , AlbumId
FROM
   artists
LEFT JOIN albums ON
   albums.ArtistId = artists.ArtistId
Table 9.26: Displaying records 1 - 10
ArtistId AlbumId
1 1
1 4
2 2
2 3
3 5
4 6
5 7
6 8
6 34
7 9

As you can see in the table above, some artist Id’s don’t have a matching Album ID, these are still included in the table since they are in the first table. This is functionally the same as a left join in R. Keeps all the observations in the first data set, and all the observations in the second that have a match in the first. The reverse is true for a Right Join, as shown below.

-- Right Join
SELECT
   artists.ArtistId
   , AlbumId
FROM
   artists
RIGHT JOIN albums ON
   albums.ArtistId = artists.ArtistId
Table 9.27: Displaying records 1 - 10
ArtistId AlbumId
1 1
1 4
2 2
2 3
3 5
4 6
5 7
6 8
6 34
7 9

9.4.3 Other types of joins

9.4.3.1 Cross Join

Suppose, the A table has N rows and B table has M rows, the CROSS JOIN of these two tables will produce a result set that contains NxM rows. It is used the same as an Inner join or a Left Join.

-- Cross Join (this table reveals no real informtion, other than how to use a cross join)
SELECT CustomerId, EmployeeId
FROM customers
CROSS JOIN employees;
Table 9.28: Displaying records 1 - 10
CustomerId EmployeeId
1 1
1 2
1 6
1 3
1 4
1 5
1 7
1 8
3 1
3 2

9.4.3.2 Self Join

The self-join is a special kind of joins that allow you to join a table to itself using either LEFT JOIN or INNER JOIN clause. You use self-join to create a result set that joins the rows with the other rows within the same table. Because you cannot refer to the same table more than one in a query, you need to use a table alias to assign the table a different name when you use self-join. The self-join compares values of the same or different columns in the same table. Only one table is involved in the self-join.

-- use of a self join to join to join rows with other rows in the same table 
SELECT m.firstname || ' ' || m.lastname AS 'Manager',
       e.firstname || ' ' || e.lastname AS 'Direct report' 
FROM employees e
INNER JOIN employees m ON m.employeeid = e.reportsto
ORDER BY manager;
Table 9.29: 7 records
Manager Direct report
Andrew Adams Nancy Edwards
Andrew Adams Michael Mitchell
Michael Mitchell Robert King
Michael Mitchell Laura Callahan
Nancy Edwards Jane Peacock
Nancy Edwards Margaret Park
Nancy Edwards Steve Johnson

9.4.3.3 Full Outer Join

the result of the FULL OUTER JOIN is a combination of a LEFT JOIN and a RIGHT JOIN. The result set of the full outer join has NULL values for every column of the table that does not have a matching row in the other table. For the matching rows, the FULL OUTER JOIN produces a single row with values from columns of the rows in both tables. This is the same as in R

-- Right Join
SELECT *
FROM tracks
FULL OUTER JOIN albums ON albums.AlbumId = tracks.AlbumId;
Table 9.30: Displaying records 1 - 10
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice AlbumId Title ArtistId
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99 1 For Those About To Rock We Salute You 1
2 Balls to the Wall 2 2 1 NA 342562 5510424 0.99 2 Balls to the Wall 2
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99 3 Restless and Wild 2
4 Restless and Wild 3 2 1 F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman 252051 4331779 0.99 3 Restless and Wild 2
5 Princess of the Dawn 3 2 1 Deaffy & R.A. Smith-Diesel 375418 6290521 0.99 3 Restless and Wild 2
6 Put The Finger On You 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205662 6713451 0.99 1 For Those About To Rock We Salute You 1
7 Let’s Get It Up 1 1 1 Angus Young, Malcolm Young, Brian Johnson 233926 7636561 0.99 1 For Those About To Rock We Salute You 1
8 Inject The Venom 1 1 1 Angus Young, Malcolm Young, Brian Johnson 210834 6852860 0.99 1 For Those About To Rock We Salute You 1
9 Snowballed 1 1 1 Angus Young, Malcolm Young, Brian Johnson 203102 6599424 0.99 1 For Those About To Rock We Salute You 1
10 Evil Walks 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263497 8611245 0.99 1 For Those About To Rock We Salute You 1