One-to-Many and Many-to-Many Joins

Introduction

Previously, you learned about the typical case where one joins on a primary or foreign key. In this section, you'll explore other types of joins using one-to-many and many-to-many relationships!

Objectives

You will be able to:

Explain one-to-many and many-to-many joins as well as implications for the size of query results
Query data using one-to-many and many-to-many joins

One-to-Many and Many-to-Many relationships

So far, you've seen a couple of different kinds of join statements: LEFT JOIN and INNER JOIN (aka, JOIN). Both of these refer to the way in which you would like to define your join based on the tables and their shared information. Another perspective on this is the number of matches between the tables based on your defined links with the keywords ON or USING.

You have also seen the typical case where one joins on a primary or foreign key. For example, when you join on customerID or employeeID, this value should be unique to that table. As such, your joins have been very similar to using a dictionary to find additional information associated with that record. In cases where there are multiple entries in either table for the field (column) you are joining on, you will similarly be given multiple rows in your resulting view, one for each of these entries.

For example, let's say you have another table 'restaurants' that has many columns including name, city, and rating. If you were to join this 'restaurants' table with the offices table using the shared city column, you might get some unexpected behavior. That is, in the office table, there is only one office per city. However, because there will likely be more than one restaurant for each of these cities in your second table, you will get unique combinations of Offices and Restaurants from your join. If there are 513 restaurants for Boston in your restaurant table and 1 office for Boston, your joined table will have each of these 513 rows, one for each restaurant along with the one office.

If you had 2 offices for Boston and 513 restaurants, your join would have 1026 rows for Boston; 513 for each restaurant along with the first office and 513 for each restaurant with the second office. Three offices in Boston would similarly produce 1539 rows; one for each unique combination of restaurants and offices. This is where you should be particularly careful of many to many joins as the resulting set size can explode drastically potentially consuming vast amounts of memory and other resources.

Connecting to the Database

import sqlite3
import pandas as pd

conn = sqlite3.connect('data.sqlite')
cur = conn.cursor()

Checking Sizes of Resulting Joins

The original tables:

cur.execute('SELECT * FROM offices;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 8

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	officeCode	city	phone	addressLine1	addressLine2	state	country	postalCode	territory
0	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA
1	2	Boston	+1 215 837 0825	1550 Court Place	Suite 102	MA	USA	02107	NA
2	3	NYC	+1 212 555 3000	523 East 53rd Street	apt. 5A	NY	USA	10022	NA
3	4	Paris	+33 14 723 4404	43 Rue Jouffroy D'abbans			France	75017	EMEA
4	5	Tokyo	+81 33 224 5000	4-1 Kioicho		Chiyoda-Ku	Japan	102-8578	Japan

cur.execute('SELECT * FROM employees;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 23

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	employeeNumber	lastName	firstName	extension	email	officeCode	reportsTo	jobTitle
0	1002	Murphy	Diane	x5800	[email protected]	1		President
1	1056	Patterson	Mary	x4611	[email protected]	1	1002	VP Sales
2	1076	Firrelli	Jeff	x9273	[email protected]	1	1002	VP Marketing
3	1088	Patterson	William	x4871	[email protected]	6	1056	Sales Manager (APAC)
4	1102	Bondur	Gerard	x5408	[email protected]	4	1056	Sale Manager (EMEA)

A One-to-One Join...

cur.execute('SELECT * FROM offices JOIN employees USING(officeCode);')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 23

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	officeCode	city	phone	addressLine1	addressLine2	state	country	postalCode	territory	employeeNumber	lastName	firstName	extension	email	reportsTo	jobTitle
0	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	1002	Murphy	Diane	x5800	[email protected]		President
1	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	1056	Patterson	Mary	x4611	[email protected]	1002	VP Sales
2	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	1076	Firrelli	Jeff	x9273	[email protected]	1002	VP Marketing
3	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	1143	Bow	Anthony	x5428	[email protected]	1056	Sales Manager (NA)
4	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	1165	Jennings	Leslie	x3291	[email protected]	1143	Sales Rep

A One-to-Many Join

Here, we'll join the products table with the productlines table. There are only a few product lines that will be matched to each product. As a result, the product line descriptions will be repeated in your resulting view.

Let's take a look at the individual products and productlines tables first.

cur.execute('SELECT * FROM products;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 110

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	productCode	productName	productLine	productScale	productVendor	productDescription	quantityInStock	buyPrice	MSRP
0	S10_1678	1969 Harley Davidson Ultimate Chopper	Motorcycles	1:10	Min Lin Diecast	This replica features working kickstand, front...	7933	48.81	95.70
1	S10_1949	1952 Alpine Renault 1300	Classic Cars	1:10	Classic Metal Creations	Turnable front wheels; steering function; deta...	7305	98.58	214.30
2	S10_2016	1996 Moto Guzzi 1100i	Motorcycles	1:10	Highway 66 Mini Classics	Official Moto Guzzi logos and insignias, saddl...	6625	68.99	118.94
3	S10_4698	2003 Harley-Davidson Eagle Drag Bike	Motorcycles	1:10	Red Start Diecast	Model features, official Harley Davidson logos...	5582	91.02	193.66
4	S10_4757	1972 Alfa Romeo GTA	Classic Cars	1:10	Motor City Art Classics	Features include: Turnable front wheels; steer...	3252	85.68	136.00

cur.execute('SELECT * FROM productlines;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 7

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	productLine	textDescription
0	Classic Cars	Attention car enthusiasts: Make your wildest c...
1	Motorcycles	Our motorcycles are state of the art replicas ...
2	Planes	Unique, diecast airplane and helicopter replic...
3	Ships	The perfect holiday or anniversary gift for ex...
4	Trains	Model trains are a rewarding hobby for enthusi...

Here is the One-to-Many Join:

cur.execute("""SELECT * 
               FROM products
               JOIN productlines
               USING(productLine);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 110

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	productCode	productName	productLine	productScale	productVendor	productDescription	quantityInStock	buyPrice	MSRP	textDescription
0	S10_1678	1969 Harley Davidson Ultimate Chopper	Motorcycles	1:10	Min Lin Diecast	This replica features working kickstand, front...	7933	48.81	95.70	Our motorcycles are state of the art replicas ...
1	S10_1949	1952 Alpine Renault 1300	Classic Cars	1:10	Classic Metal Creations	Turnable front wheels; steering function; deta...	7305	98.58	214.30	Attention car enthusiasts: Make your wildest c...
2	S10_2016	1996 Moto Guzzi 1100i	Motorcycles	1:10	Highway 66 Mini Classics	Official Moto Guzzi logos and insignias, saddl...	6625	68.99	118.94	Our motorcycles are state of the art replicas ...
3	S10_4698	2003 Harley-Davidson Eagle Drag Bike	Motorcycles	1:10	Red Start Diecast	Model features, official Harley Davidson logos...	5582	91.02	193.66	Our motorcycles are state of the art replicas ...
4	S10_4757	1972 Alfa Romeo GTA	Classic Cars	1:10	Motor City Art Classics	Features include: Turnable front wheels; steer...	3252	85.68	136.00	Attention car enthusiasts: Make your wildest c...

A Many-to-Many Join

A many-to-many join is as it sounds; there are multiple entries for the shared field in both tables. While somewhat contrived, we can see this through the example below, joining the offices and customers table based on the state field. For example, there are 2 offices in MA and 9 customers in MA. Joining the two tables by state will result in 18 rows associated with MA; one for each customer combined with the first office, and then another for each customer combined with the second option. This is not a particularly useful join without applying some additional aggregations or pivots, but can also demonstrate how a poorly written query can go wrong. For example, if there are a large number of occurrences in both tables, such as tens of thousands, then a many-to-many join could result in billions of resulting rows. Poorly conceived joins can cause a severe load to be put on the database, causing slow execution time and potentially even tying up database resources for other analysts who may be using the system.

cur.execute("""SELECT * FROM offices
                        JOIN customers
                        USING(state);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()

Number of results: 254

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	officeCode	city	phone	addressLine1	addressLine2	state	country	postalCode	territory	customerNumber	...	contactLastName	contactFirstName	phone	addressLine1	city	postalCode	country	salesRepEmployeeNumber	creditLimit
0	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	124	...	Nelson	Susan	4155551450	5677 Strong St.	San Rafael	97562	USA	1165	210500.00
1	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	129	...	Murphy	Julie	6505555787	5557 North Pendale Street	San Francisco	94217	USA	1165	64600.00
2	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	161	...	Hashimoto	Juri	6505556809	9408 Furth Circle	Burlingame	94217	USA	1165	84600.00
3	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	205	...	Young	Julie	6265557265	78934 Hillside Dr.	Pasadena	90003	USA	1166	90700.00
4	1	San Francisco	+1 650 219 4782	100 Market Street	Suite 300	CA	USA	94080	NA	219	...	Young	Mary	3105552373	4097 Douglas Av.	Glendale	92561	USA	1166	11000.00

5 rows × 21 columns

len(df[df.state=='MA'])

Summary

In this section, you expanded your join knowledge to one-to-many and many-to-many joins!

learn-co-curriculum / dsc-enterprise-chevron-one-to-many-and-many-to-many-joins Goto Github PK

dsc-enterprise-chevron-one-to-many-and-many-to-many-joins's Introduction

One-to-Many and Many-to-Many Joins

Introduction

Objectives

One-to-Many and Many-to-Many relationships

Connecting to the Database

Checking Sizes of Resulting Joins

The original tables:

A One-to-One Join...

A One-to-Many Join

Here is the One-to-Many Join:

A Many-to-Many Join

Summary

dsc-enterprise-chevron-one-to-many-and-many-to-many-joins's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent