Coder Social home page Coder Social logo

anthonydb / practical-sql-2 Goto Github PK

View Code? Open in Web Editor NEW
409.0 18.0 386.0 279.96 MB

Code and Data for the Second Edition of "Practical SQL" by Anthony DeBarros, published by No Starch Press (2022).

License: Other

PLpgSQL 100.00%
sql postgresql data-analysis

practical-sql-2's Introduction

Practical SQL, 2nd Edition

A Beginner's Guide to Storytelling with Data

Practical SQL, 2nd Edition is a beginner-friendly guide to the database programming language SQL. Author Anthony DeBarros starts with beginner SQL concepts such as queries, data types, and basic math and aggregation, and then works through intermediate and advanced topics including statistics, cleaning data, GIS, and automating tasks. Along the way, you'll use real-world data from the U.S. Census and other government agencies and learn the fundamentals of good database design. This is a book not only about SQL but about best practices for using it for thorough, accurate data analysis.

A Note About Editions

If you're reading the first edition of the book, published in 2018, please use the code and data in the first edition repository. If the cover of your copy does not say, "2nd Edition," then you're using the first edition.

Who Is This Book For?

Practical SQL is ideal for beginners as well as those who know some SQL and want to go deeper.

Which Database Does The Book Use?

We use PostgreSQL, which is free and open source. PostgreSQL is used by some of the world's largest companies. Its SQL syntax adheres closely to the ANSI SQL standard, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others. Note that Microsoft SQL Server employs a variant of SQL called T-SQL, which is not covered by Practical SQL.

What's In This Repository?

Code: All the SQL statements and command-line listings used in each chapter, organized by chapter folders.

Data: CSV and JSON files plus GIS shapefiles for you to import, also organized by chapter. NOTE! See the warning below about opening CSV files with Excel or text editors in the section on Getting the Code and Data.

Exercises: The "Try It Yourself" questions and answers for each chapter, listed separately. Try working through the questions before peeking at the answers.

Software Installation Updates: Over time, the instructions for installing PostgreSQL and additional components may change. You'll find updates noted at software-installation-updates.md.

FAQ, Updates, and Errata: Answers to frequently asked questions, updates, and corrections are noted at faq-updates-errata.md.

Resources: Updates to the book's Appendix on Additional PostgreSQL Resources at resources.md.

What's Covered in Each Chapter?

  • Chapter 1: Setting Up Your Coding Environment
  • Chapter 2: Creating Your First Database and Table
  • Chapter 3: Beginning Data Exploration with SELECT
  • Chapter 4: Understanding Data Types
  • Chapter 5: Importing and Exporting Data
  • Chapter 6: Basic Math and Stats with SQL
  • Chapter 7: Joining Tables in a Relational Database
  • Chapter 8: Table Design That Works for You
  • Chapter 9: Extracting Information by Grouping and Summarizing
  • Chapter 10: Inspecting and Modifying Data
  • Chapter 11: Statistical Functions In SQL
  • Chapter 12: Working With Dates and Times
  • Chapter 13: Advanced Query Techniques
  • Chapter 14: Mining Text to Find Meaningful Data
  • Chapter 15: Analyzing Spatial Data with PostGIS
  • Chapter 16: Working With JSON Data
  • Chapter 17: Saving Time with Views, Functions, and Triggers
  • Chapter 18: Using PostgreSQL from the Command Line
  • Chapter 19: Maintaining Your Database
  • Chapter 20: Telling Your Data's Story
  • Appendix: Additional PostgreSQL Resources

How Do I Get the Code and Data?

Non-GitHub Users

You can obtain all the code and data at once by downloading this repository as a .zip file. To do that:

  • Click the Code button at top right.
  • Click Download ZIP
  • Unzip the file on your computer. Place it in a directory that's easy to remember so you can reference it during the exercises that include importing data to PostgreSQL.
  • For additional instructions, please read Chapter 1 in the book.

Warning about CSV files!: Opening CSV files with Excel could lead to data loss. Excel will remove leading zeros from numbers that are intended to be stored as text, such as ZIP codes. To view the contents of a CSV file, only do so with a plain-text editor and be careful not to save the file in an encoding other than UTF-8.

GitHub Users

GitHub users may want to clone the repository locally and occasionally perform a git pull to receive updates.

Where Can I Buy the Book?

Practical SQL, 2nd Edition is available in PDF, .mobi, .epub, and classic print formats.

How Can I Get Help?

Questions? Please open an issue in this repository by navigating to Issues and clicking New Issue. Fill out the form, and I will answer usually within 1 to 2 business days. For other types of inquiries, please email [email protected].

practical-sql-2's People

Contributors

anthonydb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical-sql-2's Issues

Chapter 12

The export to CSV in listing 12-9 on my linux machine is doing weird stuff, the query works in pgadmin but the exported CSV file has the values mixed up. Hour 0 starts with the count for hour 8, 1 with hour 9's and so on, anyone else had this issue? Had to use psql utility in pgadmin to actually get the copy to work which worked for earlier chapters.

Difference between `CAST()` and `::` on page 56 need more clarification

If I run:

SELECT timestamp_column, CAST(timestamp_column AS varchar(10))
FROM date_time_types;

I am receiving as output original column and modified column.

If I run:

SELECT timestamp_column::varchar(10)
FROM date_time_types;

I am receiving as output only modifying column. So, it seems that those two examples are not identical. I think text needs to clarify why the output of those two examples are different.

Incorrect # of listing on p.240 and p.241

In section "Tabulating City Temperature Readings" on p.240 (bottom) and p.241 you are referring to "Listing 13-18" as to the example of crostab but the listing 13-18 is for creating table and loading data. I think you meant instead "Listing 13-17".

UNION missing in Index

A search for UNION led me to discover that within a few pages, there are multiple Index entries missing:
EXCEPT 111
INTERSECT 111
UNION 109
Maybe a new index is warranted?
Thanks!

Some errata for 2nd Edition

Hi. Here's some errata I found while translating! ๐Ÿ˜

Chapter 10

180 page
Description for Listing 10-26.
"The second statement renames the copy we made with Listing 10-24 to the original name of the table"
=> "The second statement renames the copy we made with Listing 10-25 to the original name of the table"

Chapter 12

214 page
Link for data dictionary https://www.nyc.gov/html/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf is unavailable.
New location is https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf

Chapter 14

267 page
Above Listing 14-18, It says "It's pipe-delimited and uses an ampersand for quoting." but Listing 14-18 uses @ for quoting.

Charpter 7 Issue on Listing 7-18: Performing math on joined Census population estimates tables

Having trouble with data, code, or exercises in Practical SQL, 2nd Edition? I'm glad to help. Please answer these questions, and I'll typically reply within one to two business days.

Please include the chapter number in your issue title. For example: "Chapter 13: Regular Expression Question"

General

Please answer all of these as they're essential for troubleshooting.

What's your operating system (e.g. Windows 10, macOS Monterey, Ubuntu, etc.)?

Windows

Did you install PostgreSQL, PostGIS and pgAdmin according to the steps in Chapter 1? If not, please describe your installation (e.g., using RDS on Amazon Web Services).

Yes

Which versions of PostgreSQL and pgAdmin 4 are you using? If you aren't sure, for PostgreSQL run the SQL command SELECT version(); and for pgAdmin 4, navigate to "About pgAdmin 4" under the "Help" menu.

PostgreSQL 16, pgAdmin 7.2

Did you download the book's code examples and data from GitHub using the directions in Chapter 1? If not, please describe how you loaded the material on your computer.

Yes

Issue or Question

Chapter, page and code listing number:

Chapter 7, page 112 and 113.

Describe the issue or your question:

in the table us_counties_pop_est_2010 you are creating 2 columns with wrong data type, state_fips and county_fips as text.
When you try to run the SELECT from page, you get this error:


ERROR: operator does not exist: integer = text
LINE 10: ON c2019.state_fips = c2010.state_fips
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

SQL state: 42883
Character: 444


Changing the last part of the SELECTION to CAST to integer, makes it to work:

FROM us_counties_pop_est_2019 AS c2019
JOIN us_counties_pop_est_2010 AS c2010
ON c2019.state_fips = c2010.state_fips::int
AND c2019.county_fips = c2010.county_fips::int
ORDER BY pct_change DESC;

Chapter 5, page 69, listing 5-4: Code download does not match book

In the book, start_date is created as the date data type, but in the .sql file for the chapter, it comes up as text type.

-- Listing 5-4: Creating a table to track supervisor salaries

CREATE TABLE supervisor_salaries (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    town text,
    county text,
    supervisor text,
    start_date text,
    salary numeric(10,2),
    benefits numeric(10,2)
);

Windows File/Folder Permissions for COPY

If you're a Windows user, you may need to set permissions for the folder holding the code and data for the book. If you don't, you may see an error similar to this:

ERROR: could not open file "C:\YourDirectory\us_counties_2010.csv" for reading: Permission denied
HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
SQL state: 42501

To correct this, follow the directions in the note on in Chapter 1 (Practical SQL 2nd edition), in the section "Downloading Code and Data from GitHub":

Windows users will need to provide permission for the PostgreSQL database you will
install to read and write to the contents of the practical-sql-2-master folder. To do
so, right-click the folder, click Properties, and click the Security tab. Click Edit and
then Add. Type the name Everyone into the object names box and click OK. Highlight
Everyone in the user list, select all boxes under Allow, and then click Apply and OK.

Chapter 14: Mining Text to Find Meaningful Data

General

Item Info
OS Windows 11 22H2
pgAdmin4 Version 6.19
pgsql PostgreSQL 15.2, compiled by Visual C++ build 1914, 64-bit

Issue or Question

Chapter, page and code listing number:

Thank you for writing such an informative book and providing a comprehensive set of guidelines. I have two small questions about the code examples.

Question 1

Chapter 14 - Pages 253, etc, Listing 14-12 (also involve other codes that contain \n)

UPDATE crime_reports
SET date_1 = 
(
    (regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}'))[1]
        || ' ' ||
    (regexp_match(original_text, '\/\d{2}\n(\d{4})'))[1] 
        ||' US/Eastern'
)::timestamptz
RETURNING crime_id, date_1, original_text;

Describe the issue or your question:

regexp_match(original_text, '\/\d{2}\n(\d{4})') return null

Reason: The problem is caused by Windows 11, and it can work well when I change it to regexp_match(original_text, '\/\d{2}\r\n(\d{4})'). (adding \r before \n)

Here are the shortcuts.

image

image

Question 2

Chapter 14 - Listing 14-22 (also involve other codes that invoke ts_headline())

SELECT president,
       speech_date,
       ts_headline(speech_text, to_tsquery('english', 'tax'),
                   'StartSel = <,
                    StopSel = >,
                    MinWords=5,
                    MaxWords=7,
                    MaxFragments=1')
FROM president_speeches
WHERE search_speech_text @@ to_tsquery('english', 'tax')
ORDER BY speech_date;

Describe the issue or your question:

If I do not add 'english' as the first parameter, the result of ts_headline() will be abnormal.

Reason: without the first parameter, this function does not work as expected in my environment. When I add the first parameter, it works well, i.e., ts_headline('english', speech_text, to_tsquery('english', 'tax'), ....

Here are the shortcuts.

image

image

I'm not sure if it only happens on my computer.

Encountering Error Exporting All Data Chapter 5: Pg. 72

What's your operating system (e.g. Windows 10, macOS Monterey, Ubuntu, etc.)?

Windows 10
Did you install PostgreSQL, PostGIS and pgAdmin according to the steps in Chapter 1? If not, please describe your installation (e.g., using RDS on Amazon Web Services).

Yes
Which versions of PostgreSQL and pgAdmin 4 are you using? If you aren't sure, for PostgreSQL run the SQL command SELECT version(); and for pgAdmin 4, navigate to "About pgAdmin 4" under the "Help" menu.

PostgreSql 16.2 pg admin version 8.2
Did you download the book's code examples and data from GitHub using the directions in Chapter 1? If not, please describe how you loaded the material on your computer.

Yes

Issue or Question

Issue
Chapter, page and code listing number:
5, 72, listing 5-8

Describe the issue or your question:
Whenever I try to export the table to the .txt file it comes up with this error. I have been stuck on this for quite some time and have changed the permissions to "everyone" on the file too but that also did not work.

Please paste the code and error message here. It's OK to alter identifying info such as a folder name

COPY us_counties_pop_est_2019
TO 'C:\Users\Admin\Desktop\SQL\us_counties_export.txt'
WITH (FORMAT CSV, HEADER, DELIMITER '|');

ERROR: could not open file "C:\Users\Admin\Desktop\SQL\us_counties_export.txt" for writing: Permission denied
HINT: COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.

SQL state: 42501

Language Pack missing from StackBuilder (Windows install)

To my Windows users:

Unfortunately, EDB -- the company that provides the Windows installer for PostgreSQL -- currently is not including the Language Pack in its StackBuilder installer. I have filed this ticket on the issue:

EnterpriseDB/edb-installers#80

If this issue arose while installing PostgreSQL downloaded from EDB, you can safely continue with the installation and skip the Language Pack for now. The Language Pack provides support for the Python programming language, which accounts for one small lesson late in the book, and its absence will not prevent you from running 99% of the code.

I'll post here when/if I get clarity on this from EDB.

Chapter-5, Median Script is not working in Postgres 14

I tried to run that median script in chapter 5 which gives me this error now :

ERROR: function array_append(anyarray, anyelement) does not exist
SQL state: 42883

I tried to search for a solution but people mentioned this script won't work on postgres 14.
So do i have to downgrade to version 13 or is there some alternate method to get this median script working or can i just skip this and go around working with percentile cont. funciton ? Or maybe m i doing something wrong here ?

Chapter 4 - Listing 4.2

Using PostgreSQL 15.3...
Nits which may confuse real beginners...
Listing 4.2 could use a "DROP TABLE IF EXISTS number_data_types;", or a different name to avoid table already exists error.

The INSERT enters the numbers as expected; but:
SELECT * FROM number_data_types;
displays:

 0.70   0.7   0.7
 2.14  2.14  2.14
 2.14  2.14  2.14

instead of the full digits. I see the same output in SQL-workbench/J's Data Explorer Data view...
Found SQL-workbench/J setting: Tools>Options...>Data formatting: Decimal digits 2 caused the above output.

For linear regression there is no need for independent or dependent variables to be normally distributed.

On p. 192 book states:

Statisticians also apply additional tests to data before accepting the results of regression analysis, including whether the variables follow the standard bell curve distribution...

There is no need for dependent or independent variables to be normally distributed. The residuals should be normally distributed.

Here is good explanation: https://www.statsimprove.com/en/linear-regression-should-dependent-and-independent-variables-be-distributed-normally/

More information can also be found with this googling: https://www.google.com/search?q=does+the+distribution+of+variables+matter+with+linear+regression&oq=does+the+distribution+of+variables+matter+with+linear+regression&aqs=chrome..69i57.15208j0j1&sourceid=chrome&ie=UTF-8

page-297 Listing 15-16// ERROR:

ERROR: contains: Operation on mixed SRID geometries (MultiPolygon, 0) != (Point, 4269)
I am getting this error for listing 15-16.
Shapfile was loaded successfully
I am using windows 11 machine.

Chapter 20: Which are the the resources for advanced statistics.

There is a note in Chapter 20 section of Identify Key Indicators and Trends over Time.
It says that See the appendix for PostgreSQL resources for advanced statistics.

I jumped to the appendix but not able to find out which resources are about advanced statistics.
Did I miss anything?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.