Coder Social home page Coder Social logo

anthonydb / practical-sql-2 Goto Github PK

View Code? Open in Web Editor NEW
429.0 429.0 405.0 279.96 MB

Code and Data for the Second Edition of "Practical SQL" by Anthony DeBarros, published by No Starch Press (2022).

License: Other

PLpgSQL 100.00%
data-analysis postgresql sql

practical-sql-2's People

Contributors

anthonydb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical-sql-2's Issues

Incorrect # of listing on p.240 and p.241

In section "Tabulating City Temperature Readings" on p.240 (bottom) and p.241 you are referring to "Listing 13-18" as to the example of crostab but the listing 13-18 is for creating table and loading data. I think you meant instead "Listing 13-17".

Language Pack missing from StackBuilder (Windows install)

To my Windows users:

Unfortunately, EDB -- the company that provides the Windows installer for PostgreSQL -- currently is not including the Language Pack in its StackBuilder installer. I have filed this ticket on the issue:

EnterpriseDB/edb-installers#80

If this issue arose while installing PostgreSQL downloaded from EDB, you can safely continue with the installation and skip the Language Pack for now. The Language Pack provides support for the Python programming language, which accounts for one small lesson late in the book, and its absence will not prevent you from running 99% of the code.

I'll post here when/if I get clarity on this from EDB.

UNION missing in Index

A search for UNION led me to discover that within a few pages, there are multiple Index entries missing:
EXCEPT 111
INTERSECT 111
UNION 109
Maybe a new index is warranted?
Thanks!

Some errata for 2nd Edition

Hi. Here's some errata I found while translating! ๐Ÿ˜

Chapter 10

180 page
Description for Listing 10-26.
"The second statement renames the copy we made with Listing 10-24 to the original name of the table"
=> "The second statement renames the copy we made with Listing 10-25 to the original name of the table"

Chapter 12

214 page
Link for data dictionary https://www.nyc.gov/html/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf is unavailable.
New location is https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf

Chapter 14

267 page
Above Listing 14-18, It says "It's pipe-delimited and uses an ampersand for quoting." but Listing 14-18 uses @ for quoting.

Chapter 12

The export to CSV in listing 12-9 on my linux machine is doing weird stuff, the query works in pgadmin but the exported CSV file has the values mixed up. Hour 0 starts with the count for hour 8, 1 with hour 9's and so on, anyone else had this issue? Had to use psql utility in pgadmin to actually get the copy to work which worked for earlier chapters.

For linear regression there is no need for independent or dependent variables to be normally distributed.

On p. 192 book states:

Statisticians also apply additional tests to data before accepting the results of regression analysis, including whether the variables follow the standard bell curve distribution...

There is no need for dependent or independent variables to be normally distributed. The residuals should be normally distributed.

Here is good explanation: https://www.statsimprove.com/en/linear-regression-should-dependent-and-independent-variables-be-distributed-normally/

More information can also be found with this googling: https://www.google.com/search?q=does+the+distribution+of+variables+matter+with+linear+regression&oq=does+the+distribution+of+variables+matter+with+linear+regression&aqs=chrome..69i57.15208j0j1&sourceid=chrome&ie=UTF-8

Chapter 4 - Listing 4.2

Using PostgreSQL 15.3...
Nits which may confuse real beginners...
Listing 4.2 could use a "DROP TABLE IF EXISTS number_data_types;", or a different name to avoid table already exists error.

The INSERT enters the numbers as expected; but:
SELECT * FROM number_data_types;
displays:

 0.70   0.7   0.7
 2.14  2.14  2.14
 2.14  2.14  2.14

instead of the full digits. I see the same output in SQL-workbench/J's Data Explorer Data view...
Found SQL-workbench/J setting: Tools>Options...>Data formatting: Decimal digits 2 caused the above output.

Chapter-5, Median Script is not working in Postgres 14

I tried to run that median script in chapter 5 which gives me this error now :

ERROR: function array_append(anyarray, anyelement) does not exist
SQL state: 42883

I tried to search for a solution but people mentioned this script won't work on postgres 14.
So do i have to downgrade to version 13 or is there some alternate method to get this median script working or can i just skip this and go around working with percentile cont. funciton ? Or maybe m i doing something wrong here ?

Encountering Error Exporting All Data Chapter 5: Pg. 72

What's your operating system (e.g. Windows 10, macOS Monterey, Ubuntu, etc.)?

Windows 10
Did you install PostgreSQL, PostGIS and pgAdmin according to the steps in Chapter 1? If not, please describe your installation (e.g., using RDS on Amazon Web Services).

Yes
Which versions of PostgreSQL and pgAdmin 4 are you using? If you aren't sure, for PostgreSQL run the SQL command SELECT version(); and for pgAdmin 4, navigate to "About pgAdmin 4" under the "Help" menu.

PostgreSql 16.2 pg admin version 8.2
Did you download the book's code examples and data from GitHub using the directions in Chapter 1? If not, please describe how you loaded the material on your computer.

Yes

Issue or Question

Issue
Chapter, page and code listing number:
5, 72, listing 5-8

Describe the issue or your question:
Whenever I try to export the table to the .txt file it comes up with this error. I have been stuck on this for quite some time and have changed the permissions to "everyone" on the file too but that also did not work.

Please paste the code and error message here. It's OK to alter identifying info such as a folder name

COPY us_counties_pop_est_2019
TO 'C:\Users\Admin\Desktop\SQL\us_counties_export.txt'
WITH (FORMAT CSV, HEADER, DELIMITER '|');

ERROR: could not open file "C:\Users\Admin\Desktop\SQL\us_counties_export.txt" for writing: Permission denied
HINT: COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.

SQL state: 42501

Chapter 5, page 69, listing 5-4: Code download does not match book

In the book, start_date is created as the date data type, but in the .sql file for the chapter, it comes up as text type.

-- Listing 5-4: Creating a table to track supervisor salaries

CREATE TABLE supervisor_salaries (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    town text,
    county text,
    supervisor text,
    start_date text,
    salary numeric(10,2),
    benefits numeric(10,2)
);

Windows File/Folder Permissions for COPY

If you're a Windows user, you may need to set permissions for the folder holding the code and data for the book. If you don't, you may see an error similar to this:

ERROR: could not open file "C:\YourDirectory\us_counties_2010.csv" for reading: Permission denied
HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
SQL state: 42501

To correct this, follow the directions in the note on in Chapter 1 (Practical SQL 2nd edition), in the section "Downloading Code and Data from GitHub":

Windows users will need to provide permission for the PostgreSQL database you will
install to read and write to the contents of the practical-sql-2-master folder. To do
so, right-click the folder, click Properties, and click the Security tab. Click Edit and
then Add. Type the name Everyone into the object names box and click OK. Highlight
Everyone in the user list, select all boxes under Allow, and then click Apply and OK.

Charpter 7 Issue on Listing 7-18: Performing math on joined Census population estimates tables

Having trouble with data, code, or exercises in Practical SQL, 2nd Edition? I'm glad to help. Please answer these questions, and I'll typically reply within one to two business days.

Please include the chapter number in your issue title. For example: "Chapter 13: Regular Expression Question"

General

Please answer all of these as they're essential for troubleshooting.

What's your operating system (e.g. Windows 10, macOS Monterey, Ubuntu, etc.)?

Windows

Did you install PostgreSQL, PostGIS and pgAdmin according to the steps in Chapter 1? If not, please describe your installation (e.g., using RDS on Amazon Web Services).

Yes

Which versions of PostgreSQL and pgAdmin 4 are you using? If you aren't sure, for PostgreSQL run the SQL command SELECT version(); and for pgAdmin 4, navigate to "About pgAdmin 4" under the "Help" menu.

PostgreSQL 16, pgAdmin 7.2

Did you download the book's code examples and data from GitHub using the directions in Chapter 1? If not, please describe how you loaded the material on your computer.

Yes

Issue or Question

Chapter, page and code listing number:

Chapter 7, page 112 and 113.

Describe the issue or your question:

in the table us_counties_pop_est_2010 you are creating 2 columns with wrong data type, state_fips and county_fips as text.
When you try to run the SELECT from page, you get this error:


ERROR: operator does not exist: integer = text
LINE 10: ON c2019.state_fips = c2010.state_fips
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

SQL state: 42883
Character: 444


Changing the last part of the SELECTION to CAST to integer, makes it to work:

FROM us_counties_pop_est_2019 AS c2019
JOIN us_counties_pop_est_2010 AS c2010
ON c2019.state_fips = c2010.state_fips::int
AND c2019.county_fips = c2010.county_fips::int
ORDER BY pct_change DESC;

page-297 Listing 15-16// ERROR:

ERROR: contains: Operation on mixed SRID geometries (MultiPolygon, 0) != (Point, 4269)
I am getting this error for listing 15-16.
Shapfile was loaded successfully
I am using windows 11 machine.

Chapter 20: Which are the the resources for advanced statistics.

There is a note in Chapter 20 section of Identify Key Indicators and Trends over Time.
It says that See the appendix for PostgreSQL resources for advanced statistics.

I jumped to the appendix but not able to find out which resources are about advanced statistics.
Did I miss anything?

Thank you!

Difference between `CAST()` and `::` on page 56 need more clarification

If I run:

SELECT timestamp_column, CAST(timestamp_column AS varchar(10))
FROM date_time_types;

I am receiving as output original column and modified column.

If I run:

SELECT timestamp_column::varchar(10)
FROM date_time_types;

I am receiving as output only modifying column. So, it seems that those two examples are not identical. I think text needs to clarify why the output of those two examples are different.

Chapter 14: Mining Text to Find Meaningful Data

General

Item Info
OS Windows 11 22H2
pgAdmin4 Version 6.19
pgsql PostgreSQL 15.2, compiled by Visual C++ build 1914, 64-bit

Issue or Question

Chapter, page and code listing number:

Thank you for writing such an informative book and providing a comprehensive set of guidelines. I have two small questions about the code examples.

Question 1

Chapter 14 - Pages 253, etc, Listing 14-12 (also involve other codes that contain \n)

UPDATE crime_reports
SET date_1 = 
(
    (regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}'))[1]
        || ' ' ||
    (regexp_match(original_text, '\/\d{2}\n(\d{4})'))[1] 
        ||' US/Eastern'
)::timestamptz
RETURNING crime_id, date_1, original_text;

Describe the issue or your question:

regexp_match(original_text, '\/\d{2}\n(\d{4})') return null

Reason: The problem is caused by Windows 11, and it can work well when I change it to regexp_match(original_text, '\/\d{2}\r\n(\d{4})'). (adding \r before \n)

Here are the shortcuts.

image

image

Question 2

Chapter 14 - Listing 14-22 (also involve other codes that invoke ts_headline())

SELECT president,
       speech_date,
       ts_headline(speech_text, to_tsquery('english', 'tax'),
                   'StartSel = <,
                    StopSel = >,
                    MinWords=5,
                    MaxWords=7,
                    MaxFragments=1')
FROM president_speeches
WHERE search_speech_text @@ to_tsquery('english', 'tax')
ORDER BY speech_date;

Describe the issue or your question:

If I do not add 'english' as the first parameter, the result of ts_headline() will be abnormal.

Reason: without the first parameter, this function does not work as expected in my environment. When I add the first parameter, it works well, i.e., ts_headline('english', speech_text, to_tsquery('english', 'tax'), ....

Here are the shortcuts.

image

image

I'm not sure if it only happens on my computer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.