Coder Social home page Coder Social logo

shawlu95 / beyond-leetcode-sql Goto Github PK

View Code? Open in Web Editor NEW
1.0K 12.0 423.0 1.36 MB

Analysis of SQL Leetcode and classic interview questions, common pitfalls, anti-patterns and handy tricks. Sample databases.

License: MIT License

Jupyter Notebook 100.00%

beyond-leetcode-sql's Introduction

Beyond LeetCode SQL

This repository covers supplementary analysis of SQL for LeetCode and classic interview questions, tradeoff between performance optimization and developmental efficiency, and how it relates to general database design consideration (e.g. indexing and join). Specific sample databases are provided to illustrate tricky interview questions.


LeetCode Selected Probems

Only high-quality problems are selected. Pathological problems such as Find Median Given Frequency of Numbers are not discussed. Entry-level syntax problems such as Combine Two Tables are not discussed.

# Problems Solutions Level Concept
262 Trips and Users MySQL Hard Three-way join; filtering
185 Department Top Three Salaries MySQL, MS SQL Hard Non-equijoin; aggregation; window functionsample
579 Cumulative Salary of Employee MySQL, MS SQL Hard Self-join; left join; aggregation
601 Human Traffic of Stadium MySQL, MS SQL Hard Self-join; de-duplication; window
615 Average Salary MySQL Hard Case; aggregation, join
618 Students Report By Geography MySQL, MS SQL Hard Full join, pivoting

Other undiscussed problems have solutions lumped here.


Classic Interview Questions

This section covers commonly tested concepts during interviews. Many notebooks are inspired by problems people who violated their confidentiality agreement and posted interview materials on Quora, Glassdoor, and 1point3acres. Data are either synthetic or from my personal challenge project.

# Problems Solutions Concept
1 Facebook Advertiser Status MySQL Transition diagram; conditional update
2 Spotify Listening History MySQL Update cumulative sum
3 Monthly Active User MySQL Functional dependency; aggregation; filtering
4 Page Recommendation MySQL Undirected edge; aggregation; existance
5 Pivoting Numeric Data MySQL Pivoting numeric data with case statement
6 Pivoting Text Data MySQL Pivoting text data with self-join
7 Un-pivoting Tables MySQL Un-pivoting tables using cross-join
8 Group by Bins MySQL Create custom column to group by
9 Consecutive Active Users MySQL Self-join, LAG()
10 Recommend Friends MySQL Self-join, de-duplication, aggregation
11 Spotify Similar Friends MySQL Three-way join, de-duplication, aggregation
12 Invalid Search MySQL NULL handling, rate calculation
13 Text Confirmation MySQL Rate calculation
14 Facebook Common Friends MySQL self join, three-way join
15 Facebook Recommend Friend MySQL Self join, four-way join
16 Instagram Common Follower MySQL Self join, Directed edge

Hacks

This section covers esoteric details of SQL language and use cases that may be completely useless in interview. Nevertheless, they come handy when judgement calls and some are simply fun to explore.

# Concept Notebook
1 Random Sampling from Groups MySQL8
2 NULL Pathological Study MySQL8
3 Full Join MySQL8
4 Dynamic Query (Python) MySQL8
5 Stored Procedure MySQL8
6 Hacking Aggregation MySQL8
6 Multi Column Partition MySQL8

Table Optimization

Avoid full-table scan. Used for primary key (automatic), foreign key, commonly used columns.

  • cardinality: the uniqueness of the data.

Index types:

  • single column index
  • unique index (e.g. SSN)
    • cannot create unique index on column with duplicate or NULL values
  • composite index
    • column that get's most recently queried gets placed first
  • implicit index: primary key (automatically unique)

Create index when column:

  • is frequently referneced in order by or group by
  • contains lots of unique values

Avoid index when column:

  • in small table
  • return high percentage of matching data
  • require frequent batch update (drop before updating)
  • contains lots of NULL
  • gets frequently manipulated
  • extremely long string

best practice: rebuild index frequently to reduce fragmentation


Query Optimization

alt-text

  • Place smaller table first when joining multiple tables
  • Largest table is the base table
    • base table is placed on right hand side of equal sign (where clause)
  • Place most restrictive condition last:
    • The condition in the WHERE clause of a statement that returns the fewest rows of data
    • the most restrictive condition was listed last in the WHERE clause,
  • try to use indexed column
FROM TABLE1,   -- Smallest table
     TABLE2,   -- to
     TABLE3    -- Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN    -- Join condition
  AND TABLE2.COLUMN = TABLE3.COLUMN    -- Join condition
[ AND CONDITION1 ]                     -- Filter condition
[ AND CONDITION2 ]                     -- Filter condition
  • Using the like operator and wildcards (flexible search)
  • Avoiding the or operator, use in operator
    • data retrieval is measurably faster by replac- ing OR conditions with the IN predicate
  • Avoiding the HAVING clause
    • try to frame the restriction earlier (where clause)
    • try to keep HAVING clause simple (use constant, not function)
  • avoiding large sort operations
    • it is best to schedule queries with large sorts as periodic batch processes during off-peak database usage so that the performance of most user processes is not affected.
  • Prefer stored procedure
    • compiled and permanently stored in the database in an executable format.
  • Disabling indexes during batch loads
    • When the batch load is complete, you should rebuild the indexes.
    • reduction of fragmentation that is found in the index
  • cost-based optimization: check database server manual
  • Using view: keep the levels of code in your query as flat as possible and to test and tune the statements that make up your views

Formatting for Readability

  • Always begin a new line with each clause in the statement. For example, place the FROM clause on a separate line from the SELECT clause. Then place the WHERE clause on a separate line from the FROM clause, and so on.

  • Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line.

  • Use tabs and spaces consistently.

  • Use table aliases when multiple tables are used in the statement. The use of the full table name to qualify each column in the statement quickly clutters the statement and makes reading it difficult.

  • Use remarks sparingly in SQL statements if they are available with- in your specific implementation. Remarks are great for documentation, but too many of them clutter a statement.

  • Begin a new line with each column name in the SELECT clause if many columns are being selected.

  • Begin a new line with each table name in the FROM clause if many tables are being used.

  • Begin a new line with each condition of the WHERE clause. You can easily see all conditions of the statement and the order in which they are used.


Anti-patterns

This section WILL discusses common pitfalls such as nested selects, redundant temporary tables, unnecessary cross join, unnecessary hashset using distinct key word.

# Anti-patterns Notebook
1 Ambiguous Group MySQL8
2 Bad Subquery MySQL8
3 Fail to Use Index MySQL8

beyond-leetcode-sql's People

Contributors

saeedesmaili avatar sgsfak avatar shawlu95 avatar sinkasula avatar wctjerry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

beyond-leetcode-sql's Issues

Logic improvement

On question 3 of Text Confirmation problem, your join predicate ignores the fact that a user can send confirmation on both the first day of sign up and second day of sign up.

-- dialect: MySQL

SELECT
  e.user_id
FROM Email AS e
JOIN Text AS t
ON e.user_id = t.user_id
  AND DATEDIFF(t.ts, e.ts) = 1
WHERE t.action = 'CONFIRMED';

A more accurate answer would be to only get the first date the user sent a confirmation and then perform a join and do a date comparison.

-- dialect: PostgreSQL

with first_confirm as (
  select
        user_id,
        ts,
        row_number() over (partition by user_id order by ts) as rn
  from text
  where action = 'CONFIRMED'
) select e.user_id
from email e
  join first_confirm f on e.user_id = f.user_id
  where f.rn = 1 and cast(f.ts as date) -  cast(e.ts as date) = 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.