Coder Social home page Coder Social logo

corp's Introduction

corp

Assets related to the operation of dbt Labs.

corp's People

Contributors

aescay avatar amy-byrum avatar andrewlane avatar arieldbt avatar atomatize avatar brian-gillet avatar bthomson22 avatar carlyfk avatar christineberger avatar clrcrl avatar drewbanin avatar eogilvy12 avatar epapineau avatar ericalouie avatar ernestoongaro avatar gwenwindflower avatar heysweet avatar ian-fahey avatar itzak23 avatar janessa-lantz avatar joellabes avatar jthandy avatar leighstaub avatar megcolon avatar sierrafontaine avatar vkahmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

corp's Issues

Link to Core Values file in job postings returns 404

The link to the company values is broken for at least a few of the job postings.

I saw this on the following job postings:

Engineering Manager, Cloud Application
https://boards.greenhouse.io/dbtlabsinc/jobs/4350465005

Engineering Manager, dbt Explorer
https://boards.greenhouse.io/dbtlabsinc/jobs/4351839005

For both of those, the bullet point "Aligns with [our core values" points to https://github.com/fishtown-analytics/corp/blob/master/values.md."

That link returns a 404.

I did notice that the posting for Engineering Manager, Orchestration (https://boards.greenhouse.io/dbtlabsinc/jobs/4326424005) works fine since it points to your webpage: https://www.getdbt.com/dbt-labs/values

I hope this channel is ok to report this - I would want someone to tell me and since the 404 was in this git repo it seemed the most obvious way.

Remove or adjust the `JOIN ON` recommendation

Summary (recommendation)

Remove or adjust the "ON over USING" recommendation in the style guide.

The USING syntax can make queries more readable and easier to understand when used appropriately.

Details

The style guide recommends using ON instead of USING in the SQL style guide section:

- Avoid the `using` clause in joins, preferring instead to explicitly list the CTEs and associated join keys with an `on` clause.

This is definitely appropriate when there is a mix of (left) joins over different sets of columns.

However, there are very good use cases for the USING syntax in the databases that support it -- in particular, with inner joins and full joins where the joins column(s) are always the same.

Inner join example

Suppose we have 3 tables: prospects, applicants, and customers, which each share a user_id and each have a corresponding date column. A simple inner join between the 3 of them using the ON syntax might look like:

SELECT
    prospects.user_id,
    prospects.prospect_date,
    applicants.application_date,
    customers.onboard_date
FROM prospects
    INNER JOIN applicants
        ON prospects.user_id = applicants.user_id
    INNER JOIN customers
        ON prospects.user_id = customers.user_id
;

This works fine, but:

  1. You need to inspect the joins to understand that user_id is in each of the tables.
  2. Explicit joins are prone to copy-and-paste errors: it's easy to accidentally write prospects.user_id = applicants.user_id in the customers join.

Alternatively, a simple inner join between the 3 of them using the USING syntax might look like:

SELECT
    user_id,
    prospects.prospect_date,
    applicants.application_date,
    customers.onboard_date
FROM prospects
    INNER JOIN applicants
        USING(user_id)
    INNER JOIN customers
        USING(user_id)
;

To people that are new to USING, the user_id looks ambiguous -- but this implies to me (as someone that uses USING a lot) that user_id will actually be in all of the tables precisely because the table prefix has been omitted.

Additionally, the joins are now much cleaner and no longer prone to copy-and-paste errors.

Full join example

Using the same tables as the previous example, a simple full join between the 3 tables using the ON syntax might look like:

SELECT
    COALESCE(prospects.user_id, applicants.user_id, customers.user_id) AS user_id,
    prospects.prospect_date,
    applicants.application_date,
    customers.onboard_date
FROM prospects
    FULL JOIN applicants
        ON prospects.user_id = applicants.user_id
    FULL JOIN customers
        ON COALESCE(prospects.user_id, applicants.user_id) = customers.user_id
;

Since we have full joins, we need to make use of the COALESCE function to make sure that we're getting all of the non-null values. This does not scale nicely: more joins to more tables leads to larger and larger COALESCE calls.

Alternatively, a simple full join between the 3 of them using the USING syntax might look like:

SELECT
    user_id,
    prospects.prospect_date,
    applicants.application_date,
    customers.onboard_date
FROM prospects
    FULL JOIN applicants
        USING(user_id)
    FULL JOIN customers
        USING(user_id)

This is much, much simpler and handles all of the COALESCE-ing for us. This extends much easier than the ON syntax, and still has the benefits laid out in the inner join example.

An anti-USING example

To clarify, I don't think that USING should always be used: I think it should just be used where it's more appropriate to use than ON (whatever "more appropriate" means). In particular, I don't think USING should be used when different columns are being used in each of the joins.

For example, here's an example that uses the ON syntax:

SELECT
    customers.customer_id,
    loans.loan_id,
    repayments.repayment_id,
    repayments.repayment_date,
    repayments.repayment_value
FROM customers
    LEFT JOIN loans
        ON customers.customer_id = loans.customer_id
    LEFT JOIN repayments
        ON loans.loan_id = repayments.loan_id

This is one where I think the ON syntax is clearer than the USING syntax, as the USING syntax now hides which columns come from which tables:

SELECT
    customer_id,
    loan_id,
    repayments.repayment_id,
    repayments.repayment_date,
    repayments.repayment_value
FROM customers
    LEFT JOIN loans
        USING(customer_id)
    LEFT JOIN repayments
        USING(loan_id)

This is a case where I would prefer the ON syntax over the USING syntax.

I don't know how best to rephrase the recommendation to account for this nuance, which is why I think it'd be best to drop the recommendation and leave it to the developers to use the syntax that is more appropriate for their use case. A stab at a rephrased recommendation is:

- Avoid the `using` clause in joins, preferring instead to explicitly list the CTEs and associated join keys with an `on` clause, unless the joins are all over the same column(s).

For reference, this is the PR that added this item in (it has the rationale in the description and across some of the comments):

Modify model naming conventions to account for model versioning

In the 1.5 release we will be introducing model versioning! ๐Ÿฅ‚ ๐Ÿฅณ

As a result a model will have two parts:

  • The model name which is defined using the name configuration and will be used in the {{ ref() }} function
  • The file name which will show up in the directory and is pointed to in the yml configuration defined_in

As a result we will need to update our style guide to demonstrate how we think both the model name and file name should be structured.

Remove the `UNION ALL` recommendation

Summary (recommendation)

Drop the "UNION ALL over UNION" recommendation in the style guide.

Note this is not a request to change the recommendation to "UNION over UNION ALL" -- this is a request to not make a comment favouring either.

Details

The style guide recommends using UNION ALL instead of UNION in the SQL style guide section:

- Prefer `union all` to `union` [*](http://docs.aws.amazon.com/redshift/latest/dg/c_example_unionall_query.html)

Although the linked page does show an example where the UNION ALL syntax is required over just UNION, this is not representative of most pipelines.

Given that tables usually have a well-defined primary key, using UNION ALL by default instead of UNION runs the risk of propagating data quality duplicates throughout the project especially in cloud warehouses that don't verify the uniqueness of the primary keys. For large data estates with numerous incremental models and aggregates over them, there would be a significant cost associated with running a full refresh to fix the issues caused by these duplicates.

Rather than favouring UNION or UNION ALL, I don't think there should be a recommendation to use one over the other in general -- they have different use cases, so it should be up to the developer to choose the one that is appropriate for their given use-case.

Update the values based the presentation made at the November 2019 retreat

This presentation was given at the fishtown analytics fall retreat in november of 2019. There are several changes from this presentation that haven't yet been updated in values.md:

New value: We work hard and go home.
This value is new, and intentionally new. It's not a rewording of something that we've already believed, it's actually a commitment to something that we have not in the past been committed to. I personally have been guilty of being a workaholic for most of my adult life, and as a result of this, plus as a result of the difficulties inherent in bootstrapping, much of the history of Fishtown Analytics has involved working a lot of long hours. We're making explicit efforts to correct this, and the addition of this value is a public statement of our commitment.

In doing so, we're stealing Slack's formulation. This formulation rings true: we're committed to maintaining our intensity, but also to confining the part of our days that we all dedicate to work. This is not only the right thing to do, it also promotes both long-term sustainability / minimizes burnout, plus it's more inclusive of those with care, or other, responsibilities.

New value: We are patient, yet urgent.
This value is also new, but it's not new for us. This is something we've lived from the founding of the company. We work every day with a sense of urgency, but we think strategically and optimize for the long-term. This tension between urgency and patience is hard to strike, but the creativity involved in doing so has always been central to our success.

Reformulation: Work done well is its own end.
This used to read "Work done well conveys dignity." While this is not a bad statement, it's not exactly what I was trying to say. Rather, it is that the process of creation, and our full engagement in and commitment to that process, inevitably enriches us. It is not just the nail that is created, it is the blacksmith. I think this formulation gets at that more effectively.

Clarification about CTE in staging model

The style guide recommends to use CTEs for transformation steps, ending with a CTE called final. It also recommends to use staging models to select from sources.

Would you also recommend to use a single CTE called final in these staging models?

E.g. this dbt learn course doesn't have that CTE and IMHO this makes sense as the only purpose there is to map the sources and you should not do any transformation logic in there.

cc @coapacetic (maintainer of that course)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.