Coder Social home page Coder Social logo

blog-posts's Introduction

blog-posts's People

Contributors

matteocourthoud avatar mksalawa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blog-posts's Issues

Question on Conclusion of this ROI Notebook

This is a great notebook, enjoyed reading it. I do have one question that is really bugging me.

It is established that creating an auxiliary variable of revenue divided by cost:

df["rho"] = df["revenue"] / df["cost"]
smf.ols("rho ~ new_machine", df).fit().summary().tables[1]

does not represent $\frac{\Delta R}{\Delta C}$

But if this is the case, how does the regression at the end, conducted on an auxiliary variable, which is essentially
df["revenue"] - df["cost"] and a couple constants, adequately represent essentially $\Delta R - \Delta C$ ? Isnt this the same thing as above in concept?

Bayesian bootstrap is not more precice after accounting for oversampling

Hey Matteo -

Thank you for your blog post on the Bayesian boostrap! I've found it quite helpful in adapting to my own problems and gaining a better understanding of the differences between bayesian and classic bootstrap.

I was trying to replicate your analysis by rewriting some of the code, and I noticed that in the two-level sampling part of your blog, you oversample from the dataframe 10x (cell 19). This is the reason you get a more precise / narrow posterior distribution, not just the use of the bayesian boostrap. You can check this yourself by oversampling in your classic bootstrap procedure, which results in this:

image

Within the wider context of the blog post, I think you do need to oversample to account for the rare events cases you describe later in the blog post. If you don't oversample, you're going to have instances of sampling where you won't get the rare event. You could try this for yourself with a regression that's unable to additionally take weights and would require the two-level sampling procedure. This would also result in instances where you might not be able to fit the model (since it is actually resampling) or end up with parameter estimates at extreme values.

Seemingly wrong chart in the published versions of the CUPED notebook

Hi,

First of all - I found your post on CUPED and comparisons to diff-in-diff and a simple regression with covariates very useful. I'm in a process of "upgrading" how we analyse experiments at my workplace and your work has helped a lot to clarify things.

However - there's one thing that bothered me in your post - the below chart (and associated table just below it):
image

image

According to it, the autoregression has the highest variance among all the methods - which I found very counter-intuitive. Surely it would not perform worse that a simple t-test.. The text in the article also suggests otherwise, which made me wonder if there was some strange mistake/issue when the blog post was rendered.

I just cloned your repo and re-ran the notebook, and indeed - I get results that I would expect:
image

image

I'm not sure what exactly happened - but it would be great to have those corrected! I'm sure I am not the only one who found your blog posts helpful, and another person may take away the wrong conclusion (that auto-regression is really bad).

Requirements file

I love this series of blog posts. Thanks for writing them!

I'm trying to get some of these notebooks to run and I'm struggling to get versions of the packages to play well together. Could you push a requirements.txt or poetry.toml or pip freeze? I'm using poetry and would be happy to contribute a working .toml file once I have it.

ERROR: cannot import name 'dgp_educ_wages' from 'src.dgp'

The line:
from src.dgp import dgp_educ_wages

throws me this error:
ImportError: cannot import name 'dgp_educ_wages' from 'src.dgp' (/content/Blog-Posts/src/dgp.py)

And indeed, searching through the file dgp.py, I could not find 'dgp_educ_wages'

dag_collections and dag class imports

Hi there, I was trying to run some of the notebooks to follow. However, I couldnt seems to import to the functions/class correctly.

image

Is the current dag/folder hierarchy functional for the notebook? Or am I missing something

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.