Coder Social home page Coder Social logo

wilson-score-interval's Introduction

Wilson Score Interval

Simple JavaScript implementation of Wilson score interval. Useful wherever you want to make a confident estimate about the actions or preferences of a general population, given a sample of data (e.g. assigning scores for ranking comments by upvotes, products by popularity, and more).

Table of Contents

Installation

$ npm i wilson-score-interval

Usage

const wilson = require('wilson-score-interval');

/*
  wilson(upVotes, total);
  // upVotes === whatever result you want to estimate the confidence interval for
  // total === your total sample size
*/

wilson(430, 474); // { left: 0.8776750858242243, right: 0.9301239839930541 }
wilson(392, 436); // { left: 0.8672311846637769, right: 0.9239627360567735 }
wilson(10, 14);   // { left: 0.4535045882751561, right: 0.882788120898909 }

Explanation

Less technical:

If you know what a sample population thinks, you can use this tool to estimate the preferences of the population at large.

Suppose your site has a population of 10,000 users. One product has ratings from 100 users (your sample size): 40 upvotes, and 60 downvotes. You want to understand how popular the product would be across the whole population. So you run wilson-score-interval(40, 100), which returns the result { left: 0.3093997461136029, right: 0.4979992153815976 }. Now you can estimate with 95% confidence that between 30.9% and 49.7% of total users would upvote this product.

It is common to use the lower bound of this interval (here, 30.9) as the result, as it is the most conservative estimate of the "real" score.

For a beginner-friendly introduction to confidence intervals for population proportions, see this YouTube video.

More technical:

The Wilson score interval, developed by American mathematician Edwin Bidwell Wilson in 1927, is a confidence interval for a proportion in a statistical population. It assumes that the statistical sample used for the estimation has a binomial distribution. A binomial distribution indicates, in general, that:

  1. the experiment is repeated a fixed number of times;
  2. the experiment has two possible outcomes ('success' and 'failure');
  3. the probability of success is equal for each experiment;
  4. the trials are statistically independent.

This package uses a z-score of 1.96 by default, which translates to a confidence level of 95%.

For more, please see the Wikipedia page on the Wilson score interval and this blog post.

Comparison with other scoring methods

Using a simple calculation of score = (positive ratings) - (negative ratings) or score = average rating = (positive ratings) / (total ratings) proves to be problematic when working with smaller sample sizes, or differences in sample sizes across populations. See this blog post comparing scoring methods for details and examples.

The Wilson score interval is known for performing well given small sample sizes/extreme probabilities as compared to the normal approximation interval, because the formula accounts for uncertainties in those scenarios.

This paper offers a more technical comparison of the Wilson interval with other statistical approaches.

Use cases

Apart from sorting by rating, the Wilson score interval has a lot of potential applications! You can use the Wilson score interval anywhere you need a confident estimate for what percentage of people took or would take a specific action.

You can even use it in cases where the data doesn't break cleanly into two specific outcomes (e.g. 1-5 star ratings), as long as you are able to creatively abstract the outcomes into two buckets (e.g. % of users who voted 4 stars and above vs % of users who didn't).

Examples:

  • Most romantic city on Yelp (wilson-score-interval(num_romantic_searches / num_total_searches))
  • Sorting commments by upvotes on Reddit (wilson-score-interval(num_upvotes / num_total_votes))
  • Creating a 'most shared' list (wilson-score-interval(num_shares / num_total_views))
  • Spam/abuse detection (wilson-score-interval(num_marked_spam / num_total_votes))

wilson-score-interval's People

Contributors

chocolateboy avatar msn0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

wilson-score-interval's Issues

More descriptive readme with use cases

The Wilson score interval might be ambigious to the people who doesn't know this scoring technique. Readme should describe what it does, what for it's used (e.g. to sort comments, reviews) and to what extent it's different/better from other sorting methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.