Coder Social home page Coder Social logo

sta's Introduction

sta

Simple statistics from the command line interface (CLI), fast.

Description

This is a lightweight, fast tool for calculating basic descriptive statistics from the command line. Inspired by https://github.com/nferraz/st, this project differs in that it is written in C++, allowing for faster computation of statistics given larger non-trivial data sets.

Additions include the choice of biased vs unbiased estimators and the option to use the compensated variant algorithm.

Given a file of 1,000,000 ascending numbers, a simple test on a 2.5GHz dual-core MacBook using Bash time showed sta takes less than a second to complete, compared to 14 seconds using st.

Installing sta

Run ./autogen.sh, ./configure, make, and make install.

Usage

sta [options] < file

Using sta

Imagine you have this sample file:

$ cat numbers.txt
1
2
3
4
5
6
7
8
9
10

Running sta is simple:

$ sta < numbers.txt
N	max	min	sum	mean	sd	
100	10	1	55	5.5	2.87228	 

To extract individual bits of information:

$ sta --sum --sd --var < numbers.txt
sum	sd	var	
55	2.87228	8.25

sta, by default, assumes you have a population of scores, and thus normalises with N. If in fact you have a sample of scores, and wish to know the expected population standard deviation/variance, i.e. normalise with N-1, then just add the --sample flag. See Standard deviation estimation, and Population variance and sample variance

$ sta --sum --sd --var --sample < numbers.txt
sum	sd	var	
55	3.02765	9.16667	

Worried about precision? You can calculate variance instead using the compensated variant algorithm:

$ sta --var --sample --compensated < numbers.txt

Want to compute quartiles? Run:

$ sta --q < numbers.txt
N	min	Q1	median	Q3	max	
100	1	26	50.5	76	100	

How about percentiles? Run:

$ sta --p 50,61 < numbers.txt 
50th	61th	
51	62

Don't want to see the column names? Run:

$ sta --q --brief < numbers.txt
100	1	100	5050	50.5	29.0115

To transpose the output, run:

$ sta --q --transpose < numbers.txt
N	100
min	1
Q1	26
median	50.5
Q3	76
max	100

To supply your own delimiter, run:

$ sta --delimiter $'\t\t' --sd --sum < numbers.txt 
sum		sd		
55		2.87228

or

$ sta --delimiter XXX --sd --sum < numbers.txt 
sumXXXsd
55XXX2.87228

Formatting

sta works with long doubles, and can process numbers in the following formats:

4.7858757E-39
4.7858757e-39
4.7858757

Options

--brief
--compensated
--mean
--median
--min
--max
--percentiles	
--q
--q1
--q3
--sample
--sd
--sderr
--sum
--transpose	

Testing.

The example directory contains 2 scripts to create some example files:

$./examples/create_example_asc.pl n > some_file_with_n_numbers_asc  

and

$./examples/create_example_rand.pl n > some_file_with_n_numbers_rand  

To see how long st or sta takes to output the various statistics, call:

$./examples/time_sta.sh examples/large_file_1m	

and

$./examples/time_st.sh examples/large_file_1m	

ToDo

--Add online variant

Contributing

I've not written C++ in a long time, so please do send comments, suggestions and bug reports to:

https://github.com/simonccarter/sta/issues

Or fork the code on github:

https://github.com/simonccarter/sta

sta's People

Contributors

duta avatar gwern avatar larsmans avatar simonccarter avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.