Coder Social home page Coder Social logo

markrogoyski / math-php Goto Github PK

View Code? Open in Web Editor NEW
2.3K 80.0 235.0 5.98 MB

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra

License: MIT License

PHP 99.98% Makefile 0.02%
math mathematics linear-algebra statistics matrix probability distributions regression numerical-analysis information-theory

math-php's People

Contributors

aboks avatar adamrogoyski avatar andreybolonin avatar aweptimum avatar aymeric451 avatar balping avatar beakerboy avatar dantist avatar eisberg avatar jakobsandberg avatar mark-getit avatar markrogoyski avatar nessor avatar oittaa avatar radarhere avatar robsheldon avatar smoren avatar theriault avatar tikky avatar wizard97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

math-php's Issues

Numerical Analysis

I've been using Newton's method to do inverse CDF with great accuracy, since CDF functions are nice and smooth. Where would you like this saved? A new namespace?

A "Numerical" namespace could hold linear interpolation, Newton's method, Euler's Method on Differential Equations...etc (not my area of expertise).

Is coveralls broken, or disabled?

The last few Pull requests have not had a coveralls summary attached. Also, the coveralls website says:

The file "src/Probability/Distribution/Continuous/Continuous.php" isn't available on github. Either it's been removed, or the repo root directory needs to be updated.

Regression through a point

The regression library should also include regression through a point, like RTO (regression through the origin). However, this technique has different calculations for r, and different confidence interval calaculations. Would it make sense to have an abstract Regression class which defines what is needed and available for a regression, and the different types (SLR, RTO, log, power) would each extend this abstract class and implement the required functions?

Function Matrix and OOP

Thinking about the new FunctionMatrix...Some of the parent functions will not work on it unless it's been evaluated into a determined form. It's almost as if the standard matrix should be a child of a Function matrix...where each of the functions is only a scaler.

Mull this over and let me know if you have any ideas on how this should be resolved.

Symmetric matrix class?

Would a separate symmetric matrix class make sense? Only half the matrix data would need to be saved to memory. It would be pretty easy to change the matrix multiplication code to account for the absence of half the values. A special function in the Matrix factory(or somewhere) could take a matrix times it's inverse and return a symmetric matrix.

Array Functions?

Have you thought about creating array functions for some of the common array_maps that are done. I'm thinking things like array_add, array_subtract, array_square, array_multiply, stuff like that. Take two arrays of the same length, and add, subtract, etc, item-by-item. It could keep everything more readable.

Just an idea. I know it's not "math" per se, but more convenience. PHP already has array_sum, so you could think of it as an extension to that. Go ahead and close this if you think it's a bad idea.

Build an abstract class for numerical integration

I'm building out other numerical integration techniques (simpons rule, midpoint rule, etc.) and they'll all very similar to the Trapezoidal rule.

I'm thinking of making an abstract class NumericIntegration that can hold some common methods, such as validate() and sort(), as well as common constants (X =0, Y=1 for indexing). Then, the classes for the techniques themselves can just extend the NumericalIntegration class and simply provide a solve() method.

Let me know if there are any problems with this approach.

Discussion - Refactor Distributions

What do you think? Have you looked at my code? Each distribution has a PDF, and CDF function. The parent Distribution class has an inverse method which uses Newton's method to do the inverse CDF. For Student-T, I have a separate "two-tails" method which uses the same process to get the tails.

In reality, this is all well-and-good for continuous distributions. I need to rename my Distribution class to Contunuous and create a Discrete class which will operate in discrete terms.

Additional distribution statistics could be added to each child type without cluttering up one big class, each could have it's own mean, mode, skew, etc.

Distribution naming and namespaces

It looks like there are both continuous and discrete uniform interval distributions....which looks like a problem than be solved my either class naming standards, or namespacing. Would you prefer a Math\Probability\Distribution\Continuous and Discrete namespaces, or just keep everything in Distribution?

On a similar vein, do you think all the classes and files need to have 'Distribution' in their names? Would things be cleaner to just have 'StandardNormal.php'?

Sum of squares on regression through a point

https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
page 78

These sum of squares calculations are different when regressing through the origin.

I think the regression set-up needs to be re-thought. Simple linear regression, through a point, and multiple linear will all use the exact same matrix math to calculate the parameters. Also, plenty of other functions can be linearized and passed through a linear least squares calculation, with or without a constant.

for example V(S) = Vm *S / (Km + S) can be linearized to 1 / V = Km / Vm * 1/S + 1/Vm

or y = m ** x can be linearized to a regression through the origin: ln(y) = ln(m) * x
The regression through origin, or point, will have some different stats than simple-linear-regression. I need to see how R handles it, but in Excel, things like the standard error are not transformed back to original scale when doing a power or log fit...which reddit and stack excahnge says is correct, but I don;t understand the argument.

When I get the Jacobian done, we can support non-linear least squares regression, where we truly fit to the source function. Linearizattion can change the leverage of points along the curve, making some more influential than they should. Although, linearization and using weighted least squares can compensate for this.

Child constructor differs from parent

from the PHP manual:

Unlike with other methods, PHP will not generate an E_STRICT level error message when __construct() is overridden with different parameters than the parent __construct() method has.

Does this mean it's OK do do this? The LinearThroughPoint constructor differs from its parent.

Pearson's Coefficient of determination

The equation you use is basically only valid for simple linear regression. I think it would make sense to add two equations, one for SSreg and SStot, and use the child's $this->evaluate() to calculate each when a user begins a regression. r² = SSreg/SStot; The Regression class could have a function getYhat which maps the Xs array through the evaluate() function. I think the Regression should have something like:

public function getYhat(){
  return array_map (function($x){return $this->evaluate($x)},$this->xs);
}
public function getSSreg(){
  $Yhat = $this->getYhat();
  $ybar = average ($this->ys)
  return array_sum(array_map (function($yhatpoint){ return ($yhatpoint - $ybar) ** 2;},$Yhat));
}

public function getSStot(){
  return array_sum(array_map (function($y){ return ($y - $ybar) ** 2;},$this->ys));
}

public function r2(){
  return getSSreg() / getSStot();
}

Floating point comparisons

I'm working on making a small change to the Continuous::inverse() function. Right now it's hard-coded with an initial guess of .5. This is fine for distributions which either hang out between 0 and 1, or have a median around 0, but if it's a normal distribution with a mean far away from 0, the slope at .5 will be nearly 0.

I'm trying to use the new mean, median, and mode functions to populate the initial guess, but when the initial guess = 0 for StudentsT, the slope calculation says the slope is zero. I have to dig into the code more to see where the evaluation is crapping out, but it could be with some of our comparisons using floating point numbers.

Do we want to define an epsilon for precision, and replace $a == $b with abs(($a - $b) / $b) > $epsilon?
or use bc_comp($a, $b,) == 0 ?
Any thoughts?

Confidence interval and prediction interval of a regression

I'd like to add these two statistics to your package and I would like your input on how you would like the results presented.

The confidence Interval requires a set of $points, as well as an α...or 1-p. I could either return an array of confidence intervals evaluated at each x in the $points array, or make the user provide an $x to evaluate at. Maybe the last is better, and if the user want the former, they could use array_map on $points,

Let me know what you think,
Kevin Nowaczyk

Hypergeometric Functions

In case you didn't specifically notice, I have Hypergeometric functions in the code base now, so if either of you are interested in Bessel Functions or other stuff in the family, have at it.

Population/sample variance...extending

You have functions for population variance and sample variance. I was thinking it might be worthwhile combining these into a common function.

function SumOfSquares(array $numbers){
     if (empty($numbers)) {
        return null;
    }
    $μ         = Average::mean($numbers);
    $∑⟮xᵢ − μ⟯² = array_sum(array_map(
        function ($xᵢ) use ($μ) {
            return pow(($xᵢ - $μ), 2);
        },
        $numbers
    ));
    return  $∑⟮xᵢ − μ⟯²;
}

function variance($numbers, $degrees_of_freedom){
    self::SumOfSquares($numbers) / $degrees_of_freedom;
}

public static function populationVariance(array $numbers)
{
    return self::variance($numbers, count($numbers));
}

 public static function sampleVariance(array $numbers)
{
     return self::variance($numbers, count($numbers) - 1);
}

The reason is, there are many cases where the degrees of freedom is not just $n or $n - 1. Also, the sumOfSquares function itself is used in so many situations.

Expanded use of traits?

The following sentence popped into my head this weekend, "a regression is the combination of a model and a method".

I was thinking we could create "Models" as objects. Models are functions, but more. A model would necessarily have an evaluateModel() method.

// y=mx+b
trait LinearModel;
{
    function evaluateModel(...$params)
    {
        $x = $params[0];
        $m = $params[1];
        $b = $params[2];
        return $m * $x + $b;
    }
    // It can also include other "model specific" functions.
    public function getModelEquation(...$params): string
    {
        $x = $params[0];
        $m = $params[1];
        $b = $params[2];
        return sprintf('y = %fx + %f', $m, $b);
    }

    // Include other stuff as desired. partial derivatives would be handy for Jacobians.
    function partialDerivatives($x, $m, $b, $parameter)
    {
        switch($parameter){
            case 1 : return $m;
                  break;
            case 2 : return $x;
                  break;
            case 3 : return 1;
                  break;
        }
    }
}

class Regression {
     // An array of our regression parameters. Use this instead of $this->m and $this->b.
     protected $params

     // We can then move the evaluate and getEquation code up to the parent.
     // The specific details on how to do this are a little fuzzy, arrays, list of parameters...
     public function evaluate($x)
    {
        // Params is an array of parameters, if the chosen method produces parameters

        $fitted_params = $this->params
        return evaluateModel($x, $fitted_params);
    }

 public function getEquation($x)
    {
        // Params is an array of parameters, if the chosen method produces parameters
        $fitted_params = $this->params
        return getModelEquation($x, $fitted_params);
    }
}

// The Linear class then combines a Linear Model with a Least Squares method.
// We could alternatively combine a Logarithm model with LeastSquares, or Exponential with Interpolation.
class Linear extends Regression
{
    use LinearModel;
    use LeastSquares;

    function calculate($ys, $xs)
    {
        // Prepare the data for the chosen method.
    }

    // If we have a non-parametric regression, we will but evaluate code here.
    // LOESS or interpolation would be two examples.
    function evaluate($x)
    {

    }
}

We have the existing LeastSquares trait as a method, but we could also use weighted least squares, or non-linear, or LOESS, or whatever.

The job of the regression class is to provide common functions which link a regression method and a model, such as to allow us to arbitrarily evaluate the model at defined points. I'm not saying that the job is to find any sort of universal parameters, because non-parametric regression has no universal parameters. The classes which extend the Regression class are where the model and the method are chosen. Data is prepared for the analysis, and, in the case of non-parametric regressions, functions (like evaluate) may have to be overridden.

I guess "Regression" could be extended to ParametricRegression, and NonParametricRegression...

Vandermonde matrix

https://en.wikipedia.org/wiki/Vandermonde_matrix

Providing an array and an int, produce this matrix. I don't know my design patterns, so would extending Matrix and adding some sort of factory work here?

public function __construct(array $R, int $n)
{
    for ($i=0;$i<$n;$i++)
    {
        $A[] = Single::pow($R, $i);
    }
    $this->A = $A->transpose();
}

Covariance

Your Statistics/Descriptive class has variance. Where would you like to put Covariance? Are you planning on keeping single variable statistics in Descriptive, and multi-factor statistics in either another class, or in Regression?

Bigger picture question...Are you planning on making all your methods static, and is there a reason for this choice? Would it be easier for users to code more like:

$data = [[1,2],[2,3],[5,7]];
$regression = new Regression($data); // load the data and calculate regression parameters
$parameters = $regression->getParameters();  // returns[beta, alpha]
$residuals = $regression->getResiduals(); // an array of the residuals
$regression->evaluate(2.5);    // get yhat at x=2.5

Factorial, double factorial, rising factorial

I see double factorial is in Special Function, while factorial is in Combinatorics. should they both be in the same place? I'm putting in the Pochhammer function (rising factorial) and curious where you want me to put it.

SquareMatrix Questions

If I do:

$S = new SquareMatrix([[1,2],[3,4]]);
$N = $S->scalarAdd(5);
echo get_class($N);

Would it be a Matrix, or SquareMatrix? The parent scalarAdd() is set to return a Matrix. If we want to maintain a matrix as a square matrix, would all the parent functions have to be redefined in the child?

The reason I'm asking, is we need to make sure that

    $S = new SquareMatrix([[1,2],[3,4]]);
    $N = $S->columnExclude(1);
    echo get_class($N);

does not return an object of type SquareMatrix. I'm worried that type hinting could cause issues. If I do a series of operations, I wouldn't want to have to manually "upconvert" my Matrix to a SquareMatrix in order to do a specific function.

$X->transpose()->multiply($X)->toSquare()->inverse();

It's like in the stereotypical "Car" class, having to make a special LittleRedCorvette class in order to play Prince on the radio.

Matrix Math

I don't know much about licensing...Can we just pull in the functions from the PEAR Math Matrix Library as a starting point? Regression functions will be much better with the addition of Matrix functions. Simple Linear Regression is just a simplification of Multiple Linear Regression. Polynomial fit will be arbitrarily easy.
I've used this library before, and my only issue with it as-is is it modifies $this when performing operations like invert() and mult() instead of returning a new object. This means you can't chain operations like $X->Transpose()->mult($X);

Testing distribution::rand()

I don't know unit testing enough to mock it up correctly. I was thinking the test directory could have something like this, but it looks fishy the way it is. Any advice?

class TestDistribution extends Continuous
{
    public static function rand($x)
    {
        // Have the parent call the method below which returns a float that is specified by the test.
        // If you assume that the PHP random number generator works, we will test if the method works
        // given a specific value.  Then again, if we have already tested the inverse function, in this case we
        // pretty much know it works.
        return parent::rand();
    }

    static function randomFloat()
    {
        return self::x;
    }
}

Namespace for Root Finding

I was thinking of creating a new namespace within NumericalAnalysis for RootFinding (or RootApproximation). NewtonsMethod would go in there, and if any others got implemented, like Secant method or Muller's method or whatever, they would join Newton.

Does NumericalAnalysis\RootFinding sound appropriate? Ideas for a better name?

Performance expectations for iterative functions

Some of the iterative functions I've been working on can have results from very large to very small. When developing unit tests, what would you like for the tolerance? I think a relative error makes most sense, especially since a value could be 1e-6. Having the unit test check if it's within .0001 is meaningless. Should I set some sort of tolerance as a constant in the unit test class? I was thinking 0.0001% might be a good place to start.

Accuracy of gamma function

Instead of always using one of the estimations, why not use the definition of gamma when appropriate:

static function Γ($n){
  $π = \M_PI;
  if (is_int($n)){
    return self::fact($n - 1);
  }

// floating point safe comparison
  if(round($n * 2, 0) / 2 / $n == 1){
    $x = round($n - .5, 0);
    return sqrt($π) * self::fact(2 * $x) /(4 ** $x) / self::fact($x);
  }
  return self::ΓStirling($n);
}

Organizing Regressions

Since a lot of the typical regression types are linear least squares, with some sort of translation, what do you think about using the PHP trait function to make this available? Once Matrix operations are complete, we could define a trait which contains the least squares matrix operations. Then each type of regression could do their translation, send the code to the trait computation, and then translate the results.

This would allow power, log, linear, and multiple linear to all use the same "complex" math to get their parameters, without having to extend a linear base class, or force the least squares function upon the parent Regression class.
Example:

trait LeastSquaresRegression
{
    function leastSquares($ys, $xs)
    {
        $X = new Matrix($xs);
        $y = new Matrix($ys);
        $(XᵀX)⁻¹Xᵀy = $X->transpose->mult($X)->inverse()->(mult($X)->transpose())->mult($y);
        return $(XᵀX)⁻¹Xᵀy;
    }
}

Class PowerLaw extends Regression
{
    use LeastSquaresRegression;
    function calculate()
    {
        $transformed_y = array_map('ln', $ys);
        $transformed_x = array_map('ln', $xs);

        $betas = this->leastSquares($transformed_y,$transformed_x)

        // These might be backwards.
        $this->m = exp($betas[0]);
        $this->b = $betas[1];
    }
}

Update Composer description

Currently it just says "Math Library for PHP". It think a couple quick sentences listing the features will draw in potential users. There are several libraries which offer similar features.

Jacobian matrix

I would like to add the Jacobian matrix to this package. Do you have an opinion on how to implement it? It's the partial derivative of a function evaluated at a variety of X values, with respect to each parameter. Should I make a function that returns a Matrix, or should I make it it's own object that extends the Mateix base class.

Once Matrix inverse is done I can use the Jacobian to do non-linear least squares.

Discussion - Standardizing Regression function inputs

Let's discuss how we want the input parameters to be designed for ultimate flexibility. I have my library set up to accept two arrays, one a set of independent variables [[v1,w1,x1],[v2,w2,x2],[v3,w3,x3],...] and the other the dependent array [y1,y2,y3,...]. I like this approach because, although in simple linear regression, the points are defined as (x,y), in three-space, where points are (x,y,z) would z be the dependent variable?...and if this is the case, should we change the name in the Regression class so it's not xs and ys? Would we have to explicitly state "dependent variable is always the last element in each data array"? It seems less cumbersome to do "new LinearRegression(array $ys, array $xs)"...but maybe it's all my years working with Excel biasing me.

Alternatively, we could have Regression define a default, and the children could have their own specialty constructors if need be:

class Regression{
  __construct(...$params){
      if (count($params) == 1) // set it up to use an array of points
        foreach($params as &$point){  // pop the last element off of each point into a new array.
           $ys[] = array_pop($point);  
      }
      else if (count($params) == 2) //use an array of ys and an array of xs.
        //do nothing
  }
}

Similarly, $xs could be an array [1,2,3,4,5,4,...], or a simple "Matrix" [[1],[2],[3],[4],[5],...] and the constructor could manage it. I was planning on putting this type of logic in my Regression class, and then calling $this->compute(), which would be a method that each child of Regression would implement.I see the advantage of "$this->compute()" is that each child wouldn't have to be set up like:

 class Linear extends Regression{
  function __construct(...$params){
    parent::__construct();
     [least squares code or whatever]
  }
}

if they wanted to handle class inputs in the default manner. Separating the more universal housekeeping/set-up code from the specialized optimization code.

Similar discussion needs to happen on the input parameters for [RegressionTypes]->evaluate();

Discussion - Should we create a RegressionInterface?

Adding an interface to the parent Regression class would force all non-abstract children of Regression to implement certain methods in a particular way. For example, we will want to ensure that they all have an evaluate() and calculate() method. A non-parametric algorithm like LOESS would not have a defined equation, or parameters. If they (getEquation() or getParameters()) were part of the Interface, the parent would have to implement them and return some sort of error.

Uniform Interval

You have the uniformInterval function throw an exception when x is either below a or above b. According to wikipedia, the distribution is not undefined at these points, it's jut zero. Does it need to throw an exception?

Checking parameters...Test $p

Distribution inverses will be given a $p, or $target parameter, which has the range [0,1]. Would it be best to add this to each distributions CONST, or is there a way that the parent can initialize this, and have the children add their distribution specific parameters?

Remove Standard Tables?

I'm guessing these are unnecessary if we are able to calculate everything, and to greater precision, correct?

Regularized incomplete beta function and real numbers

The regularized incomplete beta function as is only works for integers and half integers. Is this correct? Is this going to be an issue. Does it need to work for all real numbers? Is there an algorithm for them?

Regexp error?

I changed the least squares trait to use matrix math, and all the checks passed except
Failed asserting that 'y = 1.000000x + -0.000000' matches PCRE pattern "/^y = \d+[.]\d+x [+] \d+[.]\d+$/".

Is the minus sign screwing things up?

Student T Distribution CDF for Negative t Value

It seems like the StudentT CDF might be returning the upper cumulative rather than the lower cumulative for negative t values.

For example, compare the output of this calculator for x = -1 and v = 2
(http://keisan.casio.com/exec/system/1180573203)
lower cumulative = 0.211324865405187117745
upper cumulative = 0.788675134594812882255

The StudentT CDF seems to match that calculator's upper cumulative for negative values, whereas it matches the lower cumulative for positive values.

Additional Numerical Methods

It would be good to have numerical integration and differentiation. Integration would be using the Newton–Cotes formulas, and differentiation by the Symmetric derivative.

Standard normal inheritance

I see you have the standard normal inheriting from Continuous instead of Normal. I'm curious why you changed that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.