Coder Social home page Coder Social logo

atecon / pcatools Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 94 KB

Package for conducting PCA incl. scree-plot and bi-plot

License: GNU General Public License v3.0

Shell 41.93% Makefile 25.67% Python 32.41%
biplot datascience gretl pca principal-component-analysis screeplot statistics

pcatools's Introduction

pcaTools

This package augments Gretl's built-in pca command.

The package is a collection of functions for conducting Principal Component Analysis. It ships two plotting functions for creating a so called 'scree plot' (https://en.wikipedia.org/wiki/Scree_plot) and a bi-plot (https://en.wikipedia.org/wiki/Biplot).

Furthermore, it supports the computation of sparse PCs meaning that some loading coefficients may be zero. The estimation of the loadings is done by the forward-stagewise boosting algorithm (Tibshirani, for details see the fsboost Gretl package) which is similar to Lasso. Currently, however, only the loadings and scores are based on the sparse PC estimates but not the estimated variances.

Please report bugs or comments on the gretl mailing list, report an issue on github (https://github.com/atecon/pcaTools/issues) or write to [email protected].

Install from local file

Download the gfn-file which you find here https://github.com/atecon/pcaTools/blob/main/src/pcaTools.gfn

To install the package, run the following Gretl command:

pkg install /path/to/pcaTools.gfn --local

Public functions

pcaEst(const list X, bundle opts[null])

This function performs Principal Component Analysis (PCA) on the input data.

Parameters

  • X: list, Variables on which to conduct PCA.
  • opts: bundle, Optional bundle passing parameters

opts can include the following parameter for setting options:

  • do_stdize: bool, Centre variables and divide by their respective standard deviattion (default: TRUE)
  • use_vcv: bool, Compute principal components based on variance-covariance matrix if TRUE, otherwise if FALSE use the correlation matrix (default: TRUE)
  • verbose: bool, Make output more verbose if TRUE (default: FALSE)

Returns

A bundle self containing the results of the PCA.

pcaPrint(const bundle self)

This function prints the results of a PCA analysis.

Parameters

  • self: bundle, Returned information from the pcaEst() function.

Returns

No return value. This function prints the PCA results to the console.

pcaScreeplot(const bundle self, const string filename[null])

This function generates a scree plot from the results of a PCA analysis. A scree plot is a line plot of the eigenvalues of factors or principal components in an analysis.

Parameters

  • self: A bundle containing the results of a PCA, typically returned by pcaEst.
  • filename: A string referring to the PATH+FILENAME for storing the plot (optional). If no string is passed, the plot appears on the screen immediately.

One can tweak the plot by passing specific parameters to the bundle self before calling pcaScreeplot(). The following parameters are supported:

  • fontsize: Size of font (default: 10)
  • linedwidth: Width of the line (default: 1.5)

Returns

No return value. This function creates a scree plot.

pcaBiplot(const bundle self, const string filename[null])

This function generates a bi-plot from the results of a PCA analysis for each combination of computed principal components as a matrix of plots. A biplot is a plot of two principal components. It shows the combination of scores and the loading factors for each principal component.

Parameters

  • self: A bundle containing the results of a PCA, typically returned by pcaEst.
  • filename: A string referring to the PATH+FILENAME for storing the plot (optional). If no string is passed, the plot appears on the screen immediately.

Returns

No return value. This function generates a bi-plot.

Options passed to the PairPlot() function by means of the 'opts' bundle

One can tweak the plot by passing specific parameters to the bundle self before calling pcaBiplot. The following parameters are supported:

  • centre_biplot: bool, Centre the axes if TRUE, otherwise not (default: TRUE).
  • cols_biplot: int, Number of columns of gridplot (default: NA -> automatically set)
  • color_arrow: string, Color of the arrows depicting the eigenvector (default: "web-blue")
  • color_pattern: string, Color pattern for (factorized) data points of biplot. Either "dark2" or "default" (default: dark2).
  • factor: series, Distinct values for factorized bi-plot (default: none)
  • fontsize: int, Size of font for the title and variable names (default: 12)
  • fontsize_arrow: int, Size of font of the arrow labels (default: 12)
  • fontsize_key: int, Size of font of the key/ legend (default: 8)
  • height_biplot: int, Height of biplot (default: 600)
  • linedwidth: scalar, Width of the line (default: 1.5)
  • linedwidth_arrow: scalar, Width of the lines for the bi-plot arrows (default: 1.0)
  • n_pcs_to_plot: int, Number of first principal components to plot (default: all)
  • offset_label_x: scalar, Offset of labels for arrows along x-axis (default: 0)
  • offset_label_y: scalar, Offset of labels for arrows along y-axis (default: 0)
  • pointtype: int, Point type (default: 4)
  • pointsize: scalar, Size of point (default: 1.0)
  • rows_biplot: int, Number of rows of gridplot (default: NA -> automatically set)
  • sparse_pca: bool, If true, compute sparse PCA, otherwise non-sparse version (default: FALSE)
  • transparency: int, The rgbalpha plotting style assumes that each pixel of input data contains an alpha value in the range [0:255] (no transparency:full transparency). Currently, only applied to the 1st factor.
  • width_biplot: int, Width of biplot (default: 600)

Changelog

  • v0.2 (February 2024)

    • Introduce sparse regression-based PCA using the forward-stagewise boosting algorithm for feature selection
    • Improve plotting the loadings: put on the secondary axis
    • New package dependence: 'fsboost' package
    • Make font size of the key adjustable
  • v0.1 (January 2024)

    • Initial version

pcatools's People

Contributors

atecon avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.