Task The following assignment concerns the numpy.random package in Python. This required required the ceration of a Jupyter notebook to exaplain the use of the package, including detailed explanations of at least five of the distributions provided for in the package.
There are four distinct tasks to be carried out in your Jupyter notebook.
- Explain the overall purpose of the package.
- Explain the use of the “Simple random data” and “Permutations” functions.
- Explain the use and purpose of at least five “Distributions” functions.
- Explain the use of seeds in generating pseudorandom numbers.
Getting started
Download and install Python and Anaconda All files associated with this project are available at https://github.com/NiamhOL/programming-for-data-analysis-2019
Packages used in this project
The following packages were used to run statistical analysis and draw grpahs for this project.
Python https://www.python.org/downloads/
Anaconda https://www.anaconda.com/distribution/ - is the easiest way to perfrom Python data science machine learning on Linux, Windows and Mac OS.
iPython https://ipython.org/ - it an interactive command-line terminal for Python.
Numpy http://www.numpy.org/ - is the fundamental package for scientific computing within Python.
Jupyter Notebook https://jupyter.org/ - is an open-source web application that allows the creation and sharing of documents that contains live code, equations, visualisations and narriative text.
Importing packages
The above packages can be imported into Python. Use Import function in iPython as follows:
'import ipython'
'import numpy as np'
'import jupyter notebook'
'import matplotlib.pyplot as plt'
Background
"NumPy's random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sanple from different statistical distributions. [1] This random number generator was designed with the focus on modelling and simulation. A common task in data analysis is the creation of random samples. NumPy Random provides a way of creating random samples, which can then be used for data analysis.
NumPy functions operate on numbers and they are especially useful for data science, statistics and machine learning. Which often use very large dataset of numneric learning. An intrical part of machine learning and deep learning is data manipulation. NumPy provides an excellent toolkit to help "clean up" data for data manipulation.
The core functionality of NumPy is its "ndarray", data structure. Which describes the collection of items of the same type. "Every item in an ndarray takes the same size block in the memory" [2] Ndarry's can be indexed to allow for analysisng and data manipulation.
This assignment will focus on using NumPy to generate random samples of a population to check the validity of conclusions that are being drawn from the whole population.
Juypter notebook
The Juypter notebook attached to this project contains the answers to the four tasks.
References
[1] https://numpy.org/doc/1.17/reference/random/index.html
[2] https://www.tutorialspoint.com/numpy/numpy_ndarray_object.htm
Biblography
Jupyter Documentation https://jupyter.org/documentation
Numpy.random https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html
https://www.python-course.eu/python_numpy_probability.php
https://www.r-craft.org/r-news/how-to-use-numpy-random-choice/