Programming for Data Analysis, Higher Diploma in Data Analytics

Student: Niamh O'Leary

ID: G00376339

Task The following assignment concerns the numpy.random package in Python. This required required the ceration of a Jupyter notebook to exaplain the use of the package, including detailed explanations of at least five of the distributions provided for in the package.

There are four distinct tasks to be carried out in your Jupyter notebook.

Explain the overall purpose of the package.
Explain the use of the “Simple random data” and “Permutations” functions.
Explain the use and purpose of at least five “Distributions” functions.
Explain the use of seeds in generating pseudorandom numbers.

Getting started

Download and install Python and Anaconda All files associated with this project are available at https://github.com/NiamhOL/programming-for-data-analysis-2019

Packages used in this project

The following packages were used to run statistical analysis and draw grpahs for this project.

Python https://www.python.org/downloads/

Anaconda https://www.anaconda.com/distribution/ - is the easiest way to perfrom Python data science machine learning on Linux, Windows and Mac OS.

iPython https://ipython.org/ - it an interactive command-line terminal for Python.

Numpy http://www.numpy.org/ - is the fundamental package for scientific computing within Python.

Jupyter Notebook https://jupyter.org/ - is an open-source web application that allows the creation and sharing of documents that contains live code, equations, visualisations and narriative text.

Importing packages

The above packages can be imported into Python. Use Import function in iPython as follows:

'import ipython'
'import numpy as np'
'import jupyter notebook'
'import matplotlib.pyplot as plt'

Background

"NumPy's random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sanple from different statistical distributions. [1] This random number generator was designed with the focus on modelling and simulation. A common task in data analysis is the creation of random samples. NumPy Random provides a way of creating random samples, which can then be used for data analysis.

NumPy functions operate on numbers and they are especially useful for data science, statistics and machine learning. Which often use very large dataset of numneric learning. An intrical part of machine learning and deep learning is data manipulation. NumPy provides an excellent toolkit to help "clean up" data for data manipulation.

The core functionality of NumPy is its "ndarray", data structure. Which describes the collection of items of the same type. "Every item in an ndarray takes the same size block in the memory" [2] Ndarry's can be indexed to allow for analysisng and data manipulation.

This assignment will focus on using NumPy to generate random samples of a population to check the validity of conclusions that are being drawn from the whole population.

Juypter notebook

The Juypter notebook attached to this project contains the answers to the four tasks.

References

[1] https://numpy.org/doc/1.17/reference/random/index.html

[2] https://www.tutorialspoint.com/numpy/numpy_ndarray_object.htm

Biblography

Jupyter Documentation https://jupyter.org/documentation

Numpy.random https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html

https://www.python-course.eu/python_numpy_probability.php

https://www.r-craft.org/r-news/how-to-use-numpy-random-choice/

westamine / programming-for-data-analysis-2019 Goto Github PK

programming-for-data-analysis-2019's Introduction

Programming for Data Analysis, Higher Diploma in Data Analytics

Student: Niamh O'Leary

ID: G00376339

Author: Niamh O'Leary

programming-for-data-analysis-2019's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent