Coder Social home page Coder Social logo

dsc-central_tendency_warmup's Introduction

Summary Statistics Warmup

For this warmup, we will be parsing data held in objects returned from the Spotify API.

spotify

import requests
import json
import numpy as np
import matplotlib.pyplot as plt
import pickle
with open('data/jazz_query_response', 'rb') as read_file:
    jazz_tracks = pickle.load(read_file)

Task 1

The jazz_tracks object loaded above is a list of dictionaries. Each element of the dictionary contains data about a song.

The first task is to parse this list, and gather the song length data from each dictionary.

To do so, you will have to loop through jazz_tracks, use the appropriate key to access the song length data for each element, then append this data ponit to the list. The list should then be composed of 1000 song lengths.

# Your code here
track_durations = None

Task 2

Now that you have the track durations of each jazz song stored in the track_durations list, calculate the mean song length in our sample of tracks.

Don't use a built in operator, numpy, or the like. Do it from scratch.

# Your code here
track_mean_length = None
print(f'The average track length is {track_mean_length} ms')
The average track length is 238829.013 ms
# Cross check with numpy
np.mean(track_durations)
238829.013

Task 3

Calculate the variance and standard deviation of the sample of track lengths.

Since it is a sample, use the number of tracks minus 1 in the sample as the denominator.

# Your code here
track_length_variance = None
# Cross check with numpy.  ddof stands for degrees
np.var(track_durations, ddof=1)
9000620195.035866
# Your code here
track_standard_deviation = None
# Cross check with numpy
np.std(track_durations, ddof=1)
94871.59846358585

Task 4: Covariance and correlation

The formula for covariance of a sample is

$$s_{jk} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{ij}-\bar{x}j)(x{ik}-\bar{x}_k)$$

Here are 4 lists variables taken from our Spotify API request.

popularity = [track['popularity'] for track in jazz_tracks]
duration = [track['duration_ms'] for track in jazz_tracks]
total_tracks = [track['album']['total_tracks'] for track in jazz_tracks]
track_number =  [track['track_number'] for track in jazz_tracks]

Write a function that takes in any two of the 4 lists, and returns the covariance between them.

# Your code here
def covariance():
    pass

The correlation between two array-like objects is simply the covariance divided by the product of the standard deviatiations of each list.

Write a function which calculates the correlation. You can use the covariance function you calculated above within the correlation function.

# Your code here
def correlation():
    pass

Using your function, of the four lists above, which have the strongest correlation? Is the correlation positive or negative? What does this mean?

  • Your written answer here

Task 5:

Let's look at a histogram of the jazz track lengths.

# Don't worry about the matplotlib syntax yet
fig, ax = plt.subplots()
ax.hist(track_durations, bins=20)
ax.set_title('Jazz Song Length')
ax.set_xlabel('Song Length in MS');

png

Describe the shape of this histogram in the markdown cell below. Is it skewed? Which way? Does the mean you calculated above seem correct? Does it indicate the presence of outliers of song length?

  • your answer here

Task 6

Now, let's write a function that takes in any list of track lengths, then prints and returns the mean, variance, and standard deviation of the list.

Feel free to use numpy or other methods to calculate the statistics.

As a bonus, the function is pre-coded to print out a histogram as well.

def track_length_descriptor(track_duration_list, genre=''):
    
    '''
    Params
    ______
    track_duration_list: a list of track lengths in milliseconds 
    returned from the spotify API
    
    genre: a string to add genre to the histogram title.
    
    Returns
    _______
    a list containing the mean, variance, and standard deviation of the tracks
    
    '''
    
    fig, ax = plt.subplots()
    ax.hist(<fill_in>, bins=20)
    ax.set_title(f'{genre} Song Length')
    ax.set_xlabel('Song Length in MS');
    plt.show()
    
  File "<ipython-input-42-e2c99dd6d74f>", line 18
    ax.hist(<fill_in>, bins=20)
            ^
SyntaxError: invalid syntax
with open('data/track_length_lists', 'rb') as read_file:
    classical_track_durations, rap_track_durations, punk_track_durations = pickle.load(read_file)

Use your function to compare track length statistics between classicle, rap, and punk songs.

track_length_descriptor(classical_track_durations, "Classical")

png

    The mean track length of Classical songs is 165906.707 ms.
    The variance of track lengths of Classical songs is 12858950170.405153 ms.
    The standard deviation of track lengths of Classical songs is 113397.31112511069 ms.
    
    
    





[165906.707, 12858950170.405153, 113397.31112511069]
track_length_descriptor(rap_track_durations, 'Rap')

png

    The mean track length of Rap songs is 202687.558 ms.
    The variance of track lengths of Rap songs is 2652897639.9546356 ms.
    The standard deviation of track lengths of Rap songs is 51506.28738275198 ms.
    
    
    





[202687.558, 2652897639.9546356, 51506.28738275198]
track_length_descriptor(punk_track_durations, 'Punk')

png

    The mean track length of Punk songs is 216115.196 ms.
    The variance of track lengths of Punk songs is 3202906042.827584 ms.
    The standard deviation of track lengths of Punk songs is 56594.22269832482 ms.
    
    
    





[216115.196, 3202906042.827584, 56594.22269832482]

dsc-central_tendency_warmup's People

Contributors

j-max avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.