Coder Social home page Coder Social logo

annakthrnlee / cryptocurrencies Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 18.39 MB

Using unsupervised machine learning, I created a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment.

Jupyter Notebook 100.00%
plotly hvplot python pandas kmeans-clustering pca-analysis plotly-express

cryptocurrencies's Introduction

Cryptocurrencies

Overview:

Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. My job is to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment. The data I was given is not in an ideal format for my algorithms, so it will need to be processed to fit the machine learning models. Since there is no known output for what the company is looking for, I will use unsupervised learning.

  • Using my knowledge of Pandas, I preprocessed the dataset to perform PCA.
  • Using my knowledge of PCA (Principal Component Analysis) algorithm, I reduced the dimensions of the X DataFrame to three principal components and placed these dimensions into a new DataFrame.
  • Next, I clustered the cryptocurrencies using the K-Means algorithm.
  • Finally, using my knowledge of creating scatter plots with Plotly Express and hvplot, I visualized the distinct groups that corresponded to the three principal components.

Resources:

  • Software: Python 3.9.7 and Jupyter Notebook
  • Data: crpto_data.csv

Definitions:

  • K-Means: The K-means algorithm groups the data into K clusters, where belonging to a cluster is based on some similarity or distance measure to a centroid.
  • PCA: PCA is a statistical technique to speed up machine learning algorithms when the number of input features (or dimensions) is too high. The technique reduces the number of dimensions by transforming a large set of variables into a smaller one that contains most of the information in the original large set.

Results:

After removing non-tradable currencies, null values, and the "IsTrading" column, I created a new DataFrame that holds all of the crypto names. The new DataFrame consisted of 532 rows = 532 tradable cryptocurrencies on the market at that time.

Screen Shot 2022-09-06 at 6 33 25 PM

Then I used the K-means algorithm to cluster the cryptocurrencies using the PCA data. The following steps took place:

  • An elbow curve was created using hvPlot to find the best value for K.
  • Predictions were made on the K clusters of the cryptocurrencies’ data.
  • A new DataFrame was created with the same index as the crypto_df DataFrame and had the following columns: Algorithm, ProofType, TotalCoinsMined, TotalCoinSupply, PC 1, PC 2, PC 3, CoinName, and Class.

Screen Shot 2022-09-06 at 6 15 28 PM

Elbow Curve:

Screen Shot 2022-09-06 at 6 16 21 PM

New DataFrame:

Screen Shot 2022-09-06 at 6 16 47 PM

3D-Scatter plot with the PCA data and the clusters:

Screen Shot 2022-09-06 at 6 17 46 PM

Finally, I created a new table with the tradable cryptocurrencies. The total number of tradable cryptocurrencies = 532

Screen Shot 2022-09-06 at 6 20 25 PM

Screen Shot 2022-09-06 at 6 21 00 PM

Summary:

As you can see from the final 2D-Scatter plot, most of the clusters are overlapping and not quite forming the distincts groups as we had hoped. That's why I also created a 3D graph to help better visualize each group, providing three distinct groups that correspond to the three clusters that we expect the model to break the data into.

Screen Shot 2022-09-06 at 6 27 58 PM

- The 3D scatter plot can be rotated using the mouse to click and drag using the scroll wheel. Try hovering over a unique point to receive information such as each principal component.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.