Coder Social home page Coder Social logo

word-embeddings's Introduction

word-embeddings

Package contents:

  • frequency_data folder: CSVs that are the result of frequency_experiment.py. Format is "[token index], [number of times token was generated]"
    • frequencies_of_random_vectors.csv: Result of using random token embeddings instead of GPT-J's token embeddings (unsaved random seed)
    • frequencies_old.csv: My first run of the frequencies_experiment.py (unsaved random seed) (WARNING: this is formatted differently, and in particular tokens that were never generated are not in the csv)
    • frequencies_random_seed_1.csv: Results of frequency_experiment.py with the random seed 1.
  • frequency_plots folder: Analysis plots are saved here
  • README.md: The README
  • analysis.py: Calls several methods of analysis on data once it has been generated by frequency_experiment.py
  • vocab.json: GPT's vocabulary (the mapping of tokens to indexes), from https://huggingface.co/EleutherAI/gpt-j-6B/tree/main

Generating Frequency Data:

  1. Download the GPT-J parameters. They are in the file 'tf_model.h5' at https://huggingface.co/EleutherAI/gpt-j-6B/tree/main.
  2. Run the main method of frequency_experiment.py. Optionally, change its random seed and the outfile name.
  3. If you wish to generate frequencies from random token embeddings instead of GPT-J's embeddings, use the weights=np.random.normal line

Analyzing Frequency Data:

  1. Run analysis.py.
  2. If you are analyzing differently named files, you may have to change the file_name lines.

word-embeddings's People

Contributors

roberthuben avatar

Stargazers

ipruning avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.