Coder Social home page Coder Social logo

redbull05689 / text_embedding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hackalog/text_embedding

0.0 1.0 0.0 111 KB

A Playground for exploring text embedding

License: MIT License

Makefile 3.26% Jupyter Notebook 50.41% Python 46.33%

text_embedding's Introduction

text_embedding

A playground for exploring text embeddings

GETTING STARTED

  • Create and switch to the virtual environment:
make create_environment
conda activate text_embedding
  • Fetch the raw data and process it into a usable form
make data
  • Explore the notebooks in the notebooks directory. The rest of the workflow is typically:
make train
make predict
make analysis
make summarize
make publish

For a complete list of available targets, just type

make

Project Organization

  • LICENSE
  • Makefile
    • top-level makefile. Type make for a list of valid commands
  • README.md
    • this file
  • data
    • Data directory. often symlinked to a filesystem with lots of space
    • data/raw
      • Raw (immutable) hash-verified downloads
    • data/interim
      • Extracted and interim data representations
    • data/processed
      • The final, canonical data sets for modeling.
  • docs
    • A default Sphinx project; see sphinx-doc.org for details
  • models
    • Trained and serialized models, model predictions, or model summaries
    • models/trained
      • Trained models
    • models/predictions
      • output of data flowed through trained models
  • notebooks
    • Jupyter notebooks. Naming convention is a number (for ordering), the creator's initials, and a short - delimited description, e.g. 1.0-jqp-initial-data-exploration.
  • references
    • Data dictionaries, manuals, and all other explanatory materials.
  • reports
    • Generated analysis as HTML, PDF, LaTeX, etc.
    • reports/figures
      • Generated graphics and figures to be used in reporting
  • requirements.txt
    • (if using pip+virtualenv) The requirements file for reproducing the analysis environment, e.g. generated with pip freeze > requirements.txt
  • environment.yml
    • (if using conda) The YAML file for reproducing the analysis environment
  • setup.py
    • Turns contents of src into a pip-installable python module (pip install -e .) so it can be imported in python code
  • src
    • Source code for use in this project.
    • src/__init__.py
      • Makes src a Python module
    • src/data
      • Scripts to fetch or generate data. In particular:
      • src/data/make_dataset.py
        • Run with python -m src.data.make_dataset fetch or python -m src.data.make_dataset process
    • src/features
      • Scripts to turn raw data into features for modeling, notably build_features.py
    • src/models
      • Scripts to train models and then use trained models to make predictions. e.g. predict_model.py, train_model.py
    • src/visualization
      • Scripts to create exploratory and results oriented visualizations; e.g. visualize.py
  • tox.ini
    • tox file with settings for running tox; see tox.testrun.org

Project derived from the the cookiecutter data science project template, for experimenting with ideas to improve the template #cookiecutterdatascience

text_embedding's People

Contributors

hackalog avatar acwooding avatar jc-healy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.