Coder Social home page Coder Social logo

jhsansom / sycomode Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 68 KB

This repository contains code for a class of methods aimed at "consolidating" an LLM's context into its weights.

License: GNU Affero General Public License v3.0

Python 100.00%

sycomode's Introduction

Description

This repository contains code for a class of methods aimed at "consolidating" an LLM's context into its weights. The name "SyCoMode" comes from "Systems Consolidation" in humans, the process by which memories residing in the hippocampus are transferred to the neocortex.

Loosely, the SyCoMode training objective is based on that originally delineated in [1], which is as follows:

$$ \Pr (w_n | \theta') \approx \Pr (w_n | w_{n-k}, ..., w_{n-1}; \theta) $$

The objective delineated above is to learn some set of LLM model weights $\theta'$ that, without a prompt, approximate the behavior of the LLM given its default weights $\theta$ and a prompt $w_{n-1}, ..., w_{n-1}$. Thus, the prompt is effectively "consolidated" into the new weights $\theta'$.

Some possible applications of this technique include:

  • Minimization of Compute Costs: by compressing a system prompt into an LLM's weights, you would no longer need to compute its attention values and hidden layers.
  • Learning Efficiency: by mimicking in-context learning, SyCoMode could potentially be more efficient than causal language modeling.
  • Limiting Hallucinations: once again, by mimicking in-context learning, SyCoMode might perform comparably to RAG [2], which has been shown to reduce hallucinations.

Progress Thus Far

This work is currently unfinished. I have gotten some preliminary experiments to work, but nothing has been fully effective at consolidating prompts. Thus, I invite you to help me work on this project! Feel free to make a pull request into this repository or use the code in your own project. If you do, please cite this work as follows:

@misc{sycomode,
  title={Systems Consolidation in LLMs: From Context to Weights},
  author={Sansom, Jacob and Glasscock, Creighton and Ma, Ziqiao and Chai, Joyce},
  journal={GitHub Repository}
  url={https://github.com/jhsansom/SyCoMode}
  year={2024}
}

Structure of Code

I have devised three simple experiments, located in experiment1.py, experiment2.py, and experiment3.py. Each file has a description at the top. To run an experiment, use the following command:

python experiment1.py \
    --num-mem-tokens=0  \
    --objective-function=causal_language_model \
    --lr=2e-5 \
    --num-iter=5 \
    --temp=1 \
    --model-name=huggyllama/llama-7b \
    --no-wandb

Each flag shown above contains its default value. Here is a more detailed description:

  • num-mem-tokens: An integer value specifying how many custom memory tokens to compress the prompt into. If 0, compress the prompt into the weights of the model itself.
  • objective-function: Reference the various objective functions in objectives.py.
  • lr: The learning rate.
  • num-iter: The number of gradient descent steps.
  • temp: The temperature used in the softmax equation for outputting probability values.
  • model-name: The model name, as stored on HuggingFace.
  • no-wandb: Use this flag if you want to turn OFF W&B. Omit this flag if you DO want to track via W&B.

Works Cited

  • [1] Askell, Amanda, et al. "A general language assistant as a laboratory for alignment." arXiv preprint arXiv:2112.00861 (2021).
  • [2] Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.

sycomode's People

Contributors

jhsansom avatar

Stargazers

Zheyuan Brian Zhang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.