anarchy-ai / llm-vm Goto Github PK
View Code? Open in Web Editor NEWirresponsible innovation. Try now at https://chat.dev/
Home Page: https://anarchy.ai/
License: MIT License
irresponsible innovation. Try now at https://chat.dev/
Home Page: https://anarchy.ai/
License: MIT License
Should have a flag that downloads the model from one of the OSS repos.
currently that config setup doesn't check that settings make sense
localhost because that easily becomes a remote interface
currently we have the library depends on the web server, when theres no good reason to require it
Hello, I downloaded the project (no installation) as I wanted to edit it for the bounties.
As soon as I tried to compile it gave me an error like this:
Code: from llm_vm.utils.keys import *
Error: No module named 'llm_vm'
If I had installed it there´s no doubt it would´ve worked. However, when doing imports from the same project I think the usual way to navigate through folders is using dots, like:
from ...utils.tools import *
Is there another way to go about this?
Add linting and auto-formatting to keep code clean and improve efficiency
QLoRA (an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance).
Converting LMMs to 4-bit quantized with QLoRA will reduce the required resources, speed up the fine-tuning and inference parts, and be much cheaper.
Refer to the following link for your reference
Currently data_synthesis.py asks GPT to generate all the datapoints in one call. This leads to many issues with many many incorrect datapoints being made and not enough datapoints being made. I fixed this in the experimentation_finetuning branch, and that approach of data synthesis needs to be brought to main and made robust (my current implementation is for research not deployment).
One thing that needs to be done is semantic similarity checking the way that the current data_synthesis.py does. Also my implementation in experimentation_finetuning also does not support lists as inputs, and this needs to be added as well.
@itsmeashutosh43 and @VictorOdede I would love your feedback and Ideas on this
prompted by thinking about what we want out of regex/grammar constrained LLM inference, we've realized that we should just embrace having a more generic interface which those would be examples of.
I'm very much not familiar with python best practice, but heres my attempt at specifying it from a Types and stuff perspective thats slightly simplified.
"this is crudely our current api"
Class LLM(ABC):
model : AbstractHF_Model
tokenizer : Abstract_HF_Tokenizer
genToken : List[Token] -> Map( Token_ID ,Log_Probability )
generate_simple : ... bunch of stuff we pass to hf generate -> Async( String)
"""generate_simple is our current generate """
generate_with_Constraints : TokenConstraint -> ... same as above -> Async( String )
""" So lets talk about TokenConstraint!
"""
class TokenConstraint(ABC):
constraint_type : type // the type of constraints we wanna have
state_type : type // this should be frozen/immutable, but i'll describe stuff in a way that maybe kinda works either way
copy_state :
def is_valid (constraint : constraint_type , prefix : List of Tokens) -> Bool :
def allowed_transitions(constraint: constraint_type, prefix : List of Token_id, tk : Tokenizer) -> Set(Token_id) :
def construct_state(constraint: constraint_type, prefix : List of Token_id, tk : Tokenizer) -> Union[None, state_type]
// returns a state that *may* allow faster enumeration/checking of what tokens are allowed transitions next
construct_crude_filter_set (constraint: constraint_type,tk : tokenizer,)-> Set [TokenID]
// in eg a regex, this might be something like
// "the set of tokens that include only the character classes reference in the regexp"
// it might be
// if we have anything complicated in constructing the current state, this should be *much*
// faster for filtering tokens
allowed_transitions_from_state (constraint : constraint_type, tk : tokenizer, st : state_type ) -> Set[token id]
def copy_state(state : state_type)-> state_type
// if the state is immutable/frozen this should be the identity function
something like this could maybe be the generic interface for constrained inference perhaps? i'm glossing over a lot of details like we're actually gonna be zeroing out the set of token indices that aren't in these sets in the log probability vectors, and all the mapping to and from tokenid <-> strings, and also the fact that we can do token constraint computation in paralell with running the gen_next_token_inference step
also in principle, things should be such that we can have instances of this interface that correspond with
logically And
or Or
or Xor
or Nand
together any two constraint languages I think?
the first version uses JSON, which can often be malformed, and theres no good error recovery in that case, need to identify and switch to a more "error tolerant" self aligning format. (meaning we can skip a bad pair and recover useful outputs)
Example:
> r = complete(prompt="how many eyes do cats have, and how many eyes do spiders have?", output_filter_regex="\{ 'cats': [0-9]*, 'spiders': [0-9]* \}"
> print(r)
{ 'cats': 2, 'spiders': 8 }
Excuse my regex though, I'm a bit rusty.
Like https://lmql.ai/ but with easier languages to use
currently, finetuning doesn't persist models after they've been fine tuned
Current onsite LLM class uses full parameter fine-tuning which costly. LoRA fine-tuning will require less memory and prevent overfitting by freezing the pretrained weights.
File "quickstart_finetune.py", line 5, in <module>
from llm_vm.config import settings
File "C:\Users\Abhigya Sodani\Anaconda3\lib\site-packages\llm_vm\config.py", line 55, in <module>
if settings.big_model not in MODELS_AVAILABLE:
File "C:\Users\Abhigya Sodani\Anaconda3\lib\site-packages\dynaconf\base.py", line 144, in __getattr__
value = getattr(self._wrapped, name)
File "C:\Users\Abhigya Sodani\Anaconda3\lib\site-packages\dynaconf\base.py", line 309, in __getattribute__
return super().__getattribute__(name)
AttributeError: 'Settings' object has no attribute 'BIG_MODEL'
We need to replicate and fix these issues with dynaconf. I have this issue when I clone the repo and run pip run . .
open bounty for demo applications that work to add to a curated example gallery
We should use the larger LLM to synthesize data for training the small LLM with in the optimizing api
List of open-source LLMs:
Name | Release date[a] | Developer | Number of parameters[b] | Corpus size | License[c] | Notes |
---|---|---|---|---|---|---|
BERT | 2018 | 340 million[42] | 3.3 billion words[42] | Apache 2.0[43] | An early and influential language model,[2] but encoder-only and thus not built to be prompted or generative[44] | |
XLNet | 2019 | ~340 million[45] | 33 billion words | An alternative to BERT; designed as encoder-only[46][47] | ||
GPT-2 | 2019 | OpenAI | 1.5 billion[48] | 40GB[49] (~10 billion tokens)[50] | MIT[51] | general-purpose model based on transformer architecture |
GPT-3 | 2020 | OpenAI | 175 billion[25] | 300 billion tokens[50] | public web API | A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[52] |
GPT-Neo | March 2021 | EleutherAI | 2.7 billion[53] | 825 GiB[54] | MIT[55] | The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[55] |
GPT-J | June 2021 | EleutherAI | 6 billion[56] | 825 GiB[54] | Apache 2.0 | GPT-3-style language model |
Megatron-Turing NLG | October 2021[57] | Microsoft and Nvidia | 530 billion[58] | 338.6 billion tokens[58] | Restricted web access | Standard architecture but trained on a supercomputing cluster. |
Ernie 3.0 Titan | December 2021 | Baidu | 260 billion[59] | 4 Tb | Proprietary | Chinese-language LLM. Ernie Bot is based on this model. |
Claude[60] | December 2021 | Anthropic | 52 billion[61] | 400 billion tokens[61] | Closed beta | Fine-tuned for desirable behavior in conversations.[62] |
GLaM (Generalist Language Model) | December 2021 | 1.2 trillion[63] | 1.6 trillion tokens[63] | Proprietary | Sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3. | |
Gopher | December 2021 | DeepMind | 280 billion[64] | 300 billion tokens[65] | Proprietary | |
LaMDA (Language Models for Dialog Applications) | January 2022 | 137 billion[66] | 1.56T words,[66] 168 billion tokens[65] | Proprietary | Specialized for response generation in conversations. | |
GPT-NeoX | February 2022 | EleutherAI | 20 billion[67] | 825 GiB[54] | Apache 2.0 | based on the Megatron architecture |
Chinchilla | March 2022 | DeepMind | 70 billion[68] | 1.4 trillion tokens[68][65] | Proprietary | Reduced-parameter model trained on more data. Used in the Sparrow bot. |
PaLM (Pathways Language Model) | April 2022 | 540 billion[69] | 768 billion tokens[68] | Proprietary | aimed to reach the practical limits of model scale | |
OPT (Open Pretrained Transformer) | May 2022 | Meta | 175 billion[70] | 180 billion tokens[71] | Non-commercial research[d] | GPT-3 architecture with some adaptations from Megatron |
YaLM 100B | June 2022 | Yandex | 100 billion[72] | 1.7TB[72] | Apache 2.0 | English-Russian model based on Microsoft's Megatron-LM. |
Minerva | June 2022 | 540 billion[73] | 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[73] | Proprietary | LLM trained for solving "mathematical and scientific questions using step-by-step reasoning".[74] Minerva is based on PaLM model, further trained on mathematical and scientific data. | |
BLOOM | July 2022 | Large collaboration led by Hugging Face | 175 billion[75] | 350 billion tokens (1.6TB)[76] | Responsible AI | Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages) |
Galactica | November 2022 | Meta | 120 billion | 106 billion tokens[77] | CC-BY-NC-4.0 | Trained on scientific text and modalities. |
AlexaTM (Teacher Models) | November 2022 | Amazon | 20 billion[78] | 1.3 trillion[79] | public web API[80] | bidirectional sequence-to-sequence architecture |
LLaMA (Large Language Model Meta AI) | February 2023 | Meta | 65 billion[81] | 1.4 trillion[81] | Non-commercial research[e] | Trained on a large 20-language corpus to aim for better performance with fewer parameters.[81] Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.[82] |
GPT-4 | March 2023 | OpenAI | Exact number unknown, approximately 1 trillion [f] | Unknown | public web API | Available for ChatGPT Plus users and used in several products. |
Cerebras-GPT | March 2023 | Cerebras | 13 billion[84] | Apache 2.0 | Trained with Chinchilla formula. | |
Falcon | March 2023 | Technology Innovation Institute | 40 billion[85] | 1 Trillion tokens (1TB)[85] | Apache 2.0[86] | The model is claimed to use only 75% of GPT-3's training compute, 40% of Chinchilla's, and 80% of PaLM-62B's. |
BloombergGPT | March 2023 | Bloomberg L.P. | 50 billion | 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[87] | Proprietary | LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks" |
PanGu-Σ | March 2023 | Huawei | 1.085 trillion | 329 billion tokens[88] | Proprietary | |
OpenAssistant[89] | March 2023 | LAION | 17 billion | 1.5 trillion tokens | Apache 2.0 | Trained on crowdsourced open data |
PaLM 2 (Pathways Language Model 2) | May 2023 | 340 billion[90] | 3.6 trillion tokens[90] | Proprietary | Used in Bard chatbot.[91] |
Separate into "prompt template" (a GPT-3 recognized word) and variables. Then ask the gpt-3 to change specific parameters for the template.
Prompt template: What is the [variable] of the [object]?
- variable-object pairs = (currency, Myanmar), (price, Bitcoin)
Now, for generation we might specify differences for either prompt templates or variable/object. Based on following parameters(not comprehensive list):
i. Style: Keep the meaning of the question same but ask in different styles.
eg: introduce spelling mistakes, code mixing, cultural tone difference etc.
ii. Semantic diversity: The task itself should change.
Add the above nlp metrics in code and set thresholds, keep generating till the thresholds are satisfied.
Reference:
eg: keep generating similar sentences with cosine_similarity 0.5 and diversity 0.4 till "n" sentences reached or "x" attempts satisfied.
Tasks for end of June and through July 2023
Can cleanup in progress fine tunings when ctrl+c abort the local version of LLM-VM #23
Can retry data synthesis when given malformed json results #23
Maybe later should consider a more robust encoding rather than json, where mismatched brackets from unquoted responses #29
the current main branch has no issues from mypy's perspective, would that be a helpful lint for contributors?
We need documentation for the entire project, and each sub-folder, and part. And documentation for each function in the code itself, and a standard/programatic way to display documentation
quickstart_finetune.py currently demonstrates a successful finetuning of a local model.
While finetune is set to true in each completion call(lines 8, 14, 20), only the third call (line 20) results in finetuning and saving of the local model.
It is currently not obvious why we need 2 prior calls to completion before fine tuning successfully start.
If this is because we need prior examples this should be detailed in the documents and reflected in the feedback from the call itself in some way.
should probably surface the right hooks for this, though hobbyists dont really need or care about this
Create a Small_Local_Bert class capable of managing the bert LLM similar to other code in the file.
we're starting to add lots of parameters for configuration of LLM invocation for data synthesis and the agents, these need to be documented and sanity checked
perhaps https://www.dynaconf.com/, plus XDG support https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
Check entry points for consistency
Rework app directory structure into src/llm_vm and rename the _ _ init _ _.py file
All HuggingFace arguments need to work on the optimizer.complete endpoint as well. For the complete endpoint this means the all arguments for the Huggingface .generate function must be able to be added to the optimize.complete function as well. Currently, this not something I was able to do with the max_new_tokens
parameter, for example.
Only Small_Local_Neo successfully fine-tunes. This logic needs to be extended to the other models in the file.
Right now in optimizer.py on line 238
completion = self.call_small(prompt = dynamic_prompt.strip(), model=model, **kwargs)
we call the small model (accessing the generate function) with the model parameter. Right now this does not do anything, we need to be able to provide a .pt file to that parameter and then call load_model (also defined in onsite_llm.py) in order to load that .pt file as the model to use. This needs to be done in the finetune function as well.
The GPT3 and Chat_GPT class models do not allow for a way to store and load a fine-tuned model into the completion pipeline. Adjust class attributes and methods to allow fine-tuned model with c_ids to be accessed in the openai call.
Each agent should have a research-quality readme including test results and individualized usage and explanations.
its as simple as changing
print(s)
to print(s,file=sys.stderr)
at least,. i think that is a pretty reasonable change that is good engineering to do
Outputs should be documented in the READMEs of each agents.
By setting up python-dotenv, the environment variables can be loaded from a .env
file, which is in the root. And, I would like to work on this.
Add example tests:
Please brainstorm 2 other possible uses!
We need to control how many examples are generated, how dissimilar/similar we want them to be, confirming that we comply with the amount of similarity we want between generated data points
System requirements list RAM, but I'd expect an LLM client to be using VRAM. Can it use either? How do I configure the client to use VRAM if I have a GPU for example?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.