I'm following the example from your README, and it works like this: my <code class

Oh. It seems like you mean the single file you'd get from running <code class="notrans

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Chatllama] what's supposed to be in the Actor checkpoint dir? about nebuly HOT 3 OPEN

StrangeTcy commented on June 8, 2024

[Chatllama] what's supposed to be in the Actor checkpoint dir?

from nebuly.

Comments (3)

StrangeTcy commented on June 8, 2024

Oh. It seems like you mean the single file you'd get from running llama.donwload from pyllama. Let me try it out...

from nebuly.

PierpaoloSorbellini commented on June 8, 2024

Hi @StrangeTcy, thanks for reaching out.
The first time that the model is loaded, you probably won't have the checkpoints dir in /models.

The folder is created when during training checkpoints are saved and the folder gets populated.

To specify the models from HF you just need to type in the config.yaml in the model field the name of the model from HF that is passed to transformer.AutoModel() when instantiating the model.

Be aware that HF itself had an issue when loading the tokenizer for llama. You may need to check that if it is still an issue.

from nebuly.

StrangeTcy commented on June 8, 2024

The first time the model is loaded from ./models, there are indeed no checkpoints there, but they can be downloaded with the python or bash script from pyllama.

As for HF models and LLaMA, HF transformers are indeed handled by the

 self.model = AutoModelForCausalLM.from_pretrained(
                config.model,
            )
            ``` in the `actor.py`, but pure llama models go through `load_model` from `llama_model.py`.
            
I guess I should try something like `decapoda-research/llama-7b-hf` as an HF model instead of the single-file llama checkpoint

from nebuly.

[Chatllama] what's supposed to be in the Actor checkpoint dir? about nebuly HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent