Comments (4)
Your output folder from translation looks fine - we go from source language (in oasst1) to target (eu in your case) so that's what you see. Do the separate files contain actual (JSON) content? Because it seems that your create_thread_prompts.py is simply not picking up any data, see: Generating train split: 0 examples [00:00, ? examples/s]
This could be due to the loading from disk in Arrow format - you can try uploading your translated oasst1 dataset to Huggingface first and then using that to create the threads.
from llama2lang.
No the code was wrong but I fixed it now. If you pull main and make sure to update how you call the scripts (check the readme) you should be good to go from where you left off :)
from llama2lang.
Your output folder from translation looks fine - we go from source language (in oasst1) to target (eu in your case) so that's what you see. Do the separate files contain actual (JSON) content? Because it seems that your create_thread_prompts.py is simply not picking up any data, see:
Generating train split: 0 examples [00:00, ? examples/s]
This could be due to the loading from disk in Arrow format - you can try uploading your translated oasst1 dataset to Huggingface first and then using that to create the threads.
Hi thanks for the quick reply. :)
The json files are populated, I don´t see any of 0kb and for what I see they contain phrases in basque. I guess it´s a good signal.
I used the combine_checkpoints.py to upload the files to my public HugginFace repository: https://huggingface.co/datasets/Jaimefebe/Eus02 but I am getting the same error.
If the problem is on my lack of knowledge, please don´t hesitate to tell me.
I just opened this ticket because ChatGPT 3.5 wasn´t really helpful with that error and I thought that maybe it was a bug that could interest you, but if it´s a some newbie error maybe ChatGPT 4 can help me.
from llama2lang.
No the code was wrong but I fixed it now. If you pull main and make sure to update how you call the scripts (check the readme) you should be good to go from where you left off :)
Cool! I'll try it as soon as possible.
Thanks!
from llama2lang.
Related Issues (20)
- Question or bug HOT 5
- Feature request: ChatML support
- Madlad: unrecognized arguments: --model_size 7b HOT 2
- Got this error during finetuning HOT 3
- Support finetuning from local disk too
- Issue with THREAD_TEMPLATE HOT 3
- Sample Example for finetuning
- Feedback on the hindi fientuned model HOT 7
- SeamlessM4T-v2 default (medium) model removed from huggingface HOT 1
- nllb.py and madlad.py points to the incorrect HF repositories HOT 9
- error running benchmark.py with seamless HOT 2
- Dataset chat format independent HOT 2
- problem with run_inference.py HOT 4
- Best translation model for turkish HOT 2
- Translating takes too long (How to finetuning with QLoRA?) HOT 2
- Question: How would I do this with Phi 2? HOT 1
- Question: translating a monolingual HF Dataset HOT 2
- Can you make a dataset of LLaMa3-8B translated into Japanese? HOT 2
- [Question] What framework is able to load this adapter and serve the resulting model as an OpenAI endpoint? HOT 1
- [Bug] Error with benchmarking: 'NoneType' object is not iterable HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama2lang.