bigscience-workshop / bigscience Goto Github PK
View Code? Open in Web Editor NEWCentral place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
License: Other
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
License: Other
Can you please provide the files for the bias evaluation on the crowspairs dataset? The results are given in section 4.9 of the paper, but I do not see the files in the evaluation folder here. Thank you.
The config file lists the sample count of the dataset as 220M and a global batch size of 2048, which equates to ~107K steps per epoch. The main README says the total number of training steps is 95K, which means epoch 1 is not finished. However, the training chronicles suggest more than one epochs of training.
What is the number of epoch for the final training and what am I missing?
I was super excited to hear about this project! I was wondering if the model is available anywhere?
In the chronicles of tr1-13B-base it says at the end: "All checkpoints converted to HF format and uploaded to HUB.", which I thought meant that it is available on Huggingface, but I can't seem to find it.
Is it available and I'm just not able to find it, or did I misunderstand and it's not available?
The 1.3B-Pile@300B model is quite strong:
https://docs.google.com/spreadsheets/d/1CI8Q9RCblLRzUOPJ6ViqBmo284-8ojluQ-CmaEuhuv0/edit#gid=1295801165
lambada 0.6088 piqa 0.7160 hellaswag 0.5209 --> these are all better than gpt-neo 1.3B.
Could you share the model? Thank you.
I am reading the content in https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/README.md#global-batch-size, but I don't quite understand some of the numbers. Can someone help me explain it?
such as:"So it'll take several days of very inefficient run. We know we get 113 TFLOPs at iteration 512, and since PP=12 and MBS=2, only at 384 12216 it'll be the first time all pipeline stages will be filled and that's when the performance should be much better, probably around 90 TFLOPs."
I can't understand why need 384(12*2*16) all pipeline stages will be filled
What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom
Why is the value of Zero-State 0 when deepspeed is enabled in the Bloom training script? Can the Bloom model be trained and the loss curve is aligned when deepspeed is disabled? Thanks very much.
DEEPSPEED_ARGS=" \
--deepspeed \
--deepspeed_config ${config_json} \
--zero-stage ${ZERO_STAGE} \
--deepspeed-activation-checkpointing \
"
I noticed you evaluated the opt-175B model, how did it convert to a Megatron-Deepspeed checkpoint? I can not find a 175B huggingface transformers checkpoint. Also, I can not successfully convert the opt-66B checkpoint. @thomasw21 Thanks for any reply!
What are the minimum requirements regarding RAM and GPU memories for performing only inferences over the
Bloom model?
I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1.
Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?
Nodes | Size | ZS | DP | TP | PP | MBS | GBS | Mem | Sec/it | TFLOPs | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|
48 | 181B | 1 | 4 | 8 | 12 | 2 | 2048 | 37GB | 120.29 | 134.02 | 02-21 |
48 | 181B | 0 | 4 | 8 | 12 | 2 | 2048 | 72GB | 137.34 | 113.02 | 02-21 |
Hi @TevenLeScao,
I think there are some confusing and broken link in the mC4 data preprocessing section. Can you take a look?
Both of the links are broken here,
The original link should be,
In addition to that, the multinomial data processing code to create the different language splits are in this pull request, bigscience-workshop/Megatron-DeepSpeed#9
Here's few things,
For reference purpose, if you want to keep the code, I'm happy to open a pull request here. If not I'll close the pull request from bigscience/Megatron-Deepspeed repo.
Let me know what do you think.
Hello,
The final model config seems to be pointing toward the wrong tokenizer :
@thomasw21 notified me this one was used for testing purpose only since there is already an existing dataset tokenized with this tokenizer.
This issue tracks the fact that in a later stage this should be changed to :
--tokenizer-name-or-path bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles \
Hey,
pinging @stas00
I'm a researcher from Tel-Aviv University and were thinking about implementing QOS, similar to what you have with the Jean Zay cluster.
It would be really helpful to see the slurm.conf you are using for your QOS setting.
Thanks!
Ohad
Back up these folders later today to STORE
/gpfswork/rech/six/commun/bigscience-training/merged-meg-ds_v2
Hello, the evaluation script of bloom-7b1 is found in the repo, is evaluation/results/tr11/scripts/run_trevalharness_7b1.slurm, but the training script of bloom-7b1 is not found. Can you share the bloom-7b1 training script?
Thank you very much.
37 CATALOGUE_JSON_PATH=$BIGSCIENCE_REPO/data/catalogue/training_dataset_ratios_merged_nigercongo_v3.json
How to get the datasets for 1B3 version? I can not find a script in https://github.com/bigscience-workshop/bigscience/tree/master/data. Could you give me some suggestions?
How to get train-splits.txt and valid-splits.txt at Line39 in train/tr11-176B-ml/tr11-176B-ml.slurm. Thx.
TRAIN_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/data/train-splits.txt
VALID_DATA_PATH=$MEGATRON_DEEPSPEED_REPO/data/valid-splits.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.