Coder Social home page Coder Social logo

colab: preprocessing (RE-DDI) about biogpt HOT 3 CLOSED

microsoft avatar microsoft commented on July 28, 2024
colab: preprocessing (RE-DDI)

from biogpt.

Comments (3)

karkeranikitha avatar karkeranikitha commented on July 28, 2024

@raven44099 I was getting the same error. Please make sure all the environment variables are set properly. Verify the variables once you declare them

from biogpt.

ShilpaSangappa avatar ShilpaSangappa commented on July 28, 2024

Yes. Do an echo on the shell prompt to check if the variables are set:

echo $MOSES
echo $FASTBPE

You could also put the paths in .bashrc, like this:

export MOSES="/home/ubuntu/BioGPT/mosesdecoder"
export FASTBPE="/home/ubuntu/BioGPT/fastBPE"

from biogpt.

raven44099 avatar raven44099 commented on July 28, 2024

Yes, the variables were not set!

Thank you very much! I solved it by adding:

%env MOSES=/content/BioGPT/mosesdecoder
%env FASTBPE=/content/BioGPT/fastBPE

Now the output looks different, I hope its correct. I'm not sure because I couldn't run BioGPT on RE-DDI task yet.

New output

hard_match_evaluation.py  postprocess.py  README.md	   train.sh
infer.sh		  preprocess.sh   rebuild_data.py
Following PMID in ../../data/DDI/raw/train.json has no extracted triples:
DDI-DrugBank.d519 DDI-MedLine.d18 DDI-DrugBank.d491 DDI-MedLine.d4 DDI-DrugBank.d134 DDI-DrugBank.d230 DDI-DrugBank.d259 DDI-DrugBank.d293 DDI-MedLine.d64 DDI-MedLine.d100 DDI-DrugBank.d295 DDI-DrugBank.d402 DDI-MedLine.d101 DDI-DrugBank.d190 DDI-MedLine.d140 DDI-MedLine.d112 DDI-MedLine.d9 DDI-DrugBank.d301 DDI-DrugBank.d128 DDI-DrugBank.d101 DDI-DrugBank.d28 DDI-DrugBank.d376 DDI-MedLine.d28 DDI-DrugBank.d93 DDI-MedLine.d88 DDI-DrugBank.d539 DDI-DrugBank.d525 DDI-DrugBank.d540 DDI-DrugBank.d461 DDI-MedLine.d132 DDI-DrugBank.d360 DDI-MedLine.d43 DDI-MedLine.d121 DDI-DrugBank.d262 DDI-DrugBank.d164 DDI-DrugBank.d534 DDI-DrugBank.d385 DDI-DrugBank.d408 DDI-MedLine.d96 DDI-DrugBank.d285 DDI-DrugBank.d473 DDI-MedLine.d57 DDI-DrugBank.d557 DDI-DrugBank.d161 DDI-DrugBank.d24 DDI-DrugBank.d67 DDI-DrugBank.d490 DDI-DrugBank.d421 DDI-MedLine.d65 DDI-DrugBank.d342 DDI-DrugBank.d264 DDI-MedLine.d10 DDI-DrugBank.d312 DDI-MedLine.d117 DDI-MedLine.d135 DDI-DrugBank.d255 DDI-DrugBank.d390 DDI-DrugBank.d68 DDI-MedLine.d11 DDI-MedLine.d14 DDI-MedLine.d75 DDI-DrugBank.d541 DDI-DrugBank.d118 DDI-MedLine.d50 DDI-DrugBank.d218 DDI-DrugBank.d370 DDI-DrugBank.d201 DDI-DrugBank.d244 DDI-MedLine.d138 DDI-MedLine.d33 DDI-DrugBank.d553 DDI-DrugBank.d125 DDI-DrugBank.d366 DDI-DrugBank.d147 DDI-MedLine.d71 DDI-DrugBank.d363 DDI-MedLine.d32 DDI-MedLine.d76 DDI-DrugBank.d290 DDI-MedLine.d38 DDI-MedLine.d77 DDI-DrugBank.d80 DDI-DrugBank.d27 DDI-MedLine.d120 DDI-DrugBank.d52 DDI-DrugBank.d302 DDI-DrugBank.d486 DDI-DrugBank.d472 DDI-MedLine.d6 DDI-MedLine.d123 DDI-DrugBank.d173 DDI-DrugBank.d570 DDI-DrugBank.d126 DDI-DrugBank.d156 DDI-MedLine.d13 DDI-MedLine.d91 DDI-DrugBank.d349 DDI-DrugBank.d436 DDI-DrugBank.d300 DDI-DrugBank.d432 DDI-MedLine.d52 DDI-DrugBank.d554 DDI-MedLine.d19 DDI-DrugBank.d109 DDI-DrugBank.d63 DDI-DrugBank.d168 DDI-DrugBank.d37 DDI-DrugBank.d50 DDI-DrugBank.d455 DDI-DrugBank.d70 DDI-MedLine.d48 DDI-DrugBank.d515 DDI-DrugBank.d406 DDI-MedLine.d127 DDI-MedLine.d22 DDI-DrugBank.d418 DDI-MedLine.d78 DDI-MedLine.d80 DDI-MedLine.d129 DDI-DrugBank.d61 DDI-DrugBank.d524 DDI-DrugBank.d189 DDI-MedLine.d92 DDI-DrugBank.d6 DDI-DrugBank.d278 DDI-MedLine.d66 DDI-DrugBank.d383 DDI-MedLine.d15 DDI-MedLine.d60 DDI-MedLine.d31 DDI-MedLine.d58 DDI-MedLine.d137 DDI-DrugBank.d555 DDI-DrugBank.d58 DDI-DrugBank.d433 DDI-DrugBank.d375 DDI-DrugBank.d102 DDI-DrugBank.d268 DDI-DrugBank.d391 DDI-MedLine.d83 DDI-DrugBank.d243 DDI-DrugBank.d119 DDI-DrugBank.d49 DDI-MedLine.d139 DDI-DrugBank.d513 DDI-DrugBank.d451 DDI-DrugBank.d38 DDI-DrugBank.d182 DDI-MedLine.d118 DDI-DrugBank.d319 DDI-MedLine.d141 DDI-MedLine.d70 DDI-MedLine.d109 DDI-MedLine.d98 DDI-DrugBank.d214 DDI-DrugBank.d193 DDI-DrugBank.d152 DDI-MedLine.d40 DDI-DrugBank.d535 DDI-DrugBank.d167 DDI-MedLine.d108 DDI-DrugBank.d445 DDI-DrugBank.d235 DDI-DrugBank.d317 DDI-DrugBank.d251 DDI-DrugBank.d496 DDI-DrugBank.d117 DDI-DrugBank.d203 DDI-DrugBank.d532 DDI-DrugBank.d361 DDI-DrugBank.d294 DDI-MedLine.d37 DDI-MedLine.d72 DDI-MedLine.d95 DDI-DrugBank.d280 DDI-MedLine.d26 DDI-MedLine.d74 DDI-DrugBank.d407 DDI-DrugBank.d343 DDI-DrugBank.d209 DDI-DrugBank.d159 DDI-DrugBank.d239 DDI-DrugBank.d155 DDI-DrugBank.d474 DDI-DrugBank.d271 DDI-DrugBank.d403 DDI-DrugBank.d447 DDI-MedLine.d136 DDI-DrugBank.d90 DDI-DrugBank.d136 DDI-MedLine.d41 DDI-DrugBank.d292 DDI-DrugBank.d1 DDI-DrugBank.d92 DDI-DrugBank.d127 
664 samples in ../../data/DDI/raw/train.json has been processed with 195 samples has no triples extracted.
Following PMID in ../../data/DDI/raw/valid.json has no extracted triples:
DDI-DrugBank.d348 DDI-DrugBank.d520 DDI-DrugBank.d248 DDI-MedLine.d122 DDI-MedLine.d103 DDI-MedLine.d35 DDI-MedLine.d24 DDI-DrugBank.d169 DDI-DrugBank.d221 
50 samples in ../../data/DDI/raw/valid.json has been processed with 9 samples has no triples extracted.
191 samples in ../../data/DDI/raw/test.json has been processed with 0 samples has no triples extracted.
Preprocessing train
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.x ...
Read 116252 words (7707 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.x ...
Modified 116252 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.y ...
Read 34391 words (1364 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.y ...
Modified 34391 words from text file.
Preprocessing valid
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.x ...
Read 10902 words (1974 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.x ...
Modified 10902 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.y ...
Read 2976 words (266 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.y ...
Modified 2976 words from text file.
Preprocessing test
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.x ...
Read 30412 words (4124 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.x ...
Modified 30412 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.y ...
Read 9094 words (703 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.y ...
Modified 9094 words from text file.
2023-02-17 08:08:05 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | Namespace(aim_repo=None, aim_run_hash=None, align_suffix=None, alignfile=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../../data/DDI/relis-bin', dict_only=False, empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, on_cpu_convert_precision=False, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, source_lang='x', srcdict='../../data/DDI/raw/dict.txt', suppress_crashes=False, target_lang='y', task='translation', tensorboard_logdir=None, testpref='../../data/DDI/raw/relis_test.tok.bpe', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='../../data/DDI/raw/relis_train.tok.bpe', use_plasma_view=False, user_dir=None, validpref='../../data/DDI/raw/relis_valid.tok.bpe', wandb_project=None, workers=8)
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_train.tok.bpe.x: 469 sents, 139695 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_valid.tok.bpe.x: 41 sents, 12789 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_test.tok.bpe.x: 191 sents, 36514 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_train.tok.bpe.y: 469 sents, 41376 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_valid.tok.bpe.y: 41 sents, 3472 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_test.tok.bpe.y: 191 sents, 11107 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | Wrote preprocessed data to ../../data/DDI/relis-bin

from biogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.