Hi there, I am running the DNABERT run_finetune.py as instructed by

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Model name 'dna6' was not found in tokenizers model name list about dnabert HOT 7 OPEN

jerryji1993 commented on June 9, 2024 3

Model name 'dna6' was not found in tokenizers model name list

from dnabert.

Comments (7)

ksenia007 commented on June 9, 2024 2

I think I've figured out the issue! For me the error was message was generated here and was coming from the loading of the vocab files. By default, vocab_files are in fact links to the files and the server would not allow me to download files from running code. If you download the vocab files separately, and then provide path to the file instead of the dna6 it seems to work!

from dnabert.

BioSenior commented on June 9, 2024 1

It works！Thank you very much!

from dnabert.

ksenia007 commented on June 9, 2024

I have the exactly the same problem, runs fine on workstation but not on the server and gives the same error!

from dnabert.

ryao-mdanderson commented on June 9, 2024

Hello @ksenia007 and @BioSenior 👍

Thank you for sharing.
I have the same error message when I ran 'python run_pretrain.py' followed in README.
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6).

If I search vocab file in my DNABERT directory, I have the followings:
[ ~DNABERT]$ find . -name vocab*
./examples/ft/6/vocab.txt
./examples/ft/6/pre/vocab.txt
./examples/ft/6/pre_2_old/vocab.txt
./examples/ft/6-bk/vocab.txt
./src/transformers/dnabert-config/bert-config-6/vocab.txt
./src/transformers/dnabert-config/bert-config-4/vocab.txt
./src/transformers/dnabert-config/bert-config-5/vocab.txt
./src/transformers/dnabert-config/bert-config-3/vocab.txt

May you please advise what to change in the commands to go through this error?

from dnabert.

ksenia007 commented on June 9, 2024

@ryao-mdanderson I am not sure if you have the same problem. However, I believe that if you specify just dna6 as a tokenizer name, it tries to load vocab.txt from these links and not access files from the source folder. For me, I downloaded the vocab.txt file into my data folder using wget, and then in tokenizer_name just passed path/to/directory/vocab.txt.

Sorry if that does not help in your case!

from dnabert.

ryao-mdanderson commented on June 9, 2024

@ksenia007 👍

Thank you very much. I got it. since I am running the code in a compute cluster node which does not have internet access, I followed your suggestion, change in tokenizer_name by passing the path/to/directory/vocab.txt. It works now.

from dnabert.

smruti241 commented on June 9, 2024

@ksenia007, @ryao-mdanderson, @jerryji1993 I am getting the same error using run_pretrain.py script and I tried the same solution but didnt work at all. The error is given below:
<class 'transformers.tokenization_dna.DNATokenizer'>
Traceback (most recent call last):
File "examples/run_pretrain.py", line 885, in
main()
File "examples/run_pretrain.py", line 789, in main
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 377, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 479, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me regarding this?

from dnabert.

Model name 'dna6' was not found in tokenizers model name list about dnabert HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent