Coder Social home page Coder Social logo

Comments (7)

ksenia007 avatar ksenia007 commented on June 9, 2024 2

I think I've figured out the issue! For me the error was message was generated here and was coming from the loading of the vocab files. By default, vocab_files are in fact links to the files and the server would not allow me to download files from running code. If you download the vocab files separately, and then provide path to the file instead of the dna6 it seems to work!

from dnabert.

BioSenior avatar BioSenior commented on June 9, 2024 1

It works!Thank you very much!

from dnabert.

ksenia007 avatar ksenia007 commented on June 9, 2024

I have the exactly the same problem, runs fine on workstation but not on the server and gives the same error!

from dnabert.

ryao-mdanderson avatar ryao-mdanderson commented on June 9, 2024

Hello @ksenia007 and @BioSenior 👍

Thank you for sharing.
I have the same error message when I ran 'python run_pretrain.py' followed in README.
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6).

If I search vocab file in my DNABERT directory, I have the followings:
[ ~DNABERT]$ find . -name vocab*
./examples/ft/6/vocab.txt
./examples/ft/6/pre/vocab.txt
./examples/ft/6/pre_2_old/vocab.txt
./examples/ft/6-bk/vocab.txt
./src/transformers/dnabert-config/bert-config-6/vocab.txt
./src/transformers/dnabert-config/bert-config-4/vocab.txt
./src/transformers/dnabert-config/bert-config-5/vocab.txt
./src/transformers/dnabert-config/bert-config-3/vocab.txt

May you please advise what to change in the commands to go through this error?

from dnabert.

ksenia007 avatar ksenia007 commented on June 9, 2024

@ryao-mdanderson I am not sure if you have the same problem. However, I believe that if you specify just dna6 as a tokenizer name, it tries to load vocab.txt from these links and not access files from the source folder. For me, I downloaded the vocab.txt file into my data folder using wget, and then in tokenizer_name just passed path/to/directory/vocab.txt.

Sorry if that does not help in your case!

from dnabert.

ryao-mdanderson avatar ryao-mdanderson commented on June 9, 2024

@ksenia007 👍

Thank you very much. I got it. since I am running the code in a compute cluster node which does not have internet access, I followed your suggestion, change in tokenizer_name by passing the path/to/directory/vocab.txt. It works now.

from dnabert.

smruti241 avatar smruti241 commented on June 9, 2024

@ksenia007, @ryao-mdanderson, @jerryji1993 I am getting the same error using run_pretrain.py script and I tried the same solution but didnt work at all. The error is given below:
<class 'transformers.tokenization_dna.DNATokenizer'>
Traceback (most recent call last):
File "examples/run_pretrain.py", line 885, in
main()
File "examples/run_pretrain.py", line 789, in main
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 377, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 479, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me regarding this?

from dnabert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.