Comments (7)
I think I've figured out the issue! For me the error was message was generated here and was coming from the loading of the vocab files. By default, vocab_files
are in fact links to the files and the server would not allow me to download files from running code. If you download the vocab files separately, and then provide path to the file instead of the dna6
it seems to work!
from dnabert.
It works!Thank you very much!
from dnabert.
I have the exactly the same problem, runs fine on workstation but not on the server and gives the same error!
from dnabert.
Hello @ksenia007 and @BioSenior 👍
Thank you for sharing.
I have the same error message when I ran 'python run_pretrain.py' followed in README.
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6).
If I search vocab file in my DNABERT directory, I have the followings:
[ ~DNABERT]$ find . -name vocab*
./examples/ft/6/vocab.txt
./examples/ft/6/pre/vocab.txt
./examples/ft/6/pre_2_old/vocab.txt
./examples/ft/6-bk/vocab.txt
./src/transformers/dnabert-config/bert-config-6/vocab.txt
./src/transformers/dnabert-config/bert-config-4/vocab.txt
./src/transformers/dnabert-config/bert-config-5/vocab.txt
./src/transformers/dnabert-config/bert-config-3/vocab.txt
May you please advise what to change in the commands to go through this error?
from dnabert.
@ryao-mdanderson I am not sure if you have the same problem. However, I believe that if you specify just dna6
as a tokenizer name, it tries to load vocab.txt
from these links and not access files from the source folder. For me, I downloaded the vocab.txt
file into my data
folder using wget
, and then in tokenizer_name
just passed path/to/directory/vocab.txt
.
Sorry if that does not help in your case!
from dnabert.
Thank you very much. I got it. since I am running the code in a compute cluster node which does not have internet access, I followed your suggestion, change in tokenizer_name by passing the path/to/directory/vocab.txt. It works now.
from dnabert.
@ksenia007, @ryao-mdanderson, @jerryji1993 I am getting the same error using run_pretrain.py script and I tried the same solution but didnt work at all. The error is given below:
<class 'transformers.tokenization_dna.DNATokenizer'>
Traceback (most recent call last):
File "examples/run_pretrain.py", line 885, in
main()
File "examples/run_pretrain.py", line 789, in main
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 377, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/smrutip/smruti/DNABERT/src/transformers/tokenization_utils.py", line 479, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Can you please help me regarding this?
from dnabert.
Related Issues (20)
- How can I create my own processor? HOT 1
- There is a bug about attention mask in source code
- Importing error of Transformers HOT 4
- How to get the high attention regions of a given sequence.
- AssertionError in kmer2seq for motif search
- attention maps generated in pre-training stage or fine-turning stage
- Pretraining error
- benchmark for the time and computation cost during the fine-tuning
- Shape of atten.npy
- install packages using pip HOT 2
- the seq longer than 512
- Unable to get motif image HOT 3
- Request for Detailed Information on Training the Tokenizer HOT 1
- Release pretraining data?
- what is the masking ratio
- Installation Issues.
- How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process HOT 2
- Changing max_seq_length does not update max_length in config.json
- early_stop not being triggered?
- How Can I track model loss and accuracy of each epoch during fine-tuning, to make sure model is stable?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dnabert.