Comments (2)
Hi @ryao-mdanderson,
Thanks for reporting this issue. Please kindly ensure that you have properly installed all required package dependencies before training the model, as the error message suggests that one or more module is missing.
Let me know if there are additional questions.
Best,
Jerry
from dnabert.
Hello @jerryji1993 :
I fixed the missing python modules. Thank you very much.
However, I hit a new error message in testing 2.2 Model training as the following.
Just FYI, I ran the test in HPC compute node, the node does not have internet access. May you please kindly direct me how to fix missing dna6 module name?
<class 'transformers.tokenization_dna.DNATokenizer'>
Traceback (most recent call last):
File "run_pretrain.py", line 885, in
main()
File "run_pretrain.py", line 789, in main
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
File "/risapps/noarch/dnabert/20210826/src/transformers/tokenization_utils.py", line 377, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/risapps/noarch/dnabert/20210826/src/transformers/tokenization_utils.py", line 479, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
from dnabert.
Related Issues (20)
- How can I create my own processor? HOT 1
- There is a bug about attention mask in source code
- Importing error of Transformers HOT 4
- How to get the high attention regions of a given sequence.
- AssertionError in kmer2seq for motif search
- attention maps generated in pre-training stage or fine-turning stage
- Pretraining error
- benchmark for the time and computation cost during the fine-tuning
- Shape of atten.npy
- install packages using pip HOT 2
- the seq longer than 512
- Unable to get motif image HOT 3
- Request for Detailed Information on Training the Tokenizer HOT 1
- Release pretraining data?
- what is the masking ratio
- provided example does not use GPU HOT 1
- How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process HOT 2
- Changing max_seq_length does not update max_length in config.json
- early_stop not being triggered?
- How Can I track model loss and accuracy of each epoch during fine-tuning, to make sure model is stable?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dnabert.