Comments (16)
I copy @yaschenk and @kwrodarmer on this issue. Many users run HISAT2 with the sra-acc option successfully, but occasionally some seem to encounter the above problem (i.e., VCursorCellDataDirect failed). The HISAT2 binaries were built using ngs-sdk.1.1.1 (now I noticed a newer version, 1.2.2, was released about a month ago). Is this something that has been fixed or problems inherent due to data transfer over the internet? BTW, I'll modify HISAT2 to output a warning instead of terminating silently for this runtime error.
from hisat2.
There have been no changes to address an error of this sort. We'll need to debug. Could you provide more information, such as the command line that failed?
from hisat2.
I'll run it again tonight, but last time was just a straight up --sra-acc with no other parameters with the standard hg19 genome, piping to samtools to make a bam. I'll test if it still happens and if it's
somehow specific to some SRAs. I can tell you that it was not an internet connection on my side (might have been server side).
On 19.10.2015 19:32, kwrodarmer wrote:
There have been no changes to address an error of this sort. We'll
need to debug. Could you provide more information, such as the command
line that failed?—
Reply to this email directly or view it on GitHub
https://github.com/infphilo/hisat2/issues/5#issuecomment-149290561.
from hisat2.
It is exactly the specific input accessions that I'd like in order to be able to duplicate the problem on our end. Without it, there is no way we can assist in debugging the problem.
from hisat2.
SRR1203781 was the one I tested with, did not try others when I encountered problems
from hisat2.
Thanks - that's exactly what I needed. Okay, we'll start debugging.
from hisat2.
On my end I had it complete a run successfully, so maybe sometimes the servers at the SRA end fail and cause a premature termination. It would be nice if that happens if the program terminates with an error (not a warning) and ideally would try before that to resume the connection.
from hisat2.
We can investigate the network issues with our systems group. Please send network and execution time information to [email protected] and we will check our logs for anything specific. Please note, however, that we already checked for errors at or around the time you reported the problem and did not find anything suspicious. Still, with more accurate information we may be able to do more.
Meanwhile, we're trying to duplicate your results.
from hisat2.
I think it might be related to our computing cluster environment, the runs finish from my desktop, but randomly finish from the error above at some point when I run them from the cluster. Running more tests today, will report later.
from hisat2.
As it (still) fails only on the cluster, I'm wondering if it could be related to a disk space issue for the cache of sra-tools. I think it defaults to the current user's home directory, which in my case is limited (I'm running the script on our /scratch space, where disk space is not an issue). Looking for a way to change it right now, will report if it fixes the problem.
from hisat2.
Hypothesis was right, the default cache of sra-tools was set to $HOME/ncbi which is space limited on the cluster, made a symbolic link to our storage space and now it completes fully.
from hisat2.
By default, our configuration places its cache in the user's home directory. This can be changed using the tool vdb-config from the SRA toolkit. But if you are running with a good internet connection (and especially if you are running on a compute cluster), it is probably a better idea to disable user caching at all.
Caching will help with random access patterns to avoid retrieving the same portion of the SRA file multiple times, and it will help if you running multiple passes over the same file. Using a cluster can cause a small but important access conflict with some of the reference sequences if the cache is shared, which is the case whenever you use default location in $HOME.
So far, we have not been able to reproduce your results, but they would be consistent with running out of disk space, since quality values take up the bulk of SRA storage.
from hisat2.
Here is a link to our configuration page: https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration
from hisat2.
This is quite useful, thanks guys! I'll add some additional description to --sra-acc option on the HISAT2 Website (the manual page) so that people know how to disable the cache especially when they use a cluster. I'll also modify HISAT2 to terminate with an error message in case of a connection error or something else. I may modify HISAT2 to retry one more time before giving up and terminating.
from hisat2.
We're going to make some improvements to work harder to continue in the face of this type of error. While we clearly detect the problem when it occurs, it may not be the best decision in this case to throw it back to HISAT2. Instead, we could just invalidate the cache and continue serving directly from the network. So we'll try to do more on our end, too.
from hisat2.
Thanks everyone, consider this fixed on my end, so closing the issue.
from hisat2.
Related Issues (20)
- Hisat2 [Errno 2] No such file or directory
- Align ATAC-seq with Hisat2?
- error minimum intron length with hisat2 v. 2.2.1
- Repeat mapping with different result
- Feature request: Add support for xz and zstd
- hisat2 hangs aligning axolotl reads HOT 1
- Output files(.snp, .haplotype) of hisat2_extract_snps_haplotypes_*.py are empty
- Please add the pbat option of hisat-3n
- A question about methylation information extraction
- Any plans to support Apple Silicon architecture?
- Installation Issue Error 1 - make HOT 1
- -np argument seemingly not working
- ERR): "fastq file.fastq" does not exist. Exiting now ...
- [Bug Report] hisat2-align exited with value 137, space complexity of hisat2
- hisat2 location does not exist
- Hisat-3N mapping quality
- hisat2-build index for circRNA-seq
- hisat2-build failed for Segmentation fault
- [Future request] hisat-3n table option to report conversions summarized to genomic feature or reads counts
- Issue with hisatgenotype HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hisat2.