Comments (13)
I am also seeing the ST_KV_DATABASE_EXCEPTION error, running with singularity 2.6.1 and latest version of progressiveCactus on the same test data.
Specifically, I see a lot of network errors, e.g.,
(py27) [tsackton@bioinf02 cactus]$ grep "Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: " *.log
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:W/u/jobMAMhgY Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/W/jobcYd5p3 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/W/jobcYd5p3 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/W/jobcYd5p3 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/W/jobcYd5p3 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/W/jobcYd5p3 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:0/K/jobT_Nt8l Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:Y/4/jobT1BXlw Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:8/b/jobjS_1G2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.184 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:R/Y/jobziuk_f Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:J/O/jobZkDBNH Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:M/U/jobPEGnpz Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.130.249 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:T/k/jobH39D4e Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.128.189 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
logfile_slurm_3.log:WARNING:toil.leader:X/x/jobuUaQQ2 Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 10.31.133.81 with error: network error
The jobs that fail with this error are not always SavePrimaryDB, and some jobs succeed, so it seems to have to do with whether the host that the ktserver is running on is reachable from the specific compute node a later job lands on.
Is there any way to control the node that the ktserver lands on, e.g. force it to run on the high memory machine the job is launched from? In previous versions of progressiveCactus I've managed this by tinkering with the bigBatchSystem options (to force the ktserver job to use the bigBatchSystem), but it is not clear if this is still an option with the new Toil version.
from cactus.
Hmm, unfortunately toil (the descendant of the old jobTree framework we used to use) doesn't have the bigBatchSystem
hack that jobTree did. Sadly there isn't really a way that I know of to control which host the ktserver process gets launched on.
It should be somewhat possible to put the same hack into toil. There is a "local" batch system function that basically reintroduced a very similar thing to handle small CWL housekeeping jobs. Currently it's hardcoded to only run those jobs, but it could be made customizable without too much hassle.
Re: the ports possibly being off, that's interesting... maybe Singularity is doing some sort of NAT? As written, the DB code expects the address/port combination that it binds to to be reachable from all workers.
from cactus.
I'm not sure about the ports (thus removed that comment)... digging through so many scattered logs and temp directories is confusing. I'll put more info if/when I can confirm anything.
On my system at least, there should be no issues connecting from one node to another, so I'm a bit confused about the errors. Any ideas as to how we can possibly get some more detail on this problem? Or perhaps a some easier way to test/debug the ktserver stuff? I'm thinking something like running a simple ktserver instance and issuing some commands to see what does/doesn't work.
from cactus.
OK, a bit more testing has revealed two separate issues. I am now simple trying to run the example workflow on our interactive node and it gets "stuck" on the SavePrimaryDB
step every time. No errors that I can find, but things just stop. See attached logfile: cactus_localrun.log.gz
The other issue occurs when I run using SLURM. That's were I run into ST_KV_DATABASE_EXCEPTION
, even when trying to connect on localhost. Nodes on my system should be able to communicate with one another, so I'm not sure what the problem actually is.
from cactus.
I seem to be able to get things to complete (with various workarounds for Singularity 3) now. However, whenever I set --logLevel DEBUG
or --logDebug
I run into this problem. This leads me to believe that it is not an actual network error, but instead some invalid argument formatting that is causing a non-zero exit code from a process, which is incorrectly interpreted as a network error.
@tsackton Could you try running without turning on --logLevel DEBUG
and see if you can get things to complete? Thanks.
from cactus.
Oddly enough, I seem to be able to rerun the pipeline with debug on, I still got errors, but this time it completed successfully (eventually). This appears to be something that happens occasionally. Is there some non-deterministic part of this workflow? Also, any thoughts as to why I see this only with DEBUG on? Is that because the error is silenced otherwise?
from cactus.
Continued testing shows that I get this error even when the server is on the same host:
Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 127.0.0.1 with error: network error
This occurs with debug off, at least on some runs and at least on some cluster systems. At this point, I am not able to replicate runs even on the same system when using the example data.
@joelarmstrong Is there some way to set a random seed to ensure any non-deterministic parts of the workflow run the same? The inability to get the same errors or results from one run to the next is disconcerting, and makes debugging quite difficult.
from cactus.
On at leaast one of our clusters, the code used to determine a "public IP" doesn't work and ends up returning 127.0.0.1 which inevitably fails. I'm not sure of the appropriate fix at the moment, except that it seems it would be necessary to be able to choose a method that would work for the given grid environment as the current approach doesn't generalize well.
cactus/src/cactus/pipeline/ktserverControl.py
Lines 239 to 257 in d9039e7
from cactus.
@lparsons and @tsackton, did you ever get this fixed somehow? I'm having a similar issue on a SGE cluster while running the evolverMammals example: #63.
Things run fine on a single node, but fail with Exception: ST_KV_DATABASE_EXCEPTION: Opening connection to host: 172.16.13.37 with error: network error
when running distributed on an SGE queue. The cluster nodes should be able to communicate with each other, so I'm not sure what could cause the connection to fail.
I'm using a local Cactus install, as Docker and Singularity aren't supported by the cluster.
from cactus.
I ended up with a couple of different solutions to this. The first was a simple firewall rule that was preventing communication (despite assurances communication was allowed ;-) ).
The second was resolved by the system admins who kinda changed the cluster configuration to allow the getPublicIP
code to work. I'd prefer to get a more robust (or at least configurable) method for the getPublicIP
, but I don't have any great options at the moment.
from cactus.
@lparsons Thanks a lot for the reply. Do you remember anything more specific about the firewall rule that was preventing communication? I've gotten in touch with the cluster admin and I suspect that it may be a similar issue, so any further clues you could provide would be much appreciated.
from cactus.
The issue on the cluster where the getPublicIP
did work, but I still got the error (which seems more similar to your case) was a problem with iptables
. The fix implemented was to turn off iptables
completely on the nodes (which are only on the private cluster network).
from cactus.
I believe I have addressed all of the reasons that this error occurred for us. See #60 and #67 for additional info.
from cactus.
Related Issues (20)
- How to use Minigraph-cactus pan genome to poplualtion analysis HOT 2
- Mingraph-Cactus final GFA file format HOT 1
- Constructing pan-genome HOT 5
- Minigraph-cactus for grep the none-reference HOT 1
- toil.jobStores.abstractJobStore.NoSuchFileException: File 'rootJobReturnValue' does not exist. HOT 3
- Cactus failing invert_alignments for the provided example, need help troubleshooting HOT 1
- Job failed with exit value 1: 'progressive_workflow'
- I want to follow Minigraph-Cactus paper to make Figure 4 and Supplementary Figure 15 HOT 4
- Primate example HOT 4
- Job uses more disk than requested with singularity HOT 6
- msa_view seems to be missing from the cactus:v2.7.0 docker image HOT 1
- Error; possibility of Omitting 'sanitize-fasta-header' HOT 5
- The cacuts operation has stopped, but there is no error message recorded in the log file. HOT 1
- "reference/target" coordinates in "query" columns in chain file converted from hal? And about batch mode HOT 1
- cactus-pangenome error : OSError: [Errno 7] Argument list too long: 'cactus_consolidated' HOT 3
- Single reference output when using short reads with high similarity HOT 2
- progressive cactus and pan-genomes HOT 1
- Hi, I was testing four tropical crops, and I ran into the following problem
- Hi, I had the following problem while testing four crops.。 HOT 5
- InsufficientSystemResources Too much memory is requested HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cactus.