Dear Partis team, When running partition on one of my fasta files, P

ok <a href="https://github.com/psathyrella/partis/commit/f0c7f1a23dbcaf01a5e48b93a0dca

Partis Killed - no other error message about partis HOT 10 CLOSED

lindsly commented on July 17, 2024

Partis Killed - no other error message

from partis.

Comments (10)

psathyrella commented on July 17, 2024

I would suspect that the "Killed" indicates it got a signal from stdin equivalent to a ctrl-C or something, although I have no experience running things on windows so I'm not sure what could have caused that. I mean in theory i'd imagine you're entirely in docker so the fact you're on windows wouldn't matter, but in practice maybe mouse copy/pasting sent it a term signal or something. The "spent much longer" warning is quite informative about why it was slow, although I don't know if that's related to it being killed. This is saying that the main python proc spit out 12 sub procs to run bcrham, and each of those reported taking only a couple hundred seconds, whereas the whole step of writing their input and processing their output took 1150 seconds, which suggests maybe your i/o is super slow, or maybe it's completely out of memory and swapping like crazy.

If that isn't it -- 24k sequences isn't very many, and while partition time is quite hard to predict since it depends so much on the repertoire structure, on 15 cores on a typical server I'd expect that to take less than an hour. It writes a "progress" file while it's doing the clustering steps that should give you a good idea how it's doing. Another thing you can do is run on a subsample just to test that it's finishing quickly + ok.

Yeah the plotting install is unfortunate, hopefully you found this. I'm actually just decided to finally make a separate multi step docker build and switch to quay so I can have a separate docker image for plotting, although that won't be finished for a bit.

from partis.

lindsly commented on July 17, 2024

Thank you for your reply! I tried running the same command again without ever touching the command prompt and I got the Killed message again. Also, I tested a ctrl+c interrupt and it outputs the following (just for your reference):

^CTraceback (most recent call last):
  File "./bin/partis", line 805, in <module>
    args.func(args)
  File "./bin/partis", line 261, in run_partitiondriver
    parter.run(actions)
  File "/partis/python/partitiondriver.py", line 125, in run
    self.action_fcns[tmpaction]()
  File "/partis/python/partitiondriver.py", line 522, in partition
    self.run_waterer(look_for_cachefile=not self.args.write_sw_cachefile, write_cachefile=self.args.write_sw_cachefile, count_parameters=self.args.count_parameters)  # run smith-waterman
  File "/partis/python/partitiondriver.py", line 198, in run_waterer
    self.set_vsearch_info(get_annotations=True)
  File "/partis/python/partitiondriver.py", line 247, in set_vsearch_info
    self.vs_info = utils.run_vsearch('search', seqs, self.args.workdir + '/vsearch', threshold=0.3, glfo=self.glfo, print_time=True, vsearch_binary=self.args.vsearch_binary, get_annotations=get_annotations, no_indels=self.args.no_indels)
  File "/partis/python/utils.py", line 4929, in run_vsearch
    run_cmds(cmdfos)
  File "/partis/python/utils.py", line 3509, in run_cmds
    time.sleep(per_proc_sleep_time)
KeyboardInterrupt

My output was set to a folder on my host system (outside of Docker) so that may be what's causing the slowdown. I am running it again now with the output set within docker and appears (so far) to be working. I will also monitor the memory usage as it goes.

Attempting to install plotting packages led to a lot dependency issues so I may just hold off on that for now.

from partis.

psathyrella commented on July 17, 2024

ok, great that's the same ctrl-c message I'm used to. Then I'd guess it's a memory issue, when I've gotten similar things it was the OS's out-of-memory killer killing it. I'm surprised if that's the problem, since it doesn't usually use much memory on 24k sequences, at least compared to what's on a box with 12 cores, but I don't know how much memory is really there. There's a lot of different ways to optimize/approximate for speed and memory though.

yeah, sorry about the dependency issues, I'll get the new docker image up as soon as I can. Meanwhile someone else kept track of how they got plotting working in docker last week, so this will likely fix things (the difference to what's in the manual i think is just the numpy update and the explicit list of bios2mds deps):

    apt-get install -y xorg libx11-dev libglu1-mesa-dev r-cran-rgl
    conda install -y -cr r-rgl r-essentials
    conda update -y numpy
    R --vanilla --slave -e 'install.packages(c("bios2mds","picante","pspline","deSolve","igraph","TESS","fpc","pvclust","corpcor","phytools","mvMORPH","geiger","mvtnorm","glassoFast","Rmpfr"), repos="http://cran.rstudio.com/")'
    mkdir -p packages/RPANDA/lib
    R CMD INSTALL -l packages/RPANDA/lib packages/RPANDA/

from partis.

lindsly commented on July 17, 2024

Some updates and another test:

I was unable to run the full fasta even with the modified output folder location (killed again). I was able to successfully partition the fasta using a subsample of 5000 sequences though so I'm guessing that memory is the issue. My machine has 32GB RAM available so I wouldn't expect for this to be a problem. Also, I have kept up task manager while running the full file and didn't notice any huge spikes in memory usage.

I also tried another file which is larger (47k sequences) to see if it was the fasta file itself or really a memory problem but before it had a chance to be killed, I got the following exception:

(base) root@af7131fba3e1:/partis# ./bin/partis partition --infname /host/home/Desktop/partis_fa_rc/lys14_rc.fasta --outfname lys14_partis_out/lys14-partition.yaml --n-procs 12 --species mouse --small-clusters-to-ignore 1-10 --paramete
r-dir lys14-full-parameter-dir
  non-human species 'mouse', turning on allele clustering
  parameter dir does not exist, so caching a new set of parameters before running action 'partition': lys14-full-parameter-dir
caching parameters
  vsearch: 46479 / 47054 v annotations (575 failed) with 183 v genes in 31.2 sec
    keeping 62 / 261 v genes
smith-waterman  (new-allele clustering)
  vsearch: 46444 / 47054 v annotations (610 failed) with 62 v genes in 62.9 sec
    running 12 procs for 47054 seqs
Traceback (most recent call last):
  File "./bin/partis", line 805, in <module>
    args.func(args)
  File "./bin/partis", line 261, in run_partitiondriver
    parter.run(actions)
  File "/partis/python/partitiondriver.py", line 125, in run
    self.action_fcns[tmpaction]()
  File "/partis/python/partitiondriver.py", line 264, in cache_parameters
    self.run_waterer(dbg_str='new-allele clustering')
  File "/partis/python/partitiondriver.py", line 221, in run_waterer
    waterer.run(cachefname if write_cachefile else None)
  File "/partis/python/waterer.py", line 108, in run
    self.read_output(base_outfname, len(mismatches))
  File "/partis/python/waterer.py", line 490, in read_output
    self.summarize_query(qinfo)  # returns before adding to <self.info> if it thinks we should rerun the query
  File "/partis/python/waterer.py", line 979, in summarize_query
    indelfo = self.combine_indels(qinfo, best)  # the next time through, when we're writing ig-sw input, we look to see if each query is in <self.info['indels']>, and if it is we pass ig-sw the indel-reversed sequence, rather than the <input_info> sequence
  File "/partis/python/waterer.py", line 1559, in combine_indels
    return indelutils.combine_indels(regional_indelfos, full_qrseq, qrbounds, uid=qinfo['name'], debug=debug)
  File "/partis/python/indelutils.py", line 645, in combine_indels
    raise Exception('%sqr_gap_seq non-gap length %d not the same as qrbound length %d in %s region indelfo' % ('%s: ' % uid if uid is not None else '', utils.non_gap_len(rfo['qr_gap_seq']), qrbounds[region][1] - qrbounds[region][0], region))
Exception: a43659fe-3301-40c5-93b2-cda064707bde: qr_gap_seq non-gap length 249 not the same as qrbound length 248 in v region indelfo

I saw that there was another issue in 2018 (link) with this same exception, but it looks like it was successfully addressed. Any ideas what may be going wrong? Here is the read that causes the exception:

>a43659fe-3301-40c5-93b2-cda064707bde
GTGACTGGAGTTCAGACGTGCTCTTCCGATCTGGGGACTTCAGTGAAGATGTCCTGTAAGGCTTCTGGATACACCTTCACTAACTACTGGATAGGTTAGCAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGATATTTACCCTGGAGGTGCTTATATTAACTACAATGAAGTTCAAGGGCAAGGCCACACTGACTGCAGACAAATCCTCCAGCACAGCCTCCATGCAGTTCAGCAGCCTGACATCTGAGGACTCTGCCATCTATTACTGTGCAAGAAAGAATTACTACGGTAATACCTACTTTGACTACCGGGGCCAAGGCACCACTCAGTCTCCTCAGCC

from partis.

lindsly commented on July 17, 2024

I wasn't sure exactly which log file to look through but here is the info from my latest run on the first full fasta (~24k sequences).

Command (used --n-procs 6 instead of 12 to see if that would help):
./bin/partis partition --infname /host/home/Desktop/partis_fa_rc/ova14_rc.fasta --outfname _output/ova14_output/ova14-partition.yaml --n-procs 6 --species mouse --small-clusters-to-ignore 1-10 --parameter-dir _output/ova14_output/parameter-dir

(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561# ls
cluster-path-progress  germline-sets  hmm_cached_info.csv  hmm_input.csv  istep-0
(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561# cd istep-0
(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561/istep-0# ls
hmm-0  hmm-1  hmm-2  hmm-3  hmm-4  hmm-5
(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561/istep-0# cd hmm-5
(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561/istep-0/hmm-5# ls
err  hmm_cached_info.csv  hmm_input.csv  hmm_output.csv.progress  out
(base) root@af7131fba3e1:/tmp/partis-work/hmms/274561/istep-0/hmm-5# less hmm_output.csv.progress

hmm_output.csv.progress from the latest (?) hmm folder: Google Drive text file

Is this where I should be checking for any info on why the program was killed? Nothing appears to be wrong in this file from what I can tell. The "err" file was empty.

A side question: Is there a good way to extract out the CDR3 region from the fastq file generated using "./bin/extract-fasta.py"? I see that the .yaml file includes the keys "codon_positions": {"j": 327,"v": 291} and "cdr3_length": 39 for a particular cluster but I'm not quite sure how to translate this to the final fastq file. Thanks!

from partis.

psathyrella commented on July 17, 2024

argggggg that exception just won't die. I'd convinced myself that that it couldn't get triggered any more, but it looks like I'll just have to figure out a way to skip sequences that trigger it instead. I'll try to get to that tomorrow.

Yeah unfortunately running on that sequence alone doesn't reproduce the error for me, but the sequence is unproductive, so if you don't need unproductive sequences, setting --skip-unproductive may avoid the error for you.

So the log file says that particular bcrham process alone is using 7% of your memory, so multiplied by 6 is close to half, plus adding in the memory used by the python process that's spawned the bcrham procs it's likely the OOM killing it:

Unfortunately the nature of clustering is that both the time and memory required are highly dependent on the structure of the repertoire (not just its size). For instance a repertoire where everybody's either super similar or very different will be quite quick and easy, but if there's tons of sequences that are similar to each other but not super close, it has to do a lot more work since the approximate methods can't do as much. Ignoring small clustersis likely to make the biggest difference in reducing the memory footprint. But oh, wait, that looks like it says it only has access to 2GB, not 32. Maybe your docker image is only getting a small allocation, that could be increased?

At the moment you'll have to add a line or two to either bin/extract-fasta.py or bin/example-parse-output.py, but another thing I may get to tomorrow is adding a command line arg to them to make it simpler to extract a single column like cdr3 length. Adding print cluster_annotation['cdr3_length'] at this point will print the cdr3 len for the largest cluster; remove the 'break' to get the rest of them. I'm not a big fan of adding meta info to fasta files, since there's so many different formats for doing so that are all mutually incompatible, but you could do a similar thing (with more work) in extract-fasta.py

from partis.

lindsly commented on July 17, 2024

Ah, I didn't realize that Docker imposes a memory cap. I actually installed Docker for the first time for this program so I'm still getting used to this whole process.

Hopefully that change will resolve the issues I'm having. I will run the larger files tonight to make sure. Thank you very much for all your help!

from partis.

lindsly commented on July 17, 2024

The memory increase appears to have solved the original "killed" issue!

I tried running the second larger file that had the length exception again, this time using the --skip-unproductive setting, but another sequence (apparently productive) also had the same exception.

Another basic question I have is for the output file from bin/example-parse-output.py. I am able to see the output of the command

./bin/example-parse-output.py --fname _output/lys6_partis_out/lys6-partition.yaml --glfo-dir /partis/data/germlines/mouse/ > _output/lys6_partis_out/parsed_output.txt

nicely using less -R parsed_output.txt but when I transfer that folder to my working folder on my PC and try to view it, the colored text and other formatting causes the file to be very difficult to read. Example section below:

�[1;34mN�[0m�[1;34mN�[0m�[1;34mN�[0m�[1;34mN�[0m�[1;34mN�[0m�[1;34mN�[0mGAGGTGAAGCTTCTCCAGTCTGGAGGTGGCCTG�[1;34m*�[0m�[1;34m*�[0mGCAGCCT�[91mT�[0mGAGGATCCCTGGAAACTCTCCTGTGCAGCCTCAGGAATCGATTTTAGTAGATACTGGATGAGTT�[91mA�[0m�[91mA�[0m�[91mC�[0mT�[1;34m*�[0m�[1;34m*�[0mGGCGGGCTCCAGGGAAAGGACTAGAATGGATTGGAGAAATTAATCCAGATAGCAGTACAATAAACTATGCACCATCTCTAAAGGATAAATTCATCATCCTT�[91mG�[0mCAG�[91mT�[0mGACAACGCCAAAA�[91mT�[0m�[91mA�[0m�[91mC�[0m�[91mG�[0m�[91mC�[0m�[91mT�[0m�[91mG�[0m�[91mT�[0m�[91mG�[0m�[91mT�[0m�[91mA�[0mC�[91mC�[0m�[91mT�[0m�[91mT�[0m�[91mC�[0m�[91mC�[0m�[91mT�[0m�[91mG�[0m�[91mC�[0mA�[91mA�[0m�[91mA�[0m�[91mT�[0m�[91mG�[0mAGTGA�[91mA�[0mAGTGTGAGAT�[91mC�[0m�[91mT�[0m�[91mG�[0mGAGGACACAGCCCTTTATTAC�[7mT�[0m�[7mG�[0m�[7mT�[0mGCAAAAG�[91mG�[0mGGGCGGTTACTATGCTATGGACTAC�[7mT�[0m�[7mG�[0m�[7mG�[0mGGTCAAGGAA�[91mA�[0mC�[91mC�[0m�[91mT�[0m�[91mC�[0m�[91mA�[0m�[91mG�[0m�[91mT�[0m�[91mC�[0m�[91mA�[0mC�[91mT�[0m�[91mG�[0m�[91mT�[0m�[91mC�[0m�[91mT�[0mC�[91mC�[0m�[91mT�[0m�[91mC�[0m�[91mA�[0m 20ba4fc1-2906-4dc8-857e-16da8daf4e26 14.6 �[91mout of frame cdr3, stop codon�[0m

Is there a better way to write this file (I have tried .txt, .csv so far) and view it in Windows? Thanks!

from partis.

psathyrella commented on July 17, 2024

Great, glad the memory fixed it.

I'll try to get to the exception in a bit.

Yeah so those are ansi color codes. It looks like windows terminals do support those, so maybe just view in a windows terminal? It appears there's also a windows version of less, or you can also just strip them from the log files.

from partis.

psathyrella commented on July 17, 2024

ok this should fix the length exception.

from partis.

Partis Killed - no other error message about partis HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent