Coder Social home page Coder Social logo

matteoschiavinato / utilities Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 7.0 92 KB

General purpose tools for every-day sequencing bioinformatics. If you use any of these tools, please acknowledge this repository (there are no publications). Let's all help each other ;)

R 6.90% Python 92.79% Shell 0.31%
bam bioinformatics fasta python r tools

utilities's People

Contributors

matteoschiavinato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

utilities's Issues

mummer2vcf - list index out of range

Hi Matteo,

I'm using mummer3.23 to compare two human genomes and you my-mummer-2-vcf.py script is giving me a "list index out of range". I notice the .snps files I'm getting (using "show-snps -CTlr" or "show-snps -T") only have ten columns, however you're code suggests you have twelve columns.
Could you show me the header of your .snps file, so I can see where the differences lay?

my-mummer-2-vcf.py: TypeError: expected str, bytes or os.PathLike object, not NoneType

Hello @MatteoSchiavinato ,

I am experiencing the above error when trying to run my-mummer-2-vcf.py, and was hoping you might be able to help me resolve it. I am running the script using:

python3 my-mummer-2-vcf.py --input-header -s nucmer.snps and have tried this with and without --input-header.

I am running nucmer and show-snps using:
nucmer -c 100 -p xxx output.backbone.fasta xxx.fasta
show-snps -CTlr xxx.delta > nucmer.snps

I have tried using `show-snps -T xxx.delta > nucmer.snps``` also, and receive the same error.

The output from head nucmer.snps | awk '{print NF}' is:
2 1 0 12 12 12 12 12 12 12

and the output from head nucmer.snps | awk '{print NF; print $0}' is:

2
/dartfs-hpc/rc/lab/G/output.backbone.fasta /dartfs-hpc/rc/lab/G/output.GCF_000480355.1_Pseu_aeru_CF614_V1.accessory.fasta
1
NUCMER
0

12
[P1] [SUB] [SUB] [P2] [BUFF] [DIST] [LEN R] [LEN Q] [FRM] [TAGS]
12
792 G C 35454 1 792 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520
12
793 G A 35455 1 793 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520
12
795 G C 35457 2 795 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520
12
801 T C 35463 4 801 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520
12
805 . A 35468 4 805 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520
12
809 A . 35471 1 809 2482 37520 1 1 backbone_0703_length_2482 GCF_000480355.1_Pseu_aeru_CF614_V1_accessory_1907_length_37520

Any thoughts you have on why this may be occurring and how I might go about solving it would be much appreciated.

Thanks in advance!!

The error in full is:

  File "/dartfs-hpc/rc/home/d/d41294d/.conda/envs/biopyow/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
    self.stream = open(source, "r" + mode)
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "my-mummer-2-vcf.py", line 329, in <module>
    Vcf_lines = collapse_variants(args.reference, Vcf_lines)
  File "my-mummer-2-vcf.py", line 211, in collapse_variants
    Sequences = parse_sequences(Reference)
  File "my-mummer-2-vcf.py", line 189, in parse_sequences
    for record in SeqIO.parse(Reference, "fasta"):
  File "/dartfs-hpc/rc/home/d/d41294d/.conda/envs/biopyow/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
    i = iterator_generator(handle)
  File "/dartfs-hpc/rc/home/d/d41294d/.conda/envs/biopyow/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
    super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
  File "/dartfs-hpc/rc/home/d/d41294d/.conda/envs/biopyow/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 46, in __init__
    if source.read(0) != "":
AttributeError: 'NoneType' object has no attribute 'read'

suppress mummer2vcf to output a wrong 'alt' when SNPs at the same position in two (or more) reference sequences

Hi Matteo,

It's nice you put this my-mummer-2-vcf.py available for us.
I think following fixes are necessary for some cases.

  1. When SNPs exist at the same position in two (or more) reference sequences and 'alt' bases differ each other, this code outputs a wrong 'alt' of concatenated bases.

An example fix is itemgetter(0, 1) instead of itemgetter(1) at line 223 in def collapse_variants.

Sorted_snps = [ "\t".join([str(y) for y in x]) for x in sorted(Sorted_snps, key=itemgetter(0,1)) ]

  1. When SNPs exist at the same position in two (or more) query sequences (whether 'alt' bases differ each other or not), this code does not output concatenated 'orig_pos'.

If we want to output this, an example would be addition of following lines after line 114 of def collapse_snps.

			orig_pos_lst = str(Collapsed_snps[-1][8]).split(",")
			orig_pos_lst.append(str(orig_pos))
			Collapsed_snps[-1][8] = ",".join(orig_pos_lst)

Positions involved in insertions in relation to the reference are not merged

Hi,

I get your script running and it seems to be working well for SNPs and deletions in relation to the reference.
For positions involved in insertions in relation to the reference there seems to be a problem with these not being merged into one feature. Instead, these appear as as 1-base insertions on as many lines as the insertion is long.

ref_seq 16534 . T TC . . INDEL scaffold212:740
ref_seq 16534 . T TA . . INDEL scaffold212:741
ref_seq 16534 . T TA . . INDEL scaffold212:742
ref_seq 16534 . T TC . . INDEL scaffold212:743
ref_seq 16534 . T TC . . INDEL scaffold212:744

/Carl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.