Coder Social home page Coder Social logo

[VCF] Question on usage about fuc HOT 7 CLOSED

sbslee avatar sbslee commented on June 11, 2024
[VCF] Question on usage

from fuc.

Comments (7)

sbslee avatar sbslee commented on June 11, 2024

Because I'm currently out of town and don't have my computer with me, please excuse me for not being able to provide the fullest answers. But I will try my best :) I should be back by Sunday (in Korean time), so if below doesn't get you where you need, please let me know and I will try to provide additional support as soon as I get back.

For the first question, there are a number of ways to achieve the goal, but I suggest to use the pyvcf.VcfFrame.to_variants() method:

from fuc import pyvcf

vf1 = pyvcf.VcfFrame.from_file('in1.vcf')
vf2 = pyvcf.VcfFrame.from_file('in2.vcf')

variants1 = vf1.to_variants()
variants2 = vf2.to_variants()

common_variants = list(set(variants1).intersection(variants2))
unique_variants1 = [x for x in variants1 if x not in variants2]
unique_variants2 = [x for x in variants2 if x not in variants1]

For the second question, again, there are a number of ways but I recommend using the pyvcf.merge() method with the collapse=True option:

merged_vf = pyvcf.merge([vf1, vf2], how='outer')
merged_vf.to_file('out.vcf')

This will merge two VcfFrame objects while removing duplicate records.

from fuc.

ironb25 avatar ironb25 commented on June 11, 2024

The answer to the first question worked , thank you!

Regarding second question on merging variants from 2 callers, when doing outer join, it does not work.

 File "/home/ubuntu/miniconda3/envs/fuc/lib/python3.9/site-packages/fuc/api/pyvcf.py", line 5277, in <listcomp>
    key=lambda col: [CONTIGS.index(x) if isinstance(x, str)
ValueError: 'MT' is not in list

from fuc.

sbslee avatar sbslee commented on June 11, 2024

@ironb25,

Thanks for the patience, I'm back! Glad to hear the first question is resolved.

For the second question, the problem occurred because currently the pyvcf.VcfFrame.sort method expects you to use either M or chrM for indicating the mitochondria contig (you are using MT). Clearly, this needs to be fixed because people need to be able to use pyvcf.VcfFrame.sort regardless of what contigs their input VCF file has. I will update this for the next release 0.34.0 -- it should be an easy fix. In the meantime, can you send me the two VCF files you are trying to merge? This way, I can make sure there are no other surprises for you. But it's totally fine if you can't send them to me.

from fuc.

ironb25 avatar ironb25 commented on June 11, 2024

from fuc.

sbslee avatar sbslee commented on June 11, 2024

@ironb25,

I updated the pyvcf.VcfFrame.sort method to handle custom contigs such as MT. This resolves your second question. I sent you the output VCF after merging.

FYI, I noticed that the output VCF has no genotype data (e.g. 0/1) for the samples NORMAL and TUMOR. When I looked into this more closely, I found that the strelka2.vcf file does not have the GT field at all. Was this intended?

from fuc.

sbslee avatar sbslee commented on June 11, 2024

FYI, you can try out the updated fuc yourself with the following:

$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ git checkout 0.34.0-dev
$ pip install -e .

from fuc.

sbslee avatar sbslee commented on June 11, 2024

Since the two original issues have been resolved, I will close this issue. Please feel free to re-open it if necessary.

from fuc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.