Comments (7)
Because I'm currently out of town and don't have my computer with me, please excuse me for not being able to provide the fullest answers. But I will try my best :) I should be back by Sunday (in Korean time), so if below doesn't get you where you need, please let me know and I will try to provide additional support as soon as I get back.
For the first question, there are a number of ways to achieve the goal, but I suggest to use the pyvcf.VcfFrame.to_variants()
method:
from fuc import pyvcf
vf1 = pyvcf.VcfFrame.from_file('in1.vcf')
vf2 = pyvcf.VcfFrame.from_file('in2.vcf')
variants1 = vf1.to_variants()
variants2 = vf2.to_variants()
common_variants = list(set(variants1).intersection(variants2))
unique_variants1 = [x for x in variants1 if x not in variants2]
unique_variants2 = [x for x in variants2 if x not in variants1]
For the second question, again, there are a number of ways but I recommend using the pyvcf.merge()
method with the collapse=True
option:
merged_vf = pyvcf.merge([vf1, vf2], how='outer')
merged_vf.to_file('out.vcf')
This will merge two VcfFrame
objects while removing duplicate records.
from fuc.
The answer to the first question worked , thank you!
Regarding second question on merging variants from 2 callers, when doing outer join, it does not work.
File "/home/ubuntu/miniconda3/envs/fuc/lib/python3.9/site-packages/fuc/api/pyvcf.py", line 5277, in <listcomp>
key=lambda col: [CONTIGS.index(x) if isinstance(x, str)
ValueError: 'MT' is not in list
from fuc.
Thanks for the patience, I'm back! Glad to hear the first question is resolved.
For the second question, the problem occurred because currently the pyvcf.VcfFrame.sort
method expects you to use either M
or chrM
for indicating the mitochondria contig (you are using MT
). Clearly, this needs to be fixed because people need to be able to use pyvcf.VcfFrame.sort
regardless of what contigs their input VCF file has. I will update this for the next release 0.34.0
-- it should be an easy fix. In the meantime, can you send me the two VCF files you are trying to merge? This way, I can make sure there are no other surprises for you. But it's totally fine if you can't send them to me.
from fuc.
from fuc.
I updated the pyvcf.VcfFrame.sort
method to handle custom contigs such as MT
. This resolves your second question. I sent you the output VCF after merging.
FYI, I noticed that the output VCF has no genotype data (e.g. 0/1
) for the samples NORMAL
and TUMOR
. When I looked into this more closely, I found that the strelka2.vcf
file does not have the GT
field at all. Was this intended?
from fuc.
FYI, you can try out the updated fuc
yourself with the following:
$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ git checkout 0.34.0-dev
$ pip install -e .
from fuc.
Since the two original issues have been resolved, I will close this issue. Please feel free to re-open it if necessary.
from fuc.
Related Issues (20)
- [MAF/VCF] Add function to convert unannotated VCF to MAF HOT 1
- [MAF/VCF] Add function to create rainfall plots HOT 1
- [VCF] Add function to convert missing genotypes (./.) to REF homozygous (0/0)
- [BAM] Add function to plot uniformity in read depth
- [VCF] Add function to plot summary statistics HOT 1
- [VCF] Add function to convert 23andMe data to VCF
- [VCF] Error related to `pyvcf.VcfFrame.plot_hist` HOT 11
- [VCF] How to remove all rows with the same variant in VCF file using `pyvcf` HOT 10
- [VCF] Add function to create a scatter plot of allele frequency for two datasets HOT 1
- [General] Error during installation fuc via conda HOT 2
- [VCF] Add function to compute AC/AN/AF in the INFO column HOT 1
- [VCF] Add function to remove samples with high missingness
- [General] Error while importing pyvcf HOT 2
- [VCF] Update `pyvcf.VcfFrame.filter_sampnum` to be more robust HOT 1
- [MAF] maf-oncoplt Index Error HOT 3
- [MAF/VCF] `pymaf.MafFrame.from_vcf` assumes CSQ is the first field in the INFO record HOT 3
- [VCF] Issue reading vcf from mutect2, strelka2 HOT 2
- [MAF] Variant color coding mismatch HOT 3
- [MAF] Update Ensembl VEP consequences mapping HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuc.