everestial / vcf-simplify Goto Github PK
View Code? Open in Web Editor NEWA python parser to simplify and build the VCF (Variant Call Format).
License: MIT License
A python parser to simplify and build the VCF (Variant Call Format).
License: MIT License
Hi, I am trying to convert a vcf file (recent version of NCBI clinvar) to table format, but I receive this error:
Traceback (most recent call last):
File "VcfSimplify.py", line 58, in
main()
File "VcfSimplify.py", line 53, in main
vcf_solver(task, args)
File "VCF-Simplify/assign_task/perform_operation.py", line 104, in vcf_solver
fnc_vcf_to_table(infile, outfile, preheader, mode, gtbase, header_name, infos, formats, samples)
File "VCF-Simplify/metadata_parser/utils.py", line 26, in wrapper
result = func(*args, **kwargs)
File "VCF-Simplify/records_parser/simplifyvcf/to_table.py", line 123, in fnc_vcf_to_table
sample_names=all_samples, gtbase_is = gtbase
File "VCF-Simplify/records_parser/vcf_records_parser.py", line 176, in read_vcfRecord
format_tags = self.record_dict["FORMAT"]
KeyError: 'FORMAT'
The vcf file does not have the FORMAT tag. Using the flag "-formats 0" does not help, neither.
Dear developer,
I am trying to acquire the dosage VCF file converted from .gen format. However, qctools and other conversion can only output the non-dosage VCF, which cannot meet my demand of the following investigation. Then I find out that maybe Pysam can do this, but I am not quite familiar with the complex code.
I want to know if this VCF-Simplify can achieve this function?
Many thanks!
Hope for your reply!
Hi
When I use python 3.5 and 3.6 to run VcfSimplify.py on my sample I get this error
python 3.5
Traceback (most recent call last):
File "VcfSimplify.py", line 12, in
from assign_task.perform_operation import vcf_solver
File "/home/user/analysis/ariantcalling/pop_gene/VCF-Simplify/assign_task/perform_operation.py", line 76
print(f"ViewVCF run time = {run_time : .4f} seconds.")
When I use python 3.6 I get this error
Using the following arguments:
Namespace(GTbase=['GT:numeric'], PG='GT', PI='PI', formats=['all'], inVCF='final.vcf', includeUnphased='yes', infos=['all'], mode='0', outFile='final.hap.txt', outHeaderName='final.header.txt', preHeader=['all'], samples='all', toType=['haplotype'])
Using option "SimplifyVCF"
Simplifying the VCF records ...
Creating Haplotype file from VCF file
0 samples found
Writing the header to a separate output file.
sample genotypes tag 'GT' are written as 'numeric' bases
parsing records ...
elapsed time: 0.011363744735717773
fnc_vcf_to_haplotype: memory before: 7,996, after: 8,080, consumed: 84; exec time: 0.011483907699584961 seconds
Below are the respective commands I used.
python3.5 VCF-Simplify/VcfSimplify.py SimplifyVCF -toType haplotype -inVCF final.vcf -outFile final.hap.txt -outHeaderName final.header.txt -PI PI -PG GT -includeUnphased yes
python3.6 VCF-Simplify/VcfSimplify.py SimplifyVCF -toType haplotype -inVCF final.vcf -outFile final.hap.txt -outHeaderName final.header.txt -PI PI -PG GT -includeUnphased yes
Dear @everestial,
First, first things... I love this script! Thanks a lot! Then, go to the problem: I've converted an initial VCF to table (and all worked perfectly) but in the inverse process (table to VCF) with this recently generated table I've got this problem: POS field does not exit. Update your file
... I don't understand the reason, because this filed would be within the #CHROM line and it's specified this line mustn't be included.
My header would be this:
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
Any idea, please? Thanks in advance!
I followed the steps, including installing VisualStudio and got the following error:
python setup.py build_ext --inplace
...
...
...
error: Error executing cmd /u /c "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64 && set
System info:
C:\Users\balter\VCF-Simplify>python --version
Python 3.9.7
C:\Users\balter\VCF-Simplify>which cython
/c/Users/balter/miniconda3/Scripts/cython
C:\Users\balter\VCF-Simplify>cython --version
Cython version 0.29.24
Microsoft Visual Studio Community 2022 RC
Version 17.0.0 RC3
VisualStudio.17.Release/17.0.0+31825.309.rc3
Microsoft .NET Framework
Version 4.8.04084
Installed Version: Community
Visual C++ 2022 00482-90000-00000-AA247
Microsoft Visual C++ 2022
ASP.NET and Web Tools 2019 17.0.786.62401
ASP.NET and Web Tools 2019
C# Tools 4.0.0-6.21521.2+68d3c0e77ff8607adca62a883197a5637a596438
C# components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used.
Microsoft JVM Debugger 1.0
Provides support for connecting the Visual Studio debugger to JDWP compatible Java Virtual Machines
Microsoft MI-Based Debugger 1.0
Provides support for connecting Visual Studio to MI compatible debuggers
Microsoft Visual C++ Wizards 1.0
Microsoft Visual C++ Wizards
Microsoft Visual Studio VC Package 1.0
Microsoft Visual Studio VC Package
NuGet Package Manager 6.0.0
NuGet Package Manager in Visual Studio. For more information about NuGet, visit https://docs.nuget.org/
ProjectServicesPackage Extension 1.0
ProjectServicesPackage Visual Studio Extension Detailed Info
Test Adapter for Boost.Test 1.0
Enables Visual Studio's testing tools with unit tests written for Boost.Test. The use terms and Third Party Notices are available in the extension installation directory.
Test Adapter for Google Test 1.0
Enables Visual Studio's testing tools with unit tests written for Google Test. The use terms and Third Party Notices are available in the extension installation directory.
TypeScript Tools 17.0.1001.2002
TypeScript Tools for Microsoft Visual Studio
Visual Basic Tools 4.0.0-6.21521.2+68d3c0e77ff8607adca62a883197a5637a596438
Visual Basic components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used.
Visual Studio Code Debug Adapter Host Package 1.0
Interop layer for hosting Visual Studio Code debug adapters in Visual Studio
Visual Studio IntelliCode 2.2
AI-assisted development for Visual Studio.
Visual Studio Tools for CMake 1.0
Visual Studio Tools for CMake
After cloning the repo and starting VCF-Simplify in Windows I receive the following message:
C:\Users\alexander\VCF-Simplify>python VcfSimplify.py -h
Traceback (most recent call last):
File "VcfSimplify.py", line 12, in <module>
from assign_task.perform_operation import vcf_solver
File "C:\Users\alexander\VCF-Simplify\assign_task\perform_operation.py", line 6, in <module>
from metadata_parser.utils import vcf_records_as_table
File "C:\Users\alexander\VCF-Simplify\metadata_parser\utils.py", line 4, in <module>
import resource
ModuleNotFoundError: No module named 'resource'
The resource
module does not exist in CPython under Windows. As far as I know there is also no equivalent replacement so disabling this functionality lets the application run fine.
A small patch (#7) can achieve that.
Hi,
The command:
git clone https://github.com/everestial/VCF-SimplifyDev
does not work. I guess the right command is:
git clone https://github.com/everestial/VCF-Simplify
Hi,
Thank you for your vcf pharsing tool, when I use this script to simplify vep or annovar annotated vcf, it reports bug
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2855: ordinal not in range(128)
Hi I was trying to convert my vcf to haplotype . I used python3.6 to run vcf-simplify and I got the error message below
Traceback (most recent call last):
File "/home/user/apps/VCF-Simplify/VcfSimplify.py", line 58, in
main()
File "/home/user/apps/VCF-Simplify/VcfSimplify.py", line 53, in main
vcf_solver(task, args)
File "/home/user/apps/VCF-Simplify/assign_task/perform_operation.py", line 95, in vcf_solver
fnc_vcf_to_haplotype(infile, outfile, header_name, pi_tag, pg_tag, include_unphased, gtbase)
File "/home/user/apps/VCF-Simplify/metadata_parser/utils.py", line 26, in wrapper
result = func(*args, **kwargs)
File "/home/user/apps/VCF-Simplify/records_parser/simplifyvcf/to_haplotype.py", line 101, in fnc_vcf_to_haplotype
pi_values = [mapped_record[sample][pi_tag] for sample in sample_ids]
File "/home/user/apps/VCF-Simplify/records_parser/simplifyvcf/to_haplotype.py", line 101, in
pi_values = [mapped_record[sample][pi_tag] for sample in sample_ids]
KeyError: 'PI'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.