Coder Social home page Coder Social logo

everestial / vcf-simplify Goto Github PK

View Code? Open in Web Editor NEW
45.0 3.0 11.0 20.34 MB

A python parser to simplify and build the VCF (Variant Call Format).

License: MIT License

Python 93.72% Shell 6.28%
vcf python-parser variant-annotation haplotypes phasing converts phase-stitcher phase-extender tableview vcf-files

vcf-simplify's People

Contributors

bhuwanaryal19 avatar everestial avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vcf-simplify's Issues

KeyError: 'FORMAT'

Hi, I am trying to convert a vcf file (recent version of NCBI clinvar) to table format, but I receive this error:
Traceback (most recent call last):
File "VcfSimplify.py", line 58, in
main()
File "VcfSimplify.py", line 53, in main
vcf_solver(task, args)
File "VCF-Simplify/assign_task/perform_operation.py", line 104, in vcf_solver
fnc_vcf_to_table(infile, outfile, preheader, mode, gtbase, header_name, infos, formats, samples)
File "VCF-Simplify/metadata_parser/utils.py", line 26, in wrapper
result = func(*args, **kwargs)
File "VCF-Simplify/records_parser/simplifyvcf/to_table.py", line 123, in fnc_vcf_to_table
sample_names=all_samples, gtbase_is = gtbase
File "VCF-Simplify/records_parser/vcf_records_parser.py", line 176, in read_vcfRecord
format_tags = self.record_dict["FORMAT"]
KeyError: 'FORMAT'

The vcf file does not have the FORMAT tag. Using the flag "-formats 0" does not help, neither.

Is if possible for VCF-Simplify to convert non-dosage vcf to dosage vcf?

Dear developer,
I am trying to acquire the dosage VCF file converted from .gen format. However, qctools and other conversion can only output the non-dosage VCF, which cannot meet my demand of the following investigation. Then I find out that maybe Pysam can do this, but I am not quite familiar with the complex code.
I want to know if this VCF-Simplify can achieve this function?
Many thanks!
Hope for your reply!

invalid syntax

Hi

When I use python 3.5 and 3.6 to run VcfSimplify.py on my sample I get this error

python 3.5
Traceback (most recent call last):
File "VcfSimplify.py", line 12, in
from assign_task.perform_operation import vcf_solver
File "/home/user/analysis/ariantcalling/pop_gene/VCF-Simplify/assign_task/perform_operation.py", line 76
print(f"ViewVCF run time = {run_time : .4f} seconds.")

When I use python 3.6 I get this error
Using the following arguments:
Namespace(GTbase=['GT:numeric'], PG='GT', PI='PI', formats=['all'], inVCF='final.vcf', includeUnphased='yes', infos=['all'], mode='0', outFile='final.hap.txt', outHeaderName='final.header.txt', preHeader=['all'], samples='all', toType=['haplotype'])

Using option "SimplifyVCF"
Simplifying the VCF records ...
Creating Haplotype file from VCF file
0 samples found

Writing the header to a separate output file.

sample genotypes tag 'GT' are written as 'numeric' bases
parsing records ...
elapsed time: 0.011363744735717773

fnc_vcf_to_haplotype: memory before: 7,996, after: 8,080, consumed: 84; exec time: 0.011483907699584961 seconds

Below are the respective commands I used.

python3.5 VCF-Simplify/VcfSimplify.py SimplifyVCF -toType haplotype -inVCF final.vcf -outFile final.hap.txt -outHeaderName final.header.txt -PI PI -PG GT -includeUnphased yes

python3.6 VCF-Simplify/VcfSimplify.py SimplifyVCF -toType haplotype -inVCF final.vcf -outFile final.hap.txt -outHeaderName final.header.txt -PI PI -PG GT -includeUnphased yes

POS field does not exit. Update your file

Dear @everestial,

First, first things... I love this script! Thanks a lot! Then, go to the problem: I've converted an initial VCF to table (and all worked perfectly) but in the inverse process (table to VCF) with this recently generated table I've got this problem: POS field does not exit. Update your file... I don't understand the reason, because this filed would be within the #CHROM line and it's specified this line mustn't be included.

My header would be this:

##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">

Any idea, please? Thanks in advance!

Problem installing on Windows

I followed the steps, including installing VisualStudio and got the following error:

python setup.py build_ext --inplace
...
...
...
error: Error executing cmd /u /c "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64 && set

System info:

C:\Users\balter\VCF-Simplify>python --version
Python 3.9.7

C:\Users\balter\VCF-Simplify>which cython
/c/Users/balter/miniconda3/Scripts/cython

C:\Users\balter\VCF-Simplify>cython --version
Cython version 0.29.24
Microsoft Visual Studio Community 2022 RC
Version 17.0.0 RC3
VisualStudio.17.Release/17.0.0+31825.309.rc3
Microsoft .NET Framework
Version 4.8.04084

Installed Version: Community

Visual C++ 2022   00482-90000-00000-AA247
Microsoft Visual C++ 2022

ASP.NET and Web Tools 2019   17.0.786.62401
ASP.NET and Web Tools 2019

C# Tools   4.0.0-6.21521.2+68d3c0e77ff8607adca62a883197a5637a596438
C# components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used.

Microsoft JVM Debugger   1.0
Provides support for connecting the Visual Studio debugger to JDWP compatible Java Virtual Machines

Microsoft MI-Based Debugger   1.0
Provides support for connecting Visual Studio to MI compatible debuggers

Microsoft Visual C++ Wizards   1.0
Microsoft Visual C++ Wizards

Microsoft Visual Studio VC Package   1.0
Microsoft Visual Studio VC Package

NuGet Package Manager   6.0.0
NuGet Package Manager in Visual Studio. For more information about NuGet, visit https://docs.nuget.org/

ProjectServicesPackage Extension   1.0
ProjectServicesPackage Visual Studio Extension Detailed Info

Test Adapter for Boost.Test   1.0
Enables Visual Studio's testing tools with unit tests written for Boost.Test.  The use terms and Third Party Notices are available in the extension installation directory.

Test Adapter for Google Test   1.0
Enables Visual Studio's testing tools with unit tests written for Google Test.  The use terms and Third Party Notices are available in the extension installation directory.

TypeScript Tools   17.0.1001.2002
TypeScript Tools for Microsoft Visual Studio

Visual Basic Tools   4.0.0-6.21521.2+68d3c0e77ff8607adca62a883197a5637a596438
Visual Basic components used in the IDE. Depending on your project type and settings, a different version of the compiler may be used.

Visual Studio Code Debug Adapter Host Package   1.0
Interop layer for hosting Visual Studio Code debug adapters in Visual Studio

Visual Studio IntelliCode   2.2
AI-assisted development for Visual Studio.

Visual Studio Tools for CMake   1.0
Visual Studio Tools for CMake

Windows compatibility

After cloning the repo and starting VCF-Simplify in Windows I receive the following message:

C:\Users\alexander\VCF-Simplify>python VcfSimplify.py -h
Traceback (most recent call last):
  File "VcfSimplify.py", line 12, in <module>
    from assign_task.perform_operation import vcf_solver
  File "C:\Users\alexander\VCF-Simplify\assign_task\perform_operation.py", line 6, in <module>
    from metadata_parser.utils import vcf_records_as_table
  File "C:\Users\alexander\VCF-Simplify\metadata_parser\utils.py", line 4, in <module>
    import resource
ModuleNotFoundError: No module named 'resource'

The resource module does not exist in CPython under Windows. As far as I know there is also no equivalent replacement so disabling this functionality lets the application run fine.

A small patch (#7) can achieve that.

Possible README mistake

Hi,

The command:
git clone https://github.com/everestial/VCF-SimplifyDev
does not work. I guess the right command is:
git clone https://github.com/everestial/VCF-Simplify

KeyError: PI

Hi I was trying to convert my vcf to haplotype . I used python3.6 to run vcf-simplify and I got the error message below

Traceback (most recent call last):
File "/home/user/apps/VCF-Simplify/VcfSimplify.py", line 58, in
main()
File "/home/user/apps/VCF-Simplify/VcfSimplify.py", line 53, in main
vcf_solver(task, args)
File "/home/user/apps/VCF-Simplify/assign_task/perform_operation.py", line 95, in vcf_solver
fnc_vcf_to_haplotype(infile, outfile, header_name, pi_tag, pg_tag, include_unphased, gtbase)
File "/home/user/apps/VCF-Simplify/metadata_parser/utils.py", line 26, in wrapper
result = func(*args, **kwargs)
File "/home/user/apps/VCF-Simplify/records_parser/simplifyvcf/to_haplotype.py", line 101, in fnc_vcf_to_haplotype
pi_values = [mapped_record[sample][pi_tag] for sample in sample_ids]
File "/home/user/apps/VCF-Simplify/records_parser/simplifyvcf/to_haplotype.py", line 101, in
pi_values = [mapped_record[sample][pi_tag] for sample in sample_ids]
KeyError: 'PI'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.