sbslee / fuc Goto Github PK

I'm not too sure if a lot of people do this, but I generally like to split up my bam into sequence wise bams, for variant calling or genome polishing etc.

This would be a really useful command to have in general!

Tutorial to read in 10 vcfs and calculate allele frequency?

Hi, I have 10 vcfs, each with multiple samples. Can I read them all in (in parallel) and make a single pandas data frame from all variants x samples?

What do I call to get allele frequency per variant?

Many thanks!

[MAF/VCF] Add function to convert unannotated VCF to MAF

https://www.biostars.org/p/9478353/

[VCF/VEP] Add function to filter by BIOTYPE

https://www.biostars.org/p/9474428/

[VCF] Update `pyvcf.VcfFrame.filter_sampnum` to be more robust

By more robust, I mean:

support using allele frequency as prevalence (currently, only sample count is used as prevalence)
allow users to select which samples prevalence should be calculated from

https://www.biostars.org/p/9514974/

[DOC] Add sphinx-issues extension

The sphinx-issues extension provides a simple way to link to a GitHub project's issues, pull requests, user profiles, etc. For more details, visit the extension's website.

[MAF] Variant color coding mismatch

First, thanks for providing us with this awesome tool! 😊

I recently noticed some strange behavior when creating a custom oncoplot. Specifically, the coloring of the variant classification in the gene plot did not match the variant color coding in the waterfall plot when count was set to 1.

data_mf.plot_waterfall(count=1)

I eventually resolved it by setting vmin andvmaxto the maximum/minimum index of NONSYN_COLORS as the anchors were not correctly inferred by seaborn.heatmap in plot_waterfall

[MAF/VCF] Add function to convert MAF to VCF

VCF to MAF is already supported (e.g. maf_vcf2maf and pymaf.MafFrame.from_vcf). Now add MAF to VCF functionality.

https://bioinformatics.stackexchange.com/questions/3421/tools-to-do-vcf-to-maf-and-maf-to-vcf-conversion

[VCF/BED] Add function to convert VCF to BED

References:

Converting a VCF with SNPs and indels to BED format

[VCF] Question on usage

Follow up question after resolving this(#62) on the usage.

I want to find common variants, and unique to each of the vcf. (2 vcf files from mutect both)
Secondly, I want to concatenate vcf calls from strelka and mutect, and remove duplicates.

Could you suggest which methods to use for these?

thanks,
Rohan

[MAF] Update Ensembl VEP consequences mapping

Hi Steven,

I tried running fuc maf-vcf2maf in.vcf > out.maf on a VEP-annotated VCF today but got an error as below:
ValueError: Found unknown Ensembl VEP consequence: splice_donor_region_variant
I'm guessing there are additional consequence terms that need to be added?

Thank you for all your hard work. fuc is fabulous.

p.s. My version of VEP is 109.3

[VCF/VEP] Add function to parse VCF annotated by Ensembl VEP

https://www.biostars.org/p/9474428/#9474883

[VCF] Issue reading vcf from mutect2, strelka2

Hello,

I am trying to read vcf file from mutect2 and strelka2 and getting this error below.

Code:

from fuc import pyvcf
vf = pyvcf.VcfFrame.from_file('mutect2.vcf')
vf.df

Error:

DtypeWarning: Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.
  vf = pyvcf.Vcf

Could you please help?

Thanks
Rohan

[MAF/VCF] `pymaf.MafFrame.from_vcf` assumes CSQ is the first field in the INFO record

This is not necessarily true. Example INFO record I have:

'AC=2;ACGTNacgtnMINUS=0,0,0,0,0,0,0,0,0,0;ACGTNacgtnPLUS=5,0,61,0,0,0,0,0,0,0;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=9,41|1,5;CALLERS=mutect2;CSQ=A|3_prime_UTR_variant|MODIFIER|MTOR|ENSG00000198793|Transcript|ENST00000361445|protein_coding|58/58||ENST00000361445.9:c.*700C>T||8471/8721|||||||-1|||SNV|HGNC|HGNC:3942|YES|1|P1|CCDS127.1|ENSP00000354558|P42345||UPI000012ABD3||Ensembl|G|G||1|||||chr1:g.11106785G>A|||||||||||||||||||||||||||;ClippingRankSum=-0.79;DKFZBias=damage;DP=85;ECNT=2;EPR=pass;FS=0;GERMQ=93;MBQ=34,33;MFRL=153,154;MMQ=60,60;MPOS=28;MQ=60;MQ0=0;MQRankSum=0;POPAF=7.3;ReadPosRankSum=0.452;TLOD=12.45

As one can see, CSQ is further down the line. If that happens, parsing breaks, because the wrong field is selected:

fields = r.INFO.replace('CSQ=', '').split(',')[0].split('|')

This will get whatever field is at the first comma. The parsing should probably split the VCF info first by ; then look for CSQ.

[General] Error during installation fuc via conda

Hi @sbslee again,
I have a problem with installation fuc on the new computer with Ubuntu. When I try to run installation via conda, I receive such a call:

(burak) ➜  fuc git:(main) conda install -c bioconda fuc
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.                                                                                                                                                    failed                                                                                                                                                                                                     

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.27=0
  - python=3.10 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.27`

Finally, I have installed required packages but fuc not. Any ideas how to fix it and install fuc? I've tried to do it wit git clone but then I've got an error too.

Thanks a lot!

[VCF] Add function to convert missing genotypes (./.) to REF homozygous (0/0)

https://www.biostars.org/p/9481023/

[General] Error while importing pyvcf

Hi,

I wanted to follow your tutorial on creating an OncoPlot.
However, when I am trying to import pyvcf I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/93/lz6gxd9906z7wqwc1tx7hgsc0000gn/T/ipykernel_7020/1344064430.py in <module>
      1 from pysam import VariantFile
----> 2 from fuc import pyvcf

~/miniconda3/envs/ICR/lib/python3.7/site-packages/fuc/__init__.py in <module>
----> 1 from .api import *

~/miniconda3/envs/ICR/lib/python3.7/site-packages/fuc/api/pybed.py in <module>
     39 
     40 import pandas as pd
---> 41 import pyranges as pr
     42 from copy import deepcopy
     43 from . import common

~/miniconda3/envs/ICR/lib/python3.7/site-packages/pyranges/__init__.py in <module>
     18 import pkg_resources
     19 
---> 20 from pyranges.pyranges import PyRanges
     21 from pyranges import data
     22 from pyranges.methods.concat import concat

~/miniconda3/envs/ICR/lib/python3.7/site-packages/pyranges/pyranges.py in <module>
      4 import numpy as np
      5 
----> 6 from natsort import natsorted
      7 
      8 import pyranges as pr

~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/__init__.py in <module>
      1 # -*- coding: utf-8 -*-
      2 
----> 3 from natsort.natsort import (
      4     NatsortKeyType,
      5     OSSortKeyType,

~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/natsort.py in <module>
    239 
    240 # Exposed for simplicity if one needs the default natsort key.
--> 241 natsort_key = natsort_keygen()
    242 natsort_key.__doc__ = """\
    243 natsort_key(val)

~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/natsort.py in natsort_keygen(key, alg)
    210             sep = natsort.compat.locale.null_string_locale
    211         else:
--> 212             sep = natsort.compat.locale.null_string
    213         pre_sep = natsort.compat.locale.null_string
    214     regex = utils.regex_chooser(alg)

AttributeError: module 'natsort' has no attribute 'compat'

Do you think it is a problem between the different dependencies of libraries? If not, how can I fix this?
Thank you in advance!

[VCF] Error related to `pyvcf.VcfFrame.plot_hist`

Hello,
I am using the command to plot AF distribution for vcf files. But I am getting some error.

The command:

#!/bin/python
from fuc import common, pyvcf
common.load_dataset('pyvcf')
vcf_file = 'GEUVADIS.chr22.genotype.vcf'
vf = pyvcf.VcfFrame.from_file(vcf_file)
vf.plot_hist('AF')

the error:

file.python:5: DtypeWarning: Columns (5) have mixed types.Specify dtype option on import or set low_memory=False.
  vf = pyvcf.VcfFrame.from_file(vcf_file)
Traceback (most recent call last):
  File "file.python", line 6, in <module>
    vf.plot_hist('AF')
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 1991, in plot_hist
    df = self.extract(k, as_nan=True, func=d[k])
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 3912, in extract
    df = self.df.apply(one_row, axis=1)
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply
    return op.apply()
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
    return self.apply_standard()
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 805, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 821, in apply_series_generator
    results[i] = self.f(v)
  File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 3901, in one_row
    i = r.FORMAT.split(':').index(k)
ValueError: 'AF' is not in list

[VCF] How to remove all rows with the same variant in VCF file using `pyvcf`

Hello,
I try to remove all records with the same genotype calls in all samples from my vcf file (finally I need data representing differences between samples)? Which method will be appropriate to do that? I've tried to do it with filter_sampall() but it is not satisfactory for me. Here is my code:

from fuc import pyvcf
import pandas as pd

vf = pyvcf.VcfFrame.from_file('P1_short_test.vcf')

just_gt = vf.strip('GT').df

new_vf = pyvcf.VcfFrame([''], just_gt)

no_the_same_variant = new_vf.filter_sampall().df

Thanks a lot!

[DOC] Read the Docs automodule not working properly

I noticed that automodule is working fine with the make html command, but it fails to render when built by Read The Docs on the web. This issue is most likely related to this issue. When I looked at the build in Read The Docs, I found this message:

Running Sphinx v3.5.4
loading translations [en]... done
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 5 source files that are out of date
updating environment: [new config] 5 added, 0 changed, 0 removed
reading sources... [ 20%] api
reading sources... [ 40%] changelog
reading sources... [ 60%] cli
reading sources... [ 80%] index
reading sources... [100%] readme

WARNING: autodoc: failed to import module 'api.BedFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.FastqFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.VcfFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.common' from module 'fuc'; the following exception was raised:
No module named 'fuc'
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 20%] api
writing output... [ 40%] changelog
writing output... [ 60%] cli
writing output... [ 80%] index
writing output... [100%] readme

generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 4 warnings.

The HTML pages are in _build/html.
Updating searchtools for Read the Docs search...

sbslee / fuc Goto Github PK

fuc's Issues

Recommend Projects

Recommend Topics

Recommend Org