sbslee / fuc Goto Github PK
View Code? Open in Web Editor NEWFrequently used commands in bioinformatics
Home Page: https://sbslee-fuc.readthedocs.io
License: MIT License
Frequently used commands in bioinformatics
Home Page: https://sbslee-fuc.readthedocs.io
License: MIT License
HI,
I am trying to use the "maf-oncoplt" function with the example data found here: https://github.com/sbslee/fuc/blob/main/data/vcf/1.vcf
After converting the vcf using maf-vcf2maf I ran maf-oncoplt and gives the following error:
$ fuc maf-oncoplt out.maf out.pdf
IndexError: index 0 is out of bounds for axis 0 with size 0
Thank you,
Diego
Hi!
I'm not too sure if a lot of people do this, but I generally like to split up my bam into sequence wise bams, for variant calling or genome polishing etc.
This would be a really useful command to have in general!
Hi, I have 10 vcfs, each with multiple samples. Can I read them all in (in parallel) and make a single pandas data frame from all variants x samples?
What do I call to get allele frequency per variant?
Many thanks!
By more robust, I mean:
The sphinx-issues
extension provides a simple way to link to a GitHub project's issues, pull requests, user profiles, etc. For more details, visit the extension's website.
First, thanks for providing us with this awesome tool! ๐
I recently noticed some strange behavior when creating a custom oncoplot. Specifically, the coloring of the variant classification in the gene plot did not match the variant color coding in the waterfall plot when count was set to 1.
data_mf.plot_waterfall(count=1)
I eventually resolved it by setting vmin
andvmax
to the maximum/minimum index of NONSYN_COLORS
as the anchors were not correctly inferred by seaborn.heatmap
in plot_waterfall
VCF to MAF is already supported (e.g. maf_vcf2maf
and pymaf.MafFrame.from_vcf
). Now add MAF to VCF functionality.
Follow up question after resolving this(#62) on the usage.
Could you suggest which methods to use for these?
thanks,
Rohan
Hi Steven,
I tried running fuc maf-vcf2maf in.vcf > out.maf
on a VEP-annotated VCF today but got an error as below:
ValueError: Found unknown Ensembl VEP consequence: splice_donor_region_variant
I'm guessing there are additional consequence terms that need to be added?
Thank you for all your hard work. fuc
is fabulous.
p.s. My version of VEP is 109.3
Hello,
I am trying to read vcf file from mutect2 and strelka2 and getting this error below.
Code:
from fuc import pyvcf
vf = pyvcf.VcfFrame.from_file('mutect2.vcf')
vf.df
Error:
DtypeWarning: Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.
vf = pyvcf.Vcf
Could you please help?
Thanks
Rohan
Reference: https://www.biostars.org/p/9473978/
This is not necessarily true. Example INFO record I have:
'AC=2;ACGTNacgtnMINUS=0,0,0,0,0,0,0,0,0,0;ACGTNacgtnPLUS=5,0,61,0,0,0,0,0,0,0;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=9,41|1,5;CALLERS=mutect2;CSQ=A|3_prime_UTR_variant|MODIFIER|MTOR|ENSG00000198793|Transcript|ENST00000361445|protein_coding|58/58||ENST00000361445.9:c.*700C>T||8471/8721|||||||-1|||SNV|HGNC|HGNC:3942|YES|1|P1|CCDS127.1|ENSP00000354558|P42345||UPI000012ABD3||Ensembl|G|G||1|||||chr1:g.11106785G>A|||||||||||||||||||||||||||;ClippingRankSum=-0.79;DKFZBias=damage;DP=85;ECNT=2;EPR=pass;FS=0;GERMQ=93;MBQ=34,33;MFRL=153,154;MMQ=60,60;MPOS=28;MQ=60;MQ0=0;MQRankSum=0;POPAF=7.3;ReadPosRankSum=0.452;TLOD=12.45
As one can see, CSQ
is further down the line. If that happens, parsing breaks, because the wrong field is selected:
fields = r.INFO.replace('CSQ=', '').split(',')[0].split('|')
This will get whatever field is at the first comma. The parsing should probably split the VCF info first by ;
then look for CSQ
.
Hi @sbslee again,
I have a problem with installation fuc on the new computer with Ubuntu. When I try to run installation via conda, I receive such a call:
(burak) โ fuc git:(main) conda install -c bioconda fuc
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \
Found conflicts! Looking for incompatible packages. failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.27=0
- python=3.10 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.27`
Finally, I have installed required packages but fuc not. Any ideas how to fix it and install fuc? I've tried to do it wit git clone but then I've got an error too.
Thanks a lot!
Hi,
I wanted to follow your tutorial on creating an OncoPlot.
However, when I am trying to import pyvcf I get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/93/lz6gxd9906z7wqwc1tx7hgsc0000gn/T/ipykernel_7020/1344064430.py in <module>
1 from pysam import VariantFile
----> 2 from fuc import pyvcf
~/miniconda3/envs/ICR/lib/python3.7/site-packages/fuc/__init__.py in <module>
----> 1 from .api import *
~/miniconda3/envs/ICR/lib/python3.7/site-packages/fuc/api/pybed.py in <module>
39
40 import pandas as pd
---> 41 import pyranges as pr
42 from copy import deepcopy
43 from . import common
~/miniconda3/envs/ICR/lib/python3.7/site-packages/pyranges/__init__.py in <module>
18 import pkg_resources
19
---> 20 from pyranges.pyranges import PyRanges
21 from pyranges import data
22 from pyranges.methods.concat import concat
~/miniconda3/envs/ICR/lib/python3.7/site-packages/pyranges/pyranges.py in <module>
4 import numpy as np
5
----> 6 from natsort import natsorted
7
8 import pyranges as pr
~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/__init__.py in <module>
1 # -*- coding: utf-8 -*-
2
----> 3 from natsort.natsort import (
4 NatsortKeyType,
5 OSSortKeyType,
~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/natsort.py in <module>
239
240 # Exposed for simplicity if one needs the default natsort key.
--> 241 natsort_key = natsort_keygen()
242 natsort_key.__doc__ = """\
243 natsort_key(val)
~/miniconda3/envs/ICR/lib/python3.7/site-packages/natsort/natsort.py in natsort_keygen(key, alg)
210 sep = natsort.compat.locale.null_string_locale
211 else:
--> 212 sep = natsort.compat.locale.null_string
213 pre_sep = natsort.compat.locale.null_string
214 regex = utils.regex_chooser(alg)
AttributeError: module 'natsort' has no attribute 'compat'
Do you think it is a problem between the different dependencies of libraries? If not, how can I fix this?
Thank you in advance!
Hello,
I am using the command to plot AF distribution for vcf files. But I am getting some error.
The command:
#!/bin/python
from fuc import common, pyvcf
common.load_dataset('pyvcf')
vcf_file = 'GEUVADIS.chr22.genotype.vcf'
vf = pyvcf.VcfFrame.from_file(vcf_file)
vf.plot_hist('AF')
the error:
file.python:5: DtypeWarning: Columns (5) have mixed types.Specify dtype option on import or set low_memory=False.
vf = pyvcf.VcfFrame.from_file(vcf_file)
Traceback (most recent call last):
File "file.python", line 6, in <module>
vf.plot_hist('AF')
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 1991, in plot_hist
df = self.extract(k, as_nan=True, func=d[k])
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 3912, in extract
df = self.df.apply(one_row, axis=1)
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply
return op.apply()
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 805, in apply_standard
results, res_index = self.apply_series_generator()
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/pandas/core/apply.py", line 821, in apply_series_generator
results[i] = self.f(v)
File "/home/kxj190026/anaconda3/envs/var/lib/python3.7/site-packages/fuc/api/pyvcf.py", line 3901, in one_row
i = r.FORMAT.split(':').index(k)
ValueError: 'AF' is not in list
Hello,
I try to remove all records with the same genotype calls in all samples from my vcf file (finally I need data representing differences between samples)? Which method will be appropriate to do that? I've tried to do it with filter_sampall() but it is not satisfactory for me. Here is my code:
from fuc import pyvcf
import pandas as pd
vf = pyvcf.VcfFrame.from_file('P1_short_test.vcf')
just_gt = vf.strip('GT').df
new_vf = pyvcf.VcfFrame([''], just_gt)
no_the_same_variant = new_vf.filter_sampall().df
Thanks a lot!
I noticed that automodule
is working fine with the make html
command, but it fails to render when built by Read The Docs on the web. This issue is most likely related to this issue. When I looked at the build in Read The Docs, I found this message:
Running Sphinx v3.5.4
loading translations [en]... done
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 5 source files that are out of date
updating environment: [new config] 5 added, 0 changed, 0 removed
reading sources... [ 20%] api
reading sources... [ 40%] changelog
reading sources... [ 60%] cli
reading sources... [ 80%] index
reading sources... [100%] readme
WARNING: autodoc: failed to import module 'api.BedFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.FastqFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.VcfFrame' from module 'fuc'; the following exception was raised:
No module named 'fuc'
WARNING: autodoc: failed to import module 'api.common' from module 'fuc'; the following exception was raised:
No module named 'fuc'
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [ 20%] api
writing output... [ 40%] changelog
writing output... [ 60%] cli
writing output... [ 80%] index
writing output... [100%] readme
generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 4 warnings.
The HTML pages are in _build/html.
Updating searchtools for Read the Docs search...
The sphinx.ext.linkcode
extension provides a simple way to add external GitHub links to source code. For more details, visit the extension's website.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.