luyitian / flames Goto Github PK
View Code? Open in Web Editor NEWFull-length transcriptome splicing and mutation analysis
License: GNU General Public License v3.0
Full-length transcriptome splicing and mutation analysis
License: GNU General Public License v3.0
Hi, both my genome.fa
and gff3
files use contig chr1
. Is there support for this format or parameters I can set to solve this error?
Traceback (most recent call last):
File "PATH/TO/FLAMES/python/sc_long_pipeline.py", line 240, in
sc_long_pipeline(args)
File "PATH/TO/FLAMES/python/sc_long_pipeline.py", line 193, in sc_long_pipeline
raw_gff3=raw_splice_isoform if config_dict["global_parameters"]["generate_raw_isoform"] else None)
File "PATH/TO/FLAMES/python/sc_longread.py", line 1123, in group_bam2isoform
it_region = bamfile.fetch(ch, bl.s, bl.e)
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 686, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `chr1
Hi,
first, thanks a lot for developing FLAMES!
I have one question about the configuration parameters and a problem regarding some missing genes/transcripts in the final FLAMES output and would really appreciate some help.
i) First, I was wondering if there is any further explanation for the different isoform parameters that can be adapted in the config file? I have an idea about some of the parameters (MAX_DIS, MAX_TS_DIST, Min_sup_cnt, strand_specific) but I would really appreciate a bit more detail about how the others impact the isoform identification step.
ii) Moreover, I noticed that some of the chromosomes/regions I was providing in the gene annotation reference were not part of the final FLAMES output. I'm using a slightly adapted gtf and fasta file that doesn't only contain human genes but also some pathogens. However, even though reads map against those genes, not a single transcript isoform for those genes is written into the isoform_annotated.gff3 and transcript_assembly.fa. Also, no mitochondrial transcripts are detected.
I checked the number of reads mapping to those regions in the align2genome.bam with samtools idxstats align2genome.bam and at least for the mitochondrial genes, a lot of reads are mapping.
However, only those seqnames are included in the isoform_annotated.gff3:
['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '3', '4', '5', '6', '7', '8', '9', 'GL000191.1', 'GL000192.1', 'GL000194.1', 'GL000195.1', 'GL000218.1', 'GL000219.1', 'GL000223.1', 'X', 'Y']
Are they filtered out due to the parameters specified in the configuration or is something else happening here? It would be great to have information about those genes and transcripts as well.
Thanks a lot!
Best,
Kristin
Hi, I am using match_cell_barcode for ONT single cell data. I obtained a "whitelist.csv" and "putative_bc.csv" file from the output of BLAZE, at the same time I also have short-read sequencing data on the same library.
However, I am confused what file should be used for the 2nd argument of match_cell_barcode, that is "output cell barcode statistics file", or as explained in README "a file name/path for the statistics of barcode matching".
Can you please help with understanding this file? How should it look like (which headers?) and how can I get it?
Thanks in advance!
Hi there,
Thank you for creating this amazing tool!
I am trying to utilize the DTU analysis script from the FLTseq_data directory, and I am just wondering how I can get the cluster_annotation.csv file?
(Line 80-82)
cluster_barcode_anno <- read.csv(file.path(data_dir,"cluster_annotation.csv"), stringsAsFactors=FALSE)
rownames(cluster_barcode_anno) = cluster_barcode_anno$barcode_seq
comm_cells = intersect(colnames(tr_sce),rownames(cluster_barcode_anno))
Thank you
Hi, I am getting this error in the final counts matrix generation step:
does anyone know how to circumvent this issue?
b'[bam_sort_core] merging from 9 files and 12 in-memory blocks...\n'
b''
### generate transcript count matrix 2023-12-08 17:21:24
Traceback (most recent call last):
File "/users/sparthib/flames/python/bulk_long_pipeline.py", line 270, in <module>
bulk_long_pipeline(args)
File "/users/sparthib/flames/python/bulk_long_pipeline.py", line 235, in bulk_long_pipeline
bc_tr_count_dict, bc_tr_badcov_count_dict, tr_kept = parse_realigned_bam(
File "/users/sparthib/flames/python/count_tr.py", line 114, in parse_realigned_bam
bc_dict = make_bc_dict(kwargs["bc_file"])
File "/users/sparthib/flames/python/count_tr.py", line 57, in make_bc_dict
with open(bc_anno) as f:
FileNotFoundError: [Errno 2] No such file or directory: ''
thanks!
Sowmya
Ciao Luyi
Thanks again for nice work,
could you please let me know what's the difference between FSM (which is based on the definition of SQANTI isoformas matched with reference in all splicing junction) and FSM_annotation file at the output of the flames? in this annotation file there I column which is FSM-match to ref, if this is the list of all FSM Isoforms , then what this col7umn tell us?
Thanks
Iman
Hi,
I tried running the example in the example folder of the python installation of FLAMES which gives an error:
subprocess.CalledProcessError: Command '['samtools faidx FLAMES_output/transcript_assembly.fa']' returned non-zero exit status 1.
I noticed that the transcript_assembly.fa file is empty. In the get_transcript_seq function in gff3_to_fa.py, it exits the first for loop right away as the following statement is false: if ch not in chr_to_gene:
. However, it also does not enter the next for loop (for tr_seq in global_seq_dict:
) because the dictionary is empty. I'd really appreciate your help.
Hello,
I was trying to use FLAMES in a isoform characterization benchmarking study with a single sample but, since I am new with the long-read world, it is not clear to me yet which are the key parameters that I need to consider in the configuration file. After running FLAMES i found my isoform_filtered gff3 file almost empty. This is my output data:
SIZE DATE FILE
2444550950 Jul 13 19:28 align2genome.bam
3252184 Jul 13 19:28 align2genome.bam.bai
16 Jul 13 19:43 isoform_annotated.filtered.gff3
15122175 Jul 13 19:32 isoform_annotated.gff3
61 Jul 13 19:43 isoform_FSM_annotation.csv
3534807677 Jul 13 18:11 merged.fastq.gz
59 Jul 13 18:11 pseudo_barcode_annotation.csv
1505577353 Jul 13 19:41 realign2transcript.bam
3221800 Jul 13 19:41 realign2transcript.bam.bai
98666092 Jul 13 19:33 transcript_assembly.fa
2062401 Jul 13 19:33 transcript_assembly.fa.fai
118564 Jul 13 19:42 transcript_count.bad_coverage.csv.gz
186937 Jul 13 19:42 transcript_count.csv.gz
3617886 Jul 13 19:32 tss_tes.bedgraph
My input parameters and data was:
--gff3 gencode.v40.annotation.gtf (human annotations)
--genomefa GRCh38.primary_assembly.genome.fa. (human reference genome)
--outdir FLAMES_output/
--fq_dir fastq/ (path to my directory containing my unique fastq file)
I am not using any configuration file so FLAMES is applying other parameters by default and I guess this is the main problem for me since it is designed for ONT. So my question would be, which are the best parameters for running an analysis with PacBio files? Which are your recommendations?
Here I paste a config file I used for ONT data so you indicate if this is everything I need to correct or, apart from correcting these parms for PacBio there is extra params to consider.
"pipeline_parameters":{
"do_genome_alignment":true,
"do_isoform_identification":true,
"do_read_realignment":true,
"do_transcript_quantification":true
},
"global_parameters":{
"generate_raw_isoform":false,
"has_UMI":false
},
"isoform_parameters":{
"MAX_DIST":10,
"MAX_TS_DIST":120,
"MAX_SPLICE_MATCH_DIST":10,
"min_fl_exon_len":40,
"Max_site_per_splice":3,
"Min_sup_cnt":10,
"Min_cnt_pct":0.001,
"Min_sup_pct":0.2,
"strand_specific":0,
"remove_incomp_reads":5
},
"alignment_parameters":{
"use_junctions":true,
"no_flank":false
},
"realign_parameters":{
"use_annotation":true
},
"transcript_counting":{
"min_tr_coverage":0.3,
"min_read_coverage":0.3
}
}
Thank you very much for your help in advance and my apologies for such basic question!
Best,
AP
Hi,
Thank you for this tool.
I would like to know if we can only run mutation analysis without full-length transcriptome splicing. I have mapped bam and barcodes files.
Thanks
I'm struggling to convert transcript_count.csv.gz matrix to a Seurat or AnnData object? Any help and advice would be appreciated.
Hi there,
Thank you so much for this amazing tool!
I am just wondering if it is possible to get a more in-depth explanation of each parameter for the config file e.g. for isoform parameters?
Thank you
In your script file, not find filtered_feature_bc_matrix/barcodes.tsv.gz
Hi ,
I am trying to utilize the tr_classify analysis script from the FLTseq_data directory, and I am just wondering how the fsm_splice_comp.csv create (I have runed sc_long_pipeline,however no this file in the output)?
(Line 43)
fsm_splice_comp <- read.csv(file.path(data_dir,"fsm_splice_comp.csv"), header=FALSE, stringsAsFactors=FALSE)
Thank you
Hi @LuyiTian , I am currently using FLAMES for single cell isoform identification and detection. Now I'm at the barcode assignment steps, where I ran the compilation code g++ -std=c++11 -lz -O2 -o match_cell_barcode ssw/ssw_cpp.cpp ssw/ssw.c match_cell_barcode.cpp kseq.h edit_dist.cpp
as shown in the README, but I get the following error:
What is the problem?
hi~
I remember that you use FLAMES to detect mutation and plot the mutation in UMAP. I was so impressed by this part. However I notice that "mutation detection" was not included in sc_long_pipeline.py pipeline and config file while there is did a python script named "bam_mutation.py". I do not know how to use this script, can you provide a tutorial on how you did this ?
thanks
garfield
2021 12 29
Can I know when flames match flanking sequence CTACACGACGCTCTTCCGATCT,
do they allow matching with an edit distance or it has to be exact match?
Hi @LuyiTian ,
Could you please comment on my error below?
Running code:
for i in test; do /FLAMES/python/sc_long_pipeline.py --gff3 hg38v99.Cellranger.genes.gtf --infq $i.demultiplexed.fq.gz --outdir FLAMES_Output/$i --genomefa hg38v99.Cellranger.genome.fa --config_file /FLAMES/config_sclr_nanopore_default.json --minimap2_dir /Software/anaconda_py2/bin/ >$i.log 2>&1 & done
Error:
Use config file: config_sclr_nanopore_default.json
Parameters in configuration file:
comment : this is the default config for nanopore single cell long read data using 10X RNA-seq kit. use splice annotation in alignment.
global_parameters
has_UMI : True
generate_raw_isoform : False
isoform_parameters
Min_sup_pct : 0.2
MAX_SPLICE_MATCH_DIST : 10
random_seed : 666666
Min_cnt_pct : 0.001
MAX_DIST : 10
Min_sup_cnt : 5
MAX_TS_DIST : 120
Max_site_per_splice : 3
strand_specific : -1
remove_incomp_reads : 4
min_fl_exon_len : 40
pipeline_parameters
do_transcript_quantification : True
do_read_realignment : True
do_genome_alignment : True
do_isoform_identification : True
transcript_counting
min_tr_coverage : 0.4
min_read_coverage : 0.4
realign_parameters
use_annotation : True
alignment_parameters
no_flank : False
use_junctions : True
output directory not exist, create one:
FLAMES_Output/test
Input parameters:
gene annotation: hg38v99.Cellranger.genes.gtf
genome fasta: hg38v99.Cellranger.genome.fa
input fastq: test.demultiplexed.fq.gz
output directory: FLAMES_Output/test
directory contains minimap2: /Software/anaconda_py2/bin/
### align reads to genome using minimap2 2021-01-30 12:48:05
Traceback (most recent call last):
File "/FLAMES/python/sc_long_pipeline.py", line 213, in <module>
sc_long_pipeline(args)
File "/FLAMES/python/sc_long_pipeline.py", line 159, in sc_long_pipeline
minimap2_align(args.minimap2_dir, args.genomefa, args.infq, tmp_bam, no_flank=config_dict["alignment_parameters"]["no_flank"], bed12_junc=tmp_bed if config_dict["alignment_parameters"]["use_junctions"] else None)
File "/FLAMES/python/minimap2_align.py", line 37, in minimap2_align
print subprocess.check_output([align_cmd], shell=True, stderr=subprocess.STDOUT)
File "/Software/anaconda_py2/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/Software/anaconda_py2/bin/minimap2 -ax splice -t 12 --junc-bed FLAMES_Output/test/tmp.splice_anno.bed12 --junc-bonus 1 -k14 --secondary=no hg38v99.Cellranger.genome.fa test.demultiplexed.fq.gz | samtools view -bS -@ 4 -m 2G -o FLAMES_Output/test/tmp.align.bam - ']' returned non-zero exit status 1
Hello, FLAMES aligns my reads to the reference genome but during realignment I get this error:
### skip aligning reads to genome 2023-12-07 15:57:35
### read gene annotation 2023-12-07 15:57:35
remove similar transcripts in gene annotation: Counter({'duplicated_transcripts': 765})
### find isoforms 2023-12-07 15:59:27
Traceback (most recent call last):
File "/users/sparthib/flames/python/bulk_long_pipeline.py", line 270, in <module>
bulk_long_pipeline(args)
File "/users/sparthib/flames/python/bulk_long_pipeline.py", line 202, in bulk_long_pipeline
group_bam2isoform(genome_bam, isoform_gff3, tss_tes_stat, "", chr_to_blocks, gene_dict, transcript_to_junctions, transcript_dict, args.genomefa,
File "/users/sparthib/flames/python/sc_longread.py", line 1115, in group_bam2isoform
for c in get_fa(fa_f):
File "/users/sparthib/flames/python/sc_longread.py", line 45, in get_fa
for line in open(fn):
File "/users/sparthib/.conda/envs/FLAMES/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I cloned the flames package from github and this is my environment info:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.23.0 hd590300_0 conda-forge
ca-certificates 2023.11.17 hbcca054_0 conda-forge
editdistance 0.6.2 py310hc6cd4ac_2 conda-forge
htslib 1.18 h81da01d_0 bioconda
k8 0.2.5 hdcf5f25_4 bioconda
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libblas 3.9.0 20_linux64_openblas conda-forge
libcblas 3.9.0 20_linux64_openblas conda-forge
libcurl 8.4.0 hca28451_0 conda-forge
libdeflate 1.18 h0b41bf4_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_3 conda-forge
libgfortran-ng 13.2.0 h69a702a_3 conda-forge
libgfortran5 13.2.0 ha4646dd_3 conda-forge
libgomp 13.2.0 h807b86a_3 conda-forge
liblapack 3.9.0 20_linux64_openblas conda-forge
libnghttp2 1.58.0 h47da74e_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge
libsqlite 3.44.2 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
minimap2 2.26 he4a0461_2 bioconda
ncurses 6.4 h59595ed_2 conda-forge
numpy 1.26.2 py310hb13e2d6_0 conda-forge
openssl 3.2.0 hd590300_1 conda-forge
pip 23.3.1 pyhd8ed1ab_0 conda-forge
pysam 0.22.0 py310h41dec4a_0 bioconda
python 3.10.13 hd12c33a_0_cpython conda-forge
python_abi 3.10 4_cp310 conda-forge
readline 8.2 h8228510_1 conda-forge
samtools 1.18 h50ea8bc_1 bioconda
setuptools 68.2.2 pyhd8ed1ab_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
Any pointers would be appreciated, thank you!
By setting the edit distance and with the barcode list, some of the reads should be removed from consideration. How can I know the number of reads that are assigned to the barcodes in flames output?
Hi,
Thank you for development of nice tool.
I'm applying BLAZE and FLAMSE to my single cell ONT data.
I've gotten useful output, but I need a genomic coordinate for each transcript to compare the transcript structure.
However, some of transcripts in the "transcript_count.csv.gz" are not existed in the "isoform_annotated.gff3" and "isoform_annotated.filtered.gff3". How can I find the information for these transcripts?
Thank you!
Hi, is there any way to specify the number of cores for the single cell run so we can execute it faster like on a dataset with > 20 millions of reads ?
Hi, I'm just wondering whether the counts table generated from the pipeline are already UMI deduplicated counts. If not, how would I go about generating these from the FLAMES output?
In addition, I found that for my transcript IDs for the mouse samples (from pipeline_output/transcript_count.csv
), I'm getting quite a few transcript IDs that start with ENMUSG instead of ENMUST. Am I correct in thinking that these are gene codes instead of transcript IDs, and why would that be the case?
Hello,
Thanks for developing the tool.
I was wandering if there is a way to get the gene name in the output matrix instead of the transcript_ID or the gene_ID ?
It would be more convenient for downstream analysis to have the correspondence gene_ID == gene_name.
Thanks for your help.
Rania
Is there a way to define number of cores, RAM usage, etc. for the pipelines?
Minimap2/Samtools is throwing an error from reads with append cell barcode/UMI (generated from match_cell_barcode
).
[E::sam_parse1] query name too long
[W::sam_read1_sam] Parse error at line 8760987
samtools sort: truncated file. Aborting
Here is an example qname:
@CTACGGGAGAGCTTTC_CGATAAGACCCA#ACATCGAGTCAAACGG_GCACATCTTGGC#GTAGAGGAGCGGGTTA_AGGCACCTATGT#AGTACTGAGAGTCAGC_CTCAGCCAGTAA#TGTCCCAGTTACCGTA_ATCGTACCAGTC#AATCGTGTCGACATCA_ACTCAAGGCCAT#CGAGAAGGTTCGGCGT_TACGCCAGTCTG#GCTGCAGCACATGGTT_TGATTATGCCTC#CCGTAGGCAGACTGCC_CTCTCGCATACA#TAAGTCGCAGGAGGTT_TAACTATTTACG#TCGTAGATCACTACGA_AGACGCAAATTT#GTCGAATAGGTTACAA_ACAAATTGTTTC#ACAAGCTCAGGCGTTC_CGTTGCCTATAT#GTGCACGAGGATAATC_CAGGAGTCAGAA#AGGATAAAGGTATCTC_CCAATCGCTTTA#GTCATGAGTCCTCCTA_AGCTCAAACACT#GACTTCCCAAAGTATG_GCCCACTTGCTG#TGTACAGTCAACCGAT_TGAAGCATCCAC#TGAGGTTTCAAGGACG_GGACCAAGTCGG#TTACGCCCAGCCATTA_AATCACCGCTCG#ATATCCTCACAATGAA_AATTATCTCTTT#CCACACTCAATAGGGC_CACCTATTTTTT#TCTCTGGCAAACACGG_GCCCCTGCATAG#ATATCCTGTATTCCGA_AATTATGAACTT#TCCCATGGTTGCGGAA_AAATTACAATCC#AGTAGTCTCGTCTCAC_CCATGATTCACG#CTAACCCGTGGCCTCA_ATTTACAGATGA#32fd44aa-9033-40d6-a233-bf43ece68751
Looks like qname must be equal to or shorter than 254 characters: samtools/samtools#1081
On line 88, read_dict[r][0] will be assigned with (tr, rec.get_tag("AS"), tr_cov, float(rec.query_alignment_length)/rec.infer_read_length(), rec.mapping_quality)
, contradicting the comment on line 106 # transcript_id, pct_ref, pct_reads
.
hit[1] > 0.8
was used on line 119, which would be evaluating alignment score > 0.8
.
0.8 seems to be a very low threshold for alignment scores, did you mean to evaluate pct_ref > 0.8
(i.e. hit[2] > 0.8
)?
Hi Luyi,
The pipeline is great! Thanks for the effort and for sharing it.
I have tried FLAMES on your published data and our own in-house data, and have two questions:
Looking forward to your feedback.
Thanks,
Yan
Can this pipeline also demultiplex reads from cell barcodes?
hi ~
What's the difference between barcode hm match
and barcode match
.
Hi,
I am currently using FLAMES and a few other assemblers (flair and bookend), to compare them against each other and find out, which would be the most optimal one for my data and workflow (drosophila nanopore-sequences). Currently I am facing the issue, that my FLAMES-based transcriptomes are surprisingly small (after correction and filtering roughly 2500 isoforms against flairs 16000), even with the same references and sequencing files. I think, this may be due to the config file, that I honestly just copied from the github. What would you recommend as parameters/what should be changed to perhaps solve this? Would it be appropriate to be less strict and how would I enforce this in the config file?
Best,
Hasan.
Could we run FLAMES directly from a bam file which is generated by other demultiplex tool (i.e. Nanopore/sockeye)? Actually, I have tried once, but failed. It seems that fastq file is required in realign step. Could you please give me some advice if we have no short-read sequencing data but want to use FLAMES for isoform analysis? Thanks so much!
Hi,
Thanks for developing FLAMES.
I have a specific requirement that involves adapting match_cell_barcode function to accommodate different barcode and UMI lengths. Currently, the software assumes a standard barcode length of 16 and a UMI length of 10, based on 10X kit.
I would like to request if it would be possible to modify these parameters according to my needs, since I’m using a custom single-cell ONT library with same flanking sequence (CTACACGACGCTCTTCCGATCT) but barcode length of 11 and UMI of 14 bp respectively.
Regarding the UMI length, it can be specified by command line.
I would like to ask you a feedback:
Are these modifications correct and sufficient in order to have a proper barcode and UMI assignment, or do I have to change something else in the source code of match_cell_barcode?
Sorry in advance but I’m not an expert in c++.
Best
Do I need a specific GFF/GTF format?
I am getting this error:
Traceback (most recent call last):
File "/FLAMES/python/bulk_long_pipeline.py", line 243, in <module>
bulk_long_pipeline(args)
File "/FLAMES/python/bulk_long_pipeline.py", line 171, in bulk_long_pipeline
gff3_to_bed12(args.minimap2_dir, args.gff3, tmp_bed)
File "/FLAMES/python/minimap2_align.py", line 17, in gff3_to_bed12
print subprocess.check_output([cmd], shell=True, stderr=subprocess.STDOUT)
File "/miniconda/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['paftools.js gff2bed /gnet/is6/p04/data/dnaseq/analysis/led13/genomes/GCA_000001405.15_GRCh
38_full_analysis_set.refseq_annotation.gtf > /gnet/is6/p04/data/dnaseq/analysis/led13/outputs/R6310_q10_l300_flames/tmp.splice_anno.
bed12']' returned non-zero exit status 1
here is the head of my GTF file:
#gtf-version 2.2
#!genome-build GRCh38
#!genome-build-accession NCBI_Assembly:GCA_000001405.15
#!annotation-date 01/25/2019
#!annotation-source NCBI Homo sapiens Updated Annotation Release 109.20190125
chr1 BestRefSeq gene 11874 14409 . + . gene_id "DDX11L1"; db_xref "GeneID:100287102"; db_xref "HGNC
:HGNC:37102"; description "DEAD/H-box helicase 11 like 1"; gbkey "Gene"; gene "DDX11L1"; gene_biotype "transcribed_pseudogene"; pseu
do "true";
chr1 BestRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "Gen
eID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1"; exon_number "1";
chr1 BestRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "Gen
eID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1"; exon_number "2";
chr1 BestRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; db_xref "Gen
eID:100287102"; gbkey "misc_RNA"; gene "DDX11L1"; product "DEAD/H-box helicase 11 like 1"; exon_number "3";
chr1 BestRefSeq gene 14362 29370 . - . gene_id "WASH7P"; db_xref "GeneID:653635"; db_xref "HGNC:HGN
C:38034"; description "WAS protein family homolog 7, pseudogene"; gbkey "Gene"; gene "WASH7P"; gene_biotype "transcribed_pseudogene
Hi,
Nice work! Congrats!
Two questions:
1- Can I use the config file from the example ("SIRV_config.json") to run my human datasets?
2- Also, I could not activate the environment after installing your software, the error below, and I guess I still can run it without activating the env. Is that correct? I am not getting any error if I run your software without activating the env.
I tried to export the env path and also ran "conda.sh" before running the command with no luck.
Here is what I get if I try to activate the env:
conda activate FLAMES
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
I'm evaluating FLAMES and FLAIR for my project. Can you comment on the conceptual or algorithmic differences between the two packages? For example, what aspect of FLAMES leads to its increased accuracy in benchmarking?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.