lcr-bccrc / lcr-scripts Goto Github PK
View Code? Open in Web Editor NEWCollection of curated scripts from the Morin and LCR labs
Collection of curated scripts from the Morin and LCR labs
There is some conflict in the environment for salmon2counts that causes it to take forever to build. I eventually gave up after about an hour of this. Could someone else test it to see if this is in need of fixing or is somehow unique to me?
(base) -bash-4.2$ conda env create -f src/lcr-scripts/salmon2counts/1.0/salmon2counts.yaml
Collecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
Examining conflict for python pip wheel setuptools harfbuzz r-base cairo certifi python_abi pango glib: 55%|█████████████████████████████████▋ | 123/223 [33:31<05:24, 3.24s/\ ]
failed \
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
Examining conflict for r-dbplyr r-rvest r-httr r-htmltools r-reshape2 r-futile.logger r-readr r-vctrs bioconductor-biocparallel r-ggplot2 r-isoband r-pkgbuild r-rsqlite r-progress r-forcats r-knitr r-d/failed
Hi
I want to ask about the get_loh_flag function of BattenbergParser in cnv2igv.py file(1.4 version). I am quiet confused about the loh type "neutral" and it's function (if int(nMaj1_A) == 2 and int(nMin1_A) == 1: loh_flag = '1' ). I thought in the neutral situation, the nMin1_A should be 0. Could you please tell me how to define this flag here?
When the goal is to run one of the level 3 tools on :all_the_things:, allow the subsetting column to be NA so all of the possible values for that column are used (e.g. no need to specify full list of cohorts, or genome builds etc when you want to use all of the samples for that pathology).
We should modify the vcf filtering script (implemented for Strelka's format) to allow the user to filter based on some more criteria, including total depth and non-reference read count.
The fill_segments script implemented through bedtools should also support filling the subclones
outputs of battenberg. One of the ways to implement it is through adding a new parameter mode that will take either SEG
or subclones
options. These are the columns in the standard subclones file:
chr startpos endpos BAF pval LogR ntot nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A SDfrac_A SDfrac_A_BS frac1_A_0.025 frac1_A_0.975 nMaj1_B nMin1_B frac1_B nMaj2_B nMin2_B frac2_B SDfrac_B SDfrac_B_BS frac1_B_0.025 frac1_B_0.975 nMaj1_C nMin1_C frac1_C nMaj2_C nMin2_C frac2_C SDfrac_C SDfrac_C_BS frac1_C_0.025 frac1_C_0.975 nMaj1_D nMin1_D frac1_D nMaj2_D nMin2_D frac2_D SDfrac_D SDfrac_D_BS frac1_D_0.025 frac1_D_0.975 nMaj1_E nMin1_E frac1_E nMaj2_E nMin2_E frac2_E SDfrac_E SDfrac_E_BS frac1_E_0.025 frac1_E_0.975 nMaj1_F nMin1_F frac1_F nMaj2_F nMin2_F frac2_F SDfrac_F SDfrac_F_BS frac1_F_0.025 frac1_F_0.975
After the bed-defined columns, the rest should be filled as follows:
# assign values to be used to fill normal CN segments
empty_baf = float(0.5)
empty_pval = int(1)
empty_logr = int(0)
empty_ntot = float(2.0)
empty_nMaj1_A = int(1)
empty_nMin1_A = int(1)
empty_frac1_A = int(1)
empty_nMaj2_A = int(1)
empty_nMin2_A = int(1)
empty_frac2_A = int(1)
All remaining columns must be filled with NA
string to maintain the formatting of the original files.
Similar to other tools, a section will be added to ensure fishHook is only run on GRCh37 (hg19) genome build.
cat("Requested mode is HotMAPS, but the supplied file is in the hg38-based coordinates.\n")
cat("Unfortunately, HotMAPS is configured to only work for grch37-based maf files.\n")
stop("Please supply the mutation data in grch37-based version.")
I get an error running the latest version on a Battenberg output:
python src/lcr-scripts/cnv2igv/1.3/cnv2igv.py --mode battenberg --sample FL2007T2 results/gambl/battenberg-1.1/02-battenberg/genome--grch37/FL2007T2--FL2007N/FL2007T2_subclones.txt
running _battenberg_to_igv_seg for FL2007T2--FL2007N on gphost03.bcgsc.ca at Sat Dec 26 15:23:45 PST 2020
Traceback (most recent call last):
File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 320, in <module>
main()
File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 317, in main
print(seg.to_igv(prepend))
File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 251, in to_igv
return('\t'.join([self.sample, self.chrm, self.start, \
TypeError: sequence item 4: expected str instance, int found
I get this error with the get_bams.R:
Setting up session...
Retrieving list of merged BAM files...
Warning message:
The following library IDs do not exist:
Warning messages:
1: `tz` argument is ignored by `as_date()`
2: `tz` argument is ignored by `as_date()`
Warning message:
Unknown library strategy encountered: EXC-Seq, NA
Warning message:
These libraries do not have merges: B22792, B22861, B22793, B22862, B22784, B22853, B22795, B22864, B22798, B22867, B22803, B22872, B59260, B59255, B59261, B59238, B59235, B59236, B59251, B59237, B59243, B59256, B59244, B59245, B59240, B59259, B59252, B59254, B59241, B59249, B59247, B59248, B59253, B59264, B59246, B59263, B59262, B59267, B59268, B59269, B59270, B59239, B58144, B58145, B59266, B59265, B59258, B58146, B58129, B58130, B58131, B58132, B58133, B58134, B58135, B58136, B58137, B58138, B58139, B58140, B58141, B58142, B58143, B59242, B59250, B59257, B59267, B59268, B59269, B59270
Retrieving RNA-seq aligned_libcore IDs...
Retrieving aligned_libcore Bio QC status and comments...
Error: Problem with `summarise()` input `lc.comments_warning`.
x 'x' must be atomic
i Input `lc.comments_warning` is `(structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
i The error occured in group 1: bam_id = 8.
Backtrace:
x
1. \-`%>%`(...)
2. +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
4. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
5. \-`_fseq`(`_lhs`)
6. \-magrittr::freduce(value, `_function_list`)
7. +-base::withVisible(function_list[[k]](value))
8. \-function_list[[k]](value)
9. \-dplyr::summarise_at(...)
10. +-dplyr::summarise(.tbl, !!!funs)
11. \-dplyr:::summarise.grouped_df(.tbl, !!!funs)
12. \-dplyr:::summarise_cols(.da
In addition: Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Execution halted
I found out that the bug comes from the command to split comments by status and collapse using libcore IDs (~line 364). More specifically, the error comes out at summarise_at(), but the warning messages stem from pivot_wider().
To fix:
Group by status, then create a dummy variable for each "status" name before using pivot_wider()
new code:
bio_qc_comments <-
bio_qc_info %>%
dplyr::filter(!is.na(bio_qc_comments), bio_qc_comments != "") %>%
separate_rows(bio_qc_comments, sep = "; ?(?=Failed|Warning|Manual)") %>%
mutate(bio_qc_comments = ifelse(grepl(":", bio_qc_comments),
bio_qc_comments,
paste0("Other:", bio_qc_comments))) %>%
separate(bio_qc_comments, c("status", "comment"),
sep = ":", extra = "merge") %>%
mutate(status = sub(" ", "_", status),
status = tolower(status),
status = paste0("lc.comments_", status),
comment = sub('"', "", comment),
comment = trimws(comment),
comment = paste0(aln_libcore_id, "={", comment, "}")) %>%
group_by(status) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = status, values_from = comment) %>%
dplyr::select(-row) %>%
group_by(bam_id) %>%
summarise_at(vars(contains("comments_")),
~ paste(sort(na.omit(unique(.))), collapse = "|"))
The cat()
that adds the outputs to stdout is not adding new lines at the end. When the stdout is piped into a log file, for example, this output is written entirely on one line, making it hard to read.
Solution:
Change printing function to one that automatically adds newlines at the end, or add a new line character to the string in the current cat()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.