lcr-bccrc / lcr-scripts Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 0.0 228 KB

Collection of curated scripts from the Morin and LCR labs

Python 56.87% R 35.86% Shell 7.27%

lcr-scripts's People

Contributors

Stargazers

Watchers

lcr-scripts's Issues

environment for salmon2counts fails to build

There is some conflict in the environment for salmon2counts that causes it to take forever to build. I eventually gave up after about an hour of this. Could someone else test it to see if this is in need of fixing or is somehow unique to me?

(base) -bash-4.2$ conda env create -f src/lcr-scripts/salmon2counts/1.0/salmon2counts.yaml
Collecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
Examining conflict for python pip wheel setuptools harfbuzz r-base cairo certifi python_abi pango glib:  55%|█████████████████████████████████▋                           | 123/223 [33:31<05:24,  3.24s/\ ]
failed                                                                                                                                                                                                   \
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
Examining conflict for r-dbplyr r-rvest r-httr r-htmltools r-reshape2 r-futile.logger r-readr r-vctrs bioconductor-biocparallel r-ggplot2 r-isoband r-pkgbuild r-rsqlite r-progress r-forcats r-knitr r-d/failed

LOH flags about battenberg file

I want to ask about the get_loh_flag function of BattenbergParser in cnv2igv.py file(1.4 version). I am quiet confused about the loh type "neutral" and it's function (if int(nMaj1_A) == 2 and int(nMin1_A) == 1: loh_flag = '1' ). I thought in the neutral situation, the nMin1_A should be 0. Could you please tell me how to define this flag here?

Allow empty columns in smr script

When the goal is to run one of the level 3 tools on :all_the_things:, allow the subsetting column to be NA so all of the possible values for that column are used (e.g. no need to specify full list of cohorts, or genome builds etc when you want to use all of the samples for that pathology).

vcf filtering

We should modify the vcf filtering script (implemented for Strelka's format) to allow the user to filter based on some more criteria, including total depth and non-reference read count.

The fill_segments script should support filling battenberg subclones files

The fill_segments script implemented through bedtools should also support filling the subclones outputs of battenberg. One of the ways to implement it is through adding a new parameter mode that will take either SEG or subclones options. These are the columns in the standard subclones file:

chr   startpos    endpos BAF   pval  LogR  ntot  nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A SDfrac_A    SDfrac_A_BS   frac1_A_0.025  frac1_A_0.975  nMaj1_B nMin1_B frac1_B nMaj2_B  nMin2_B frac2_B SDfrac_B    SDfrac_B_BS   frac1_B_0.025  frac1_B_0.975  nMaj1_C nMin1_C frac1_C nMaj2_C nMin2_C frac2_C SDfrac_C    SDfrac_C_BS   frac1_C_0.025  frac1_C_0.975  nMaj1_D  nMin1_D frac1_D nMaj2_D nMin2_D frac2_D SDfrac_D    SDfrac_D_BS   frac1_D_0.025  frac1_D_0.975  nMaj1_E nMin1_E frac1_E nMaj2_E nMin2_E frac2_E SDfrac_E    SDfrac_E_BS   frac1_E_0.025   frac1_E_0.975  nMaj1_F nMin1_F frac1_F nMaj2_F nMin2_F frac2_F SDfrac_F    SDfrac_F_BS   frac1_F_0.025  frac1_F_0.975

After the bed-defined columns, the rest should be filled as follows:

# assign values to be used to fill normal CN segments
  empty_baf = float(0.5)
  empty_pval = int(1)
  empty_logr = int(0)
  empty_ntot = float(2.0)
  empty_nMaj1_A = int(1)
  empty_nMin1_A = int(1)
  empty_frac1_A = int(1)
  empty_nMaj2_A = int(1)
  empty_nMin2_A = int(1)
  empty_frac2_A = int(1)

All remaining columns must be filled with NA string to maintain the formatting of the original files.

fishHook smg input only made for GRCh37

Similar to other tools, a section will be added to ensure fishHook is only run on GRCh37 (hg19) genome build.

cat("Requested mode is HotMAPS, but the supplied file is in the hg38-based coordinates.\n")
    cat("Unfortunately, HotMAPS is configured to only work for grch37-based maf files.\n")
    stop("Please supply the mutation data in grch37-based version.")

Possible bug in cnv2igv v1.3

I get an error running the latest version on a Battenberg output:

python src/lcr-scripts/cnv2igv/1.3/cnv2igv.py --mode battenberg --sample FL2007T2 results/gambl/battenberg-1.1/02-battenberg/genome--grch37/FL2007T2--FL2007N/FL2007T2_subclones.txt

running _battenberg_to_igv_seg for FL2007T2--FL2007N on gphost03.bcgsc.ca at Sat Dec 26 15:23:45 PST 2020
Traceback (most recent call last):
  File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 320, in <module>
    main()
  File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 317, in main
    print(seg.to_igv(prepend))
  File "/projects/rmorin/projects/gambl-repos/gambl-rmorin/src/lcr-scripts/cnv2igv/1.3/cnv2igv.py", line 251, in to_igv
    return('\t'.join([self.sample, self.chrm, self.start, \
TypeError: sequence item 4: expected str instance, int found

get_bams.R error

I get this error with the get_bams.R:

Setting up session...
Retrieving list of merged BAM files...
Warning message:
The following library IDs do not exist:
Warning messages:
1: `tz` argument is ignored by `as_date()`
2: `tz` argument is ignored by `as_date()`
Warning message:
Unknown library strategy encountered: EXC-Seq, NA
Warning message:
These libraries do not have merges: B22792, B22861, B22793, B22862, B22784, B22853, B22795, B22864, B22798, B22867, B22803, B22872, B59260, B59255, B59261, B59238, B59235, B59236, B59251, B59237, B59243, B59256, B59244, B59245, B59240, B59259, B59252, B59254, B59241, B59249, B59247, B59248, B59253, B59264, B59246, B59263, B59262, B59267, B59268, B59269, B59270, B59239, B58144, B58145, B59266, B59265, B59258, B58146, B58129, B58130, B58131, B58132, B58133, B58134, B58135, B58136, B58137, B58138, B58139, B58140, B58141, B58142, B58143, B59242, B59250, B59257, B59267, B59268, B59269, B59270
Retrieving RNA-seq aligned_libcore IDs...
Retrieving aligned_libcore Bio QC status and comments...
Error: Problem with `summarise()` input `lc.comments_warning`.
x 'x' must be atomic
i Input `lc.comments_warning` is `(structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
i The error occured in group 1: bam_id = 8.
Backtrace:
     x
  1. \-`%>%`(...)
  2.   +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3.   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4.     \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       \-`_fseq`(`_lhs`)
  6.         \-magrittr::freduce(value, `_function_list`)
  7.           +-base::withVisible(function_list[[k]](value))
  8.           \-function_list[[k]](value)
  9.             \-dplyr::summarise_at(...)
 10.               +-dplyr::summarise(.tbl, !!!funs)
 11.               \-dplyr:::summarise.grouped_df(.tbl, !!!funs)
 12.                 \-dplyr:::summarise_cols(.da
In addition: Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Execution halted

I found out that the bug comes from the command to split comments by status and collapse using libcore IDs (~line 364). More specifically, the error comes out at summarise_at(), but the warning messages stem from pivot_wider().

To fix:
Group by status, then create a dummy variable for each "status" name before using pivot_wider()

new code:

bio_qc_comments <-
  bio_qc_info %>%
  dplyr::filter(!is.na(bio_qc_comments), bio_qc_comments != "") %>%
  separate_rows(bio_qc_comments, sep = "; ?(?=Failed|Warning|Manual)") %>%
  mutate(bio_qc_comments = ifelse(grepl(":", bio_qc_comments),
                                  bio_qc_comments,
                                  paste0("Other:", bio_qc_comments))) %>%
  separate(bio_qc_comments, c("status", "comment"),
           sep = ":", extra = "merge") %>%
  mutate(status = sub(" ", "_", status),
         status = tolower(status),
         status = paste0("lc.comments_", status),
         comment = sub('"', "", comment),
         comment = trimws(comment),
         comment = paste0(aln_libcore_id, "={", comment, "}")) %>% 
  group_by(status) %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from = status, values_from = comment) %>% 
  dplyr::select(-row) %>% 
  group_by(bam_id) %>% 
  summarise_at(vars(contains("comments_")),
               ~ paste(sort(na.omit(unique(.))), collapse = "|"))

generarte_smr_inpouts stdout needs line breaks

The cat() that adds the outputs to stdout is not adding new lines at the end. When the stdout is piped into a log file, for example, this output is written entirely on one line, making it hard to read.

Solution:
Change printing function to one that automatically adds newlines at the end, or add a new line character to the string in the current cat()

lcr-bccrc / lcr-scripts Goto Github PK

lcr-scripts's People

Contributors

Stargazers

Watchers

lcr-scripts's Issues

environment for salmon2counts fails to build

LOH flags about battenberg file

Allow empty columns in smr script

vcf filtering

The fill_segments script should support filling battenberg subclones files

fishHook smg input only made for GRCh37

Possible bug in cnv2igv v1.3

get_bams.R error

generarte_smr_inpouts stdout needs line breaks

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent