Coder Social home page Coder Social logo

Mutect2 edge case error about lcr-modules HOT 2 CLOSED

emouts avatar emouts commented on July 26, 2024
Mutect2 edge case error

from lcr-modules.

Comments (2)

lkhilton avatar lkhilton commented on July 26, 2024

Hi Ilias,

If I understand correctly, after running this one sample through Strelka and LoFreq, it had no variants called? Have you performed any QC on this sample to see what might be causing this?

A workaround is to run MuTect2 separately from the SLMS-3 pipeline without specifying the candidate VCF file. However, if Strelka and LoFreq returned empty VCF files, it's unlikely that MuTect2 will find anything either. If this one sample is low quality, it should probably be excluded from your analyses anyway.

If you really want to try to fix this, you could use Python error handling to catch the empty file (example) in the checkpoint, and only use it for making the list of chromosomes if it's not empty. Something like:

checkpoint _mutect2_input_chrs:
    input:
        candidate_positions = CFG["inputs"]["candidate_positions"] if CFG["inputs"]["candidate_positions"] else str(rules._mutect2_dummy_positions.output),
        chrs = reference_files("genomes/{genome_build}/genome_fasta/main_chromosomes.txt"),
        capture_arg = _mutect_get_capspace
    output:
        chrs = CFG["dirs"]["inputs"] + "chroms/{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}/mutated_chromosomes.txt"
    run:
        # obtain list of main chromosomes
        main_chrs = pd.read_csv(input.chrs, comment='#', sep='\t', header=None)
        main_chrs = main_chrs.iloc[:, 0].astype(str).unique().tolist()
        #obtain list of chromosomes in candidate positions
        try: 
             candidate_chrs = pd.read_csv(input.candidate_positions, comment='#', sep='\t')
             candidate_chrs = candidate_chrs.iloc[:, 0].astype(str).unique().tolist()
        # If the candidate_chrs csv is empty, skip it and initialize candidate_chrs as an empty list
        except pd.errors.EmptyDataError:
            print('Note: filename.csv was empty. Skipping.')
            candidate_chrs = list()
            continue
        # obtain list of chromosomes in the capture space
        interval_chrs = pd.read_csv(input.capture_arg, comment='@', sep='\t')
        interval_chrs = interval_chrs.iloc[:, 0].astype(str).unique().tolist()
        # intersect the three lists to obtain chromosomes present in all
        intersect_chrs = list(set(main_chrs) & set(candidate_chrs) & set(interval_chrs))
        # convert list to single-column df
        intersect_chrs = pd.DataFrame(intersect_chrs).sort_values(0)
        # write out the file with mutated chromosomes
        intersect_chrs.to_csv(output.chrs, index=False, header=False)

I probably won't implement this as a permanent solution in the pipeline, however, because this will mean that some samples are handled in SLMS-3 differently, and in reality anything that has zero variants returned from Strelka and LoFreq probably needs to be dropped from the analysis.

Hope this helps!
Laura

from lcr-modules.

emouts avatar emouts commented on July 26, 2024

Hi Laura,

Thank you very much for your response, I appreciate it! You are absolutely right that this will only happen on a problematic sample and should be quite rare. My thinking was to avoid manual intervention during a run, but adding workarounds and silently failing can be dengerous too, so keeping it as is makes a lot of sense!

Thanks again,
Ilias

from lcr-modules.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.