Coder Social home page Coder Social logo

bonsai-prp's Introduction

Pipeline result processor (prp)

A collection of parsers and data models for creation and validation of a standardized output for the jasen pipeline which is used as an input for bonsai.

Warning

Bonsai-PRP is under development in an alpha stage. Expect uneven documentation, breaking changes, and bugs until the official 1.0 release.

Dependencies (latest)

  • biopython
  • pydantic=2.5.3
  • python=3.10

Using prp

Use the help argument for information regarding the prp's methods

prp --help

Use the method help argument for information regarding the input for each of prp's methods (create-bonsai-input, create-cdm-input, create-qc-result, print-schema, validate)

prp <method> --help

Create bonsai input from pipeline data

prp create-bonsai-input -i SAMPLE_ID -u RUN_METADATA_FILE -q QUAST_FILENAME -d PROCESS_METADATA_FILE -k KRAKEN_FILE -a AMRFINDER_FILE -m MLST_FILE -c CGMLST_FILE -v VIRULENCEFINDER_FILE -r RESFINDER_FILE -p POSTALIGNQC_FILE -k MYKROBE_FILE -t TBPROFILER_FILE [--correct_alleles] -o OUTPUT_FILE [-h]

Create CDM input from pipeline data

prp create-cdm-input -q QUAST_FILENAME -c CGMLST_FILE -p POSTALIGNQC_FILE [--correct_alleles] -o OUTPUT_FILE [-h]

Create QC result from bam file

prp create-qc-result -i SAMPLE_ID --b BAM_FILE [-e BED_FILE] [-a BAITS_FILE] -r REFERENCE_FILE [-c CPUS] -o OUTPUT_FILE [-h]

bonsai-prp's People

Contributors

mhkc avatar ryanjameskennedy avatar svartapaerlan avatar

Watchers

 avatar

bonsai-prp's Issues

TypeError re coverage_uniformity

Here is the following error:

  Traceback (most recent call last):
    File "/usr/local/bin/prp", line 8, in <module>
      sys.exit(cli())
               ^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/prp/cli.py", line 376, in create_cdm_input
      res: QcMethodIndex = parse_postalignqc_results(quality)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/prp/parse/qc.py", line 278, in parse_postalignqc_results
      coverage_uniformity=float(qc_dict["coverage_uniformity"]),
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  TypeError: float() argument must be a string or a real number, not 'NoneType'

Pysam error when trying to read the bam file

The following error occurs regarding oppening the bam file:

  [W::hts_idx_load3] The index file is older than the data file: /fs1/results_dev/jasen/mtuberculosis/bam/11TB0101651.bam.bai
  [E::bgzf_read_block] Invalid BGZF header at offset 564
  [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes
  Traceback (most recent call last):
    File "/usr/local/bin/prp", line 8, in <module>
      sys.exit(cli())
               ^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 101, in augment_usage_errors
      yield
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/prp/cli.py", line 301, in create_bonsai_input
      bam_ref_genome = get_reference_seq_accnr(bam)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/prp/parse/mapping.py", line 17, in get_reference_seq_accnr
      read = next(samfile.fetch())
             ^^^^^^^^^^^^^^^^^^^^^
  StopIteration
  OSError: [Errno 0] Success
  Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.__dealloc__'
  OSError: [Errno 0] Success

Args for cdm qc output json filepath and analysis results

CDM needs a qc json output as defined in this issue that is outputted to --qc-results and the usual analysis results outputted to --analysis-results. Essentially we want a conbined qc output produced by prp looking like this:

[
    {
        "software": "quast",
        "version": null,
        "result": {
            "total_length": 2761929,
            "reference_length": 2809422,
            "largest_contig": 188911,
            "n_contigs": 84,
            "n50": 107938,
            "assembly_gc": 32.64,
            "reference_gc": 32.82,
            "duplication_ratio": 1.004
        }
    },
    {
        "software": "postalignqc",
        "version": null,
        "result": {
            "ins_size": 270,
            "ins_size_dev": 259,
            "mean_cov": 60,
            "pct_above_x": {
                "30": 87.4161621758634,
                "1000": 0.0,
                "10": 99.2404637975042,
                "500": 0.00347789180809111,
                "250": 0.0127915681755215,
                "100": 6.22849159705971,
                "1": 99.9462400452715
            },
            "mapped_reads": 1007929,
            "tot_reads": 1140921,
            "iqr_median": 0.616666666666667,
            "dup_pct": 0.0,
            "dup_reads": 0
        }
    },
    {
        "software": "chewbbaca",
        "version": null,
        "result": {
            "n_missing": 192,
        }
    }
]

Also can bump the version (not sure if this is major or minor?)

tb-profiler: assigns wrong variant type

TBprofiler parser seems to only assign substitution variant type.

For instance the test file contains a conservative_inframe_deletion in rpoB is reported as a substitution.

Docker upload GA workflow error

The following error is being returned:

RUN pip install bonsai-prp &&     pip install biopython:
2.406 ERROR: Ignored the following versions that require a different python version: 0.3.0 Requires-Python >=3.10; 0.3.1 Requires-Python >=3.10
2.406 ERROR: Could not find a version that satisfies the requirement bonsai-prp (from versions: none)
2.406 ERROR: No matching distribution found for bonsai-prp
2.504 
Notice: 2.504 [notice] A new release of pip is available: 23.0.1 -> 23.3.2
Notice: 2.504 [notice] To update, run: pip install --upgrade pip

tb-profiler: include qc_fail_variants

include the failed variants from the tb-profiler json. This enables us to particularly look at minority variants.

Also add a value pipeline-qc : fail or similar to tell bonsai that the variant could be bad. In bonsai there will need to be an accompanying function to manually flag the qc_fail_variants as ok if manual inspection judges it to be a valid variant.

Altered TBProfiler database db_version doesn't contain "commit" key

The following error occured when running TBProfiler with the altered database:

Traceback (most recent call last):
    File "/usr/local/bin/prp", line 8, in <module>
      sys.exit(cli())
               ^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/bonsai-prp/prp/cli.py", line 232, in create_bonsai_input
      version=pred_res["db_version"]["commit"],
              ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  KeyError: 'commit'

Resfinder variant parser should populate gene symbol and accession nr

PRP currently outputs

{
  "ref_database": "PointFinder-4.0.0",
  "ref_id": "gyrA;;1;;CP073768.1_83_l",
  "variant_type": "substitution",
  "genes": [],
  "position": 83,
  "ref_nt": "tcg",
  "alt_nt": "ttg",
  "depth": 187.98,
  "contig_id": null,
  "gene_symbol": null,
  "sequence_name": null,
  "ass_start_pos": null,
  "ass_end_pos": null,
  "strand": null,
  "element_type": null,
  "element_subtype": null,
  "target_length": null,
  "res_class": null,
  "res_subclass": null,
  "method": null,
  "close_seq_name": null,
  "type": null,
  "change": null,
  "nucleotide_change": null,
  "protein_change": null,
  "annotation": null,
  "drugs": null,
  "phenotypes": [
    "ciprofloxacin",
    "nalidixic acid"
  ]
}

The fields,

  • gene_symbol
  • sequence_name
  • nucleotide_change
  • protien_change

Could be populated from this information

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.