Coder Social home page Coder Social logo

mappy-rs's Introduction

About me

Hi! I'm currently (10/11/2022) a post-graduate researcher at the University of Nottingham working on adaptive sampling with nanopore sequencing. I'm interested in all sorts of Bioinformatics, visualisations and fun side projects ( Some of which I even complete ).

Current mood

Current Projects

๐Ÿฑโ€๐Ÿ’ป I'm currently working on:

minoTour - real-time monitoring of nanopore sequencers

Swordfish - An inter-communicator for minoTour and readfish

Advent Of Code ๐ŸŽ…๐ŸŽ… - Just gotta love it

๐Ÿ”ญ How to reach me:

Follow me on twitter for utter silence @rorymatics โ—ฆ Follow me on mastadon [email protected] โ—ฆ Open issues! โ—ฆ ๐Ÿ“ง [email protected]

mappy-rs's People

Contributors

adoni5 avatar alexomics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mappy-rs's Issues

Mans got a memory leak?

Threadbuffer potentially not freeing up memory.

  • Profile with bytehound
  • Implement freeing of threadbuffer as PR to minimap2-rs

Fix mappy interface

Require the addition of the __bool__ function and fix how the scoring is set on mapopts

Send_one functionality

Send a read to be aligned into the work queue.
Needs a companion map function to map all the reads in the queue, yielding the results and the metadata. In this case, Metadata should consist of either channel_number, read_number or read_id, and seq.

Should be quite straight forward. ๐Ÿฅน

Allow map batch to accept lists

It would be nice if Aligner.map_batch could accept a list of input, currently:

import mappy_rs
fasta = [
  {'id': 0, 'seq': 'AGAGTGAAGCCAATATTCCGATAACGATTGCTTTCATGATATCCCTCATTCATCACAAGTTTT'},
  {'id': 1, 'seq': 'AGAGTGAAGCCAATATTCCGATAACGATTGCTTTCATGATATCCCTCATTCATCACAAGTTTT'},
]
al = mappy_rs.Aligner("resources/test/test.mmi")
al.enable_threading(2)
mappings = al.map_batch(fasta)
# TypeError: argument 'seqs': 'list' object cannot be converted to 'Iterator'

Fix tests

The tests/test_alingment.yml is very out of date - update to use new format for system tests.
Don't
We also need to create more rust based unit tests in the lib.
For when I'm feeling bored!

Apple Silicon issues

Compiling on Apple Silicon seems impossible.

Error messages are:

error occurred: Command "cc" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-gdwarf-2" "-fno-omit-frame-pointer" "-arch" "arm64" "-static" "-I" "minimap2" "-Wc++-compat" "-DHAVE_KALLOC" "-lm" "-lpthread" "-o" "/Users/mattloose/GIT/mappy-rs/target/debug/build/minimap2-sys-a801728ffb604384/out/minimap2/ksw2_ll_sse.o" "-c" "minimap2/ksw2_ll_sse.c" with args "cc" did not execute successfully (status code exit status: 1).

warning: build failed, waiting for other jobs to finish...

benchmarking

Benchmarking scripts could be placed in resources/benchmarking

Multi threaded interface

We can extend the base mappy-rs.Aligner, which is currently single-threaded, to use multiple threads on iterables of data.

Proposed minimal interface:

  • mappy-rs.Aligner.send(data: PyDict): send a python dictionary containing at least one key/value pair, seq -> FASTA (string) (?). This function should return whether the data was queued successfully. Maybe take a second parameter that is the key to the FASTA data
  • mappy-rs.Aligner.get_results: retrieve all available aligned data from the output queue and return it.

Extended interface:

  • mappy-rs.Aligner.send_batch(batch: Iter[data, ...]): Place an iterable of data dictionaries into the work queue. These would be retrieved by get_results. This should be non-blocking.
  • mappy-rs.Aligner.map_batch(batch: Iter[data, ...]): Map a batch of data, yielding results

Copy mappy python interface for drop-in compatibility

minimap2 already has a python interface, mappy we should aim to re-implement that API so that - in the first instance - this can function as a drop-in replacement.

For this we need:

  • Same Aligner args
    • Take fn_idx_in as (MMI)
    • Take fn_idx_in as (FA)
    • Write to fn_idx_out (FA -> MMI)
  • Same Aligner.map args
  • Aligner.seq_names to work
  • Aligner.seq to have the same behaviour
    • Return None on error rather than raise
  • Same Alignment structure with attributes

Maybes:

  • Misc functions:
    • fastx_read(fn, read_comment=False)
    • revcomp(seq)

Pre-commit

At some point I would like to set up pre-commit and rust-fmt and cargo test on hooks

__repr__ functions

Write __str__ and __repr__ functions for the Alignments.
__str__ should return a paf formatted line.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.