Coder Social home page Coder Social logo

AT trim defaults parameters about htstream HOT 9 CLOSED

s4hts avatar s4hts commented on August 20, 2024
AT trim defaults parameters

from htstream.

Comments (9)

msettles avatar msettles commented on August 20, 2024

As first guess, use parameters from Lucy

From: David Streett [email protected]
Reply-To: ibest/HTStream [email protected]
Date: Tuesday, September 6, 2016 at 9:49 AM
To: ibest/HTStream [email protected]
Subject: [ibest/HTStream] AT trim defaults parameters (#26)

Hey, @samhunter

Specific to AT trim - what should the default values be for min trim length and number of mismatch?

In general, min accepted length default?

All trimming algorithms will also have parameters for stranded, 3' trim, 5' trim.

Anything I am missing? We can run test later to actually get optimal values, but is there a decent first guess?

Thank you!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

from htstream.

samhunter avatar samhunter commented on August 20, 2024

By "min trim length' do you mean the minimum number of bases to trim or the minimum size that is kept after trimming?

I think Lucy is using a 10bp sliding window and continues to slide the window until 3 mismatches are encountered? At a quick glance I don't see any rational for this strategy, and it seems like we could be a little more sensitive for short bits of poly A/T if we used a sliding window 5bp allowing for 2 mismatches, moving from either end of the sequence towards the center. Either way should produce sequences that bwa mem will map/trim I would guess?

This is the Lucy strategy:

Poly-A/T tail removal
If the raw DNA sequences are obtained from an
EST library, some users want their poly-A/T tags
to be removed before clustering. LUCY does this
quickly after vector trimming by searching for the first
min span (10) or longer poly-T fragment within the
first initial search range (50) bases inside the
vector-free good region, then attempts to extend from
this initial poly-T seed toward the center of the sequence,
allowing no more than max error (3) mismatches
between every min span (10) consecutive T bases in
the scan. This is therefore a linear-time and linear-space
operation. The poly-A tail trimming at the other end of
the sequence is carried out similarly. If users wish to tell
LUCY that they are processing EST sequences but they
also wish to keep the poly-A/T tags for their purposes,
they can issue the keep option in combination with the
poly-A/T trimming option cdna.

from htstream.

msettles avatar msettles commented on August 20, 2024

Id still say that is a starting place, then modify have no idea on
expectations of homopolymer A/T in genome. Seq of length 5 with 3 A/T (2
mismatches) seems pretty likely to occur in genome, non polyadenalated seq

Matt

On Sep 6, 2016 12:13 PM, "Sam Hunter" [email protected] wrote:

By "min trim length' do you mean the minimum number of bases to trim or
the minimum size that is kept after trimming?

I think Lucy is using a 10bp sliding window and continues to slide the
window until 3 mismatches are encountered? At a quick glance I don't see
any rational for this strategy, and it seems like we could be a little more
sensitive for short bits of poly A/T if we used a sliding window 5bp
allowing for 2 mismatches, moving from either end of the sequence towards
the center. Either way should produce sequences that bwa mem will map/trim
I would guess?

This is the Lucy strategy:

Poly-A/T tail removal

If the raw DNA sequences are obtained from an EST library, some users
want their poly-A/T tags to be removed before clustering. LUCY does this
quickly after vector trimming by searching for the first min span (10) or
longer poly-T fragment within the first initial search range (50) bases
inside the vector-free good region, then attempts to extend from this
initial poly-T seed toward the center of the sequence, allowing no more
than max error (3) mismatches between every min span (10) consecutive T
bases in the scan. This is therefore a linear-time and linear-space
operation. The poly-A tail trimming at the other end of the sequence is
carried out similarly. If users wish to tell LUCY that they are processing
EST sequences but they also wish to keep the poly-A/T tags for their
purposes, they can issue the keep option in combination with the poly-A/T
trimming option cdna.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ibest/HTStream/issues/26#issuecomment-245058188, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAno5pc-lkBTFSddnmt26sfMp6Co4CPEks5qnbtSgaJpZM4J2DLW
.

from htstream.

dstreett avatar dstreett commented on August 20, 2024

Hello, again, @msettles !

So, for sickle reboot and poly AT tail remover, I wasn't planning on doing a sliding window. I was just planning on doing a simple loop starting at both ends for both of these algorithms. Any reason we should keep the sliding window?

Thanks!

from htstream.

msettles avatar msettles commented on August 20, 2024

Don’t know actually!, But in talking with @shunter just now, I may have the perfect dataset (SE100) to test with, it is mouse and 5’ biased, meaning there should be A LOT of differing length polyA/T tails. I think we can use mapping result, and how say BWA mem soft clips the right side of the read to validate and tune with.

Matt

From: David Streett [email protected]
Reply-To: ibest/HTStream [email protected]
Date: Wednesday, September 7, 2016 at 10:16 AM
To: ibest/HTStream [email protected]
Cc: Matt Settles [email protected], Mention [email protected]
Subject: Re: [ibest/HTStream] AT trim defaults parameters (#26)

Hello, again, @msettles !

So, for sickle reboot and poly AT tail remover, I wasn't planning on doing a sliding window. I was just planning on doing a simple loop starting at both ends for both of these algorithms. Any reason we should keep the sliding window?

Thanks!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from htstream.

samhunter avatar samhunter commented on August 20, 2024

There must be some information on whether poly-A trimming impacts analysis? I'm not sure if I've ever seen it rigorously analyzed before however? Anyone have a citation? Does bwa mem just happily soft-clip off all of those AAA's and map anyway? Maybe Kallisto/Salmon/etc aren't impacted much?

from htstream.

msettles avatar msettles commented on August 20, 2024

Well, my thoughts,

  1.   Newish alg that do a global to local are less likely to be impacted than older global algorithms, I bet length matters, so clipping SE100 has less impact than SE50 data?
    
  2.   Stats associated with polyA/T might be important for validation of comparability, differences (especially for SE data) may be valuable for explaining problems in the data, so more for prep related stats than for ‘better results’
    
  3.   Can’t hurt!
    

But with this dataset should be able to determine that! And I have no citation

matt

From: Sam Hunter [email protected]
Reply-To: ibest/HTStream [email protected]
Date: Wednesday, September 7, 2016 at 11:20 AM
To: ibest/HTStream [email protected]
Cc: Matt Settles [email protected], Mention [email protected]
Subject: Re: [ibest/HTStream] AT trim defaults parameters (#26)

There must be some information on whether poly-A trimming impacts analysis? I'm not sure if I've ever seen it rigorously analyzed before however? Anyone have a citation? Does bwa mem just happily soft-clip off all of those AAA's and map anyway? Maybe Kallisto/Salmon/etc aren't impacted much?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from htstream.

dstreett avatar dstreett commented on August 20, 2024

I was also wondering, @msettles , if there were any assumptions we could build into this. Such as T's will only appear on the 5' end and A's will only appear on the 3' end?

from htstream.

msettles avatar msettles commented on August 20, 2024

Depends on the library preparation method. So this data I have in mind has a specific set of assumptions, on what/where the polyA/T will occur, but the generic RNAseq, could have A or T at beginning of read or end of read

But should think of how to specify some of those possibilities as parameters, with the default to look at all possible. And stats for all, for now

Matt

From: David Streett [email protected]
Reply-To: ibest/HTStream [email protected]
Date: Wednesday, September 7, 2016 at 12:06 PM
To: ibest/HTStream [email protected]
Cc: Matt Settles [email protected], Mention [email protected]
Subject: Re: [ibest/HTStream] AT trim defaults parameters (#26)

I was also wonder, @msettles , if there were any assumptions we could build into this. Such as T's will only appear on the 5' end and A's will only appear on the 3' end?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from htstream.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.