Comments (2)
Hi,
Thank you for reporting this issue.
We were not aware that the suffix number could be any integer, and this case has obviously not been anticipated in the code of LRez : -1
suffixes are just removed from the barcode tag (and any other integer is not recognized).
In the case of different suffix numbers in a single BAM, the expected behaviour of LRez would be to consider as two distinct barcodes two barcodes that share the same nucleotide barcode sequence but have different suffix numbers, is that correct ?
This may not be straightforward to implement in LRez, since LRez assumes all barcodes are purely nucleotide sequences and then encodes them into integers with a 2bit encoding. The suffix numbers could be converted to nucleotide words appended to the barcodes, but this would cost extra space for vast majority of the datasets with only the "-1" suffix, and to optimize the extra space, we would need to know in advance the maximal number of different integer suffixes for the given sample.
Do you have an idea of this maximal number of different integer suffixes in practice ? In your opinion, does this situation (BAMs with multiple 10X libraries) occur frequently in practice ?
Note that a temporary (though not very neat or practical) solution is be to pre-process the BAM by replacing -X suffixes by short nucleotide words specific to each library.
Best,
Claire
from lrez.
Thanks for the quick reply!
In the case of different suffix numbers in a single BAM, the expected behaviour of LRez would be to consider as two distinct barcodes two barcodes that share the same nucleotide barcode sequence but have different suffix numbers, is that correct ?
Yes this is correct as the same nucleotide barcode sequence could have been sampled in multiple library preparations.
This may not be straightforward to implement in LRez, since LRez assumes all barcodes are purely nucleotide sequences and then encodes them into integers with a 2bit encoding. The suffix numbers could be converted to nucleotide words appended to the barcodes, but this would cost extra space for vast majority of the datasets with only the "-1" suffix, and to optimize the extra space, we would need to know in advance the maximal number of different integer suffixes for the given sample.
Do you have an idea of this maximal number of different integer suffixes in practice ? In your opinion, does this situation (BAMs with multiple 10X libraries) occur frequently in practice ?
I expected this might be an issue. As for the maximal number of expected suffixes I don't have a good answer here. Clearly most people only use BAMs with one suffix ("-1"). For me I have merged as much as 6 different libraries, that is 6 different suffixes in one BAM. I am however not sure how common this is for other people. A pretty safe estimate for a maximum numer of integer suffixes would probably be around 10.
Note that a temporary (though not very neat or practical) solution is be to pre-process the BAM by replacing -X suffixes by short nucleotide words specific to each library.
Yes I suppose this would be a solution.
An even simpler solution, and more practical for now, would be to just ignore any barcode suffix for the index. I am thinking that this would be a ok solution for now as I am not sure this is a big problem for other users. Also one could always confirm which suffix is present on an alignment after accessing it.
from lrez.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lrez.