Collapsing barcodes generated by faulty dropseq beads. Implemented in Python.
deletionsAnalysis
, collapses barcodes according to existence of Ts in the end of UMIs. Assumes the existence of a single tab separated file containing each barcode and its UMI.checkHamming
, collapses barcodes from bulk onto top ones. Only mutations are currently considered. Assumes the existence of two files,topBarcodes.txt
andrestBarcodes.txt
, which are lists of barcodes sorted by number of reads.
- collapse a set of top barcodes among themselves.
- introduce dynamically thresholds for number of reads and/or percentages of Ts.
- introduce check for percentage of Gs in the last positions of barcodes when Ts are found in the UMIs.
- introduce computeKnee function to have an estimate of the number of tops.
- refactor the
deletionsAnalysis
search to incorporate arbitrary number of bases to check. - improve the structure of identifying mutations function in the
checkHamming
. - possibly trear Ns in the
checkHamming
.