vpc-ccg / calib Goto Github PK
View Code? Open in Web Editor NEWCalib clusters barcode tagged paired-end reads based on their barcode and sequence similarity.
License: MIT License
Calib clusters barcode tagged paired-end reads based on their barcode and sequence similarity.
License: MIT License
Hey @baraaorabi,
I was wondering how the consensus and error correction steps are performed with the conda installed version of calib?
I was able to generate the test.cluster with the following command:
calib --input-forward R1.fastq.gz --input-reverse R2.fastq.gz --barcode-length 4 --output-prefix test. --minimizer-count 7 --kmer-size 8 --error-tolerance 1 --minimizer-threshold 2
BUT, I'm unable to proceed with the clustering and error correction steps because there are no additional calib arguments with the conda installed version:
$ calib --help
Combined barcode lengths must be a positive integer and each mate barcode length must be non-negative! Note if both mates have the same barcode length you can use -l/--barcode-length parameter instead.
Calib: Clustering without alignment using LSH and MinHashing of barcoded reads
Usage: calib [--PARAMETER VALUE]
Example: calib -f R1.fastq -r R2.fastq -o my_out. -e 1 -l 8 -m 5 -t 2 -k 4 --silent
Calib's paramters arguments:
-f --input-forward (type: string; REQUIRED paramter)
-r --input-reverse (type: string; REQUIRED paramter)
-o --output-prefix (type: string; REQUIRED paramter)
-s --silent (type: no value; default: unset)
-q --no-sort (type: no value; default: unset)
-g --gzip-input (type: no value; default: unset)
-l --barcode-length (type: int; REQUIRED paramter unless -l1 and -l2 are provided)
-l1 --barcode-length-1 (type: int; REQUIRED paramter unless -l is provided)
-l2 --barcode-length-2 (type: int; REQUIRED paramter unless -l is provided)
-p --ignored-sequence-prefix-length (type: int; default: 0)
-m --minimizer-count (type: int; default: Depends on observed read length;)
-k --kmer-size (type: int; default: Depends on observed read length;)
-e --error-tolerance (type: int; default: Depends on observed read length;)
-t --minimizer-threshold (type: int; default: Depends on observed read length;)
-c --threads (type: int; default: 1)
-h --help
Am I missing something here?
Best,
Chad
Hi,
I've just stepped into NGS data analysis and I am not really familiar to it yet but I'm motivated to analyze my data and my research on the internet took me here...
Sequencing gave me single end reads, with my 16N UMI Tag on the 3' end of the read.
I've cleaned my data so only the 'sequence of interest with the UMI's attached' reads should be left in my input file.
I'm a little bit confused from the readme. How can I define the postion of my UMI or should I move it in front of the sequence ?
And since I don't have paired end reads, I can copy my dataset without UMI's for r2 input right?
Thanks in advance
Hi, I'm trying to use calib in combination with UMI VarCal, but apparently calib adjusts the fastq quality score beyond what is used by Illumina (see https://gitlab.com/vincent-sater/umi-varcal/-/issues/12) and is thus not compatible with UMI VarCal. Is this intended? Is it possible to add an option to adjust this to max 41?
Hi,
I have a fastq file in which UMI has attached on first 8 bp of Read2.
How can I use this software?
Can calib deal with reads that do not have a molecular duplicate (-m 1). Currently, I'm seeing all these reads as having a default adjusted base quality of K (Q42). Is this correct behaviour? Can these reads retain original quality denotation apart from any overlap between mates where error correction could correctly be applied?
Hello,
I used Calib to deduplicate my paired end reads.
calib -f S1_R1.fastq -r S1_R2.fastq -o S1_Calib. -l1 17 -l2 0
I have 8 bases index and 9 bases barcode attached to R1 while nothing attached to R2. Now, After running Calib and calib_cons, I noticed that the 17 bases of index+barcode is still attached to my reads. Is there a way to strip this?
Thanks!
Make a paper branch and freeze it
I have gzip file as input. However, when I switched on -g
, the program finished with just a few read processed. I compiled the lastest master, 721830a.
No error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 149 from sample of 10000 reads
Selected paramters for (mean) barcode length 6 are:
error_tolerance 1
kmer_size 8
minimizer_count 7
minimizer_threshold 2
Extracting minimizers and barcodes...
Memory before reading FASTQ:
1MB
Memory right after reading FASTQ:
1MB
Memory after reserving for read_to_node_vector & node_to_minimizers:
1MB
Memory after filling barcode_to_node_map:
1MB
Memory after releasing node_to_read_map:
1MB
Memory after reserving barcode_to_nodes_vector:
1MB
Memory after filling barcodes & barcode_to_nodes_vector:
1MB
Memory after releasing barcode_to_node_map:
1MB
Read count: 4
Node count: 4
Barcode count: 4
Memory after exiting extract_barcodes_and_minimizers():
1MB
Clustering...
Adding edges due to barcode barcode similarity
Number of masks is 12
011111111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
101111111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
110111111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
111011111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
111101111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
111110111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
111111011111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
111111101111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
Hi there,
I'm getting a Segmentation fault 11 error code when trying to run Calib. Please see below.
Thanks for your help!
Kartik
Kartiks-MBP-2:dms2-subamp-seq-test kartik$ calib -f ./fastqfiles_A33R/appendedWithUMI_s1.CTGATCGT-GCGCATAT.J1230.AHCFMLAFX2.L1-4.1.fastq -r ./fastqfiles_A33R/s1.CTGATCGT-GCGCATAT.J1230.AHCFMLAFX2.L1-4.2.fastq -l 9 -o foo_
No error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 155 from sample of 10000 reads
Selected paramters for (mean) barcode length 9 are:
error_tolerance 2
kmer_size 8
minimizer_count 7
minimizer_threshold 2
Extracting minimizers and barcodes...
Memory before reading FASTQ:
Segmentation fault: 11
The experiments/parameter_tests/README.md plots link references the repository through rawgit.com which has been shutdown.
rawgit.com:
RawGit is now in a sunset phase and will soon shut down. It's been a fun five years, but all things must end.
GitHub repositories that served content through RawGit within the last month will continue to be served until at least October of 2019. URLs for other repositories are no longer being served.
If you're currently using RawGit, please stop using it as soon as you can.
The README can contain relative links to files in the repository:
https://help.github.com/en/github/creating-cloning-and-archiving-repositories/about-readmes#relative-links-and-image-paths-in-readme-files
spoa
should be resolved by wget
cmake
+3.2 (for spoa
)Python3
packages for simulationGNU Time v1.9
for benchmarkingSiNVICT
samtools
Can calib call duplex consensus reads? I have data with xGen duplex seq adapters.
Calib depends on GCC v5.2
. This is because earlier versions have not implemented the full C++11
file stream functions (something about copy constructors is broken in the earlier versions). But if we want Calib to work with bioconda, we should downgrade our GCC
to v4.8.5
. This should be relatively an easy fix.
Hi,
I don't have paired end reads but as described in previous issues I have copied my input fastq, removed the umi's (16 N long) and used it as input for the second file as shown in the screenshot. I've gotten an error message (no error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 55 from sample of 10000 reads). Then I've tried to use the example command and only adjusted my input file names and the barcode length and my outfile (cluster) had been generated. But I'm not sure if this is the right parameter selection for my sequences - they are very short - only 55 bases already including 16 bases umi.
But I've tried further if I can use the generated cluster file for calib_cons. No error message here, but empty files. So my question here is, does the described example command refer to the same input files as in the first calib command for clustering or is this another fastq file, different from the input.
To run Calib error correction, run:
calib_cons -c <cluster_file> -q <space_separated_FASTQ_list> -o <space_separated_output_prefix_list>
For example:
calib_cons -c R.cluster -q R1.fastq R2.fastq -o R1. R2.
Thanks in advance and sorry for the probably dumb questions for experts, but I'm new in this topic (:
Hi, I tried to run the program vi conda.
My commands:
calib -f SF_1.fastq.gz -r SF_2.fastq.gz -l 8 -o SF --gzip-input --no-sort --threads 8 # for creating cluster file
The first step runs smoothly.
calib_cons -c SFcluster -q SF_1.fastq.gz SF_2.fastq.gz -o SF_1.out SF_2.out -t 8
This exits after reading the fastq file (because it is gzipped). I used file streaming to solve the problem, but can the -g
parameter be implemented here as well?
Thank you
@baraaorabi
After running calib, about 30% of my reads contain an UMI with at least one 'N'. How do you suggest that we deal with these? Currently, we are filtering out all these reads.
Thanks.
Hi
If using "-p" in clustering (--ignored-sequence-prefix-length), and having a umi on one read only (eg -l1 10, -l2 0), will the value of -p be ignored on both fastq reads or just the one with > 0 bases specified for the UMI (-l1 in this case).
Thanks
Goals:
calib_cons
with Calib's makefilemaster
branchHi there,
I'm working on.an Apple M1 Pro Mac running OS X Monterey 12.2.1. I was able to run calib on my fastq files to generate the cluster file but calib_cons gives me an "Illegal instruction: 4" error.
I was able to run Calib fine on my previous Intel Mac so I thought this might be due to the change in chip architecture. I tried running in Rosetta mode by prefixing with "arch -x86_64" but that gave me the same error.
Look forward to your input on how to fix this.
Thanks!
Kartik
Hi,
Since FASTQ is the input format I was wondering if I could also use a fake-FASTQ file as my input, where the quality sequence is missing or randomly inserted but the read sequence is real?
So in other words, does Calib sort and cluster the reads also based on quality information or only on sequence similarity?
Thanks for upcoming answers.
Hi Baraa!
I tried to use calib which was installed throught conda but on the second step (calib_cons) it's generated dump error.
When calib was istalled from git all is without problem. I use ubuntu 18.04 LST.
As far as I understand, such a problem with the conda has already been found in several users on different systems.
There is no question as such, I just wanted to voice this fact.
Best wishes,
Marsel
Hello,
I have 4 fastq files for each sample - Read_R1.fq Read_R2.fq Read_I1.fq and Read_I2.fq
The I1 and I2 files have the barcodes for my paired end reads. Can I use these as input to Calib?
Convert unordered sets to vectors. Resolve removing unmatching nodes.
Hi there,
I'm trying to run a pair of R1 and R2 files with 12-bp UMI at the 5' end end of each read.
calib ran with the following messages but seemed to exit with the above error.
The reads it lists as "fishy" are the very first read pair in the fastq files. I've attached a sample of R1 and R2 fastq files (each read mate has a 12-bp UMI at the 5' end)
R1_fastq.txt
R2_fastq.txt
R1_fastq.txt
. I did some text processing to move the UMIs but did not edit the quality cigar string. Would this cause the error?
Thank you for your help,
Kartik
calib -f ./final_fastq_processed/A-S_R1_filtered.fastq -r ./final_fastq_processed/A-S_R2_reformatted_filtered.fastq -l1 12 -l2 12 -e 0 -k 4 -m 7 -t 3 -o out_
Extracting minimizers and barcodes...
Read count: 3258645
Node count: 2027186
Barcode count: 1145285
Clustering...
Adding edges due to barcode barcode similarity
Number of masks is 1
111111111111111111111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
On thread 0 building all LSH took: 0
On thread 0 processing all LSH took: 0
R2_fastq.txt
On thread 0 merging local graph with global graph
On thread 0 merging took 0
Building the graph on 1 thread(s) took 1
Adding edges between nodes of identical barcodes with thread 0
Adding edges due to barcodes similarity took: 1807
Extracting clusters
Extracting clusters took: 68
Outputting clusters
ERROR: Something is fishy with read:
name_1 @M01243:273:000000000-K64MD:1:1101:15987:1337 1:N:0:1
sequence_1 TTCCCAGCCGCAACTTTGTGAGTATGGGTAGTAGACTCCTTGAAGAGCTACTACTACAAGTGCTGGGAAGAGCCAACTCAGGGAAATACAGGAAGAGATCACTCGCCATGAGCAGCAGCTTGTCATT
trash +
quality_1 AAAABFFFFFBBGGGFFFF5BGFHFEA2EAAFGBGHCF5BFHFBGHGGHHHHHHHHDGEGBFHHGFFH3EGFGHH3BB@@55c>13BF??BF???DB331AF1GFEHBHHBEEEG3?FGFGFECAC2B??GC20F2FHHC2>FG
name_2 @M01243:273:000000000-K64MD:1:1101:15987:1337 2:N:0:1
sequence_2 TGCGGCTGGGAAAATGNCAAGCTGCTGCTCATGGCGAGTGATCTCTTACTGTATTTCCCAGAGTTGGCTCATCCCAGCACTTGTAGGAGTAGCTCTTCAAGGAGTCTACTAACCATACTCACAAAGT
trash +
quality_2 AA1A#>>1AA1F1BBF11E1BFGE00A0D222D221B122D2AD2F111//1DD11FEFHFHAGBFF0GFAGDD11BG01FFFHHHFD2EA>0GEGDDF2BGF110BBF1GFGBB2BFG?//>/E<C/BG<G0</</B/F<F2<
Add travis-ci testing to Calib
Hi there, wondering if any ideas on the following.
I'm running calib installed with conda on a Linux server. I've successfully run the calib command and generated a cluster file. The problem occurs when I try to run calib_cons. The following is the full command and output:
calib_cons -c B5-testgz.cluster -q B5_3_1_S5_L001_R1_UMI_full.fastq B5_3_1_S5_L001_R2_UMI_full.fastq -t 4 -o B5gz-R1 B5gz-R2
Reading cluster file: B5-testgz.cluster
Reading fastq file: B5_3_1_S5_L001_R1_UMI_full.fastq
Writing output files: B5gz-R1
Illegal instruction (core dumped)
I end up with 6 empty files in my directory (.msa, .msa1, .fastq and .fast0/1/2).
A factor here is that some preprocessing occurred before the data was passed to me and the UMIs had been removed from the front of the read sequences - I have had to copy them back to the sequence start from the read header and put dummy characters (!!!!!!!!!!!) in the quality scores line so that the lengths match the sequences. Not sure if something here could be problematic. An example read:
@A01439:100:HC3VHDRX2:1:2101:2808:1000:CTTGCATCTTA 1:N:0:CGGCATTA+TGACTGAC CTTGCATCTTACTTAAAAACCTACAAATGAAACCCAGCATGCATACACACACCCCTCCATACCCTCACATAAATTATATATACCCTTATCTATACTAACTATAAAATGTAT + !!!!!!!!!!!F:F:FFF,:F,:F::FFFF::FF:F,,FFFFF:FFF,F:FF,:FFFFF,FFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFF
results of head B5-testgz.cluster if it helps:
26551449 34947546 10 @A01439:100:HC3VHDRX2:1:2101:12319:1000:GGAAACTGTCT GGAAACTGTCTAGTCAGTTTTCTAAATCTATAATGGAAAAGAAAATCGAATCTCGTCTTTATTTTTAAAAAGGGAAGGATGTTCAAGATCGGAAGAGCACACGTCTGAA !!!!!!!!!!!FF::FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A01439:100:HC3VHDRX2:1:2101:12319:1000:GGAAACTGTCT GGAAACTGTCTTGAACATCCTTCCCTTTTTAAAAATAAAGACGAGATTCGATTTTCTTTTCCATTATAGATTTAGAAAACTGACTTAGATCGGAAGAGCGTCGTGTAGGGAA !!!!!!!!!!!FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 11104884 14491263 46 @A01439:100:HC3VHDRX2:1:2101:6777:1016:CGGATTCATCA CGGATTCATCAGTACTGGAAAGTCCCATTTTTCTCTGCACTGAACAGCCAGAAAAAGAAACAACGTTTCTAACTTAATTGGCTAGATCGGAAGAGCACACGTCTGAACTC !!!!!!!!!!!,,FF:FFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A01439:100:HC3VHDRX2:1:2101:6777:1016:CGGATTCATCA CGGATTCATCAAGCCAATTAAGTTAGAAACGTTGTTTCTTTTTCTGGCTGTTCAGTGCAGAGAAAAATGGGACTTTCCAGTACAAGATCGGAAGAGCGTCGTGTTGGGAAAG !!!!!!!!!!!FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFF,FF,,FF,,:F 15253236 19933987 47 @A01439:100:HC3VHDRX2:1:2101:7229:1016:GGAAATCGGTT GGAAATCGGTTGCATAACACAGCAGAGCCACTATGAAATTCAGCTCTTATAGCAAACATTTAAATGATTTTTGTTGGATATTTTCTCTCAGTTGGCATGTGAACAAATGTG !!!!!!!!!!!F,,,FFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A01439:100:HC3VHDRX2:1:2101:7229:1016:GGAAATCGGTT GGAAATCGGTTCACATTTGTTCACATGCCAACTGAGAGAAAATATCCAACAAAAATCATTTAAATGTTTGCTATAAGAGCTGAATTTCATAGTGGCTCTGCTGTGTTATG !!!!!!!!!!!FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
any thoughts on what may be leading to this error?
the FASTQ was generated with the following description:
one DNA molecular was aligned to different adaptors, resulting two DNA molecular from both strands. The result two DNA molecular were sequenced and get diffenrent barcode. eg. mate pairs from + strand tagged with barcode ATG-----TTA, while those from - strands tagged with barcode TTA------ATG.
And we want to call consensus from mate pairs from both strands
Hi,
I have two 1.2 GB FASTQ (R1, R2) containing less than 3.5 million reads.
Despite the dataset being small, the tool uses more than 128GB. Because of this, I have yet to find a setting where calib runs to completion.
I currently use these settings:
calib --threads 1 --error-tolerance 1 --kmer-size 8 --minimizer-count 7 --minimizer-threshold 2 -f $R1_UNPACKED -r $R2_UNPACKED -o $CALIB_OUT_PREFIX -l $BARCODE_LENGTH
Where BARCODE_LENGTH
is 8 in this cases. Note that these settings are the default for 150KB reads, but the error tolerance for the barcodes has been set to 1. Since there are only 65536 barcodes, all possible barcodes are used.
I am currently tweaking the settings, but I wonder if there is any advise on how to run this tool on this sort of data. Thanks!
I installed calib via conda and tried two different sets of paired-end FASTQ files with a barcode only in Read 1. On the first set of files, I ran without providing parameters and got a segmentation fault.
% calib -f test_illumina_50000_read1.fastq -r test_illumina_50000_read2.fastq -l1 8 -l2 0 -o my_out. --no-sort
No error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 250 from sample of 10000 reads
Selected paramters for (mean) barcode length 4 are:
error_tolerance 1
kmer_size 8
minimizer_count 7
minimizer_threshold 2
Extracting minimizers and barcodes...
Memory before reading FASTQ:
zsh: segmentation fault calib -f test_illumina_50000_read1.fastq -r -l1 8 -l2 0 -o --no-sor
The first few lines of the files tested above looked like this:
% head -4 test_illumina_50000_read1.fastq
@0
CTGTGACGTGAGGAGACGGTGACCGTGGTCCCTTGGCCCCACGCAGATTCCTTTGTATCGGTGTTCCGGTTGGATAAAGGGTACCTCGCTGAACAGTAATACACGGCCGTGTCCTCAGATCTCAGGCTGCTCAGCTCCATGTAGGCTGTGCTTATGGAGGTGTTCCTGGTGATGGTGACTCTGCCCTGGAACTTCTGTGCATAGCCGAAGAACGCATGAGTTGTCTCCCATCCCATCCACTCAAGCCCTT
+
=B>B=8:9B@@B9=:=98@=@<<BB;B=9A<9@;88<8B<BA;B?;9B=<8<@;>9>BA:B>@A<A99=8>@??B9B8A;<=9B=@B9;==9@@;B<:;9;<<@<::?89>=>8:8:99<@<;8?>@B;<A88?B>:B>@??;9A99A88<<?B@A>A?A9A;9A<:<:9<B=B9;A:8<A@89<@A;??8B9:@8=BB>;8?BA<<<<@8>=<8@<@B9=<8?:<<;:898:@;9<=?:BAA8AB><BA
% head -4 test_illumina_50000_read2.fastq
@0
GCTCTCAGCAGGTGCAGCTGGTGCTGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTTACCCCAGATGCATGCTCCTACTCGCGATCAACTGGGTGCGACAGGCTACTGGACAAGGGCTTGAGTGGATGGGATGGGAGACAACTCATGCGTTCTTCGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATCACCAGGAACACCTCCATAAGCACAGCCTACAT
+
AAA@<9@?88?<;;BB8?<?;8:>=8>B8;:?>A=@9=8A;8=:>9:B?:>:=8?>;;A;<8=?=;8;9B=>9@8@=<9<8=@?8@><A;>A=?=?>>;A9<;;:9:=88=9;;>8<;<;@@:<:>><:<:;8AB;>:<;9A9::=><>=B?>9>=<>:9<<:B9?A>B=9?88A@=AB=;?B?A8::9?9B8?<;?>:@=8<==8?B=?B8=8?;8=>;:=<=>8B9::A??9??@9;9>8>9>B;A;9
Next, I tried a new set of paired-end reads (10,010 reads) and the selected parameters were all -1:
% calib -f LM_PCR_sample_reads_R1.fastq -r LM_PCR_sample_reads_R2.fastq -l1 8 -l2 0 -o my_out.
No error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 301 from sample of 10000 reads
Selected paramters for (mean) barcode length 4 are:
error_tolerance -1
kmer_size -1
minimizer_count -1
minimizer_threshold -1
Missing clustering error and minimizer parameters!
Calib: Clustering without alignment using LSH and MinHashing of barcoded reads
Usage: calib [--PARAMETER VALUE]
Example: calib -f R1.fastq -r R2.fastq -o my_out. -e 1 -l 8 -m 5 -t 2 -k 4 --silent
Calib's paramters arguments:
-f --input-forward (type: string; REQUIRED paramter)
-r --input-reverse (type: string; REQUIRED paramter)
-o --output-prefix (type: string; REQUIRED paramter)
-s --silent (type: no value; default: unset)
-q --no-sort (type: no value; default: unset)
-g --gzip-input (type: no value; default: unset)
-l --barcode-length (type: int; REQUIRED paramter unless -l1 and -l2 are provided)
-l1 --barcode-length-1 (type: int; REQUIRED paramter unless -l is provided)
-l2 --barcode-length-2 (type: int; REQUIRED paramter unless -l is provided)
-p --ignored-sequence-prefix-length (type: int; default: 0)
-m --minimizer-count (type: int; default: Depends on observed read length;)
-k --kmer-size (type: int; default: Depends on observed read length;)
-e --error-tolerance (type: int; default: Depends on observed read length;)
-t --minimizer-threshold (type: int; default: Depends on observed read length;)
-c --threads (type: int; default: 1)
-h --help
Then, I tried providing my own parameters but got a segmentation fault.
% calib -f LM_PCR_sample_reads_R1.fastq -r LM_PCR_sample_reads_R2.fastq -l1 8 -l2 0 -o my_out. -m 7 -k 8 -e 1 -t 2
Extracting minimizers and barcodes...
Memory before reading FASTQ:
zsh: segmentation fault calib -f LM_PCR_sample_reads_R1.fastq -r LM_PCR_sample_reads_R2.fastq -l1 8
The first few lines of the files tested above looked like this:
% head -4 LM_PCR_sample_reads_R1.fastq
@M03525:380:000000000-CDJ38:1:1101:16781:1441 1:N:0:TTCTGCCT
CGGCTTACAATTCCTGCGACTATTTCCCTTTCCTCCGCTTAAGGGCCTAGGAGTCCGTTGTTGGCATGGTTGCAGTTCCTGGTGGCGTGTTGTGTTGACACGTTCTCTAGAACGCATGCTGCGGAGCAGATGGTTCCGAGGCAGCCACGCTGAGGAAATGCTGTGTGCCTCATGCTAGAGATTTTCCACACTGACTAAAAGGGTCTTATAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTCTGCCTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAATATCACGCCACTC
+
A-,@@@8E,,CCFFE<@7+8@+CEEEEEFFFFEEFF7CCC7,,,,,8B6,,,,CEE7FG7FE,,C,C8,EF7E9<EFEEFC8C8,C+C:FF8E7EF8,B?F7EEFFEFC9,?E7F@FCC?9BC7B7><F,?,8EEEE7+4=C:7CF@FDEEC+>@8DCF9FF9E8F@FFCF7D9;EGFGCGGGGGGGGGGGGGGGGGFGGEGGGGGGGGFGGGGGGGGGGGCFGGGGGGGFGGGGGGGFGGFFFAFFF=FFFFFCBEAEDFFFFFBF?>B?DB>><AF::4<?0>FFF(4.64),(,((((
% head -4 LM_PCR_sample_reads_R2.fastq
@M03525:380:000000000-CDJ38:1:1101:16781:1441 2:N:0:TTCTGCCT
GCTATAAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCATGAGGCACACAGCATTTCCTCACCCTTCCTTCCTCTCTCCCCTCTCCTCCTCCTCCTCCCTTCTTCATTCCTTTTCCCCCCCCCACCCCCCCCCCCCCTCCCTCCCTCCCCACCACCCCCTCCTCTTCCCTTCCCCCCTCGTCCCCTTCCTATTCCCCCCCTTTGTCACCCCCCTTCCCCACTCCCTCCTCTTTGCCTCCCTTCTCTTTCTCCCTCCTCCCCCTCTCCTTCCCCATCCCCTCCTCCCCCTCCCCCCCCCCC
+
-ACCCGGGGGGGGGGGGGGGGGC9C<,,6,,;CBEF,,<,C,,,,:,;,;,,;,;CC6@;C,,;,6,,;;,;;66,,,,6:,689,96:9,6,,:,4,:,4599,,,,,,:,5,59,,4+8++6+6+64+84+++++33*6*,43*3,61*,1*,****4*64,,,4622,2***3**************2++23*2*))*)00*2***01))))*2***))*).).0)))/******)()))(0))**.)))(,(((((--((((,()))(((((((((((,(((,(,(,((((,(,((-
From a read cluster, we need to find the left consensus and the right consensus. Since the reads are highly similar, we can assume no indels, and those reads with indels will be highly corrected.
Hello,
I have 2 questions -
Thanks!
Dear Baraa,
A quick question. I understand that the parameters for k-mer size, minimizer count, error tolerance and minimizer threshold are set to the defaults that are dependent on read length if I had not set them myself. Is there a way to find out these defaults parameters?
Thanks,
Wee
It sucks to see that +90% of the code is HTML
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.