Comments (29)
Thanks. The below table is from a run with fewer samples that threw the the same error. I don't see any red flags, but you may think otherwise.
Might the issue stem from this?
What if my forward and reverse reads aren’t in matching order?
This situation commonly arises when external filtering methods, like the QIIME demultiplexing script, are used and filter the forward and reverse reads independently. This can be remedied by using adding matchIDs=TRUE flag to the filterAndTrim or fastqPairedFilter functions. For example, if no more filtering is required, the following will retain just those reads that match between the forward and reverse fastq files (assumes Illumina fastq headers):
filterAndTrim(..., matchIDs=TRUE)
(see https://benjjneb.github.io/dada2/faq.html)
┌────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬───────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼───────┼─────┼─────┤
│ 100102-S13_L001_R1_001.fastq.gz │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35 │ 301 │
│ 100202-S14_L001_R1_001.fastq.gz │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35 │ 301 │
│ 100203-S15_L001_R1_001.fastq.gz │ 139739 │ 42018747 │ 300.7 │ 301 │ 301 │ 301 │ 0.164 │ 35 │ 301 │
│ 100204-S16_L001_R1_001.fastq.gz │ 132438 │ 39803309 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35 │ 301 │
│ 100205-S17_L001_R1_001.fastq.gz │ 144511 │ 43460952 │ 300.7 │ 301 │ 301 │ 300 │ 0.107 │ 35 │ 301 │
│ 100206-S18_L001_R1_001.fastq.gz │ 135101 │ 40573650 │ 300.3 │ 301 │ 301 │ 301 │ 0.210 │ 35 │ 301 │
│ 100502-S19_L001_R1_001.fastq.gz │ 154180 │ 46279307 │ 300.2 │ 301 │ 300 │ 299 │ 0.169 │ 35 │ 301 │
│ 100503-S20_L001_R1_001.fastq.gz │ 136254 │ 40876059 │ 300.0 │ 301 │ 300 │ 298 │ 0.196 │ 35 │ 301 │
│ 100504-S21_L001_R1_001.fastq.gz │ 149045 │ 44791183 │ 300.5 │ 301 │ 301 │ 299 │ 0.114 │ 35 │ 301 │
│ 100505-S22_L001_R1_001.fastq.gz │ 144673 │ 43404679 │ 300.0 │ 301 │ 300 │ 299 │ 0.188 │ 35 │ 301 │
│ 100506-S23_L001_R1_001.fastq.gz │ 139843 │ 41949009 │ 300.0 │ 301 │ 300 │ 299 │ 0.203 │ 35 │ 301 │
│ 200202-S94_L001_R1_001.fastq.gz │ 149611 │ 44664048 │ 298.5 │ 301 │ 301 │ 300 │ 0.202 │ 35 │ 301 │
│ 200203-S95_L001_R1_001.fastq.gz │ 103600 │ 31036418 │ 299.6 │ 301 │ 301 │ 300 │ 0.283 │ 35 │ 301 │
│ 200502-S7_L001_R1_001.fastq.gz │ 144901 │ 43509686 │ 300.3 │ 301 │ 300 │ 299 │ 0.157 │ 35 │ 301 │
│ 200503-S8_L001_R1_001.fastq.gz │ 137665 │ 41387170 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35 │ 301 │
│ 200504-S9_L001_R1_001.fastq.gz │ 123302 │ 37034341 │ 300.4 │ 301 │ 301 │ 300 │ 0.222 │ 35 │ 301 │
│ 200505-S10_L001_R1_001.fastq.gz │ 117559 │ 35345816 │ 300.7 │ 301 │ 301 │ 300 │ 0.104 │ 35 │ 301 │
│ 200506-S11_L001_R1_001.fastq.gz │ 136077 │ 40842847 │ 300.1 │ 301 │ 300 │ 299 │ 0.185 │ 35 │ 301 │
│ 200599-S12_L001_R1_001.fastq.gz │ 113326 │ 33910575 │ 299.2 │ 301 │ 301 │ 300 │ 0.264 │ 35 │ 301 │
│ 200602-S134_L001_R1_001.fastq.gz │ 136183 │ 40852757 │ 300.0 │ 301 │ 300 │ 299 │ 0.206 │ 35 │ 301 │
│ 200603-S135_L001_R1_001.fastq.gz │ 97157 │ 29174080 │ 300.3 │ 301 │ 301 │ 299 │ 0.238 │ 35 │ 301 │
│ 200702-S2_L001_R1_001.fastq.gz │ 117177 │ 35234949 │ 300.7 │ 301 │ 301 │ 300 │ 0.093 │ 35 │ 301 │
│ 200703-S3_L001_R1_001.fastq.gz │ 139156 │ 41850711 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 86 │ 301 │
│ 200704-S4_L001_R1_001.fastq.gz │ 115559 │ 34510934 │ 298.6 │ 301 │ 301 │ 300 │ 0.260 │ 35 │ 301 │
│ 200705-S5_L001_R1_001.fastq.gz │ 128350 │ 38462972 │ 299.7 │ 301 │ 301 │ 300 │ 0.228 │ 35 │ 301 │
│ 200706-S6_L001_R1_001.fastq.gz │ 117212 │ 34748566 │ 296.5 │ 301 │ 301 │ 300 │ 0.260 │ 35 │ 301 │
│ 200802-S103_L001_R1_001.fastq.gz │ 101572 │ 30372892 │ 299.0 │ 301 │ 301 │ 301 │ 0.291 │ 35 │ 301 │
│ 200803-S104_L001_R1_001.fastq.gz │ 131863 │ 39583103 │ 300.2 │ 301 │ 301 │ 301 │ 0.214 │ 35 │ 301 │
│ 200899-S105_L001_R1_001.fastq.gz │ 127531 │ 38191634 │ 299.5 │ 301 │ 301 │ 300 │ 0.234 │ 35 │ 301 │
│ 200902-S136_L001_R1_001.fastq.gz │ 119614 │ 35851983 │ 299.7 │ 301 │ 301 │ 299 │ 0.244 │ 35 │ 301 │
│ 200904-S137_L001_R1_001.fastq.gz │ 137159 │ 41195209 │ 300.3 │ 301 │ 300 │ 299 │ 0.105 │ 35 │ 301 │
│ 200905-S138_L001_R1_001.fastq.gz │ 101718 │ 30472597 │ 299.6 │ 301 │ 301 │ 300 │ 0.289 │ 35 │ 301 │
│ 200906-S139_L001_R1_001.fastq.gz │ 116950 │ 35140104 │ 300.5 │ 301 │ 301 │ 299 │ 0.119 │ 35 │ 301 │
│ 201002-S1_L001_R1_001.fastq.gz │ 164983 │ 49509070 │ 300.1 │ 301 │ 299 │ 298 │ 0.123 │ 35 │ 301 │
│ 301102-S72_L001_R1_001.fastq.gz │ 111648 │ 33522225 │ 300.2 │ 301 │ 301 │ 300 │ 0.248 │ 35 │ 301 │
│ 301103-S73_L001_R1_001.fastq.gz │ 134229 │ 40310675 │ 300.3 │ 301 │ 301 │ 300 │ 0.201 │ 35 │ 301 │
│ 301104-S74_L001_R1_001.fastq.gz │ 159954 │ 48021586 │ 300.2 │ 301 │ 300 │ 299 │ 0.162 │ 35 │ 301 │
│ 301105-S75_L001_R1_001.fastq.gz │ 149240 │ 44760267 │ 299.9 │ 301 │ 300 │ 299 │ 0.194 │ 35 │ 301 │
│ 301202-S76_L001_R1_001.fastq.gz │ 129360 │ 38852367 │ 300.3 │ 301 │ 301 │ 299 │ 0.187 │ 35 │ 301 │
│ 301203-S77_L001_R1_001.fastq.gz │ 129759 │ 38979143 │ 300.4 │ 301 │ 301 │ 299 │ 0.144 │ 35 │ 301 │
│ 301204-S78_L001_R1_001.fastq.gz │ 124516 │ 37406603 │ 300.4 │ 301 │ 301 │ 299 │ 0.166 │ 35 │ 301 │
│ 301205-S79_L001_R1_001.fastq.gz │ 110261 │ 33139525 │ 300.6 │ 301 │ 301 │ 300 │ 0.163 │ 35 │ 301 │
│ 301206-S80_L001_R1_001.fastq.gz │ 125896 │ 37834323 │ 300.5 │ 301 │ 301 │ 300 │ 0.162 │ 35 │ 301 │
│ 301299-S81_L001_R1_001.fastq.gz │ 112832 │ 33872487 │ 300.2 │ 301 │ 300 │ 299 │ 0.205 │ 35 │ 301 │
│ 303102-S38_L001_R1_001.fastq.gz │ 126745 │ 38103308 │ 300.6 │ 301 │ 301 │ 300 │ 0.160 │ 35 │ 301 │
│ 303202-S39_L001_R1_001.fastq.gz │ 127925 │ 37997809 │ 297.0 │ 301 │ 301 │ 300 │ 0.238 │ 35 │ 301 │
│ 303203-S40_L001_R1_001.fastq.gz │ 136279 │ 40725166 │ 298.8 │ 301 │ 301 │ 300 │ 0.220 │ 35 │ 301 │
│ 303204-S41_L001_R1_001.fastq.gz │ 111187 │ 33332312 │ 299.8 │ 301 │ 301 │ 299 │ 0.264 │ 35 │ 301 │
│ 303205-S42_L001_R1_001.fastq.gz │ 120717 │ 35669441 │ 295.5 │ 301 │ 301 │ 299 │ 0.255 │ 35 │ 301 │
│ 303206-S43_L001_R1_001.fastq.gz │ 135175 │ 40419603 │ 299.0 │ 301 │ 301 │ 300 │ 0.222 │ 35 │ 301 │
│ 303299-S44_L001_R1_001.fastq.gz │ 116796 │ 34984447 │ 299.5 │ 301 │ 301 │ 299 │ 0.253 │ 35 │ 301 │
│ 303402-S45_L001_R1_001.fastq.gz │ 119797 │ 35858803 │ 299.3 │ 301 │ 299 │ 298 │ 0.240 │ 35 │ 301 │
│ 303403-S46_L001_R1_001.fastq.gz │ 186419 │ 55979250 │ 300.3 │ 301 │ 301 │ 299 │ 0.148 │ 35 │ 301 │
│ 303404-S47_L001_R1_001.fastq.gz │ 118567 │ 35652038 │ 300.7 │ 301 │ 301 │ 300 │ 0.078 │ 35 │ 301 │
│ 303405-S48_L001_R1_001.fastq.gz │ 116310 │ 34969402 │ 300.7 │ 301 │ 301 │ 300 │ 0.088 │ 35 │ 301 │
│ 303406-S49_L001_R1_001.fastq.gz │ 115016 │ 34398140 │ 299.1 │ 301 │ 300 │ 299 │ 0.258 │ 35 │ 301 │
│ 400102-S84_L001_R1_001.fastq.gz │ 134777 │ 37410202 │ 277.6 │ 301 │ 301 │ 257 │ 0.244 │ 35 │ 301 │
│ 400103-S85_L001_R1_001.fastq.gz │ 143919 │ 41743878 │ 290.1 │ 301 │ 301 │ 299 │ 0.218 │ 35 │ 301 │
│ 400104-S86_L001_R1_001.fastq.gz │ 124926 │ 36977294 │ 296.0 │ 301 │ 300 │ 299 │ 0.245 │ 35 │ 301 │
│ 400105-S87_L001_R1_001.fastq.gz │ 114701 │ 33863620 │ 295.2 │ 301 │ 301 │ 300 │ 0.268 │ 35 │ 301 │
│ 400106-S88_L001_R1_001.fastq.gz │ 156059 │ 45877631 │ 294.0 │ 301 │ 301 │ 299 │ 0.198 │ 35 │ 301 │
│ 400302-S89_L001_R1_001.fastq.gz │ 146847 │ 44060983 │ 300.0 │ 301 │ 301 │ 300 │ 0.194 │ 35 │ 301 │
│ 400303-S90_L001_R1_001.fastq.gz │ 152645 │ 45637805 │ 299.0 │ 301 │ 299 │ 298 │ 0.198 │ 35 │ 301 │
│ 400304-S91_L001_R1_001.fastq.gz │ 139780 │ 41925656 │ 299.9 │ 301 │ 301 │ 300 │ 0.210 │ 35 │ 301 │
│ 400306-S92_L001_R1_001.fastq.gz │ 149711 │ 44906057 │ 300.0 │ 301 │ 300 │ 298 │ 0.181 │ 35 │ 301 │
│ 400399-S93_L001_R1_001.fastq.gz │ 131503 │ 39519530 │ 300.5 │ 301 │ 301 │ 301 │ 0.172 │ 35 │ 301 │
│ 400402-S97_L001_R1_001.fastq.gz │ 121356 │ 36378975 │ 299.8 │ 301 │ 301 │ 300 │ 0.244 │ 35 │ 301 │
│ 400403-S98_L001_R1_001.fastq.gz │ 138126 │ 41429581 │ 299.9 │ 301 │ 300 │ 299 │ 0.206 │ 35 │ 301 │
│ 400404-S99_L001_R1_001.fastq.gz │ 145812 │ 43816556 │ 300.5 │ 301 │ 301 │ 300 │ 0.176 │ 35 │ 301 │
│ 400405-S100_L001_R1_001.fastq.gz │ 130846 │ 39087045 │ 298.7 │ 301 │ 301 │ 300 │ 0.231 │ 35 │ 301 │
│ 400406-S96_L001_R1_001.fastq.gz │ 122805 │ 36758431 │ 299.3 │ 301 │ 301 │ 300 │ 0.243 │ 35 │ 301 │
│ 400498-S102_L001_R1_001.fastq.gz │ 117750 │ 35196667 │ 298.9 │ 301 │ 301 │ 300 │ 0.253 │ 35 │ 301 │
│ 400499-S101_L001_R1_001.fastq.gz │ 128036 │ 38396715 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35 │ 301 │
│ 401502-S125_L001_R1_001.fastq.gz │ 133535 │ 40072003 │ 300.1 │ 301 │ 301 │ 299 │ 0.197 │ 35 │ 301 │
│ 401503-S126_L001_R1_001.fastq.gz │ 123937 │ 37223689 │ 300.3 │ 301 │ 301 │ 299 │ 0.167 │ 35 │ 301 │
│ 401504-S127_L001_R1_001.fastq.gz │ 110292 │ 33108766 │ 300.2 │ 301 │ 300 │ 299 │ 0.202 │ 35 │ 301 │
│ 401505-S128_L001_R1_001.fastq.gz │ 117017 │ 35141339 │ 300.3 │ 301 │ 300 │ 299 │ 0.131 │ 35 │ 301 │
│ 401506-S129_L001_R1_001.fastq.gz │ 95721 │ 28731675 │ 300.2 │ 301 │ 301 │ 299 │ 0.225 │ 35 │ 301 │
│ 401902-S119_L001_R1_001.fastq.gz │ 134605 │ 40408859 │ 300.2 │ 301 │ 301 │ 300 │ 0.212 │ 35 │ 301 │
│ 401903-S120_L001_R1_001.fastq.gz │ 125445 │ 37659002 │ 300.2 │ 301 │ 301 │ 300 │ 0.209 │ 35 │ 301 │
│ 401904-S121_L001_R1_001.fastq.gz │ 131128 │ 39239268 │ 299.2 │ 301 │ 301 │ 299 │ 0.228 │ 35 │ 301 │
│ 401905-S122_L001_R1_001.fastq.gz │ 80284 │ 24079942 │ 299.9 │ 301 │ 301 │ 300 │ 0.356 │ 35 │ 301 │
│ 401906-S123_L001_R1_001.fastq.gz │ 128938 │ 38727296 │ 300.4 │ 301 │ 301 │ 301 │ 0.212 │ 35 │ 301 │
│ 401999-S124_L001_R1_001.fastq.gz │ 132643 │ 39690036 │ 299.2 │ 301 │ 301 │ 300 │ 0.227 │ 35 │ 301 │
│ 402202-S62_L001_R1_001.fastq.gz │ 105353 │ 31660054 │ 300.5 │ 301 │ 301 │ 300 │ 0.245 │ 35 │ 301 │
│ 402203-S63_L001_R1_001.fastq.gz │ 95134 │ 28606171 │ 300.7 │ 301 │ 301 │ 300 │ 0.167 │ 35 │ 301 │
│ 402204-S64_L001_R1_001.fastq.gz │ 107403 │ 32295082 │ 300.7 │ 301 │ 301 │ 300 │ 0.116 │ 35 │ 301 │
│ 402205-S65_L001_R1_001.fastq.gz │ 92203 │ 27729694 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35 │ 301 │
│ 402206-S66_L001_R1_001.fastq.gz │ 107835 │ 32431189 │ 300.7 │ 301 │ 301 │ 301 │ 0.145 │ 35 │ 301 │
│ 402602-S67_L001_R1_001.fastq.gz │ 119751 │ 35964665 │ 300.3 │ 301 │ 301 │ 301 │ 0.232 │ 35 │ 301 │
│ 402603-S68_L001_R1_001.fastq.gz │ 115001 │ 34480055 │ 299.8 │ 301 │ 301 │ 300 │ 0.255 │ 35 │ 301 │
│ 402604-S69_L001_R1_001.fastq.gz │ 104178 │ 31215971 │ 299.6 │ 301 │ 301 │ 301 │ 0.280 │ 35 │ 301 │
│ 402605-S70_L001_R1_001.fastq.gz │ 143556 │ 43097511 │ 300.2 │ 301 │ 301 │ 300 │ 0.200 │ 35 │ 301 │
│ 402606-S71_L001_R1_001.fastq.gz │ 136853 │ 41150225 │ 300.7 │ 301 │ 301 │ 300 │ 0.055 │ 35 │ 301 │
│ 404302-S130_L001_R1_001.fastq.gz │ 132182 │ 39637404 │ 299.9 │ 301 │ 301 │ 300 │ 0.219 │ 35 │ 301 │
│ 404303-S131_L001_R1_001.fastq.gz │ 124872 │ 37514339 │ 300.4 │ 301 │ 301 │ 300 │ 0.185 │ 35 │ 301 │
│ 404304-S132_L001_R1_001.fastq.gz │ 111924 │ 33600617 │ 300.2 │ 301 │ 301 │ 300 │ 0.247 │ 35 │ 301 │
│ 404305-S133_L001_R1_001.fastq.gz │ 96481 │ 28591819 │ 296.3 │ 301 │ 300 │ 299 │ 0.316 │ 35 │ 301 │
│ 405202-S107_L001_R1_001.fastq.gz │ 117201 │ 35011277 │ 298.7 │ 301 │ 300 │ 299 │ 0.256 │ 35 │ 301 │
│ 405203-S108_L001_R1_001.fastq.gz │ 127049 │ 37630557 │ 296.2 │ 301 │ 301 │ 301 │ 0.241 │ 35 │ 301 │
│ 405204-S109_L001_R1_001.fastq.gz │ 103407 │ 30171591 │ 291.8 │ 301 │ 300 │ 299 │ 0.302 │ 35 │ 301 │
│ 405205-S110_L001_R1_001.fastq.gz │ 94724 │ 27638705 │ 291.8 │ 301 │ 300 │ 293 │ 0.329 │ 35 │ 301 │
│ 405206-S111_L001_R1_001.fastq.gz │ 112195 │ 33399068 │ 297.7 │ 301 │ 300 │ 299 │ 0.266 │ 35 │ 301 │
│ 405302-S112_L001_R1_001.fastq.gz │ 108441 │ 32607145 │ 300.7 │ 301 │ 301 │ 301 │ 0.179 │ 35 │ 301 │
│ 405303-S113_L001_R1_001.fastq.gz │ 120154 │ 36060437 │ 300.1 │ 301 │ 300 │ 298 │ 0.207 │ 35 │ 301 │
│ 405304-S114_L001_R1_001.fastq.gz │ 121694 │ 36538880 │ 300.3 │ 301 │ 300 │ 299 │ 0.180 │ 35 │ 301 │
│ 405305-S115_L001_R1_001.fastq.gz │ 100204 │ 30032416 │ 299.7 │ 301 │ 300 │ 299 │ 0.282 │ 35 │ 301 │
│ 405306-S116_L001_R1_001.fastq.gz │ 103412 │ 30959775 │ 299.4 │ 301 │ 300 │ 298 │ 0.250 │ 35 │ 301 │
│ 405602-S50_L001_R1_001.fastq.gz │ 144151 │ 43324535 │ 300.5 │ 301 │ 301 │ 300 │ 0.117 │ 35 │ 301 │
│ 405603-S51_L001_R1_001.fastq.gz │ 143343 │ 43079313 │ 300.5 │ 301 │ 301 │ 299 │ 0.064 │ 35 │ 301 │
│ 405604-S52_L001_R1_001.fastq.gz │ 131212 │ 39413412 │ 300.4 │ 301 │ 301 │ 299 │ 0.158 │ 35 │ 301 │
│ 405605-S53_L001_R1_001.fastq.gz │ 124667 │ 37467053 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35 │ 301 │
│ 405606-S54_L001_R1_001.fastq.gz │ 146036 │ 43867444 │ 300.4 │ 301 │ 300 │ 299 │ 0.072 │ 35 │ 301 │
│ 405699-S55_L001_R1_001.fastq.gz │ 128499 │ 38596090 │ 300.4 │ 301 │ 301 │ 300 │ 0.206 │ 35 │ 301 │
│ 406002-S56_L001_R1_001.fastq.gz │ 124471 │ 37393565 │ 300.4 │ 301 │ 301 │ 299 │ 0.105 │ 35 │ 301 │
│ 406003-S57_L001_R1_001.fastq.gz │ 103812 │ 31184093 │ 300.4 │ 301 │ 301 │ 299 │ 0.141 │ 1 │ 301 │
│ 406004-S58_L001_R1_001.fastq.gz │ 138285 │ 41542957 │ 300.4 │ 301 │ 301 │ 300 │ 0.183 │ 35 │ 301 │
│ 406005-S59_L001_R1_001.fastq.gz │ 143149 │ 43041084 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35 │ 301 │
│ 406006-S60_L001_R1_001.fastq.gz │ 131859 │ 39561171 │ 300.0 │ 301 │ 300 │ 299 │ 0.213 │ 35 │ 301 │
│ 406502-S25_L001_R1_001.fastq.gz │ 126658 │ 38064498 │ 300.5 │ 301 │ 301 │ 300 │ 0.076 │ 35 │ 301 │
│ 406503-S26_L001_R1_001.fastq.gz │ 127714 │ 38200024 │ 299.1 │ 301 │ 301 │ 300 │ 0.236 │ 35 │ 301 │
│ 406504-S27_L001_R1_001.fastq.gz │ 113363 │ 33949930 │ 299.5 │ 301 │ 301 │ 300 │ 0.262 │ 35 │ 301 │
│ 406505-S28_L001_R1_001.fastq.gz │ 111854 │ 33626963 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35 │ 301 │
│ 406506-S29_L001_R1_001.fastq.gz │ 127794 │ 38409856 │ 300.6 │ 301 │ 301 │ 300 │ 0.090 │ 35 │ 301 │
│ 407002-S30_L001_R1_001.fastq.gz │ 124120 │ 37203784 │ 299.7 │ 301 │ 301 │ 300 │ 0.236 │ 35 │ 301 │
│ 407003-S31_L001_R1_001.fastq.gz │ 115886 │ 34664346 │ 299.1 │ 301 │ 301 │ 300 │ 0.255 │ 35 │ 301 │
│ 407004-S32_L001_R1_001.fastq.gz │ 127180 │ 38136953 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35 │ 301 │
│ 407005-S33_L001_R1_001.fastq.gz │ 107374 │ 32201894 │ 299.9 │ 301 │ 301 │ 300 │ 0.265 │ 35 │ 301 │
│ 407006-S34_L001_R1_001.fastq.gz │ 100642 │ 30124806 │ 299.3 │ 301 │ 300 │ 300 │ 0.288 │ 35 │ 301 │
│ 407099-S35_L001_R1_001.fastq.gz │ 85509 │ 25620473 │ 299.6 │ 301 │ 301 │ 300 │ 0.337 │ 41 │ 301 │
│ NG23-S24_L001_R1_001.fastq.gz │ 6534 │ 1923506 │ 294.4 │ 301 │ 301 │ 300 │ 0.347 │ 35 │ 301 │
│ NG24-S36_L001_R1_001.fastq.gz │ 5067 │ 1485871 │ 293.2 │ 301 │ 301 │ 301 │ 0.293 │ 35 │ 301 │
│ NG27-S82_L001_R1_001.fastq.gz │ 2522 │ 698269 │ 276.9 │ 301 │ 301 │ 301 │ 0.684 │ 35 │ 301 │
│ NG28-S106_L001_R1_001.fastq.gz │ 2492 │ 571390 │ 229.3 │ 301 │ 301 │ 300 │ 0.931 │ 35 │ 301 │
│ NG30-S61_L001_R1_001.fastq.gz │ 4866 │ 1387137 │ 285.1 │ 301 │ 301 │ 301 │ 0.501 │ 35 │ 301 │
│ NG32-S117_L001_R1_001.fastq.gz │ 3554 │ 733993 │ 206.5 │ 301 │ 301 │ 299 │ 0.818 │ 35 │ 301 │
│ NG38-S140_L001_R1_001.fastq.gz │ 2425 │ 482015 │ 198.8 │ 301 │ 301 │ 299 │ 1.057 │ 35 │ 301 │
│ PCRNEG-1-S37_L001_R1_001.fastq.gz │ 6316 │ 1735958 │ 274.9 │ 301 │ 301 │ 301 │ 0.649 │ 35 │ 301 │
│ PCRNEG-2-S83_L001_R1_001.fastq.gz │ 2122 │ 626119 │ 295.1 │ 301 │ 301 │ 301 │ 0.931 │ 35 │ 301 │
│ PCRNEG-3-S118_L001_R1_001.fastq.gz │ 3309 │ 832711 │ 251.7 │ 301 │ 301 │ 300 │ 0.857 │ 35 │ 301 │
│ PCRNEG-4-S142_L001_R1_001.fastq.gz │ 2639 │ 464063 │ 175.8 │ 301 │ 301 │ 297 │ 1.340 │ 35 │ 301 │
│ POS-1-S141_L001_R1_001.fastq.gz │ 135504 │ 40585861 │ 299.5 │ 301 │ 301 │ 300 │ 0.163 │ 35 │ 301 │
│ QEB-1-S143_L001_R1_001.fastq.gz │ 1535 │ 152569 │ 99.4 │ 301 │ 44 │ 35 │ 3.335 │ 35 │ 301 │
└────────────────────────────────────┴────────┴──────────┴───────┴─────┴─────┴─────┴───────┴─────┴─────┘
from dadaist2.
Please see attached -- thanks!
from dadaist2.
Thanks very much. I will try both solutions.
Before you posted this, as I had suspected, I solved the issue by running dada2 using the matchIDs=TRUE
option of the filterAndTrim
command. At first glance, the taxonomic classification results make sense.
from dadaist2.
Thank you. Please see attached log of dadaist2 1.2.2a ran on input_1 files (minus Sample115* files):
from dadaist2.
Hello, I would be most grateful if you can provide some more details to try checking how that happened.
- What OS are you using
- What version of Dadaist2
- Can you attach the log?
- Can paste the total reads counts by
seqfu stats -n reads/*.fastq.gz
to have an overview of the number of samples and their depth
from dadaist2.
- OS: Ubuntu 14.04 LTS
- dadaist2 version: [1.1.0]
- dadaist.log
- There are 1262 samples, so I'm just pasting the first few (the story is the same for all of them):
─────────────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬────────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├─────────────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼────────┼─────┼─────┤
│ ./0300903-27022018-S37_L001_R1_001.fastq.gz │ 132736 │ 39908375 │ 300.7 │ 301 │ 301 │ 300 │ 0.136 │ 35 │ 301 │
│ ./0300903-27022018-S37_L001_R2_001.fastq.gz │ 132736 │ 39873083 │ 300.4 │ 301 │ 300 │ 300 │ 0.131 │ 35 │ 301 │
│ ./0300904-06042018-S38_L001_R1_001.fastq.gz │ 125201 │ 37500347 │ 299.5 │ 301 │ 301 │ 300 │ 0.237 │ 35 │ 301 │
│ ./0300904-06042018-S38_L001_R2_001.fastq.gz │ 125201 │ 37473176 │ 299.3 │ 301 │ 300 │ 300 │ 0.236 │ 35 │ 301 │
│ ./0404504-2019-S98_L001_R1_001.fastq.gz │ 146585 │ 43893185 │ 299.4 │ 301 │ 301 │ 299 │ 0.199 │ 35 │ 301 │
│ ./0404504-2019-S98_L001_R2_001.fastq.gz │ 146585 │ 43884601 │ 299.4 │ 301 │ 300 │ 300 │ 0.199 │ 35 │ 301 │
│ ./100102-S13_L001_R1_001.fastq.gz │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35 │ 301 │
│ ./100102-S13_L001_R2_001.fastq.gz │ 139407 │ 41890087 │ 300.5 │ 301 │ 300 │ 300 │ 0.054 │ 35 │ 301 │
│ ./100102-S48_L001_R1_001.fastq.gz │ 80043 │ 23353810 │ 291.8 │ 301 │ 300 │ 289 │ 0.387 │ 35 │ 301 │
│ ./100102-S48_L001_R2_001.fastq.gz │ 80043 │ 23380877 │ 292.1 │ 301 │ 300 │ 291 │ 0.386 │ 35 │ 301 │
│ ./100102-S68_L001_R1_001.fastq.gz │ 102856 │ 30910462 │ 300.5 │ 301 │ 301 │ 299 │ 0.032 │ 71 │ 301 │
│ ./100102-S68_L001_R2_001.fastq.gz │ 102856 │ 30912377 │ 300.5 │ 301 │ 300 │ 300 │ 0.032 │ 71 │ 301 │
│ ./100202-S14_L001_R1_001.fastq.gz │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35 │ 301 │
│ ./100202-S14_L001_R2_001.fastq.gz │ 146864 │ 44137972 │ 300.5 │ 301 │ 300 │ 300 │ 0.049 │ 35 │ 301
from dadaist2.
If you still have the temp dir, can you also count the reads from /tmp/dadaist2_ZLjc61/for/*gz
and /tmp/dadaist2_ZLjc61/rev/*gz
?
Running the pipeline with --debug
force the temporary directories not to be deleted in any case.
from dadaist2.
Sorry, but a somewhat related question: if the above cannot be solved, using R1-data only may be enough for my purposes. However, I could not find a combination of parameters that would allow me to do so. Is this possible? Thanks very much.
from dadaist2.
Thanks for your help reporting this, hope a fix will be out this week.
For single end mode, that was never implemented because The last time I saw a single end dataset that was a long time ago, so dadaist was born quite opinionated :) this is on the radar but will come later probably.
from dadaist2.
I just pushed an update that should fix your problem, which I suppose can be somehow filesystem related (by default the order of files should be the same for FOR and REV).
The new version 1.2.1 is now on github and should be available also via BioConda in ~ 24 hours; I would be most grateful if you could test it as I could not easily reproduce the problem!
Best
Andrea
from dadaist2.
Great! Thank you. Quick question: what installation method (other than miniconda) would you recommend for v.1.2.1? The one described in the "developmental snapshot"?
from dadaist2.
While not available through BioConda, the only possibility would be something like "dev snapshot".
Since you have a working environment, you can clone the repository somewhere and add it to you PATH temporarily, something like
# Start from a directory you can download the package
git clone [email protected]:quadram-institute-bioscience/dadaist2.git
# Activate your dadaist environment
source activate "dadaist-env-name"
# Add the current directory/dadaist2/bin to PATH
export PATH="$PWD"/dadaist2/bin/:"$PATH"
# Check if it worked
dadaist2 --version
from dadaist2.
Thank you -- the installation worked but I got the same error message(s). I will test if using dada2 outside of dadaist2 results in the same problem and get back to you.
from dadaist2.
Ewww, that's frustrating, sorry about this!
If you have some extra time for me, I pushed an update in the repo with extra checks. If you git pull
inside the repository and try again (version should print 1.2.2 now).
If you can run in debug mode and send me the log, I might finally understand where the files are flipping
dadaist2 --debug {your parameters} 2>&1 | tee dadaist-debug.log
Thanks!
from dadaist2.
Hi @padbc, many thanks for sharing. I tried creating a dataset using your sample names (each with different reads number) but I have been unlucky in solving the issue.
What I can suggest, if you are so kind to keep helping me here, is:
- To pull the latest update (1.2.2a) without bugfixes but with more verbose reporting (can always be run with
--debug
please) and see if this helps seeing more details on the trouble - The second thing is inspired by an issue in DADA2 with common suffixes which I would just attempt renaming the reads in two ways: replacing dashes and with progressive sample names
# This to be run in the place you have your reads:
# in the logs appear as ./ so I used the same here
INPUT=./
# This will produce two output directory input_1 and input_2
mkdir -p input_{1,2}
C=0
seqfu metadata $INPUT | grep -v sample-id > metadata.tsv
set -euo pipefail
while read LINE;
do
C=$((C+1))
sample=$(echo "$LINE" | cut -f1)
for=$(echo "$LINE" | cut -f 2)
rev=$(echo "$LINE" | cut -f 3)
echo -n "Copying $sample... "
cp "$for" "input_1"/Sample${C}_R1.fastq.gz
cp "$rev" "input_1"/Sample${C}_R2.fastq.gz
cp "$for" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R1.fastq.gz
cp "$rev" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R2.fastq.gz
echo Done
done < metadata.tsv
# Check paired reads match
seqfu count input_1/*.gz >/dev/null && echo "OK: input_1"
seqfu count input_2/*.gz >/dev/null && echo "OK: input_2"
# If the following is not printed some program failed
echo "DONE: OK"
from dadaist2.
Yes, I will implement that as well, but I also wanted to improve logging in the meanwhile and I'm really grateful for your patience in this issue!
from dadaist2.
Quick update. The last part of the output of (2) is the following:
ERROR: Counts in R1 and R2 files do not match for input_1/Sample115_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_1/Sample116_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406003xS57_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406002xS56_R1.fastq.gz
We can therefore rule out naming convention, but these are pre-filtered files. Should we expect additional mismatch errors to be introduced after QCing?
from dadaist2.
Since by default QC is non-modifying the FASTQ files (it collects data to feed parameters to DADA2), I'm intrigued.
Are the files related 406003-S57
R1 and R2 containing the same number of reads? Maybe a seqfu count ./*.fastq.gz
can help spot the problem, but also a direct read count of the two files might be useful.
from dadaist2.
Read counts:
./406003-S57_L001_R1_001.fastq.gz 103812
./406003-S57_L001_R1_001.fastq.gz 111495
from dadaist2.
To add to the above: the difference in read numbers is not what's causing the issue, as removing those samples from the analysis throws the same error message.
from dadaist2.
Just to be sure I'm not losing track of info here:
- In the read counts reported above:
./406003-S57_L001_R1_001.fastq.gz 103812
./406003-S57_L001_R1_001.fastq.gz 111495
The two files have the very same name: is one of the two actually the R2?
- The
ERROR: Counts in R1 and R2 files do not match
reported by the previous script can be explained either as:
- a problem in the original files with different counts
- a problem copying the original files in the new directories
Surely, if the only problem was in 406003-S57
this that not explain that the error raised by DADA2 occurred against multiple samples. Under this light, the naming can be still a culprit (because of shared suffixes), and running latest dadaist2 might help understanding when the problem arises, using input_1
and input_2
as input directories (but removing Sample115_*
and 406003xS57_*
respectively from those directories.
from dadaist2.
Could you confirm whether the the matchIDs=TRUE
option of the filterAndTrim
command has been incorporated into the last version of dadaist2? Thanks very much.
from dadaist2.
Hello @padbc
sorry for the late reply, last week we have been busier with a workshop.
matchIDs is not yet implemented, and does not look what we need in this context from the docs (https://rdrr.io/github/benjjneb/dada2/man/fastqPairedFilter.html).
From the log you kindly provided it looks like the vectors are in the correct order, like in this short example:
58,/tmp/dadaist2_2a7XbW/for/Sample22_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample22_R2.fastq.gz
59,/tmp/dadaist2_2a7XbW/for/Sample23_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample23_R2.fastq.gz
60,/tmp/dadaist2_2a7XbW/for/Sample24_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample24_R2.fastq.gz
61,/tmp/dadaist2_2a7XbW/for/Sample25_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample25_R2.fastq.gz
62,/tmp/dadaist2_2a7XbW/for/Sample26_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample26_R2.fastq.gz
63,/tmp/dadaist2_2a7XbW/for/Sample27_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample27_R2.fastq.gz
so I'm still struggling finding the cause of your issue. If you can create a minimal directory with a short selection of samples that cause the problem to you and wish to send it via mail/ link I'm happy to inspect further. Meanwhile I'll try to add some extra checks trying to go closer to the problem.
from dadaist2.
I pushed in the repo 1.2.3 that will count the reads in the input folder and in the filtered folder. Note that this extra information is only available with --debug
and in the output log dadaist.log
, not in the STDERR printed to the screen.
It would be interesting to see if the problem persists also after removing from the input:
- Sample33
- Sample134
- Sample136
- Sample137
- Sample140
- Sample141
- Sample143
from dadaist2.
Great -- thank you. I will try to take a look at this later today.
from dadaist2.
Sorry for the delay here. The errors persists after removing those samples; see attached log.
dadaist.log
from dadaist2.
Ah, thanks a million.
So, before running Dada2 now the log prints the number of sequences from the location they are temporary copied, and here there is one sample with a discrepancy before getting into DADA2.
Sample116_R1.fastq.gz 103812
Sample116_R2.fastq.gz 111495
Now, this does not make sense as you previosly checked the input directory with seqfu counts $DIR/*gz
and I believe no errors where found, but you might try again now and see if there is indeed a problem in the input files.
If not, I cannot figure out why some somples degrades while copying in the temporary directory. Is it a peculiar filesystem maybe?
If you wish, I can try on a differen server, contact me via email at andrea.telatin 🐌 quadram.ac.uk
from dadaist2.
Thanks so much for the quick reply. So yes, deleting Sample116* appears to have "solved" the problem, i.e., dadaist2 produces the expected output. Like you, I find this perplexing.
from dadaist2.
I can only suggest to check with seqfu count inputdir/*gz
the reads before starting, but will implement some extra (optional) checks
from dadaist2.
Related Issues (19)
- V1-V3 HOT 8
- Installation without mamba? HOT 4
- DADA2 Command Modifications HOT 3
- Running dadaist without alignment/tree-building steps HOT 2
- [BUG] Unable to replicate tutorial HOT 14
- [BUG] add wget in the environment
- [BUG] DADA2 ERROR while running my data HOT 4
- Mamba install throws an error with missing packages HOT 1
- Sample names sanity
- [BUG] PhyloSeq creation failed when just concat is used! HOT 2
- Phyloseq file is not generated HOT 5
- Remove samples failing QC HOT 2
- check libraries at start HOT 1
- add progressive sample number HOT 1
- add trailing slash to output dir HOT 1
- taxonomy > microbiome analyst HOT 1
- Input validation --skip-qc
- Reg: Pipeline crashes and enhancements HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dadaist2.