Coder Social home page Coder Social logo

Comments (29)

padbc avatar padbc commented on July 25, 2024 1

Thanks. The below table is from a run with fewer samples that threw the the same error. I don't see any red flags, but you may think otherwise.

Might the issue stem from this?

What if my forward and reverse reads aren’t in matching order?

This situation commonly arises when external filtering methods, like the QIIME demultiplexing script, are used and filter the forward and reverse reads independently. This can be remedied by using adding matchIDs=TRUE flag to the filterAndTrim or fastqPairedFilter functions. For example, if no more filtering is required, the following will retain just those reads that match between the forward and reverse fastq files (assumes Illumina fastq headers):
filterAndTrim(..., matchIDs=TRUE)

(see https://benjjneb.github.io/dada2/faq.html)

┌────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬───────┬─────┬─────┐
│ File                               │ #Seq   │ Total bp │ Avg   │ N50 │ N75 │ N90 │ auN   │ Min │ Max │
├────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼───────┼─────┼─────┤
│ 100102-S13_L001_R1_001.fastq.gz    │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35  │ 301 │
│ 100202-S14_L001_R1_001.fastq.gz    │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35  │ 301 │
│ 100203-S15_L001_R1_001.fastq.gz    │ 139739 │ 42018747 │ 300.7 │ 301 │ 301 │ 301 │ 0.164 │ 35  │ 301 │
│ 100204-S16_L001_R1_001.fastq.gz    │ 132438 │ 39803309 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35  │ 301 │
│ 100205-S17_L001_R1_001.fastq.gz    │ 144511 │ 43460952 │ 300.7 │ 301 │ 301 │ 300 │ 0.107 │ 35  │ 301 │
│ 100206-S18_L001_R1_001.fastq.gz    │ 135101 │ 40573650 │ 300.3 │ 301 │ 301 │ 301 │ 0.210 │ 35  │ 301 │
│ 100502-S19_L001_R1_001.fastq.gz    │ 154180 │ 46279307 │ 300.2 │ 301 │ 300 │ 299 │ 0.169 │ 35  │ 301 │
│ 100503-S20_L001_R1_001.fastq.gz    │ 136254 │ 40876059 │ 300.0 │ 301 │ 300 │ 298 │ 0.196 │ 35  │ 301 │
│ 100504-S21_L001_R1_001.fastq.gz    │ 149045 │ 44791183 │ 300.5 │ 301 │ 301 │ 299 │ 0.114 │ 35  │ 301 │
│ 100505-S22_L001_R1_001.fastq.gz    │ 144673 │ 43404679 │ 300.0 │ 301 │ 300 │ 299 │ 0.188 │ 35  │ 301 │
│ 100506-S23_L001_R1_001.fastq.gz    │ 139843 │ 41949009 │ 300.0 │ 301 │ 300 │ 299 │ 0.203 │ 35  │ 301 │
│ 200202-S94_L001_R1_001.fastq.gz    │ 149611 │ 44664048 │ 298.5 │ 301 │ 301 │ 300 │ 0.202 │ 35  │ 301 │
│ 200203-S95_L001_R1_001.fastq.gz    │ 103600 │ 31036418 │ 299.6 │ 301 │ 301 │ 300 │ 0.283 │ 35  │ 301 │
│ 200502-S7_L001_R1_001.fastq.gz     │ 144901 │ 43509686 │ 300.3 │ 301 │ 300 │ 299 │ 0.157 │ 35  │ 301 │
│ 200503-S8_L001_R1_001.fastq.gz     │ 137665 │ 41387170 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35  │ 301 │
│ 200504-S9_L001_R1_001.fastq.gz     │ 123302 │ 37034341 │ 300.4 │ 301 │ 301 │ 300 │ 0.222 │ 35  │ 301 │
│ 200505-S10_L001_R1_001.fastq.gz    │ 117559 │ 35345816 │ 300.7 │ 301 │ 301 │ 300 │ 0.104 │ 35  │ 301 │
│ 200506-S11_L001_R1_001.fastq.gz    │ 136077 │ 40842847 │ 300.1 │ 301 │ 300 │ 299 │ 0.185 │ 35  │ 301 │
│ 200599-S12_L001_R1_001.fastq.gz    │ 113326 │ 33910575 │ 299.2 │ 301 │ 301 │ 300 │ 0.264 │ 35  │ 301 │
│ 200602-S134_L001_R1_001.fastq.gz   │ 136183 │ 40852757 │ 300.0 │ 301 │ 300 │ 299 │ 0.206 │ 35  │ 301 │
│ 200603-S135_L001_R1_001.fastq.gz   │ 97157  │ 29174080 │ 300.3 │ 301 │ 301 │ 299 │ 0.238 │ 35  │ 301 │
│ 200702-S2_L001_R1_001.fastq.gz     │ 117177 │ 35234949 │ 300.7 │ 301 │ 301 │ 300 │ 0.093 │ 35  │ 301 │
│ 200703-S3_L001_R1_001.fastq.gz     │ 139156 │ 41850711 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 86  │ 301 │
│ 200704-S4_L001_R1_001.fastq.gz     │ 115559 │ 34510934 │ 298.6 │ 301 │ 301 │ 300 │ 0.260 │ 35  │ 301 │
│ 200705-S5_L001_R1_001.fastq.gz     │ 128350 │ 38462972 │ 299.7 │ 301 │ 301 │ 300 │ 0.228 │ 35  │ 301 │
│ 200706-S6_L001_R1_001.fastq.gz     │ 117212 │ 34748566 │ 296.5 │ 301 │ 301 │ 300 │ 0.260 │ 35  │ 301 │
│ 200802-S103_L001_R1_001.fastq.gz   │ 101572 │ 30372892 │ 299.0 │ 301 │ 301 │ 301 │ 0.291 │ 35  │ 301 │
│ 200803-S104_L001_R1_001.fastq.gz   │ 131863 │ 39583103 │ 300.2 │ 301 │ 301 │ 301 │ 0.214 │ 35  │ 301 │
│ 200899-S105_L001_R1_001.fastq.gz   │ 127531 │ 38191634 │ 299.5 │ 301 │ 301 │ 300 │ 0.234 │ 35  │ 301 │
│ 200902-S136_L001_R1_001.fastq.gz   │ 119614 │ 35851983 │ 299.7 │ 301 │ 301 │ 299 │ 0.244 │ 35  │ 301 │
│ 200904-S137_L001_R1_001.fastq.gz   │ 137159 │ 41195209 │ 300.3 │ 301 │ 300 │ 299 │ 0.105 │ 35  │ 301 │
│ 200905-S138_L001_R1_001.fastq.gz   │ 101718 │ 30472597 │ 299.6 │ 301 │ 301 │ 300 │ 0.289 │ 35  │ 301 │
│ 200906-S139_L001_R1_001.fastq.gz   │ 116950 │ 35140104 │ 300.5 │ 301 │ 301 │ 299 │ 0.119 │ 35  │ 301 │
│ 201002-S1_L001_R1_001.fastq.gz     │ 164983 │ 49509070 │ 300.1 │ 301 │ 299 │ 298 │ 0.123 │ 35  │ 301 │
│ 301102-S72_L001_R1_001.fastq.gz    │ 111648 │ 33522225 │ 300.2 │ 301 │ 301 │ 300 │ 0.248 │ 35  │ 301 │
│ 301103-S73_L001_R1_001.fastq.gz    │ 134229 │ 40310675 │ 300.3 │ 301 │ 301 │ 300 │ 0.201 │ 35  │ 301 │
│ 301104-S74_L001_R1_001.fastq.gz    │ 159954 │ 48021586 │ 300.2 │ 301 │ 300 │ 299 │ 0.162 │ 35  │ 301 │
│ 301105-S75_L001_R1_001.fastq.gz    │ 149240 │ 44760267 │ 299.9 │ 301 │ 300 │ 299 │ 0.194 │ 35  │ 301 │
│ 301202-S76_L001_R1_001.fastq.gz    │ 129360 │ 38852367 │ 300.3 │ 301 │ 301 │ 299 │ 0.187 │ 35  │ 301 │
│ 301203-S77_L001_R1_001.fastq.gz    │ 129759 │ 38979143 │ 300.4 │ 301 │ 301 │ 299 │ 0.144 │ 35  │ 301 │
│ 301204-S78_L001_R1_001.fastq.gz    │ 124516 │ 37406603 │ 300.4 │ 301 │ 301 │ 299 │ 0.166 │ 35  │ 301 │
│ 301205-S79_L001_R1_001.fastq.gz    │ 110261 │ 33139525 │ 300.6 │ 301 │ 301 │ 300 │ 0.163 │ 35  │ 301 │
│ 301206-S80_L001_R1_001.fastq.gz    │ 125896 │ 37834323 │ 300.5 │ 301 │ 301 │ 300 │ 0.162 │ 35  │ 301 │
│ 301299-S81_L001_R1_001.fastq.gz    │ 112832 │ 33872487 │ 300.2 │ 301 │ 300 │ 299 │ 0.205 │ 35  │ 301 │
│ 303102-S38_L001_R1_001.fastq.gz    │ 126745 │ 38103308 │ 300.6 │ 301 │ 301 │ 300 │ 0.160 │ 35  │ 301 │
│ 303202-S39_L001_R1_001.fastq.gz    │ 127925 │ 37997809 │ 297.0 │ 301 │ 301 │ 300 │ 0.238 │ 35  │ 301 │
│ 303203-S40_L001_R1_001.fastq.gz    │ 136279 │ 40725166 │ 298.8 │ 301 │ 301 │ 300 │ 0.220 │ 35  │ 301 │
│ 303204-S41_L001_R1_001.fastq.gz    │ 111187 │ 33332312 │ 299.8 │ 301 │ 301 │ 299 │ 0.264 │ 35  │ 301 │
│ 303205-S42_L001_R1_001.fastq.gz    │ 120717 │ 35669441 │ 295.5 │ 301 │ 301 │ 299 │ 0.255 │ 35  │ 301 │
│ 303206-S43_L001_R1_001.fastq.gz    │ 135175 │ 40419603 │ 299.0 │ 301 │ 301 │ 300 │ 0.222 │ 35  │ 301 │
│ 303299-S44_L001_R1_001.fastq.gz    │ 116796 │ 34984447 │ 299.5 │ 301 │ 301 │ 299 │ 0.253 │ 35  │ 301 │
│ 303402-S45_L001_R1_001.fastq.gz    │ 119797 │ 35858803 │ 299.3 │ 301 │ 299 │ 298 │ 0.240 │ 35  │ 301 │
│ 303403-S46_L001_R1_001.fastq.gz    │ 186419 │ 55979250 │ 300.3 │ 301 │ 301 │ 299 │ 0.148 │ 35  │ 301 │
│ 303404-S47_L001_R1_001.fastq.gz    │ 118567 │ 35652038 │ 300.7 │ 301 │ 301 │ 300 │ 0.078 │ 35  │ 301 │
│ 303405-S48_L001_R1_001.fastq.gz    │ 116310 │ 34969402 │ 300.7 │ 301 │ 301 │ 300 │ 0.088 │ 35  │ 301 │
│ 303406-S49_L001_R1_001.fastq.gz    │ 115016 │ 34398140 │ 299.1 │ 301 │ 300 │ 299 │ 0.258 │ 35  │ 301 │
│ 400102-S84_L001_R1_001.fastq.gz    │ 134777 │ 37410202 │ 277.6 │ 301 │ 301 │ 257 │ 0.244 │ 35  │ 301 │
│ 400103-S85_L001_R1_001.fastq.gz    │ 143919 │ 41743878 │ 290.1 │ 301 │ 301 │ 299 │ 0.218 │ 35  │ 301 │
│ 400104-S86_L001_R1_001.fastq.gz    │ 124926 │ 36977294 │ 296.0 │ 301 │ 300 │ 299 │ 0.245 │ 35  │ 301 │
│ 400105-S87_L001_R1_001.fastq.gz    │ 114701 │ 33863620 │ 295.2 │ 301 │ 301 │ 300 │ 0.268 │ 35  │ 301 │
│ 400106-S88_L001_R1_001.fastq.gz    │ 156059 │ 45877631 │ 294.0 │ 301 │ 301 │ 299 │ 0.198 │ 35  │ 301 │
│ 400302-S89_L001_R1_001.fastq.gz    │ 146847 │ 44060983 │ 300.0 │ 301 │ 301 │ 300 │ 0.194 │ 35  │ 301 │
│ 400303-S90_L001_R1_001.fastq.gz    │ 152645 │ 45637805 │ 299.0 │ 301 │ 299 │ 298 │ 0.198 │ 35  │ 301 │
│ 400304-S91_L001_R1_001.fastq.gz    │ 139780 │ 41925656 │ 299.9 │ 301 │ 301 │ 300 │ 0.210 │ 35  │ 301 │
│ 400306-S92_L001_R1_001.fastq.gz    │ 149711 │ 44906057 │ 300.0 │ 301 │ 300 │ 298 │ 0.181 │ 35  │ 301 │
│ 400399-S93_L001_R1_001.fastq.gz    │ 131503 │ 39519530 │ 300.5 │ 301 │ 301 │ 301 │ 0.172 │ 35  │ 301 │
│ 400402-S97_L001_R1_001.fastq.gz    │ 121356 │ 36378975 │ 299.8 │ 301 │ 301 │ 300 │ 0.244 │ 35  │ 301 │
│ 400403-S98_L001_R1_001.fastq.gz    │ 138126 │ 41429581 │ 299.9 │ 301 │ 300 │ 299 │ 0.206 │ 35  │ 301 │
│ 400404-S99_L001_R1_001.fastq.gz    │ 145812 │ 43816556 │ 300.5 │ 301 │ 301 │ 300 │ 0.176 │ 35  │ 301 │
│ 400405-S100_L001_R1_001.fastq.gz   │ 130846 │ 39087045 │ 298.7 │ 301 │ 301 │ 300 │ 0.231 │ 35  │ 301 │
│ 400406-S96_L001_R1_001.fastq.gz    │ 122805 │ 36758431 │ 299.3 │ 301 │ 301 │ 300 │ 0.243 │ 35  │ 301 │
│ 400498-S102_L001_R1_001.fastq.gz   │ 117750 │ 35196667 │ 298.9 │ 301 │ 301 │ 300 │ 0.253 │ 35  │ 301 │
│ 400499-S101_L001_R1_001.fastq.gz   │ 128036 │ 38396715 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35  │ 301 │
│ 401502-S125_L001_R1_001.fastq.gz   │ 133535 │ 40072003 │ 300.1 │ 301 │ 301 │ 299 │ 0.197 │ 35  │ 301 │
│ 401503-S126_L001_R1_001.fastq.gz   │ 123937 │ 37223689 │ 300.3 │ 301 │ 301 │ 299 │ 0.167 │ 35  │ 301 │
│ 401504-S127_L001_R1_001.fastq.gz   │ 110292 │ 33108766 │ 300.2 │ 301 │ 300 │ 299 │ 0.202 │ 35  │ 301 │
│ 401505-S128_L001_R1_001.fastq.gz   │ 117017 │ 35141339 │ 300.3 │ 301 │ 300 │ 299 │ 0.131 │ 35  │ 301 │
│ 401506-S129_L001_R1_001.fastq.gz   │ 95721  │ 28731675 │ 300.2 │ 301 │ 301 │ 299 │ 0.225 │ 35  │ 301 │
│ 401902-S119_L001_R1_001.fastq.gz   │ 134605 │ 40408859 │ 300.2 │ 301 │ 301 │ 300 │ 0.212 │ 35  │ 301 │
│ 401903-S120_L001_R1_001.fastq.gz   │ 125445 │ 37659002 │ 300.2 │ 301 │ 301 │ 300 │ 0.209 │ 35  │ 301 │
│ 401904-S121_L001_R1_001.fastq.gz   │ 131128 │ 39239268 │ 299.2 │ 301 │ 301 │ 299 │ 0.228 │ 35  │ 301 │
│ 401905-S122_L001_R1_001.fastq.gz   │ 80284  │ 24079942 │ 299.9 │ 301 │ 301 │ 300 │ 0.356 │ 35  │ 301 │
│ 401906-S123_L001_R1_001.fastq.gz   │ 128938 │ 38727296 │ 300.4 │ 301 │ 301 │ 301 │ 0.212 │ 35  │ 301 │
│ 401999-S124_L001_R1_001.fastq.gz   │ 132643 │ 39690036 │ 299.2 │ 301 │ 301 │ 300 │ 0.227 │ 35  │ 301 │
│ 402202-S62_L001_R1_001.fastq.gz    │ 105353 │ 31660054 │ 300.5 │ 301 │ 301 │ 300 │ 0.245 │ 35  │ 301 │
│ 402203-S63_L001_R1_001.fastq.gz    │ 95134  │ 28606171 │ 300.7 │ 301 │ 301 │ 300 │ 0.167 │ 35  │ 301 │
│ 402204-S64_L001_R1_001.fastq.gz    │ 107403 │ 32295082 │ 300.7 │ 301 │ 301 │ 300 │ 0.116 │ 35  │ 301 │
│ 402205-S65_L001_R1_001.fastq.gz    │ 92203  │ 27729694 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35  │ 301 │
│ 402206-S66_L001_R1_001.fastq.gz    │ 107835 │ 32431189 │ 300.7 │ 301 │ 301 │ 301 │ 0.145 │ 35  │ 301 │
│ 402602-S67_L001_R1_001.fastq.gz    │ 119751 │ 35964665 │ 300.3 │ 301 │ 301 │ 301 │ 0.232 │ 35  │ 301 │
│ 402603-S68_L001_R1_001.fastq.gz    │ 115001 │ 34480055 │ 299.8 │ 301 │ 301 │ 300 │ 0.255 │ 35  │ 301 │
│ 402604-S69_L001_R1_001.fastq.gz    │ 104178 │ 31215971 │ 299.6 │ 301 │ 301 │ 301 │ 0.280 │ 35  │ 301 │
│ 402605-S70_L001_R1_001.fastq.gz    │ 143556 │ 43097511 │ 300.2 │ 301 │ 301 │ 300 │ 0.200 │ 35  │ 301 │
│ 402606-S71_L001_R1_001.fastq.gz    │ 136853 │ 41150225 │ 300.7 │ 301 │ 301 │ 300 │ 0.055 │ 35  │ 301 │
│ 404302-S130_L001_R1_001.fastq.gz   │ 132182 │ 39637404 │ 299.9 │ 301 │ 301 │ 300 │ 0.219 │ 35  │ 301 │
│ 404303-S131_L001_R1_001.fastq.gz   │ 124872 │ 37514339 │ 300.4 │ 301 │ 301 │ 300 │ 0.185 │ 35  │ 301 │
│ 404304-S132_L001_R1_001.fastq.gz   │ 111924 │ 33600617 │ 300.2 │ 301 │ 301 │ 300 │ 0.247 │ 35  │ 301 │
│ 404305-S133_L001_R1_001.fastq.gz   │ 96481  │ 28591819 │ 296.3 │ 301 │ 300 │ 299 │ 0.316 │ 35  │ 301 │
│ 405202-S107_L001_R1_001.fastq.gz   │ 117201 │ 35011277 │ 298.7 │ 301 │ 300 │ 299 │ 0.256 │ 35  │ 301 │
│ 405203-S108_L001_R1_001.fastq.gz   │ 127049 │ 37630557 │ 296.2 │ 301 │ 301 │ 301 │ 0.241 │ 35  │ 301 │
│ 405204-S109_L001_R1_001.fastq.gz   │ 103407 │ 30171591 │ 291.8 │ 301 │ 300 │ 299 │ 0.302 │ 35  │ 301 │
│ 405205-S110_L001_R1_001.fastq.gz   │ 94724  │ 27638705 │ 291.8 │ 301 │ 300 │ 293 │ 0.329 │ 35  │ 301 │
│ 405206-S111_L001_R1_001.fastq.gz   │ 112195 │ 33399068 │ 297.7 │ 301 │ 300 │ 299 │ 0.266 │ 35  │ 301 │
│ 405302-S112_L001_R1_001.fastq.gz   │ 108441 │ 32607145 │ 300.7 │ 301 │ 301 │ 301 │ 0.179 │ 35  │ 301 │
│ 405303-S113_L001_R1_001.fastq.gz   │ 120154 │ 36060437 │ 300.1 │ 301 │ 300 │ 298 │ 0.207 │ 35  │ 301 │
│ 405304-S114_L001_R1_001.fastq.gz   │ 121694 │ 36538880 │ 300.3 │ 301 │ 300 │ 299 │ 0.180 │ 35  │ 301 │
│ 405305-S115_L001_R1_001.fastq.gz   │ 100204 │ 30032416 │ 299.7 │ 301 │ 300 │ 299 │ 0.282 │ 35  │ 301 │
│ 405306-S116_L001_R1_001.fastq.gz   │ 103412 │ 30959775 │ 299.4 │ 301 │ 300 │ 298 │ 0.250 │ 35  │ 301 │
│ 405602-S50_L001_R1_001.fastq.gz    │ 144151 │ 43324535 │ 300.5 │ 301 │ 301 │ 300 │ 0.117 │ 35  │ 301 │
│ 405603-S51_L001_R1_001.fastq.gz    │ 143343 │ 43079313 │ 300.5 │ 301 │ 301 │ 299 │ 0.064 │ 35  │ 301 │
│ 405604-S52_L001_R1_001.fastq.gz    │ 131212 │ 39413412 │ 300.4 │ 301 │ 301 │ 299 │ 0.158 │ 35  │ 301 │
│ 405605-S53_L001_R1_001.fastq.gz    │ 124667 │ 37467053 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35  │ 301 │
│ 405606-S54_L001_R1_001.fastq.gz    │ 146036 │ 43867444 │ 300.4 │ 301 │ 300 │ 299 │ 0.072 │ 35  │ 301 │
│ 405699-S55_L001_R1_001.fastq.gz    │ 128499 │ 38596090 │ 300.4 │ 301 │ 301 │ 300 │ 0.206 │ 35  │ 301 │
│ 406002-S56_L001_R1_001.fastq.gz    │ 124471 │ 37393565 │ 300.4 │ 301 │ 301 │ 299 │ 0.105 │ 35  │ 301 │
│ 406003-S57_L001_R1_001.fastq.gz    │ 103812 │ 31184093 │ 300.4 │ 301 │ 301 │ 299 │ 0.141 │ 1   │ 301 │
│ 406004-S58_L001_R1_001.fastq.gz    │ 138285 │ 41542957 │ 300.4 │ 301 │ 301 │ 300 │ 0.183 │ 35  │ 301 │
│ 406005-S59_L001_R1_001.fastq.gz    │ 143149 │ 43041084 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35  │ 301 │
│ 406006-S60_L001_R1_001.fastq.gz    │ 131859 │ 39561171 │ 300.0 │ 301 │ 300 │ 299 │ 0.213 │ 35  │ 301 │
│ 406502-S25_L001_R1_001.fastq.gz    │ 126658 │ 38064498 │ 300.5 │ 301 │ 301 │ 300 │ 0.076 │ 35  │ 301 │
│ 406503-S26_L001_R1_001.fastq.gz    │ 127714 │ 38200024 │ 299.1 │ 301 │ 301 │ 300 │ 0.236 │ 35  │ 301 │
│ 406504-S27_L001_R1_001.fastq.gz    │ 113363 │ 33949930 │ 299.5 │ 301 │ 301 │ 300 │ 0.262 │ 35  │ 301 │
│ 406505-S28_L001_R1_001.fastq.gz    │ 111854 │ 33626963 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35  │ 301 │
│ 406506-S29_L001_R1_001.fastq.gz    │ 127794 │ 38409856 │ 300.6 │ 301 │ 301 │ 300 │ 0.090 │ 35  │ 301 │
│ 407002-S30_L001_R1_001.fastq.gz    │ 124120 │ 37203784 │ 299.7 │ 301 │ 301 │ 300 │ 0.236 │ 35  │ 301 │
│ 407003-S31_L001_R1_001.fastq.gz    │ 115886 │ 34664346 │ 299.1 │ 301 │ 301 │ 300 │ 0.255 │ 35  │ 301 │
│ 407004-S32_L001_R1_001.fastq.gz    │ 127180 │ 38136953 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35  │ 301 │
│ 407005-S33_L001_R1_001.fastq.gz    │ 107374 │ 32201894 │ 299.9 │ 301 │ 301 │ 300 │ 0.265 │ 35  │ 301 │
│ 407006-S34_L001_R1_001.fastq.gz    │ 100642 │ 30124806 │ 299.3 │ 301 │ 300 │ 300 │ 0.288 │ 35  │ 301 │
│ 407099-S35_L001_R1_001.fastq.gz    │ 85509  │ 25620473 │ 299.6 │ 301 │ 301 │ 300 │ 0.337 │ 41  │ 301 │
│ NG23-S24_L001_R1_001.fastq.gz      │ 6534   │ 1923506  │ 294.4 │ 301 │ 301 │ 300 │ 0.347 │ 35  │ 301 │
│ NG24-S36_L001_R1_001.fastq.gz      │ 5067   │ 1485871  │ 293.2 │ 301 │ 301 │ 301 │ 0.293 │ 35  │ 301 │
│ NG27-S82_L001_R1_001.fastq.gz      │ 2522   │ 698269   │ 276.9 │ 301 │ 301 │ 301 │ 0.684 │ 35  │ 301 │
│ NG28-S106_L001_R1_001.fastq.gz     │ 2492   │ 571390   │ 229.3 │ 301 │ 301 │ 300 │ 0.931 │ 35  │ 301 │
│ NG30-S61_L001_R1_001.fastq.gz      │ 4866   │ 1387137  │ 285.1 │ 301 │ 301 │ 301 │ 0.501 │ 35  │ 301 │
│ NG32-S117_L001_R1_001.fastq.gz     │ 3554   │ 733993   │ 206.5 │ 301 │ 301 │ 299 │ 0.818 │ 35  │ 301 │
│ NG38-S140_L001_R1_001.fastq.gz     │ 2425   │ 482015   │ 198.8 │ 301 │ 301 │ 299 │ 1.057 │ 35  │ 301 │
│ PCRNEG-1-S37_L001_R1_001.fastq.gz  │ 6316   │ 1735958  │ 274.9 │ 301 │ 301 │ 301 │ 0.649 │ 35  │ 301 │
│ PCRNEG-2-S83_L001_R1_001.fastq.gz  │ 2122   │ 626119   │ 295.1 │ 301 │ 301 │ 301 │ 0.931 │ 35  │ 301 │
│ PCRNEG-3-S118_L001_R1_001.fastq.gz │ 3309   │ 832711   │ 251.7 │ 301 │ 301 │ 300 │ 0.857 │ 35  │ 301 │
│ PCRNEG-4-S142_L001_R1_001.fastq.gz │ 2639   │ 464063   │ 175.8 │ 301 │ 301 │ 297 │ 1.340 │ 35  │ 301 │
│ POS-1-S141_L001_R1_001.fastq.gz    │ 135504 │ 40585861 │ 299.5 │ 301 │ 301 │ 300 │ 0.163 │ 35  │ 301 │
│ QEB-1-S143_L001_R1_001.fastq.gz    │ 1535   │ 152569   │ 99.4  │ 301 │ 44  │ 35  │ 3.335 │ 35  │ 301 │
└────────────────────────────────────┴────────┴──────────┴───────┴─────┴─────┴─────┴───────┴─────┴─────┘

from dadaist2.

padbc avatar padbc commented on July 25, 2024 1

Please see attached -- thanks!

dadaist-debug.log

from dadaist2.

padbc avatar padbc commented on July 25, 2024 1

Thanks very much. I will try both solutions.

Before you posted this, as I had suspected, I solved the issue by running dada2 using the matchIDs=TRUE option of the filterAndTrim command. At first glance, the taxonomic classification results make sense.

from dadaist2.

padbc avatar padbc commented on July 25, 2024 1

Thank you. Please see attached log of dadaist2 1.2.2a ran on input_1 files (minus Sample115* files):

dadaist-debug-input1.log

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Hello, I would be most grateful if you can provide some more details to try checking how that happened.

  • What OS are you using
  • What version of Dadaist2
  • Can you attach the log?
  • Can paste the total reads counts by seqfu stats -n reads/*.fastq.gz to have an overview of the number of samples and their depth

from dadaist2.

padbc avatar padbc commented on July 25, 2024
  • OS: Ubuntu 14.04 LTS
  • dadaist2 version: [1.1.0]
  • dadaist.log
  • There are 1262 samples, so I'm just pasting the first few (the story is the same for all of them):
    ─────────────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬────────┬─────┬─────┐
    │ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
    ├─────────────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼────────┼─────┼─────┤
    │ ./0300903-27022018-S37_L001_R1_001.fastq.gz │ 132736 │ 39908375 │ 300.7 │ 301 │ 301 │ 300 │ 0.136 │ 35 │ 301 │
    │ ./0300903-27022018-S37_L001_R2_001.fastq.gz │ 132736 │ 39873083 │ 300.4 │ 301 │ 300 │ 300 │ 0.131 │ 35 │ 301 │
    │ ./0300904-06042018-S38_L001_R1_001.fastq.gz │ 125201 │ 37500347 │ 299.5 │ 301 │ 301 │ 300 │ 0.237 │ 35 │ 301 │
    │ ./0300904-06042018-S38_L001_R2_001.fastq.gz │ 125201 │ 37473176 │ 299.3 │ 301 │ 300 │ 300 │ 0.236 │ 35 │ 301 │
    │ ./0404504-2019-S98_L001_R1_001.fastq.gz │ 146585 │ 43893185 │ 299.4 │ 301 │ 301 │ 299 │ 0.199 │ 35 │ 301 │
    │ ./0404504-2019-S98_L001_R2_001.fastq.gz │ 146585 │ 43884601 │ 299.4 │ 301 │ 300 │ 300 │ 0.199 │ 35 │ 301 │
    │ ./100102-S13_L001_R1_001.fastq.gz │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35 │ 301 │
    │ ./100102-S13_L001_R2_001.fastq.gz │ 139407 │ 41890087 │ 300.5 │ 301 │ 300 │ 300 │ 0.054 │ 35 │ 301 │
    │ ./100102-S48_L001_R1_001.fastq.gz │ 80043 │ 23353810 │ 291.8 │ 301 │ 300 │ 289 │ 0.387 │ 35 │ 301 │
    │ ./100102-S48_L001_R2_001.fastq.gz │ 80043 │ 23380877 │ 292.1 │ 301 │ 300 │ 291 │ 0.386 │ 35 │ 301 │
    │ ./100102-S68_L001_R1_001.fastq.gz │ 102856 │ 30910462 │ 300.5 │ 301 │ 301 │ 299 │ 0.032 │ 71 │ 301 │
    │ ./100102-S68_L001_R2_001.fastq.gz │ 102856 │ 30912377 │ 300.5 │ 301 │ 300 │ 300 │ 0.032 │ 71 │ 301 │
    │ ./100202-S14_L001_R1_001.fastq.gz │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35 │ 301 │
    │ ./100202-S14_L001_R2_001.fastq.gz │ 146864 │ 44137972 │ 300.5 │ 301 │ 300 │ 300 │ 0.049 │ 35 │ 301

from dadaist2.

telatin avatar telatin commented on July 25, 2024

If you still have the temp dir, can you also count the reads from /tmp/dadaist2_ZLjc61/for/*gz and /tmp/dadaist2_ZLjc61/rev/*gz?
Running the pipeline with --debug force the temporary directories not to be deleted in any case.

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Sorry, but a somewhat related question: if the above cannot be solved, using R1-data only may be enough for my purposes. However, I could not find a combination of parameters that would allow me to do so. Is this possible? Thanks very much.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Thanks for your help reporting this, hope a fix will be out this week.

For single end mode, that was never implemented because The last time I saw a single end dataset that was a long time ago, so dadaist was born quite opinionated :) this is on the radar but will come later probably.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

I just pushed an update that should fix your problem, which I suppose can be somehow filesystem related (by default the order of files should be the same for FOR and REV).
The new version 1.2.1 is now on github and should be available also via BioConda in ~ 24 hours; I would be most grateful if you could test it as I could not easily reproduce the problem!
Best
Andrea

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Great! Thank you. Quick question: what installation method (other than miniconda) would you recommend for v.1.2.1? The one described in the "developmental snapshot"?

from dadaist2.

telatin avatar telatin commented on July 25, 2024

While not available through BioConda, the only possibility would be something like "dev snapshot".
Since you have a working environment, you can clone the repository somewhere and add it to you PATH temporarily, something like

# Start from a directory you can download the package
git clone [email protected]:quadram-institute-bioscience/dadaist2.git

# Activate your dadaist environment
source activate "dadaist-env-name"

# Add the current directory/dadaist2/bin to PATH
export PATH="$PWD"/dadaist2/bin/:"$PATH"

# Check if it worked
dadaist2 --version 

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Thank you -- the installation worked but I got the same error message(s). I will test if using dada2 outside of dadaist2 results in the same problem and get back to you.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Ewww, that's frustrating, sorry about this!
If you have some extra time for me, I pushed an update in the repo with extra checks. If you git pull inside the repository and try again (version should print 1.2.2 now).

If you can run in debug mode and send me the log, I might finally understand where the files are flipping

dadaist2 --debug {your parameters} 2>&1 | tee dadaist-debug.log

Thanks!

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Hi @padbc, many thanks for sharing. I tried creating a dataset using your sample names (each with different reads number) but I have been unlucky in solving the issue.

What I can suggest, if you are so kind to keep helping me here, is:

  1. To pull the latest update (1.2.2a) without bugfixes but with more verbose reporting (can always be run with --debug please) and see if this helps seeing more details on the trouble
  2. The second thing is inspired by an issue in DADA2 with common suffixes which I would just attempt renaming the reads in two ways: replacing dashes and with progressive sample names
# This to be run in the place you have your reads: 
# in the logs appear as ./ so I used the same here
INPUT=./

# This will produce two output directory input_1 and input_2
mkdir -p input_{1,2}
C=0
seqfu metadata $INPUT | grep -v sample-id > metadata.tsv

set -euo pipefail
while read LINE;
do
   C=$((C+1))
   sample=$(echo "$LINE" | cut -f1)
   for=$(echo "$LINE" | cut -f 2)
   rev=$(echo "$LINE" | cut -f 3)
   echo -n "Copying $sample... "
   cp "$for" "input_1"/Sample${C}_R1.fastq.gz
   cp "$rev" "input_1"/Sample${C}_R2.fastq.gz
   cp "$for" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R1.fastq.gz
   cp "$rev" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R2.fastq.gz
   echo Done
done < metadata.tsv 

# Check paired reads match
seqfu count input_1/*.gz >/dev/null && echo "OK: input_1"
seqfu count input_2/*.gz >/dev/null && echo "OK: input_2"

# If the following is not printed some program failed
echo "DONE: OK"

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Yes, I will implement that as well, but I also wanted to improve logging in the meanwhile and I'm really grateful for your patience in this issue!

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Quick update. The last part of the output of (2) is the following:

ERROR: Counts in R1 and R2 files do not match for input_1/Sample115_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_1/Sample116_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406003xS57_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406002xS56_R1.fastq.gz

We can therefore rule out naming convention, but these are pre-filtered files. Should we expect additional mismatch errors to be introduced after QCing?

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Since by default QC is non-modifying the FASTQ files (it collects data to feed parameters to DADA2), I'm intrigued.
Are the files related 406003-S57 R1 and R2 containing the same number of reads? Maybe a seqfu count ./*.fastq.gz can help spot the problem, but also a direct read count of the two files might be useful.

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Read counts:

./406003-S57_L001_R1_001.fastq.gz       103812  
./406003-S57_L001_R1_001.fastq.gz       111495 

from dadaist2.

padbc avatar padbc commented on July 25, 2024

To add to the above: the difference in read numbers is not what's causing the issue, as removing those samples from the analysis throws the same error message.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Just to be sure I'm not losing track of info here:

  1. In the read counts reported above:
./406003-S57_L001_R1_001.fastq.gz       103812  
./406003-S57_L001_R1_001.fastq.gz       111495 

The two files have the very same name: is one of the two actually the R2?

  1. The ERROR: Counts in R1 and R2 files do not match reported by the previous script can be explained either as:
  • a problem in the original files with different counts
  • a problem copying the original files in the new directories

Surely, if the only problem was in 406003-S57 this that not explain that the error raised by DADA2 occurred against multiple samples. Under this light, the naming can be still a culprit (because of shared suffixes), and running latest dadaist2 might help understanding when the problem arises, using input_1 and input_2 as input directories (but removing Sample115_* and 406003xS57_* respectively from those directories.

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Could you confirm whether the the matchIDs=TRUE option of the filterAndTrim command has been incorporated into the last version of dadaist2? Thanks very much.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Hello @padbc
sorry for the late reply, last week we have been busier with a workshop.

matchIDs is not yet implemented, and does not look what we need in this context from the docs (https://rdrr.io/github/benjjneb/dada2/man/fastqPairedFilter.html).

From the log you kindly provided it looks like the vectors are in the correct order, like in this short example:

58,/tmp/dadaist2_2a7XbW/for/Sample22_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample22_R2.fastq.gz 
59,/tmp/dadaist2_2a7XbW/for/Sample23_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample23_R2.fastq.gz 
60,/tmp/dadaist2_2a7XbW/for/Sample24_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample24_R2.fastq.gz 
61,/tmp/dadaist2_2a7XbW/for/Sample25_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample25_R2.fastq.gz 
62,/tmp/dadaist2_2a7XbW/for/Sample26_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample26_R2.fastq.gz 
63,/tmp/dadaist2_2a7XbW/for/Sample27_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample27_R2.fastq.gz 

so I'm still struggling finding the cause of your issue. If you can create a minimal directory with a short selection of samples that cause the problem to you and wish to send it via mail/ link I'm happy to inspect further. Meanwhile I'll try to add some extra checks trying to go closer to the problem.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

I pushed in the repo 1.2.3 that will count the reads in the input folder and in the filtered folder. Note that this extra information is only available with --debug and in the output log dadaist.log, not in the STDERR printed to the screen.

It would be interesting to see if the problem persists also after removing from the input:

  • Sample33
  • Sample134
  • Sample136
  • Sample137
  • Sample140
  • Sample141
  • Sample143

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Great -- thank you. I will try to take a look at this later today.

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Sorry for the delay here. The errors persists after removing those samples; see attached log.
dadaist.log

from dadaist2.

telatin avatar telatin commented on July 25, 2024

Ah, thanks a million.
So, before running Dada2 now the log prints the number of sequences from the location they are temporary copied, and here there is one sample with a discrepancy before getting into DADA2.

Sample116_R1.fastq.gz	103812
Sample116_R2.fastq.gz	111495

Now, this does not make sense as you previosly checked the input directory with seqfu counts $DIR/*gz and I believe no errors where found, but you might try again now and see if there is indeed a problem in the input files.

If not, I cannot figure out why some somples degrades while copying in the temporary directory. Is it a peculiar filesystem maybe?
If you wish, I can try on a differen server, contact me via email at andrea.telatin 🐌 quadram.ac.uk

from dadaist2.

padbc avatar padbc commented on July 25, 2024

Thanks so much for the quick reply. So yes, deleting Sample116* appears to have "solved" the problem, i.e., dadaist2 produces the expected output. Like you, I find this perplexing.

from dadaist2.

telatin avatar telatin commented on July 25, 2024

I can only suggest to check with seqfu count inputdir/*gz the reads before starting, but will implement some extra (optional) checks

from dadaist2.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.