I'm having some trouble with the "preprocessing" script and not able to generate the plot at the end.
I'm using my own data, loading in .ped and .map files and converting them with snp_plinkQC()
and snp_plinkIBDQC()
.
plink <- download_plink()
NCORES <- nb_cores()
technoQC.bed <- snp_plinkQC(prefix.in = "data/technoTeens", plink.path = plink, file.type = "--file", # ped/map geno = 0.05, mind = 0.05, maf = 0.05, hwe = 1e-10, autosome.only = TRUE )
PLINK v1.90b6.4 64-bit (7 Aug 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to data/technoTeens_QC.log.
Options in effect:
--autosome
--file data/technoTeens
--geno 0.05
--hwe 1e-10
--maf 0.05
--make-bed
--mind 0.05
--out data/technoTeens_QC
257658 MB RAM detected; reserving 128829 MB for main workspace.
Allocated 96621 MB successfully, after larger attempt(s) failed.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (669934 variants, 423 people).
--file: data/technoTeens_QC-temporary.bed + data/technoTeens_QC-temporary.bim +
data/technoTeens_QC-temporary.fam written.
669934 variants loaded from .bim file.
423 people (0 males, 0 females, 423 ambiguous) loaded from .fam.
Ambiguous sex IDs written to data/technoTeens_QC.nosex .
1 person removed due to missing genotype data (--mind).
ID written to data/technoTeens_QC.irem .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 422 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate in remaining samples is 0.992935.
8604 variants removed due to missing genotype data (--geno).
--hwe: 2294 variants removed due to Hardy-Weinberg exact test.
338458 variants removed due to minor allele threshold(s)
(--maf/--max-maf/--mac/--max-mac).
320578 variants and 422 people pass filters and QC.
Note: No phenotypes present.
--make-bed to data/technoTeens_QC.bed + data/technoTeens_QC.bim +
data/technoTeens_QC.fam ... done.
technoQC2.bed <- snp_plinkIBDQC( bedfile.in = technoQC.bed, plink.path = plink, ncores = NCORES )
PLINK v1.90b6.4 64-bit (7 Aug 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /tmp/Rtmpg7a6zA/file6adc6e7d579f.log.
Options in effect:
--bfile data/technoTeens_QC
--indep-pairwise 100 1 0.2
--out /tmp/Rtmpg7a6zA/file6adc6e7d579f
257658 MB RAM detected; reserving 128829 MB for main workspace.
Allocated 96621 MB successfully, after larger attempt(s) failed.
320578 variants loaded from .bim file.
422 people (0 males, 0 females, 422 ambiguous) loaded from .fam.
Ambiguous sex IDs written to /tmp/Rtmpg7a6zA/file6adc6e7d579f.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 422 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.997882.
320578 variants and 422 people pass filters and QC.
Note: No phenotypes present.
Pruned 13777 variants from chromosome 1, leaving 11023.
Pruned 15056 variants from chromosome 2, leaving 10855.
Pruned 12506 variants from chromosome 3, leaving 9649.
Pruned 11249 variants from chromosome 4, leaving 8904.
Pruned 10497 variants from chromosome 5, leaving 8289.
Pruned 16064 variants from chromosome 6, leaving 8472.
Pruned 10190 variants from chromosome 7, leaving 7938.
Pruned 9629 variants from chromosome 8, leaving 7051.
Pruned 7762 variants from chromosome 9, leaving 6492.
Pruned 8934 variants from chromosome 10, leaving 7148.
Pruned 8982 variants from chromosome 11, leaving 6730.
Pruned 8438 variants from chromosome 12, leaving 6958.
Pruned 6310 variants from chromosome 13, leaving 5307.
Pruned 5699 variants from chromosome 14, leaving 4841.
Pruned 5450 variants from chromosome 15, leaving 4709.
Pruned 5894 variants from chromosome 16, leaving 5071.
Pruned 5003 variants from chromosome 17, leaving 4729.
Pruned 5036 variants from chromosome 18, leaving 4680.
Pruned 3750 variants from chromosome 19, leaving 3755.
Pruned 4288 variants from chromosome 20, leaving 3808.
Pruned 2469 variants from chromosome 21, leaving 2258.
Pruned 2496 variants from chromosome 22, leaving 2432.
Pruning complete. 179479 of 320578 variants removed.
Marker lists written to /tmp/Rtmpg7a6zA/file6adc6e7d579f.prune.in and
/tmp/Rtmpg7a6zA/file6adc6e7d579f.prune.out .
PLINK v1.90b6.4 64-bit (7 Aug 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to /tmp/Rtmpg7a6zA/file6adc6e7d579f.log.
Options in effect:
--bfile data/technoTeens_QC
--extract /tmp/Rtmpg7a6zA/file6adc6e7d579f.prune.in
--genome
--min 0.08
--out /tmp/Rtmpg7a6zA/file6adc6e7d579f
--threads 23
257658 MB RAM detected; reserving 128829 MB for main workspace.
Allocated 96621 MB successfully, after larger attempt(s) failed.
320578 variants loaded from .bim file.
422 people (0 males, 0 females, 422 ambiguous) loaded from .fam.
Ambiguous sex IDs written to /tmp/Rtmpg7a6zA/file6adc6e7d579f.nosex .
--extract: 141099 variants remaining.
Using up to 23 threads (change this with --threads).
Before main variant filters, 422 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.997893.
141099 variants and 422 people pass filters and QC.
Note: No phenotypes present.
IBD calculations complete.
Finished writing /tmp/Rtmpg7a6zA/file6adc6e7d579f.genome .
PLINK v1.90b6.4 64-bit (7 Aug 2018) www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to data/technoTeens_QC_norel.log.
Options in effect:
--bfile data/technoTeens_QC
--make-bed
--out data/technoTeens_QC_norel
--remove /tmp/Rtmpg7a6zA/file6adc4e65cf65
257658 MB RAM detected; reserving 128829 MB for main workspace.
Allocated 96621 MB successfully, after larger attempt(s) failed.
320578 variants loaded from .bim file.
422 people (0 males, 0 females, 422 ambiguous) loaded from .fam.
Ambiguous sex IDs written to data/technoTeens_QC_norel.nosex .
--remove: 195 people remaining.
Warning: At least 17736 duplicate IDs in --remove file.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 195 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate in remaining samples is 0.997835.
320578 variants and 195 people pass filters and QC.
Note: No phenotypes present.
--make-bed to data/technoTeens_QC_norel.bed + data/technoTeens_QC_norel.bim +
data/technoTeens_QC_norel.fam ... done.
system.time( technoQC.rds <- snp_readBed(technoQC2.bed, "technoQC") )
user system elapsed
1.109 0.054 1.161
str(technoTeens, max.level = 2)
List of 3
$ genotypes:Reference class 'FBM.code256' [package "bigstatsr"] with 10 fields
..and 22 methods, of which 8 are possibly relevant:
.. add_columns, as.FBM, copy#envRefClass, initialize,
.. initialize#FBM, save, show#envRefClass, show#FBM
$ fam :'data.frame': 195 obs. of 6 variables:
..$ family.ID : int [1:195] 1 3 4 6 8 11 12 13 17 21 ...
..$ sample.ID : chr [1:195] "202528140036_R01C01" "202528140036_R02C01" "202528140036_R02C02" "202528140036_R03C02" ...
..$ paternal.ID: int [1:195] 0 0 0 0 0 0 0 0 0 0 ...
..$ maternal.ID: int [1:195] 0 0 0 0 0 0 0 0 0 0 ...
..$ sex : int [1:195] 0 0 0 0 0 0 0 0 0 0 ...
..$ affection : int [1:195] -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 ...
$ map :'data.frame': 320578 obs. of 6 variables:
..$ chromosome : int [1:320578] 1 1 1 1 1 1 1 1 1 1 ...
..$ marker.ID : chr [1:320578] "GSA-rs114420996" "rs3131972" "rs12127425" "rs4970383" ...
..$ genetic.dist: num [1:320578] 0 0 0 0 0 0 0 0 0 0 ...
..$ physical.pos: int [1:320578] 58814 752721 794332 838555 840753 846808 853954 854250 861808 866893 ...
..$ allele1 : chr [1:320578] "A" "G" "A" "A" ...
..$ allele2 : chr [1:320578] "G" "A" "G" "C" ...
- attr(*, "class")= chr "bigSNP"
G <- technoTeens$genotypes
big_counts(G, ind.col = 1:10)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
0 0 31 2 23 38 27 26 14 40 8
1 55 91 31 87 99 89 89 87 97 51
2 133 70 162 83 58 79 80 94 58 136
7 3 0 2 0 0 0 0 0 0
system.time( infos <- snp_fastImpute(G, technoTeens$map$chromosome, ncores = NCORES) )
user system elapsed
0.791 1.128 185.524
plot(subset(infos, pNA > 0.001), pch = 19, cex = 0.5)
Error in subset.default(infos, pNA > 0.001) : object 'pNA' not found``