alexkychen / assignpop Goto Github PK

Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.

Home Page: http://alexkychen.github.io/assignPOP/

License: GNU General Public License v3.0

R 100.00%

population-assignment population-genomics machine-learning r radseq gbs data-integration cross-validation

assignpop's People

Contributors

Stargazers

Watchers

Forkers

zemmos schnappi-wkl wangpanqiao teunbrand

assignpop's Issues

Problem with read.Structure for fixed SNP loci

Hi,

I have noticed a problem with the function read.Structure, which doubled the number of row of the DataMatrix (YOUR_LIST_NAME$DataMatrix) compared to the number of individuals (as in YOUR_LIST_NAME$SampleID). I have encountered this problem with a dataset in which some loci were fixed.

This problem was caused by the structure_onehot internal function, which doesn't seem to manage fixed loci (keep only 1 colum for fixed loci and double the number rows).

Now that I have noticed it I had removed fixed loci, but maybe adding a warning, or modifying the way structure_onehot works migth prevent another person from problems with this package (which, by the way, I really like!).

Error running subset of individuals/loci (assign.matrix)

Hi and thanks for the great software,

Everything has been working well so far (thanks for the documentation!), however, I'm getting an error when I run assign.matrix with for specific proportions of loci/individuals. The error states: Error in if (!train.inds == "all") { : the condition has length > 1.

The code below works well:
assign.matrix( dir="assignPOP/MSI_vs_North/known_assignPOP_Result/")

However, the following code produces the error:
assign.matrix( dir="assignPOP/MSI_vs_North/known_assignPOP_Result/", train.inds=c(0.7, 0.9), train.loci=c(0.5, 1))

This was the code I used to generate the results:
assign.MC(known_2_assnPOP_data_95, train.inds=c(0.5, 0.7, 0.9), train.loci=c(0.1, 0.25, 0.5, 1), loci.sample="fst", iterations=100, model="svm", dir="assignPOP/MSI_vs_North/known_assignPOP_Result/")

I'd appreciate any help you can offer!
Quinn

Error with assign.X

Hi Alex,

I was hoping you would be able to help me with an error I'm getting when using assign.X please?
I am using non-genetic data only to try and assign my unknown individuals with the following code:
assign.X(Baselines, unknown, dir="Wavelet19/",
pca.method=TRUE, scaled = TRUE, model="randomForest")
But I receive the following error:
"Error in is.factor(x[, 1]) : object 'x' not found"
I have tried to look through the code for assign.X but I don't have enough experience in R to be able to understand it all. I have made sure the colnames for Baselines (x1) and unknown (x2) are the same apart from the pop column in x1, and they are both dataframes.
I ran older scripts, where assign.X has worked for me previously, but they are not working now. I have tried using different versions of R also, in response to the warning I got about the version of that assignPOP was built under, but this made no difference.

Any help would be really appreciated.

Thanks,

Emma

Installation not working

Hello,

When I try to install using install.packages("assignPOP") I get this error:
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called ‘CVST’
When I try to install the CVST package I get this error:
ERROR: dependency ‘kernlab’ is not available for package ‘CVST’
When I try installing the kernlab package I get this error:
ERROR: compilation failed for package ‘kernlab’

Any idea how to get around this issue?

arguments imply differing number of rows

Hi Alex,

I'm having a similar issue to #13, except with assign.MC. I can run assign.X with the same data frame and it does not error out, but when I try to run assign.MC it gives me this:
/Error in { :
task 1 failed - "arguments imply differing number of rows: 27, 69"/

I noticed your announcement #15 and am currently using version 1.2.4, with R version 4.3.0. Is there a compatibility issue with the newest version of R?

Thank you.

-Kris

Error in summary.connection(connection) : invalid connection

Hi Alex,

I think I am having an issue with the doParallel portion of the assign.kfold() and assign.MC() functions. When I was running the scripts I made a issue so killed the run to start over and when I tried again I got the following error "Error in summary.connection(connection) : invalid connection"

my script was:

assign.kfold( genin80_3pops_rd, k.fold=c(3, 4, 5), train.loci=c( 1),

         loci.sample="random", model="svm", dir="./Baseline_analysis/assignPOP/Result_kf_3pop_svm_grand_comb/", multiprocess = T)

the output I got was:
7 cores/threads of CPU will be used for analysis...
Error in summary.connection(connection) : invalid connection

through some searching I found that I think it has something to do with stopCluster(), but was have not been able to fix it and get anything to run in parallel (the multiprocesses=FALSE method works just fine).

Thanks for your help!

Peter

Warning message

Hello!
Im having problem during the instalation of the package in RStudio. I received the folowing message "ERROR: dependencies 'e1071', 'randomForest', 'tree' are not available for package 'assignPOP'". I ask that if Ive to install those dependencies and where Could I find them?

Thanks a lot!!!

Sandra.

error with example data

Hey There

running your test code outlined in your paper.
I intall ect and then run the line
YourGenepop <- assignPOP::read.Genepop( "test.txt", pop.names = c("pop_A","pop_B",
"pop_C"), haploid = FALSE)

....where test.txt is the example file you provide on your GitHub page.

I get this error:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/lubridate/libs/lubridate.so':
`maximal number of DLLs reached...

I am using the latest version of R, see below for the info:

R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.6 ggrepel_0.8.0 marmap_1.0 ggmap_2.6.1
[6] stringr_1.3.1 plyr_1.8.4 qvalue_2.10.0 pcadapt_4.0.3 radiator_0.0.13
[11] adegenet_2.1.1 factoextra_1.0.5 ggplot2_3.0.0 ade4_1.7-11

loaded via a namespace (and not attached):
[1] utf8_1.1.4 proto_1.0.0 tidyselect_0.2.4 robust_0.4-18
[5] RSQLite_2.1.1 htmlwidgets_1.2 grid_3.4.4 munsell_0.5.0
[9] codetools_0.2-15 future_1.8.1 withr_2.1.2 colorspace_1.3-2
[13] fst_0.8.8 pegas_0.11 rstudioapi_0.7 geometry_0.3-6
[17] stats4_3.4.4 robustbase_0.93-1 dimRed_0.1.0 pbmcapply_1.2.5
[21] listenv_0.7.0 labeling_0.3 RgoogleMaps_1.4.2 poppr_2.8.0
[25] mnormt_1.5-5 bit64_0.9-7 coda_0.19-1 LearnBayes_2.15.1
[29] ipred_0.9-6 R6_2.2.2 DRR_0.0.3 assertthat_0.2.0
[33] promises_1.0.1 scales_0.5.0 pinfsc50_1.1.0 nnet_7.3-12
[37] gtable_0.2.0 ddalpha_1.3.4 globals_0.12.1 phangorn_2.4.0
[41] timeDate_3043.102 rlang_0.2.1 CVST_0.2-2 RcppRoll_0.3.0
[45] splines_3.4.4 lazyeval_0.2.1 ModelMetrics_1.1.0 broom_0.4.5
[49] yaml_2.1.19 reshape2_1.4.3 abind_1.4-5 httpuv_1.4.4.2
[53] tools_3.4.4 lava_1.6.2 psych_1.8.4 spData_0.2.9.0
[57] raster_2.6-7 Rcpp_0.12.17 purrr_0.2.5 ggpubr_0.1.7
[61] rpart_4.1-13 deldir_0.1-15 sfsmisc_1.1-2 cluster_2.0.7-1
[65] magrittr_1.5 data.table_1.11.4 RSpectra_0.13-1 gmodels_2.18.1
[69] mvtnorm_1.0-8 amap_0.8-16 mmapcharr_0.1.0 hms_0.4.2
[73] mime_0.5 xtable_1.8-2 jpeg_0.1-8 shape_1.4.4
[77] vcfR_1.8.0 compiler_3.4.4 tibble_1.4.2 maps_3.3.0
[81] ncdf4_1.16 crayon_1.3.4 htmltools_0.3.6 mgcv_1.8-24
[85] pcaPP_1.9-73 later_0.7.3 spdep_0.7-7 tidyr_0.8.1
[89] rrcov_1.4-4 expm_0.999-2 DBI_1.0.0 magic_1.5-8
[93] MASS_7.3-50 boot_1.3-20 Matrix_1.2-14 readr_1.1.1
[97] permute_0.9-4 cli_1.0.0 quadprog_1.5-5 gdata_2.18.0
[101] parallel_3.4.4 bindr_0.1.1 gower_0.1.2 igraph_1.2.1
[105] pkgconfig_2.0.1 fit.models_0.5-14 geosphere_1.5-7 foreign_0.8-70
[109] sp_1.3-1 plotly_4.7.1 foreach_1.4.4 prodlim_2018.04.18
[113] digest_0.6.15 vegan_2.5-2 polysat_1.7-3 fastmatch_1.1-0
[117] kernlab_0.9-26 shiny_1.1.0 gtools_3.8.1 rjson_0.2.20
[121] hierfstat_0.04-22 nlme_3.1-137 jsonlite_1.5 mapproj_1.2.6
[125] seqinr_3.4-5 viridisLite_0.3.0 pillar_1.2.3 lattice_0.20-35
[129] httr_1.3.1 DEoptimR_1.0-8 survival_2.42-4 glue_1.2.0
[133] png_0.1-7 iterators_1.0.9 bit_1.1-14 class_7.3-14
[137] adehabitatMA_0.3.12 stringi_1.2.3 blob_1.1.1 memoise_1.1.0
[141] ape_5.1

non-genomic data column selection error

Loci order in baseline and test data

Hello assignPOP team,

first of all thanks for this amazing tool. I just have a quick question: do the order of the loci needs to be the same on the baseline and test data?

Thanks again and I am looking forward to hearing from you.

Best,
Enrique

assign.X error, Error in data.frame

Hi Alex
I have problem with performing assign.X
when I run assign.X (x1=x1, x2=x2, dir="unknown assign/", model="naiveBayes") it gives below error but I dont see any problem in data-frame.

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 218, 0

I have successfully read both x1 and x2 SNP data (in structure format) into R following the assignPOP instruction in github. Even when I use two data with same number of rows it gives me this error, the number of rows in x2 is not zero but it says zero here.
any comment please

Could not find function "read.Genpop"

Hi Alex,

I'm using package ‘assignPOP’ version 1.2.2. I can't import my genepop file into R using
genin <- read.genpop( "simGenepop.txt", pop.names=c("pop_A","pop_B","pop_C") )

I just want to ask if anyone here got this issue and please guide me on how to import a genepop file

Many thanks

Data conversion

Hi there, I have genind and tried using genind2genpop, converting the file into a "formal class genepop". However, when trying running assign.MC it gives me the error "this S4 class is not subsettable". Would you have more suggestions about the data conversion? Thanks in advance.

issue with assign MC

Hi
Thanks for your very interesting program
when I try assign.MC( myGenepopRd, train.inds=c(0.5, 0.7), train.loci=c(0.1, 0.25, 0.5, 1),
loci.sample="fst", iterations=30, model="LDA", dir="Result-folder/")
it does the assignment successfully "Monte-Carlo cross-validation done!!
240 assignment tests completed!!"
but doesn't write the assignment files to the result folder (only write the info file: AnalysisInfo.txt)
I appreciate if you let me know what is the problem

Error in makePSOCKcluster(names = spec, ...) : Cluster setup failed. 1 worker of 1 failed to connect.

Hi,
I am getting the same error message when using assign.MC() and assign.kfold():

assign.MC(GENEPOP391ind_3_realpop_imputed, train.inds=c(0.5), train.loci=c(0.001, 0.005, 0.05, 0.1, 0.25, 0.5, 1),
           loci.sample="fst", iterations=30, model="svm", dir="assignPOP_391_3_realpop_imputed_mc/")

The error message is:

Parallel computing is on. Analyzing data using 3 cores/threads of CPU...
Error in makePSOCKcluster(names = spec, ...) : 
  Cluster setup failed. 3 worker of 3 failed to connect.

I am using macOS 11.2.2 with R 4.0.2 and assignPOP 1.2.2. Is that something wrong with my operation system as it seems not able to use the 3 cores? I also tried with processors=1 but got the same error message.

Thank you for sharing the tool and thanks in advance for your help!

Sincerely,
Peiwen

GenepopFormat

Hello,
It's not an issue, but I have a comprehension problem.
In a genepop file, -1 corresponds to 'Nocalling' ; 0 for AA ; 1 for AB ; 2 for BB
And this, for each biallelic locus

But in your example, we have "Locus_1_01" and "Locus_1_02". For example, for one allele (1_01) at this locus
-> 0 is 'Nocalling' ; 0.5 is major allele ; 1 is minor allele ?

I hope that I was clear.
Thank you,
Emilio Egal

assign.x error with the smv model

Hi,
I am using your package to assign unknown individuals to their population of origin with genomic markers, however, I’m having trouble using the function assign.X. When I try to run this script :
assign.X(genepop_baseline_2016_snps, genepop_assign_2016_snps, model = "smv",dir = "/Volumes/drobo2/emilie_carrier/all_samples_fev_2019/03-bad_samples_high_relatedness_sex_removed/population_assignment_files/assignment_2016/", mplot = TRUE)
I get this message :
Error in table(outcome_matrix$pred.pop) :
object 'outcome_matrix' not found

I have followed all the steps in your tutorial without any problem and also tried using different models with these files and it seems to work. I have also run this script with different files without any error message.

Thanks in advance!

assign.matrix()

Hi
I have used assignPOP to assign my samples to populations using genetic data only. The package is very easy to use and the tutorials are great. However, I am having troubles with the assign.matrix function. When I try to use it I get the following error: "Error: could not find function "assign.matrix"". I checked the help page for assignPOP and this function is not listed there. So I tried to search online for a package with this function, but couldn't find any.
Is this function missing from the package or do I need to install some other package that I have missed?

R version compatibility

Hi all,

R has recently released its new version 4.0. Due to its change, assignPOP version 1.1.9 and earlier are not fully compatible with R 4.0. Specifically, it will result in errors when performing the following functions: assign.MC(), assign.kfold(), assign.X(), and accuracy.plot(), because of data type conversion issue.

If you're using R 4.0, please update your assignPOP to version 1.2.0 that should work with both R 3.X and R 4.0.

Thanks,
Alex

On random forest function

Hi Alex,
Thanks for developing this package!

When using the random forest function the parameter "y" needs to be set to a factor for classification, other wise regression is performed. I was wondering if in the R script of assign.MC.R the input "y" is set to a factor. After a quick look at this function, it does not seem to be but maybe I am wrong, I just wanted to confirm.

Also, is there an easy way to obtain MSE (mean standard error) and MDA (mean decrease in accuracy) from the random forest simulations? I am asking because from them we can determine the optimal number of trees (ntree).

Thanks

Error in Ops.factor(Var1, Var2) : level sets of factors are different

I used version1.1.5 installed from assignPOP-1.1.5.tar.gz. I meet following error:

assign.MC( datafile, train.inds=c(0.7), train.loci=c(0.1, 0.25),

```
       loci.sample="fst", iterations=30, model="svm", dir="Result-folder/")
```
Parallel computing is on. Analyzing data using 7 cores/threads of CPU...
Monte-Carlo cross-validation done!!
60 assignment tests completed!!

accuMC <- accuracy.MC(dir = "Result-folder/") #Use this function for Monte-Carlo cross-validation results
Error in Ops.factor(Var1, Var2) : level sets of factors are different

accuracy.MC error

I have successfully analyzed a dataset with 248 individuals and ca. 9700 loci within assignPop using the kfold method, however, the MC method is giving an error that I'm hoping you can help with. I've tried using the information from the vignette as a starting point:
assign.MC(referenceAlleles, train.inds=c(10, 15, 19), train.loci=c(0.1, 0.25, 0.5, 1), loci.sample="fst",
iterations=30, multiprocess = TRUE, model="lda", dir="MC/")
but when I run accuracy.MC(dir="MC2/") I get the error:
Error in [<-.data.frame(*tmp*, i, , value = c(0.333333333333333, 0.235294117647059, :
replacement has 6 items, need 7.

Do you have any suggestions on how I can fix the input file so that accuracy.MC will run? I'm running R v. 3.4.2 through R studio v. 1.1.383.

error in membership.plot: Faceting variables must have at least one value

Hi I'm getting an error when trying to produce a membership plot. Using the following code:

results1 <- file.path(outdir, paste0(analysis_name,"_Assign_kfold//"))

k.folds <- assign.kfold(assign, k.fold=c(2,3,4), train.loci=c(0.05, 0.1, 0.25, 0.5), 
             loci.sample="fst", model="svm", dir= results1)

accuKF <- accuracy.kfold(dir = results1)

accuracy.plot(accuKF, pop = c("all", "1", "2"))+
  annotate("segment",x=0.4,xend=3.6,y=0.33,yend=0.33,colour="red",size=1)

membership.plot(dir = results1, style = 2)

I get the error:
Error: Faceting variables must have at least one value

Which led me to the facet_grid() argument in ggplot.
Style = 1 or Style=3 work fine
The problem appears with both style =2 and =4, I suspect that ndf$origin.pop might be empty for some reason, but I'm not confident enough with how functions work to try anything else, sorry.

Deane.

R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

assignPOP_1.1.7

alexkychen / assignpop Goto Github PK

assignpop's People

Contributors

Stargazers

Watchers

Forkers

assignpop's Issues

Recommend Projects

Recommend Topics

Recommend Org