kenhanscombe / ukbtools Goto Github PK
View Code? Open in Web Editor NEWAn R package to manipulate and explore UK Biobank data
Home Page: https://kenhanscombe.github.io/ukbtools/
An R package to manipulate and explore UK Biobank data
Home Page: https://kenhanscombe.github.io/ukbtools/
Hello! I have the same error. I have about 500,000 observations (eid), but "my_ukb_data" only contain 12152 observations.
Besides that, I get the warning:
my_ukb_data <- ukb_df("ukbxxxxx")
Warning: data_frame()
is deprecated as of tibble 1.1.0.
Please use tibble()
instead.
This warning is displayed once every 8 hours.
Call lifecycle::last_warnings()
to see where this warning was generated.
Warning in data.table::fread(input = tab_location, sep = "\t", header = TRUE, :
Discarded single-line footer: <<4356529 1 NA NA NA 0 1941 1 NA NA NA 22 NA NA NA 24 NA NA NA 79 NA NA NA 96 NA NA NA 164 NA NA NA 135 NA NA NA 7 2009-06-03 NA NA NA 11010 NA NA NA 6 NA NA NA 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA >>
Originally posted by @FeFortti in #13 (comment)
Hello,
I'm currently using the ukb_context function and I keep getting the following error:
Error: No expression to parse
Call `rlang::last_error()` to see a backtrace.
I've tried to using dplyr::select and it appears to be grabbing all the correct columns for the demographics data and the nomiss.var is definitely in the referenced dataset. I even tried
setting all the demographic columns manually, but still no luck. Here is the code I ran. Any insight would be appreciated:
ukbxxxx_data <- ukb_df("ukbxxxx_symp", path = "/Users/amandarodrigue/Dropbox/Biobank_symp/Psychotic_experiences/")
ukbyyyy_data <-ukb_df("ukbyyyy_symp", path = "/Users/amandarodrigue/Dropbox/Biobank_symp/Psychotic_experiences/")
full_data<-ukb_df_full_join(ukbxxxx_data, ukbyyyy_data)
ukb_context(full_data, nonmiss.var = "volume_of_brainseg_whole_brain_f26514_2_0",
bar.position = "fill", sex.var = "sex_f31_0_0",
age.var = "age_when_attended_assessment_centre_f21003_0_0",
socioeconomic.var = "townsend_deprivation_index_at_recruitment_f189_0_0",
ethnicity.var = "ethnic_background_f21000_0_0",
employment.var = "current_employment_status_corrected_f20119_0_0",
centre.var = "uk_biobank_assessment_centre_f54_0_0")
I've also tried the ukb_context with a subset of data and get the same error.
Hi Ken,
thank you for this package, I am using it extensively on my project and it's really great.
I have encountered an issue with a data point in my UKB application while importing the basket through ukb_df()
, which causes a warning from fread()
(with relative nonzero exit code) and a desperate attempt to fix the entry:
Warning message:
In data.table::fread(input = tab_location, sep = "\t", header = TRUE, :
Found and resolved improper quoting out-of-sample. First healed line 459278: <
<5893037 [...]>>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
The culprit is what I suspect to be an entry from the medical records, containing several escaped quotation marks. Removing them with sed -e "s|\\\'||g" -e 's|\\\"||g' data.tab
solves the issue.
Not sure if you plan on doing something about it (as the solution is relatively trivial), but I thought it was worth letting you know.
Ken
I installed UKBtools in r and initially it worked with your sample dataset. Then it suddenly stopped working and I get the error
cannot find function ukb_df
How can fix this?
Steven
Hi there,
Not sure how to report this and/or what kind of information to add, but I am encountering an awful lot of memory issues when using the function ukb_df_full_join()
in an HPC environment, from memory leaks (OOM with ~100GB available per node) to segfaults.
Happy to provide all the information needed from my environment
Cheers
I am keen to use this package for my work with UK Biobank data. Using the example to explore the context of my data, when I run this code.
> subgroup_of_interest <- (my_ukb_data$body_mass_index_bmi_f21001_0_0>=25)
> ukb_context(my_ukb_data, nonmiss.var = NULL, subset.var = subgroup_of_interest,
+ bar.position = "fill", sex.var = "sex_f31_0_0",
+ age.var = "age_at_recruitment_f21022_0_0",
+ socioeconomic.var = "townsend_deprivation_index_at_recruitment_f189_0_0",
+ ethnicity.var = "ethnic_background_f21000_0_0",
+ employment.var = "current_employment_status_f6142_0_0",
+ centre.var = "uk_biobank_assessment_centre_f54_0_0")
Error in unit(x, default.units) : 'x' and 'units' must have length > 0
In addition: Warning messages:
1: Groups with fewer than two data points have been dropped.
2: Groups with fewer than two data points have been dropped.
3: Groups with fewer than two data points have been dropped.
4: Groups with fewer than two data points have been dropped.
5: Groups with fewer than two data points have been dropped.
I can only see the gender bar chart, not the ethnicity, Townsend etc. My variables are:
sex_f31_0_0
age_at_recruitment_f21022_0_0
ethnic_background_f21000_0_0
townsend_deprivation_index_at_recruitment_f189_0_0
uk_biobank_assessment_centre_f54_0_0
current_employment_status_f6142_0_0
Hello, I get this error when I try to follow the example of the vignette:
subgroup_of_interest <- (my_ukb_data$body_mass_index_bmi_f21001_0_0 >= 25)
ukb_context(my_ukb_data, subset.var = subgroup_of_interest)
Error: More than one expression parsed
Call `rlang::last_error()` to see a backtrace
I identified that the problem comes for the variable sex.var
in the function ukb_context
. The pattern sex.var = "^sex.*0_0"
matches several variables:
Browse[3]> sex.var
[1] "sex_f31_0_0"
[2] "sexually_molested_as_a_child_f20490_0_0"
[3] "sexual_interference_by_partner_or_expartner_without_consent_as_an_adult_f20524_0_0"
[4] "sex_chromosome_aneuploidy_f22019_0_0"
[5] "sex_inference_x_probeintensity_f22022_0_0"
[6] "sex_inference_y_probeintensity_f22023_0_0"
[7] "sex_of_baby_f41226_0_0"
I have been trying your example commands
subgroup_of_interest <- (my_ukb_data$body_mass_index_bmi_0_0 >= 25)
ukb_context(my_ukb_data, subset.var = subgroup_of_interest)
but I keep getting following error:
<error/rlang_error>
More than one expression parsed
Backtrace:
x
\-base::lapply(...)
\-ggplot2:::FUN(X[[i]], ...)
\-rlang::parse_expr(x)
What am I doing wrong?
Hello, Thank you for creating the ukbtools package.
The current issue I am facing is
Error in html_table_nodes[[data.pos]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: './ukb26###.html'
I have moved all the files to the same drive, and re-converted the files using conv.
Any ideas what may be causing this?
Thanks heaps, Mahima
Originally posted by @mkapoor123 in #1 (comment)
Thanks Ken for creating this package. I tried to import my data using Ukb_df function, but the data base produced is only 16089 observations and my data set is about 500,000. Do you know what could be the reason for this discrepancy?
Maher,
Hi,
Using the below command,
ukb_icd_freq_by(all_data, reference.var = "sex_f31_0_0", n.groups = 10,icd.code = c("^(F00)","^(F01)","^(F02)"), icd.labels = c("disease1", "disease2","disease3"), plot.title = "", legend.col = 1, legend.pos = "right", icd.version = 10, freq.plot = FALSE, reference.lab = "Reference variable", freq.lab = "UKB disease frequency")
I get the following error
Error in if (!(icd.code == c("^(I2[0-5])", "^(I6[0-9])", "^(J09|J1[0-9]|J2[0-2]|P23|U04)"))) { :
the condition has length > 1
I can't seem to understand the problem.
Hello,
The ukbtools manual https://kenhanscombe.github.io/ukbtools/ describes the function ukb_gen_samples_to_remove
However when I try to use it, I get the follwoing error:
Error in ukb_gen_samples_to_remove(my_relatedness_data, ukb_with_data = pheno$anxiety_self) :
could not find function "ukb_gen_samples_to_remove"
Other functions such as ukb_gen_rel_count work fine, though. I've re-installed ukbtools (just in case it was using an older version) but still not working.
Thanks,
J
I'm getting an error from the ukb_context()
function.
library(ukbtools)
load("ukb34514_data.rda", verbose = T)
dim(ukb34514)
[1] 502527 3862
Now I supply a logical vector with subset.var
:
subgroup_of_interest <- (ukb34514$body_mass_index_bmi_f21001_0_0 >= 25)
head(subgroup_of_interest)
[1] TRUE FALSE TRUE TRUE TRUE FALSE
length(subgroup_of_interest)
[1] 502527
For some reason though I get the following error:
ukb_context(ukb34514, subset.var = subgroup_of_interest)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
The sessionInfo()
is:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ukbtools_0.11.3.9000
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 compiler_3.5.1 pillar_1.4.2 iterators_1.0.10 prettyunits_1.0.2 remotes_2.0.4
[7] tools_3.5.1 zeallot_0.1.0 digest_0.6.20 packrat_0.5.0 pkgbuild_1.0.3 pkgload_1.0.2
[13] memoise_1.1.0.9000 tibble_2.1.3 gtable_0.3.0 pkgconfig_2.0.2 rlang_0.4.0 foreach_1.4.4
[19] cli_1.1.0 rstudioapi_0.10 yaml_2.2.0 parallel_3.5.1 curl_3.3 stringr_1.4.0
[25] withr_2.1.2 dplyr_0.8.3 vctrs_0.2.0 hms_0.5.0 desc_1.2.0 fs_1.2.7
[31] devtools_2.0.2 rprojroot_1.3-2 grid_3.5.1 tidyselect_0.2.5 data.table_1.12.2 glue_1.3.1
[37] R6_2.4.0 processx_3.3.0 XML_3.98-1.20 sessioninfo_1.1.1 tidyr_0.8.3 readr_1.3.1
[43] callr_3.2.0 purrr_0.3.2 ggplot2_3.2.0 magrittr_1.5 codetools_0.2-16 backports_1.1.4
[49] scales_1.0.0 ps_1.3.0 usethis_1.5.0 assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3
[55] doParallel_1.0.14 lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4
I have a big file with lots of fields.
awk -F$'\t' '{if (NR < 2) print NF}' ukb677133.tab
8473
I try to create a key file...
library(ukbtools)
my_ukb_data <- ukb_df("ukb677133")
my_ukb_key <- ukb_df_field(my_ukb_data)
write.table(my_ukb_key, file = "./xukb677133_key.txt", sep = "\t")
But I get this error in R...
Error in lapply(value, as.character) :
R character strings are limited to 2^31-1 bytes
Calls: ukb_df_field -> -> regmatches<- -> lapply
Execution halted
Hi ken, do you know how to download the ukbxxx.enc file mentioned by ukbtools?
ukb_unpack ukbxxxx.enc key
ukb_conv ukbxxxx.enc_ukb r
ukb_conv ukbxxxx.enc_ukb docs
Thanks.
Shicheng
I was following instructions from here:
https://cran.r-project.org/web/packages/ukbtools/vignettes/explore-ukb-data.html
for the step for Retrieving ICD diagnoses. ukb_icd_diagnosis
returns
Error: Column 1 must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
Do you know why this is happening?
Thanks
Hi Ken.
(Hope you are well. You may remember me from the SGDP, years back we had a few processing meetings, with Oliver)
I am keen to try your biobank tool out, but when I run the ukb_df command I get the following error:
my_ukb <- ukb_df("ukbXXXX")
Error in source(if (path == ".") { :
/mnt/10tbstore/projects/ukb/ukbXXXX.r:219:18: unexpected symbol
218: lvl.100391 <- c(-3,-1,1,2,3,4)
219: lbl.100391 <- c("Prefer
I looked for changes in the ukbXXXX.r file before and after I ran ukb_df and can see that there is some weird substitution going on at line 219.
Before the command ukbXXXX.r looks like this on line 214:
lbl.100388 <- c("Prefer not to answer","Do not know","Never/rarely use spread","Butter/spreadable butter","Flora Pro-Active/Benecol","Other type of spread/margarine")
After the command the same line is a bit further down on line 219 and the path have been added in a wierd way:
lbl.100388 <- c("Prefer not to answer","Do not know","Never/rarely use spread.delim('/mnt/10tbstore/projects/ukb/ukbXXXX.tab')
Any ideas what may be causing this?
All the best,
Johan
Is there an option from ukbtools to automatically collapse cohorts into one column? For example, collapsing
"hdl_cholesterol_f30760_0_0" "hdl_cholesterol_f30760_1_0"
to
"hdl_cholesterol_f30760"
Although not described in http://biobank.ctsu.ox.ac.uk/crystal/help.cgi?cd=value_type, there seems to be a Curve
type that I believe should be read in as categorical (likely is Compound).
See the below snapshots from an example basket we have and also the type on the website.
Hi
devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)
When I use the above command I am getting the following error:
E> * checking for file ‘/nvme/pbs/tmpdir/pbs.202087.flashmgr2/RtmpCkMHkl/remotes615da87bc05/kenhanscombe-ukbtools-3dca23a/DESCRIPTION’ ... OK
E> * preparing ‘ukbtools’:
E> * checking DESCRIPTION meta-information ... OK
E> * installing the package to process help pages
E> * creating vignettes ... ERROR
E> Warning in engine$weave(file, quiet = quiet, encoding = enc) :
E> Pandoc (>= 1.12.3) not available. Falling back to R Markdown v1.
E> Error: processing vignette 'explore-ukb-data.Rmd' failed with diagnostics:
E> The 'markdown' package should be installed and declared as a dependency of the 'ukbtools' package (e.g., in the 'Suggests' field of DESCRIPTION), because the latter contains vignette(s) built with the 'markdown' package. Please see yihui/knitr#1864 for more information.
E> Execution halted
I do not have pandoc in the cluster.
How can I solve this problem?
Regards,
Sathish
Hi, I used the following code to install the latest development version: devtools::install_github("kenhanscombe/ukbtools", build_vignettes = TRUE, dependencies = TRUE)
and then run the following two lines:
library(ukbtools)
my_ukb_data <– ukb_df("ukb9820", path="/restricted/projectnb/ukbiobank/jiehuang/data/ukb/pheno/raw")
Bub I still got the following error:
Error in data.table::fread(input = tab_location, sep = "\t", header = TRUE, :
unused argument (nThread = if (n_threads == "max") {
parallel::detectCores()
} else if (n_threads == "dt") {
data.table::getDTthreads()
} else if (is.numeric(n_threads)) {
min(n_threads, parallel::detectCores())
})
Hello there,
I have read the other post about error in using the ukb_df command but have tried specifying the path to the file without success.
I have done
devtools::install_github("kenhanscombe/ukbtools", build_vignettes = TRUE, dependencies = TRUE)
library(ukbtools)
ukb_dataset <- ukb_df("ukb29xxx-4", path = "/Users/workspace/monday/DH")
But keep getting the following and no matter how i've tweaked it it just doesn't work.
Error in html_table_nodes[[data.pos]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: './ukb29xxx-4.tab.html'
I have also used
path_to_example_data <- system.file("extdata", package = "ukbtools")
df <- ukb_df("ukbxxxx", path = path_to_example_data)
df_field <- ukb_df_field("ukbxxxx", path = path_to_example_data)
and it does work with the example (just not the actual data)
Please do you have any advice?
Thanks a lot,
K
Hi Ken
I have used your package ukbtools to label the columns in my ukb_sqc_v2.txt file (really helpful, thank you). However, I still end up with those two columns at the start (x1 and x2) which remained unnamed after I run ukb_gen_sqc_names.
This sqc file doesn't appear to have an IID in it anywhere, However, one of the columns that remains unnamed after I use ukb_gen_sqc_names (x2) looks like it could be IIDs. So, I labelled it as such.
However... if I then try to merge the FID column from the .fam file, into the ukb_sqc file, matching on IID, I get only about 98k matched out of 488k. So presumably this unnamed column in the sqc file actually isn't IID? Or at least it doesn't match my IID column in the fam file?
Have you come across this issue? I've sunk about 3 days trying to sort this now.
Cheers!
Hi All,
Kindly help,
How can i fix Error in node$parent$priority[, node$name] : subscript out of bounds? Below i have attached my YAML file and AHP r code.
solar.txt
library(data.tree)
vignette(package = 'data.tree')
library(ahp)
pvAhp <- Load('solar.txt')
Calculate(pvAhp)
Visualize(pvAhp)
Analyze(pvAhp)
AnalyzeTable(pvAhp)
I was trying my hand at using the ukb_icd_prevalence function with a regular expression and I got some inconsistent results when checking it exhaustively against all the codes I was interested in.
Code below:
x <- ukb_icd_prevalence(my_ukb_data, icd.code = "K85.*", icd.version = 10)
y <- (ukb_icd_prevalence(my_ukb_data, icd.code = "K85", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K850", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K851", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K852", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K853", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K854", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K855", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K856", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K857", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K858", icd.version = 10) + ukb_icd_prevalence(my_ukb_data, icd.code = "K859", icd.version = 10))
I found the prevalence of x to be smaller in my dataset than the prevalence of y. This is confusing to me as K85.* should cover all the codes I was looking up in y. If anything I expected y to maybe be smaller than x, to account for K85.00, K85.01 etc. and other sub-codes I may have not included (but did not see in the dataset from a cursory overview).
I am not sure which result to trust. x is 0.0045 and y is 0.0066. Thoughts?
New package vroom out that is much faster than fread at mixed data (aka, not only numerical) and that also supports the wonderful features found in readr. Maybe switch over to vroom instead?
The first icd frequency by bmi line chart does not work for me. The second, bmi by gender does produce a pair of bar charts.
> ukb_icd_freq_by(my_ukb_data, reference.var = "body_mass_index_bmi_f21001_0_0", freq.plot = TRUE)
Error: Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?
> ukb_icd_freq_by(my_ukb_data, reference.var = "sex_f31_0_0", freq.plot = TRUE)
If I set freq.plot = FALSE for the bmi chart, I get a correct data frame
# A tibble: 10 x 6
categorized_var `coronary artery dis~ `cerebrovascular dis~ `lower respiratory tract~ lower upper
<ord> <dbl> <dbl> <dbl> <dbl> <dbl>
1 [12.1,22.1] 0.0392 0.0186 0.0451 12.1 22.1
2 (22.1,23.6] 0.0469 0.0182 0.0368 22.1 23.6
3 (23.6,24.7] 0.0588 0.0192 0.0376 23.6 24.7
4 (24.7,25.7] 0.0706 0.0214 0.0401 24.7 25.7
5 (25.7,26.7] 0.0802 0.0235 0.0423 25.7 26.7
6 (26.7,27.9] 0.0894 0.0244 0.0468 26.7 27.9
7 (27.9,29.1] 0.0983 0.0265 0.0495 27.9 29.1
8 (29.1,30.8] 0.109 0.0285 0.0545 29.1 30.8
9 (30.8,33.6] 0.126 0.0306 0.0629 30.8 33.6
10 (33.6,74.7] 0.140 0.0348 0.0822 33.6 74.7
I'd like to write a covariates file using ukb_gen_write_plink such as ukb.variables = c("variable1", "variable2", "variable3")
However, I am wondering if there is a way to collapse the serials fields, such that, for instance, I have 4 possible recordings taken for average monthly red wine intake:
average_monthly_red_wine_intake_f4407_0_0
average_monthly_red_wine_intake_f4407_1_0
average_monthly_red_wine_intake_f4407_2_0
average_monthly_red_wine_intake_f4407_3_0
Is there a way to just call it once such as "average_monthly_red_wine_intake_f4407" and to get ukbtools to report only the max,min, or most recent value? Or, in general, is the only way to do it to call for all of the 4 fields using the ukbtools, and then write my own script that collapses the fields in fam file so that I have "average_monthly_red_wine_intake" only once with my best value?
Hi
I am familiarizing myself with ukbtools, since I am going to pull out phenotypes from the data. However, I am running into a problematic permission issue.
my_ukb_data <- ukb_df("ukbxxxx", path = "/shared/ukb/data/path")
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning messages:
1: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
2: In file(file, ifelse(append, "a", "w")) :
cannot open file '/shared/ukb/data/path/ukbxxxx.r': Permission denied
Since I have both reading and execution permissions for the .r-file, this error message can only be caused by writing permissions. The problem is that I share this directory with other people, so I am worried if writing permissions can be an issue. Is it really true that ukb_df() requires writing permissions?
Alternatively is there another way to do this without writing to the files?
Best regards,
Hi ken, do you know how to download the ukbxxx.enc file mentioned by ukbtools?
ukb_unpack ukbxxxx.enc key
ukb_conv ukbxxxx.enc_ukb r
ukb_conv ukbxxxx.enc_ukb docs
Thanks.
Shicheng
Running my_ukb_data <- ukb_df("ukbxxxx") with my ukb ID I get:
Warning message in file(file, ifelse(append, "a", "w")):
“cannot open file './ukbxxxx.r': Permission denied”
Error in file(file, ifelse(append, "a", "w")): cannot open the connection
Traceback:
Any thoughts? I thought it would be resolved by running my jupyter notebook with the --allow-root option but that didn't do the trick.
Also I have read/write/execute permissions into the directory in question.
Hello, I used ukb_centre
to add the assessment centre as a text string. Looking at the frequencies this doesn't look right.
> table(my_ukb_data$ukb_centre)
Barts Birmingham Bristol Bury Cardiff
3797 13939 14058 17878 18647
Cheadle (revisit) Croydon Edinburgh Glasgow Hounslow
17198 19433 29411 28321 37002
Leeds Liverpool Manchester Middlesborough Newcastle
44198 43012 12582 33876 30396
Nottingham Oxford Reading Sheffield Stockport (pilot)
32816 21286 28875 27380 25501
Stoke Swansea
2281 649
For example, comparing with the showcase count https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=54, Birmingham has 25,501 participants, the count for Stockport (pilot) here.
Hello, I am using your ukb_centre function to give better descriptive names to the assessment centres. When I do this I get the following output.
> ukb_centre(my_ukb_data, centre.var = "uk_biobank_assessment_centre_f54_0_0")
Error: cannot allocate vector of size 15.0 Gb
> str(my_ukb_data$uk_biobank_assessment_centre_f54_0_0)
chr [1:502536] "11017" "11007" "11011" "11009" "11011" "11021" "11016" "11018" "11010" "11016" ...
> str(ukbcentre)
'data.frame': 24 obs. of 2 variables:
$ code : int 11012 11021 11011 11008 11003 11024 11020 11005 11004 11018 ...
$ centre: chr "Barts" "Birmingham" "Bristol" "Bury" ...
- attr(*, "spec")=
.. cols(
.. code = col_integer(),
.. centre = col_character()
.. )--
In the past I have found this is because there is a miss-match between the type of the variable that I am matching with (inner or outer joins?). Here I note that in my_ukb_data the assessment centre is a character string whilst in the ukbcentre it is an int. I thought that the good work you have done with ukb_df/ukb_context might of also dealt with this, but possibly not so? Thanks.
I'm getting an error when trying to use ukb_context
on a subgroup of interest.
my_ukb_data <- ukb_df("ukb24898", path = "/share/projects/uk_biobank/pheno_data")
my_ukb_key <- ukb_df_field("ukb24898", path = "/share/projects/uk_biobank/pheno_data")
One thing I noticed is that the ukb_df_field()
command is appending uses_datacoding_...
to all of the variables which seems a bit odd -not what I see from the vignette- but perhaps this is because there are multiple UDI's for each Description (e.g. Never eat eggs, dairy, wheat, sugar (pilot) Uses data-coding 100672 has four UDI's: 10855-0.0, 10855-0.1, 10855-0.2, 10855-0.3)?
The error I'm getting is from the ukb_context()
function:
heavy_abuse_subgroup <- (my_ukb_data$physically_abused_by_family_as_a_childuses_datacoding_532_f20488_0_0 == "Very often true")
ukb_context(my_ukb_data, nonmiss.var = heavy_abuse_subgroup )
Error: Length of logical index vector for `[` must equal number of columns (or 1):
* `.data` has 3177 columns
* Index vector has length 502543
The phenodata we paid for (41975) apparently does not have body_mass_index or BMI so I cannot try what you have in the vignette. I can however provide you with the data dictionary if we need to troubleshoot using another variable.
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /share/apps/anaconda2/lib/libopenblasp-r0.3.5.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] feather_0.3.3 ukbtools_0.11.2 usethis_1.4.0 devtools_2.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 plyr_1.8.4 compiler_3.5.3 pillar_1.3.1 iterators_1.0.10 prettyunits_1.0.2
[7] remotes_2.0.4 tools_3.5.3 testthat_2.1.1 digest_0.6.18 packrat_0.5.0 pkgbuild_1.0.3
[13] pkgload_1.0.2 memoise_1.1.0.9000 tibble_2.1.1 gtable_0.3.0 pkgconfig_2.0.2 rlang_0.3.4
[19] foreach_1.4.4 cli_1.1.0 rstudioapi_0.10 parallel_3.5.3 xfun_0.6 knitr_1.22
[25] stringr_1.4.0 withr_2.1.2 dplyr_0.8.0.1 hms_0.4.2 desc_1.2.0 fs_1.2.7
[31] rprojroot_1.3-2 grid_3.5.3 tidyselect_0.2.5 data.table_1.12.2 glue_1.3.1 R6_2.4.0
[37] processx_3.3.0 XML_3.98-1.19 sessioninfo_1.1.1 tidyr_0.8.3 readr_1.3.1 callr_3.2.0
[43] purrr_0.3.2 ggplot2_3.1.1 magrittr_1.5 codetools_0.2-16 backports_1.1.4 scales_1.0.0
[49] ps_1.3.0 assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3 doParallel_1.0.14 lazyeval_0.2.2
[55] munsell_0.5.0 crayon_1.3.4
Hello,
i tried using your example datasset "ukbxxxx" and when i tried to use the icd_diagnosis, it came up with the error:
ukb_icd_diagnosis(mydata, id = "1", icd.version = 10) ## mydata is same as my_ukb_data just change to make typing easier
Error: Column 1 must be named.
Use .name_repair to specify repair.
Run rlang::last_error()
to see where the error occurred.
is there something wrong i did? path = (my pc location)\ukbtools-master\ukbtools-master\inst\extdata
Dear Ken,
I am using your ukbtools package to process the UK Biobank data downloaded. The
ukb_df_field() function works fine but I came across the following strange problem with the ukb_df() function:
> library(devtools)
> devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)
> library(ukbtools)
> my_ukb_data <- ukb_df("ukb23009")
Error in data.table::fread(input = tab_location, sep = "\t", header = TRUE, :
unused argument (nThread = if (n_threads == "max") {
parallel::detectCores()
} else if (n_threads == "dt") {
data.table::getDTthreads()
} else if (is.numeric(n_threads)) {
min(n_threads, parallel::detectCores())
})
The problem appeared both linux and mac environments. Would be great if you could help!
Thanks in advance
Wenhua
get these errors
uk <- ukb_df("ukb43365")
Error in html_table_nodes[[data.pos]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: './ukb43365.html'
uk <– ukb_df("ukb43365", path = "/sc/arion/work/lehres01")
Error: unexpected input in "uk <▒"
I've been trying to use the ukb_gen_samples_to_remove command:
ukb_gen_samples_to_remove(ukb_kinship_data, ukb_with_data = my_list_of_eids)
The documentation says that the my_list_of_eids should be an integer vector (other places say character vector, I have tried both) containing a list of eids. My kinship data file has been working for other commands, but I cannot get any command requiring the 'ukb_with_data' file to work. It always ends up resulting in a blank vector (integer or character depending on how I transform the data).
Is this a known issue or am I missing something? Thanks.
Hi Ken,
Is there any image processing functions on the roadmap to be developed?
Thanks
Shicheng
Hello,
I have created these file sets using ukbconv.
ukb42106.html
ukb42106.tab
ukb42106.r
However, after installing the tools in R, I cannot seem to load them up.
getwd()
[1] "/mnt/BIOINFX/UK-Biobank"
library(ukbtools)
my_ukb_data <- ukb_df("ukb42106")
Error: Can't subset columns that don't exist.
✖ Location 2 doesn't exist.
ℹ There are only 1 column.
Runrlang::last_error()
to see where the error occurred.
Backtrace:
[.tbl_df
(data[, "Description"], i)rlang::last_trace()
to see the full context.Backtrace:
█
[.tbl_df
(data[, "Description"], i)$ head ukb42106.r
bd <- read.table("/mnt/BIOINFX/UK-Biobank/ukb42106.tab", header=TRUE, sep="\t")
lvl.0009 <- c(0,1)
lbl.0009 <- c("Female","Male")
bd$f.31.0.0 <- ordered(bd$f.31.0.0, levels=lvl.0009, labels=lbl.0009)
lvl.0008 <- c(1,2,3,4,5,6,7,8,9,10,11,12)
lbl.0008 <- c("January","February","March","April","May","June","July","August","September","October","November","December")
bd$f.52.0.0 <- ordered(bd$f.52.0.0, levels=lvl.0008, labels=lbl.0008)
bd$f.53.0.0 <- as.Date(bd$f.53.0.0)
head ukb42106.html
<style type="text/css"> body {font-family: Arial, Verdana, sans-serif; } a:link {text-decoration: none; } td {vertical-align: top;} </style> <title>UK Biobank : Application XXXXX</title>Hi Ken,
After installing your latest version (either the development or the CRAN one) i get the following error when running ukb_df.
my_ukb <- ukb_df(paste0(datasetname,".sub"))
Error in mutate_impl(.data, dots) :
Evaluation error:as_dictionary()
is defunct as of rlang 0.3.0.
Please useas_data_pronoun()
instead.
Hi, I have been trying to use your package to write a covariates table for use with BGENIE. Whilst doing so I encountered the following problem:
I loaded my data in using:
biobank_data <- ukb_df("ukb_key")
bgenie_covars <- ukb_gen_write_bgenie(biobank_data, "path_to_sample_file", "path_to_output_file", list_of_biobank_variables)
And received the error message:
Error in UseMethod("left_join") :
no applicable method for 'left_join' applied to an object of class "character"
Any help would be greatly appreciated!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.