ohdsi / databaseconnector Goto Github PK
View Code? Open in Web Editor NEWAn R package for connecting to databases using JDBC.
Home Page: http://ohdsi.github.io/DatabaseConnector/
An R package for connecting to databases using JDBC.
Home Page: http://ohdsi.github.io/DatabaseConnector/
he dplyr package allows R users to interact with databases without writing SQL. Although it doesn't support all functionality we need in OHDSI, for most users it will be enough and will make it easier for them to build tools.
The DatabaseConnector package could work like an add on to the dplyr package similar to the sparklyr package.
I like the idea of connectionDetails to house common parameters.
I suggest we extend this object with few more attributes.
cdmDatabaseSchema
oracleTempSchema = cdmDatabaseSchema,
cdmVersion = 5
resultSchema = 'results'
this would reduce number of parameters passed back and forth and standardize the naming for future studies.
Dear All,
I received this message when trying to install these two packages :
install.packages("CohortMethod", INSTALL_opts="--no-multiarch", dependencies=TRUE)
install.packages("FeatureExtraction", INSTALL_opts="--no-multiarch", dependencies=TRUE)
"Warning: unable to access index for repository https://OHDSI.github.io/drat/bin/windows/contrib/3.3:
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES'
Packages which are only available in source form, and may need compilation of C/C++/Fortran:
‘FeatureExtraction’ ‘CohortMethod’
These will not be installed"
My goal is to install package "KeppraAngioedema", which apparently requires the above two packages as dependencies. I tried 64-bits, 32-bits R with 64-bits JAVA JDK, JRE etc.
Thank you for your help!
Lian
This was brought up in this post in our Developers forum. DatabaseConnector currently expects forward slashes (i.e. domain/username) whereas Windows always uses back slashes (i.e. domain\username). As an alternate solution, DatabaseConnector may use a separate parameter for domain.
Hi,
Does DatabaseConnector support Google BigQuery? If not, which files should I change to add this feature?
Any idea would be really appreciated.
Hi,
I found a small problem when I try to connect to oracle database:
result <- class(try(connection <- RJDBC::dbConnect(driver, paste("jdbc:oracle:thin:@",
host,
":",
port,
":",
sid,
sep = ""), user, password), silent = TRUE))[1]
The separator between port and sid should be '/' instead of ":".
Thanks,
Jingqi
Features section of Readme doesn't mention that "pdw" is a separate database option (which in my case led to selection of MS SQL type, when I used PDW, and therefore gave erroneous script behavior), however it is, which is clearly seen through the code.
Unable to connect to make a connection to Redshift database. Java is installed on the server and also the JDBC driver. DatabaseConnector unable to make the connection.
library(Achilles)
Loading required package: SqlRender
Loading required package: DatabaseConnector
Loading required package: RJDBC
Loading required package: DBI
Loading required package: rJava
Loading required package: rjson
connectionDetails <- createConnectionDetails(dbms="redshift",server="10.999.999.99/omop",port="5439", schema="achilles",user="achilles",password="passwd" )
conn <- connect(connectionDetails)
Connecting using Redshift driver
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org.postgresql.jdbc4.Jdbc4Connection.libPaths()
[1] "/usr/lib64/R/library" "/usr/share/R/library"
$ ls /usr/lib64/R/library/DatabaseConnector/java/
jtds-1.2.7.jar postgresql-8.4-704.jdbc4.jar
mysql-connector-java-5.1.30-bin.jar postgresql-9.3-1101.jdbc4.jar
ojdbc6.jar sqljdbc4.jar
The user/password connection string (using the jtds
driver) fails to test for and include the port.
https://github.com/OHDSI/DatabaseConnector/blob/master/R/DatabaseConnector.R#L148
I've got a branch here: master...tomwhite:impala.
I've tested this locally, but is there a recommended way to add automated tests for new databases?
The standard microsoft jdbc requires the installation of the sqljdbc_auth.dll, which only adds windows authentication to java running on windows clients.
If you simple add logic to allow the connectiondetails method to optionally capture (and pass through) of a 'domain' parameter in addition to the user and password parameters you already have then a user can install the jtds jdbc driver java and connect to sql server via windows authentication from all platforms, not just windows.
@alondhe : this connectionDetails object reference does not appear to be valid. Could you fix it?:
DatabaseConnector/R/InsertTable.R
Line 451 in 0c8d63f
The value range of INT is up to 2 ^ 31 - 1, so if you use numeric to call a larger value (such as BIGINT) as insertTable function, it is set to FLOAT value in DB (currently use SQL Server).
Hi @schuemie -- have you seen this error? I've gotten it fairly inconsistently, but usually after execution of one of the longer running OHDSI packages. It can be pretty annoying, especially since it throws an RStudio error pop-up box for each error.
In this example, I ran SCCS, then was given a series of "the connection is closed" errors:
Many SQL dialects have reserved words that cannot be used as field names or table names (e.g. 'rowid' on Oracle). It would be good if DatabaseConnector would throw a warning if it detects such a name when inserting a table.
in the example to test DatabaseConnector, you used querySql to execute a select query:
querySql(conn,"SELECT COUNT(*) FROM person")
What do you use for insert and other queries?
Thank you
It seems like it would be fairly easy to add support for Netezza and there is an existing use case in PEDSnet. Since Netezza is based on PostgreSQL, I'm thinking it would be very similar to the redshift implementation in connect.default. Thoughts?
I'm accustomed to connecting using the Oracle Thin driver using TNSNAMES connection style. I was able to make this work by adding another check in the DatabaseConnector.R for oracle. Here's my patch:
---
R/DatabaseConnector.R | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/R/DatabaseConnector.R b/R/DatabaseConnector.R
index 3792136..e4d2112 100644
--- a/R/DatabaseConnector.R
+++ b/R/DatabaseConnector.R
@@ -197,7 +197,10 @@ connect.default <- function(dbms = "sql server", user, domain, password, server,
sid = parts[2]
}
result <- class(try(connection <- RJDBC::dbConnect(driver, paste("jdbc:oracle:thin:@",host,":",port,":",sid ,sep=""), user, password),silent=TRUE))[1]
-
+
+ # JNix...try again...
+ if (result =="try-error")
+ result <- class(try(connection <- RJDBC::dbConnect(driver, paste("jdbc:oracle:thin:@",server,sep=""), user, password),silent=TRUE))[1]
# Next try OCI driver:
if (result =="try-error")
--
Hi @schuemie,
It looks like R 3.5 only seems to work with JDK 10, but installing JDK 10 seems to introduce a few new wrinkles, one of which is with connect() and Redshift:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.amazon.jdbc.classloader.JarInJarClassLoader (file:/home/alondhe2/R/x86_64-redhat-linux-gnu-library/3.5/DatabaseConnectorJars/java/RedshiftJDBC4-1.2.10.1009.jar) to field java.net.URL.factory
WARNING: Please consider reporting this to the maintainers of com.amazon.jdbc.classloader.JarInJarClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
This doesn't break anything, so not a huge problem, but the warning message hints that future JDKs could prevent the connection from happening without a fix to java.net.URL.factory.
Thanks,
Ajit
We are testing the DatabaseConnector package (on a Postgres database) to retrieve a table with over 25 million records. It's taking about an hour to retrieve all the records with batch size of 100,000 (in dbGetQueryBatchWise function). Increasing the batch size (to 200,000 and above), while hoping to speed up the process, is leading to out of memory exceptions.
Please let us know if there are any suggestions for configuration settings?
Monetdb is an open-source column store. It provides a jdbc driver
https://www.monetdb.org/Documentation/Manuals/SQLreference/Programming/JDBC
I recently installed to the latest version of the DatabaseConnector
package and I am trying to execute a simple query on Postgres (which was working fine with the older version of the package)
querySql.ffdf(con,' select visit_end_date, visit_start_date from pedsnet.visit_occurrence where visit_end_date is null limit 10')
I get the following error Error: Error executing SQL: java.lang.NullPointerException
The query works fine otherwise (i.e. when run directly on the database) and returns some records with null values in one of the columns.
Please let me know if I am missing something.
I tried running different queries and it looks like the ones that return NULL values in any date field do not execute.
I'm getting an invalid url error when trying to call the createConnectionDetail function from R 3.2.3 running on MAC OS X 10.11.5 to connect to SQL Server 2014 using Windows Authentication. I'm passing the following connection string to the connectionString parameter: jdbc:jtds:sqlserver://serverIP/DB;domain='domain';user='user';password='password'. We successfully connected to this SQL server instance using the mentioned connection string but with the RJDBC package.
Previously I tried using the jdbc driver only without jtds and setting the integratedSecurity parameter to yes but this doesn't work in non-windows platforms since it needs a .dll file. I also try setting the integratedSecurity and authenticationScheme parameters to yes and JavaKerberos respectively, but this approach require configuring Kerberos and that seems like hard task. So to avoid this configuration I tried using the jdbc and jtds drivers that works with RJDBC but not with the DatabaseConnector package.
Hi @schuemie
We were able to successfully connect to Postgres (as mentioned here). However, we ran into the following error with Oracle:
Connecting using Oracle driver
Jan 21, 2015 8:46:22 a.m. oracle.jdbc.driver.OracleDriver registerMBeans
WARNING: Error while registering Oracle JDBC Diagnosability MBean.
java.lang.NoSuchMethodError: method javax.management.StandardMBean. with signature (Ljava.lang.Class;Z)V was not found.
at oracle.jdbc.driver.OracleDiagnosabilityMBean.(OracleDiagnosabilityMBean.java:34)
at oracle.jdbc.driver.OracleDriver.registerMBeans(OracleDriver.java:367)
at oracle.jdbc.driver.OracleDriver$1.run(OracleDriver.java:201)
at java.security.AccessController.doPrivileged(libgcj.so.10)
at oracle.jdbc.driver.OracleDriver.(OracleDriver.java:197)
at java.lang.Class.initializeClass(libgcj.so.10)
at java.lang.Class.forName(libgcj.so.10)
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: oracle.jdbc.driver.T2CConnection
Would you have any insight on this? Thanks a lot for your help!
Prestodb is an open source distributed SQL engine. It provides a jdbc driver:
https://prestodb.io/docs/current/installation/jdbc.html
When it gives the option of pathToDriver createConnectionDetails or connect function
An error appears on the console indicating that the driver file can not be found.
Hi,
I'm interested to know if this package support or will support connecting to data cube in SSAS?
Dear All,
I received the following error message while installing DatabaseConnector
"
Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: fun(libname, pkgname)
error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
Error : package 'rJava' could not be loaded
Error: loading failed
Execution halted
ERROR: loading failed for 'x64'
Could someone advise what I am missing here for not being able to install DatabaseConnector package?
Thanks a lot!
Lian
For Redshift's usage of insertTable(), there's this fragment:
else if (attr(connection, "dbms") == "oracle" | attr(connection, "dbms") == "redshift") {
apply(data[start:end,
drop = FALSE],
statement = statement,
isDate = isDate,
MARGIN = 1,
FUN = insertRowOracle)
It errors out with this message:
Error in `[.data.frame`(data, start:end, drop = FALSE) :
undefined columns selected
In addition: Warning message:
In `[.data.frame`(data, start:end, drop = FALSE) :
'drop' argument will be ignored
I'm wondering if the code fragment for Redshift should be this, where there's a blank 2nd parameter in the call to data[]. This doesn't error out, and instead yields a data frame of the data you're trying to insert.
else if (attr(connection, "dbms") == "oracle" | attr(connection, "dbms") == "redshift") {
apply(data[start:end, ,
drop = FALSE],
statement = statement,
isDate = isDate,
MARGIN = 1,
FUN = insertRowOracle)
Hi @anthonysena, @msuchard, and @schuemie,
DatabaseConnector fails to connect to Redshift in IMEDS using a Windows instance for commit dcc97bd and later. Commit 14332dd is the last commit that connects without error. Here is the code that I ran. Password redacted.
install.packages("devtools")
library(devtools)
install_github("ohdsi/SqlRender")
library(SqlRender)
install_github("ohdsi/DatabaseConnector", ref = "dcc97bd")
library(DatabaseConnector)
pw <-
dbms <- "redshift"
user <- "demosymp"
server <- "omop-datasets.cqlmv7nlakap.us-east-1.redshift.amazonaws.com/truven"
cdmschemamdcr <- "mdcr_cdm4"
port <- "5439"
connectiondetailsmdcr <- createConnectionDetails(dbms=dbms, server=server, user=user, password=pw, schema=cdmschemamdcr,port=port)
connmdcr <- connect(connectiondetailsmdcr)
The error is:
connmdcr <- connect(connectiondetailsmdcr)
Connecting using Redshift driver
Error in .jfindClass(as.character(driverClass)[1]) : class not found
If there is more information I can give you to help solve this, please let me know.
Gratefully,
Trevor
It is defined as a VARCHAR column when creating a table, even though the value is put in DATETIME format.
On a new win7 machine, I am getting this error:
> library(devtools)
> devtools::install_github("ODHSI/DatabaseConnector")
Downloading GitHub repo ODHSI/DatabaseConnector@master
from URL https://api.github.com/repos/ODHSI/DatabaseConnector/zipball/master
Error in stop(github_error(request)) : Not Found (404)
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] drat_0.1.0 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.1.2 tools_3.3.1 withr_1.0.2 curl_0.9.7
[6] memoise_1.0.0 git2r_0.15.0 jsonlite_1.0 digest_0.6.9
per discussion here OHDSI/SqlRender#36
this would be GREAT new addition!
@schuemie
Dear All,
I am trying to load Lee Evans provided CDM format csv files into my database. I encountered one problem. Hope someone advise.
some csv data files (e.g. CDM_death.csv, CDM_visit_occurrence.csv) have dates columns, but the format is yyyymmdd., e.g. 20080413). In oracle.sql, the format for DEATH_DATE column for DEATH table is defined as "date". The sqlldr failed to load the this numerical format "yyyymmdd" into DEATH table.
I thought there must be a reason Lee Evans used this kind of numerical format in the csv file.
Could someone advise how to load this type of format as a date format into oracle database? I am using sqlplus.
Thank you very much in advance!
Lian
We have conflicting conventions.
In other discussion - we discuss CamelCase/SnakeCase for SQL and R, yet I kind of like this radical solution the querySql function does.
See below.
Do we want this to stay or be removed ?
I am not sure I like different names in R and SQL. I would not want to call always in this function the Camel-Snake conversion. (also it drops capital letters in 2+1 as grandTotal (into GRANDTOTAL)
I can live with default toupper(). current behaviour (althoug for typing (having RSI) - I would prefer tolower() :-)
querySql <- function(connection, sql) {
tryCatch({
rJava::.jcall("java/lang/System", , "gc") #Calling garbage collection prevents crashes
result <- lowLevelQuerySql(connection, sql)
colnames(result) <- toupper(colnames(result))
return(result)
}, error = function(err) {
writeLines(paste("Error executing SQL:", err))
the insert data feature is very important.
When my data has NAs, it fails )(throws an errro).
I have to do data[is.na(data)] <- 0
to work around it. But the data type is an issue. Could native support for NA be somehow added?
error:
Error in rJava::.jnew("java/lang/String", as.character(row[i])) :
java.lang.NullPointerException
NAs came to be because the source table where the data.frame came from had NULL there.
so idealy NAs would become NULLs.
@schuemie We have been actively using this package to connect to a variety of DBMSes in PEDSnet (please see the [function here] (https://github.com/PEDSnet/Data-Quality-Analysis/blob/master/Library/PerformDatabaseOperations.R#L37).
I wanted to learn how to enable SSL on a postgres database through this package?
We found this [line of code] (
I appreciate your advice on this.
Hopefully this is not user error:
library(DatabaseConnector)
conn <- connect(dbms="sql server", server="RNDUSRDHIT06.jnj.com",schema="Vocabulary")
dbGetQuery(conn,"SELECT COUNT(*) FROM concept")
dbDisconnect(conn)
I get the following error after the CONNECT statement:
Error in connect.default(dbms = "sql server", server = "RNDUSRDHIT06.jnj.com", :
argument "port" is missing, with no default
I do not get this if I manually authenticate with a SQL Server account.
For transparency: I've been trying to add a connection method for AWS's Athena service, which allows you to run SQL against flat files in AWS S3.
@leeevans -- any chance you have experience here?
DATETIME column is not inserted when calling insertTable.
`install.packages("drat")
Warning in install.packages :
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES.gz': HTTP status was '404 Not Found'
Warning in install.packages :
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES': HTTP status was '404 Not Found'
Warning in install.packages :
unable to access index for repository https://OHDSI.github.io/drat/bin/windows/contrib/3.3:
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES'
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.3/drat_0.1.1.zip'
Content type 'application/zip' length 72297 bytes (70 KB)
downloaded 70 KB
package ‘drat’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Temp\Rtmp4eeek4\downloaded_packages
drat::addRepo("OHDSI")
install.packages("DatabaseConnector")
Warning in install.packages :
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES.gz': HTTP status was '404 Not Found'
Warning in install.packages :
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES': HTTP status was '404 Not Found'
Warning in install.packages :
unable to access index for repository https://OHDSI.github.io/drat/bin/windows/contrib/3.3:
cannot open URL 'https://OHDSI.github.io/drat/bin/windows/contrib/3.3/PACKAGES'
installing the source package ‘DatabaseConnector’
trying URL 'https://OHDSI.github.io/drat/src/contrib/DatabaseConnector_1.7.1.tar.gz'
Content type 'application/octet-stream' length 6967603 bytes (6.6 MB)
downloaded 6.6 MB
The downloaded source packages are in
‘C:\Temp\Rtmp4eeek4\downloaded_packages’
library(DatabaseConnector)
Loading required package: RJDBC
Loading required package: DBI
Loading required package: rJavaconn <- connect(dbms = "mysql",
server = "localhost",
user = "root",
password = "root",
Connecting using MySQL driverschema = "sakila")
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Access denied for user ''@'localhost' (using password: NO)
`
Thanks in advance to those who solves the issue...
I studied the documentation here but I am not able to use MSSQL integrated security (or other login)
https://raw.githubusercontent.com/OHDSI/DatabaseConnector/master/extras/DatabaseConnector.pdf
I would like to find out out from other MS SQL users, how they connect to the server using the R DatabaseConnect package
I prefer integrated security (win login) but any working solution will do. (I am worried however that only integrated security will let me connect). I am a user on a domain.
This are options I tried with no luck with any of them.
#windows auth
connD <- createConnectionDetails(dbms="sql server",user=myUser, server=myServer,schema=mySchema)
#windows auth with domain
connD <- createConnectionDetails(dbms="sql server",user=myUser, server=myServer,domain=myDomain,schema=mySchema)
#not using integrated security - plain username and psw
connD <- createConnectionDetails(dbms="sql server",user=myUser, password = myPsw,server=myServer,domain=myDomain,schema=mySchema)
Can you please advise what are working connectionDetails approaches that are working fine at other sites
Errors I am getting.
> conn<-connect(connD)
Connecting using SQL Server driver
Warning: Using JTDS driver because a domain is specified. This may lead to problems. Try using integrated security instead.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.
Connecting using SQL Server driver
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'vhuser'. ClientConnectionId:a5652c6f-378a-47a4-ae38-c07b3f296566
>
Something that could be an issue: querying records with NULL values against SQL Server on a Windows machine seems to be producing 0s for the NULL values when you run querySql().
This doesn't happen for me using PDW/Windows or SQL Server/RedHat. Need to dig into this more, but @schuemie have you seen anything like this?
Edit: Anthony Molinaro confirms he's getting 0s for NULL values in SQL Server/Windows
I am using DQ tutorial amazon environment (that has redshift demo OMOP data) and get tables names does not work no matter what trick I try
In the redshiftCopy.sql file, we use the BLANKSASNULL option, this should not be set by default, as it can change the intended blank values of a data frame into DB NULLs.
We'll need to add SSL support to this package.
Column names of data for InsertTable should not start with number. I don't know why, but it acts weird in the function and causes an error.
Latest version seems to effectively remove both functions and the only one available is querySql. This effectively breaks the Aphrodite code that depends on database calls.
Did I miss a notice that this change will be made?
https://github.com/OHDSI/DatabaseConnector/blob/master/R/InsertTable.R#L439
This conditional doesn't actually pull the user name and password from a connection object.
I believe the AND condition should be an OR in this part of the code
https://github.com/OHDSI/DatabaseConnector/blob/master/R/DatabaseConnector.R#L309
In order to have the schema specified after connection, the current code requires it in both the schema argument AND in the connectionDetails$schema.
I think the desired behaviour is that if simply just the schema parameter says 'forceit' the connection function with switch to forceit schema.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.