Coder Social home page Coder Social logo

ohdsi / achilles Goto Github PK

View Code? Open in Web Editor NEW
128.0 70.0 116.0 7.68 MB

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database

Home Page: https://ohdsi.github.io/Achilles/

R 99.57% Perl 0.18% Shell 0.25%

achilles's Introduction

Achilles

Build Status codecov.io CRAN_Status_Badge CRAN_Status_Badge

Achilles is part of HADES.

Introduction

Automated Characterization of Health Information at Large-Scale Longitudinal Evidence Systems (ACHILLES) Achilles provides descriptive statistics on an OMOP CDM database. ACHILLES currently supports CDM version 5.3 and 5.4.

Features

  • Performs broad database characterization
  • Export feature for ARES
  • Export feature for AchillesWeb (deprecated)

Technology

Achilles is an R package.

System Requirements

Requires R (version 4.0 or higher).

Installation

  1. See the instructions here for configuring your R environment, including RTools and Java.

  2. In R, use the following commands to download and install Achilles:

install.packages("remotes")
remotes::install_github("OHDSI/Achilles")

User Documentation

Documentation can be found on the package website.

Support

Contributing

Read here how you can contribute to this package.

License

Achilles is licensed under Apache License 2.0

Development

Achilles is being developed in R Studio.

Development status

Achilles is ready for use.

Acknowledgements

  • This project was supported in part through the National Science Foundation grant IIS 1251151.

achilles's People

Contributors

aandryc avatar agackovka avatar aguynamedryan avatar alondhe avatar anthonymolinaro avatar anthonysena avatar c4lm avatar cataphract avatar chrisknoll avatar clairblacketer avatar fdefalco avatar hongwonjun avatar jmbanda avatar lopsided avatar mark-velez avatar maximmoinat avatar msuchard avatar murphyke avatar myounglai avatar paulheider avatar pavgra avatar pbr6cornell avatar rfollett avatar schuemie avatar ssuvorov-fls avatar t-abdul-basser avatar tomwhite avatar vojtechhuser avatar wivern avatar wojszko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

achilles's Issues

Error when running Achilles against CDM v5

I updated my R install with the latest Achilles and SQLRender packages in order run against a CDM v5 database instance. Everything appears to work fine until I attempt to export the JSON. At this point, the query appears to be referring to vocabularies via numerical ID a la CDM v4 (1) rather than name (SNOMED)

exportToJson(pgConnectionDetails,"public", "achilles_results", "/Users/wstephens/ext/OHDSI/AchillesWeb-master/data/Test")
| |
0%Error executing SQL: Error in .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", as.character(query)[1]): org.postgresql.util.PSQLException: ERROR: operator does not exist: character varying = integer
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 1075

An error report has been created at /Users/wstephens/errorReport.txt
Error in value[3L] : no loop for break/next, jumping to top level

screen shot 2015-01-13 at 3 01 59 pm

ACHILLES HEEL - Flipped DISEASE_CLASS and REVENUE_CODE queries.

At the top you define the following analysis:
--1609-Number of records by disease_class_concept_id
--1610-Number of records by revenue_code_concept_id

DISEASE_CLASS_CONCEPT_ID typically falls in VOCABULARY 40-DRG and REVENUE_CODE_CONCEPT_ID falls in VOCABULARY 43-REVENUE_CODE.

However, you have these VOCABULARY_IDs switched when actually look for the issue in the CDM. So your REVENUE_CODE query is assigned to 1609 when is it should be 1610 and vice ver ca.

small prevalence concept files are not generated

In Achilles Web going to the table view of the condition report and clicking on a condition with only a few people in it will generate a 404 error as the file is not found. This is due to the exportToJSON function not generating all necessary files.

Schema identification suggestion

The documentation for Achilles and Achilles Web has commands that take in arguments for the "names of the schemas holding the CDM data and target results ". This may be the terminology for other DB's, but in SQL Server, these arguments are the DB Name, not necessarily the schema that the tables are in within that DB.

But the more significant issue is that the SQL scripts assume the tables are in the 'dbo' schema within that DB. 'dbo' is either hardcoded into the scripts or no schema is provided, which results in falling back on the users default schema. Within our organization it's specific policy not to use 'dbo', so our implementation uses other schema's for the CDM tables. As a result, I had to modify all those scripts to get our implementation running.

We may be the only ones who face this issue, but would suggest modifying those calls to accept a source and target DB Name AND a source and target schema (perhaps this could default to 'dbo') and adjust the SQL Scripts accordingly. It could make Achilles a bit more flexible.

Error on Analysis 116 in Oracle

I am getting this error when running analysis 116, which seems to be a key analysis for making the individual visualizations work for conditions, procedures, etc. Looks like a connection error but other analyses running fine in the same session with same ConnectionDetails.

screen shot 2014-07-15 at 12 43 53 pm

Compatibility with upcoming OMOP CDMV5

Just wondering if there are plans to bring this tool into compatibility with the upcoming OMOP CDMV5, how long it might take, and if there's anything I can do to help.

Field missing causing Achilles to crash at 59% complete

I am trying to run Achilles on CDM V5. After following the installation instructions in this page, I managed to get until 59% complete and then I get this error (extracted from the error log)

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: column vo1.place_of_service_concept_id does not exist
Position: 114)

SQL:
insert into results_schema.ACHILLES_results (analysis_id, stratum_1, count_value)
select 200 as analysis_id,
vo1.place_of_service_CONCEPT_ID as stratum_1,
COUNT(distinct vo1.PERSON_ID) as count_value
from
visit_occurrence vo1
group by vo1.place_of_service_CONCEPT_ID

After looking at the CDM V5 fileds for the visit_occurrence table, the column place_of_service_CONCEPT_ID does not exist.

Any suggestions? or is there a different way to run the CDMV5 Achilles, other than the instructions found in this page.

Thanks!

Postgres Missing CAST in Join

In the sqlAgeAtFirstDiagnosis.sql file there is a missing (CAST as INT) in a join condition (line 12) which causes the join to fail in Postgres 9.1 due to a type mismatch error. We believe we have fixed this in a fork. Should we move forward with committing this change?

Problem with generating Achilles results

Hi,

I saw an error trying to generate Achilles result. Here's the error:

Error:

execute JDBC update query failed in dbSendUpdate (ERROR: operator does not exist: timestamp without time zone + integer
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 610)

SQL:
CREATE TEMP TABLE temp_dates
AS
SELECT
DISTINCT
EXTRACT(YEAR FROM observation_period_start_date)_100 + EXTRACT(MONTH FROM observation_period_start_date) AS obs_month,
TO_DATE(CAST(EXTRACT(YEAR FROM observation_period_start_date) AS varchar(4)) || RIGHT('0' || CAST(EXTRACT(MONTH FROM OBSERVATION_PERIOD_START_DATE) AS VARCHAR(2)), 2) || '01' , 'yyyymmdd') AS obs_month_start,
((TO_DATE(CAST(EXTRACT(YEAR FROM observation_period_start_date) AS varchar(4)) || RIGHT('0' || CAST(EXTRACT(MONTH FROM OBSERVATION_PERIOD_START_DATE) AS VARCHAR(2)), 2) || '01' , 'yyyymmdd') + 1_INTERVAL'1 month') + -1) AS obs_month_end
FROM
omop_rz.observation_period

What is the possible fix for this error?

Thank you in advance.

Global Temporary Table - Analysis 116

I'm getting an error on Analysis 116 as follows:

Error:
execute JDBC update query failed in dbSendUpdate (ORA-01031: insufficient privileges
)

SQL:
CREATE TEMPORARY TABLE temp_dates
ON COMMIT PRESERVE ROWS
AS
SELECT
distinct
EXTRACT(YEAR FROM observation_period_start_date) as obs_year

FROM

OBSERVATION_PERIOD

I think it is trying to create the temp table on the CDM4 schema rather than on my Results schema. I have don't have permissions to create tables on the CDM4 database, but of course can create on the Results database.

Can you change code for 116 (and any others than use temp tables) to make the table on @results_schema.dbo.[table] so that a user does not require CREATE privileges on their CDM to run the analysis?

STDDEV problem in analysis 105

I'm getting the following errorReport.txt when I try to run the achilles function:

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: function stddev(interval) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
  Position: 343)

SQL:
insert into public.ACHILLES_results_dist (analysis_id, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)^M
select 105 as analysis_id,^M
        COUNT(count_value) as count_value,^M
        min(count_value) as min_value,^M
        max(count_value) as max_value,^M
        avg(1.0*count_value) as avg_value,^M
        STDDEV(count_value) as stdev_value,^M
        max(case when p1<=0.50 then count_value else -9999 end) as median_value,^M
        max(case when p1<=0.10 then count_value else -9999 end) as p10_value,^M
        max(case when p1<=0.25 then count_value else -9999 end) as p25_value,^M
        max(case when p1<=0.75 then count_value else -9999 end) as p75_value,^M
        max(case when p1<=0.90 then count_value else -9999 end) as p90_value^M
from^M
(^M
select ( op1.observation_period_end_date - op1.observation_period_start_date) as count_value,^M
        1.0*(row_number() over (order by ( op1.observation_period_end_date - op1.observation_period_start_date)))/(COUNT(( op1.observation_period_end_date - op1.observation_period_start_date)) over ()+1) as p1^M
from PERSON p1^M
        inner join ^M
        (select person_id, ^M
                OBSERVATION_PERIOD_START_DATE, ^M
                OBSERVATION_PERIOD_END_DATE, ^M
                ROW_NUMBER() over (PARTITION by person_id order by observation_period_start_date asc) as rn1^M
                 from OBSERVATION_PERIOD^M
        ) op1^M
        on p1.PERSON_ID = op1.PERSON_ID^M
        where op1.rn1 = 1^M
) t1^M

sql server connection error - win authentication

I installed the package.

I get this error

connectionDetails <- createConnectionDetails(dbms="sql server", server="myserver",schema="CDM")
conn <- connect(connectionDetails)

Connecting using SQL Server driver using Windows integrated security
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication.

Even if I try domain/user with psw. I am not able to connect due to security settings. Where I can configure the JDBC driver used by R?

Login within R via ODBC, however, works.
I am able to connect with win authontication using ODBC route.
Like so:

require(RODBC)
conn = odbcConnect("myserver")

Achiles probably requires JDBC. I could try to use JDBC ODBC bridge....

require(RJDBC) #tried this

drv<-JDBC("sun.jdbc.odbc.JdbcOdbcDriver") not found, does not work, did not try to install JDBC ODBC bridge solution

SQL server throws 'fatal error' while running Achilles

Hi,

Achilles runs fine for about 53% and then it errors out with the message 'sql server closed connection'. On further investigation, I found that the following statement is causing the error. Has anyone encountered this error and is there a way to fix this?

--{104 IN (@list_of_analysis_ids)}?{
-- 104 Distribution of age at first observation period by gender
insert into @results_database_schema.ACHILLES_results_dist (analysis_id, stratum_1, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)

select 104 as analysis_id,
gender_concept_id,
COUNT_BIG(count_value) as count_value,
min(count_value) as min_value,
max(count_value) as max_value,
avg(1.0*count_value) as avg_value,
stdev(count_value) as stdev_value,
max(case when p1<=0.50 then count_value else -9999 end) as median_value,
max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
max(case when p1<=0.90 then count_value else -9999 end) as p90_value

from
(
select p1.gender_concept_id,
year(op1.index_date) - p1.YEAR_OF_BIRTH as count_value,
1.0_(row_number() over (partition by p1.gender_concept_id order by year(op1.index_date) - p1.YEAR_OF_BIRTH))/(COUNT_BIG(_) over (partition by p1.gender_concept_id)+1) as p1
from
OMOP_CDM4.dbo.PERSON p1
inner join (select person_id, MIN(observation_period_start_date) as index_date from OMOP_CDM4.dbo.OBSERVATION_PERIOD group by PERSON_ID) op1 on p1.PERSON_ID = op1.PERSON_ID) t1
group by gender_concept_id

Thanks!
Paulina.

Issue with prevalence calculations in conditions, condition era and drug exposures

We have been working on setting up Achilles for the first time and there are a few of the treemap graphs that donโ€™t display. Tracing it back I believe it is due to the prevalence for each row (Conditions, Condition Era, Drug Exposures) being calculated at 0%. The other treemaps based on prevalence get calculated and displayed correctly.

Digging into the sql itself (e.g. sqlConditionTreemap.sql) I can run the sql directly and it brings back the correct results and prevalence calculations but the JSON files appear to incorrect prevalence calculations from the results of the SQL statements. Has something like this been seen before?

ExportToJSON by AnalysisID / Table

This may not be possible, but exporting to JSON is now only done in bulk. Can take several hours. When doing incremental additional analyses, would be nice to select export of only certain analyses(or all analyses for a given table) to JSON.

Sample Dataset in CDM v4

Is there a sample dataset that is CDM v4 that I can use Achilles with? The OMOP website provides OSIM and OSIM2 but they both appear to be using legacy CDMs.

docker-compose build fails due to openjdk-6-jdk

$ sudo docker-compose build

---> Running in 5cb42fa9dd4e
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Get:2 http://security.debian.org jessie/updates/main amd64 Packages [177 kB]
Get:3 http://http.debian.net sid InRelease [253 kB]
Ign http://http.debian.net jessie InRelease
Get:4 http://http.debian.net jessie-updates InRelease [142 kB]
Get:5 http://http.debian.net jessie Release.gpg [2,373 B]
Get:6 http://http.debian.net sid/main amd64 Packages [10.1 MB]
Get:7 http://http.debian.net jessie Release [148 kB]
Get:8 http://http.debian.net jessie/main amd64 Packages [9,035 kB]
Get:9 http://http.debian.net jessie-updates/main amd64 Packages [3,614 B]
Fetched 19.9 MB in 2s (7,089 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Package openjdk-6-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
apt

E: Package 'openjdk-6-jdk' has no installation candidate
Service 'achilles' failed to build: The command '/bin/sh -c apt-get update && apt-get install -y libssl-dev libcurl4-openssl-dev libxml2-dev openjdk-6-jdk && rm -rf /var/lib/apt/lists/* && R CMD javareconf' returned a non-zero code: 100

Have you seen this before during builds?

$ uname -a
Linux ip-54-40-17-252 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Provide support for new features in CDMv5

The current version of Achilles does not break when using CDMv5, but it also doesn't use any of the new features.

Most importantly, it is not outputting statistics on the Measurement table.Other immediately useful information would be the metadata that was introduced in V5. Currently less important (because nobody I know has the information) is reports on devices, specimen, notes, and information on different types of observation periods.

Add test_id to Achilles Heel generation

Currently, there is no way to uniquely identify an error for an analysis. For example, you may receive 2 errors for the same Analysis_Id, and while they have different text messages, you can't tell if they came from different tests. You also can not compare results between 2 different heel reports over time to determine if the same error occurred or was fixed.

We propose the following change to all Achilles heel queries:

First, add a column to Achilles_heel_results

CREATE TABLE ACHILLES_HEEL_results (
  test_id INT,
  analysis_id INT,
  ACHILLES_HEEL_warning VARCHAR(255)
);

Next, for each Achilles heel test:

--check for non-zero counts from checks of improper data (invalid ids, out-of-bound data, inconsistent dates)
INSERT INTO ACHILLES_HEEL_results (
        test_id, -- this is new
    analysis_id,
    ACHILLES_HEEL_warning
    )
SELECT DISTINCT 1 as test_id, -- hard code test ID for this test
    or1.analysis_id,
    'ERROR: ' + cast(or1.analysis_id as VARCHAR) + '-' + oa1.analysis_name + '; count (n=' + cast(or1.count_value as VARCHAR) + ') should not be > 0' AS ACHILLES_HEEL_warning
FROM ACHILLES_results or1
INNER JOIN ACHILLES_analysis oa1
    ON or1.analysis_id = oa1.analysis_id
WHERE or1.analysis_id IN (
        7,
        8,
        9,
        114,
        115,
        207,
        208,
        209,
        210,
        302,
        409,
        410,
        411,
        412,
        413,
        509,
        510,
        609,
        610,
        612,
        613,
        709,
        710,
        711,
        712,
        713,
        809,
        810,
        812,
        813,
        814,
        908,
        909,
        910,
        1008,
        1009,
        1010,
        1415,
        1500,
        1501,
        1600,
        1601,
        1701
        ) --all explicit counts of data anamolies
    AND or1.count_value > 0;

With a test_id, we can then determine if a test was resolved between heel reports.

Problem with Analysis 606

I have run Analysis 606 several times and it does not make it to the Results table. But I do get this ERROR in Achilles HEEL:

ERROR: Distribution of age by procedure_concept_id; min should not be negative

Is this a "fatal" data problem then that prevents completion of the query? I want to sleuth out the cause (assuming something to do with birthdates). But wondering if ACHILLES could still complete output even with this error.

Can I use just one schema for input data and also for results?

There is a key step in the process that looks like this.
achillesResults <- achilles(connectionDetails, "cdm4_inst", "results")

Will the script crash if I have it point to the same special schema
achillesResults <- achilles(connectionDetails, "OMOPsandbox", "OMOPsandbox")?

I think it should not but I want to be sure before I request from my dbadmin the necessary users and schemas.

Postgres CAST Errors

In the TestAchillesCode.R we noticed that there is code written to test Achilles on PostgreSQL, but when we tried to use Achilles with PostgreSQL 9.1 data we ran into CAST errors. The Postgres ROUND function expects ROUND(numeric,int) but there are two instances of ROUND(CAST as FLOAT, int) which causes PG to throw an illegal data type error. When changed to ROUND(CAST as NUMERIC(8,6), int), these statements work correctly. We believe we have fixed all of these in a fork. Should we move forward with committing these changes?

Unable to connect Achilles to SQL server

I've copied both drivers jtds-1.2.7.jar and sqljdbc4.jar to the C:\Program Files\R\R-3.2.0\library\DatabaseConnector\java directory. Can anyone help in figuring out how to resolve this issue?

SQL Server authentication:

connectionDetails <- createConnectionDetails(dbms="sql server", server="localhost", schema="OMOP", user="xyz",pasword="123")
conn <- connect(connectionDetails)

Connecting using SQL Server driver
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: I/O Error: DB server closed connection.

Windows authentication: The error message received is
Connecting using SQL Server driver using Windows integrated security
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication. ClientConnectionId:4434edbb-347f-4373-b285-3c44636caeb9

Thanks!

Set up automated Docker builds on the registry

This would automatically build the docker image of Achilles every time the repo is updated, using the recently added Dockerfile, and store that image on the Docker Hub service (the registry). The advantages are two-fold:

  1. Automatic monitoring of build process and notification of failures.
  2. Removal of the local build step when running Achilles using Docker.

Dashboard - Persons with Continuous Observation By Month - Last Month Missing

In my TRUVEN MDCD data, the raw data ends 12/31/2012 however according to this chart no one has continuous observation time in 12/2012. The last month available on this chart is 11/2012 (see image).

image

The following query pulls back 6,743,855 people in our CDM.
SELECT COUNT(DISTINCT PERSON_ID)
FROM OBSERVATION_PERIOD op
WHERE op.OBSERVATION_PERIOD_START_DATE <= '12/01/2012'
AND op.OBSERVATION_PERIOD_END_DATE >= '12/31/2012'

Achilles shows last dataset but current label if the data folder is incorrect

I noticed this behavior today when I was adding a new dataset to review using Achilles directly in Firefox. Note that I am manually configuring Achilles by editing the datasources.json file. The new dataset was a sub-population of one I had already configured so I expected a much smaller number of patients. However, even though Achilles showed the new datasets name, the data it showed was from the larger dataset that I had viewed earlier in the morning. I found out that I had misspelled the data folder in datasources.json. Once corrected, I refreshed Achilles from the browser and the correct data was shown.

Obviously, this is fundamentally an "operator error" however, I think that Achilles behavior when the data folder name is incorrect should be to report an error or show no data at all so that users don't accidentally review a previously viewed dataset thinking that it's the new one.

thanks for all the great work!
-R

Analysis Id input parameter

Currently, AchillesWeb does not support displaying only a subset of reports for a given Table. But it is difficult to know which Analysis_Ids apply to a given table report. Thus it would be helpful to have a parameter in achilles where you enter the table you want to analyze (e.g., Person) and it runs all those reports.

achilles(connectionDetails, cdmSchema, resultsSchema, sourceName = "Test", TableToAnalyze="Person" createTable = TRUE, smallcellcount = 5)

OR

myTables=c("Person","Condition_Occurrence)

achilles(connectionDetails, cdmSchema, resultsSchema, sourceName = "Test", TablesToAnalyze=myTables)

How to submit new analyses for Achilles (which SQL flavor to code)

In Achilles.sql file thare are many analyses. (most likely in SQL server flavor) (or what flavor exactly)
These get translated to postgres flavor using R package.

I would like to contribute new analyses. Probably some semi-final SQL code that gets incorporated by core developer.

For authoring new analyses, in which sql flavor should I author them?
(I need rounding of real number to integer and then to multiple of 5)
e.g., 21 rounded to 20
7 rounded to 5
6.3 rounded to 5

so it will be like x- (x % 5) (modulo function)

How sophisticated is the flavor translation?

Pre-release testing on R running on Ubuntu, nursing home dataset

Dear Achilles Team,

Please forgive me for taking the liberty to do some "pre-release" testing on your code. The truth is, I am working on an ETL for a nursing home dataset and could not resist the temptation tao test Achilles because I think that its summary reports could help me more rapidly see the influence of some of the decisions I have to make while loading the dataset.

In case its helpful, I have written below a brief account of my experience with installing and running the tool in its current state:

Installation
I installed Achilles and its dependencies on a machine running Ubuntu Precise (12.04). I had to upgrade my R from 2.x to 3.1. The instructions here were helpful (http://stackoverflow.com/questions/10476713/how-to-upgrade-r-in-ubuntu)

The dependencies that I noted for Achilles were:
-- Rcpp (Cran)
-- RJDBC (Cran)
-- rJava (Cran)
-- rjson (Cran)
-- DBI (Cran)
-- SqlRender (OHDSI)
-- DatabaseConnector (OHDSI)

Most of these installed from the command line with no issues. For Cran, I installed in R running as root like so:

# install.packages("Rcpp", lib="/usr/local/lib/R/site-library/")

For OHDSI packages, I installed like so:

sudo R CMD INSTALL SqlRender/ -l /usr/local/lib/R/site-library/

In SqlRender I had to comment out the include for _mingw.h present in SqlTranslate.cpp and SqlSplit.cpp. This seems to be a Windows compatability requirement? Another issue was that there was an include in SqlRender.h that seemed spelled incorrectly:

/*#include "SQLRender.h"*/
#include "SqlRender.h"

Finally, I installed Achilles:

sudo R CMD INSTALL Achilles/ -l /usr/local/lib/R/site-library/

Running Achilles
After installation, I followed the example provided in the help page and figured out how to connect to the DB with my nursing home data

> library("Achilles")
> connectionDetails <- createConnectionDetails(dbms="oracle",user="...",password="...",server="...",schema="DIKB_DEV")
> achillesResults <- achilles(connectionDetails, "DIKB_DEV", "DIKB_DEV", "TestDB")
Connecting using Oracle driver
Executing multiple queries. This could take a while
  |======================================================================| 100%
Analysis took 16.4 secs
Done. Results can now be found in DIKB_DEV

The database had new tables loaded with data and R held a loaded achillesResults dataframe.

I then tried to run 'plot' per the example in the help page but ran into an error:

> plot(achillesResults, "population")
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ

Not sure what happened here - any ideas?

Next, I wanted to play with AchillesWeb. I figured the exportToJson function was important for generating data that the web app could load so gave it a try. It almost worked:

> exportToJson(connectionDetails, "DIKB_DEV", "DIKB_DEV")
Warning: folder /home/PITT/rdb20/GeriOMOP already exists
Connecting using Oracle driver
Generating person reports
  |======================================================================| 100%
Generating observation period reports
  |======================================================================| 100%
Generating condition treemap
  |======================================================================| 100%
Generating condition reports
  |                                                                      |   0%

Error executing SQL: Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result... 

The error file held the SQL query that failed (below). Running it against the DB yielded the following error:

ORA-01722: invalid number
01722. 00000 -  "invalid number"
*Cause:    
*Action:

Its no clear to me what happened. One issue I had with OSIM2 that might be relevant here is that the nursing home population does not have certain age groups. This can trip up code that assumes all age groups will have some members. Anything else I can test to help find out whats going on?

SQL Query that failed:

DBMS:
oracle

Error:
Unable to retrieve JDBC result set for select c1.concept_id as condition_concept_id, 
       c1.concept_name as condition_concept_name,
       c2.concept_group_id as concept_id,
       c2.concept_group_name as concept_name, 
       sum(ar1.count_value) as count_value
from ACHILLES_results ar1
       inner join
       DIKB_DEV.concept c1
       on ar1.stratum_1 = c1.concept_id
       inner join
       (
       select concept_id,
             case when concept_name like 'Inpatient%' then 10
                    when concept_name like 'Outpatient%' then 20
                    else concept_id end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 1
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 2
                    else 0 end as concept_group_id,
             case when concept_name like 'Inpatient%' then 'Claim- Inpatient: '
                    when concept_name like 'Outpatient%' then 'Claim- Outpatient: '
                    else concept_name end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 'Primary diagnosis'
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 'Secondary diagnosis'
                    else '' end as concept_group_name
       from DIKB_DEV.concept
       where vocabulary_id = 37

       ) c2
       on ar1.stratum_2 = c2.concept_id
where ar1.analysis_id = 405
group by c1.concept_id, 
       c1.concept_name,
       c2.concept_group_id,
       c2.concept_group_name (ORA-01722: invalid number
)

SQL:
select c1.concept_id as condition_concept_id, 
       c1.concept_name as condition_concept_name,
       c2.concept_group_id as concept_id,
       c2.concept_group_name as concept_name, 
       sum(ar1.count_value) as count_value
from ACHILLES_results ar1
       inner join
       DIKB_DEV.concept c1
       on ar1.stratum_1 = c1.concept_id
       inner join
       (
       select concept_id,
             case when concept_name like 'Inpatient%' then 10
                    when concept_name like 'Outpatient%' then 20
                    else concept_id end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 1
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 2
                    else 0 end as concept_group_id,
             case when concept_name like 'Inpatient%' then 'Claim- Inpatient: '
                    when concept_name like 'Outpatient%' then 'Claim- Outpatient: '
                    else concept_name end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 'Primary diagnosis'
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 'Secondary diagnosis'
                    else '' end as concept_group_name
       from DIKB_DEV.concept
       where vocabulary_id = 37

       ) c2
       on ar1.stratum_2 = c2.concept_id
where ar1.analysis_id = 405
group by c1.concept_id, 
       c1.concept_name,
       c2.concept_group_id,
       c2.concept_group_name

Heel: Analysis-SUB-IDs

When Heel uses previous data, it checks for various variations, but the variation is not assigned a subID.

It is hard to identify if the problem is events prior birth or events after death.
They all get the same analysisID 101

101 <0
101 >100

This is quite key quality indicator and if it has the same ID, it is not good for tracking quality.

See the problem here:

filter(dsets,msgID==101)
Source: local data frame [9 x 2]

msgID msg
1 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,869)
2 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=21,924)
3 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=10,836)
4 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=42)
5 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=6)
6 101 Number of persons by age (n=338)
7 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=8,549)
8 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,100)
9 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=15)

Achilles Heel Output - More Information on Specific Issues on Issue Prevelance

Right now the Achilles Heel report only gives counts at the CONCEPT_ID level. Example:

ERROR: 706 - Distribution of age by observation_concept_ID (count = 16); min value should not be negative.

The count 16 is on the CONCEPT_ID, but that doesn't tell you how prevalent the issue is. It would be nice to know how many records and how many people it affects.

This is a feature requested by the Erasmus folks.

feature suggestion: increase number of rows from 5 to 15

Is there a performance reason why only 5 rows are displayed in most reports. (e.g., conditions, drugs, procedures)?

Can this be increased in future v5based version?

(Thanks to Achilles install in IMEDS Lab - I can now provide much better feedback)

error during install

Installing the package package below fails. I am on Win7 in RStudio.
The other 2 packages install fine.

Maybe other users see the same error....

Here is the error

install_github("ohdsi/SqlRender")

Installing SqlRender
"C:/PROGRA1/R/R-311.0/bin/x64/R" --vanilla CMD INSTALL
"C:\Users\huserv\AppData\Local\Temp\1\Rtmp4OBuft\devtools38a45b0c7062\SqlRender-master"
--library="C:/Users/huserv/Documents/R/win-library/3.1" --install-tests

  • installing source package 'SqlRender' ...
    ** libs

*** arch - i386
Warning: running command 'make -f "Makevars.win" -f "C:/PROGRA1/R/R-311.0/etc/i386/Makeconf" -f "C:/PROGRA1/R/R-311.0/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="SqlRender.dll" OBJECTS="RcppExports.o RcppWrapper.o SqlRender.o SqlSplit.o SqlTranslate.o stringUtilities.o"' had status 127

ERROR: compilation failed for package 'SqlRender'

  • removing 'C:/Users/huserv/Documents/R/win-library/3.1/SqlRender'
    Error: Command failed (1)

Analysis 406 and 1006 - Same Error Msg, Different Queries

Both Analysis_ID 406 and 1006 have the same error message:

"ERROR: Distribution of age by condition_concept_id; min should not be negative"

However, one is off the CONDITION_ERA and the other is off the CONDITION_OCCURRENCE. Impossible tell which is which from the error message.

This is the same for 906 vs 706 ("Distribution of age by drug_concept_id")

Add export modes

Introduce exportToJSON options for generating only missing files vs generating all files and overwriting existing files.

New analysis suggestion - count of LV5, LV10, LV15 and LV20 patients

I have a new analysis. Is there a list of all SQL queries Achilles runs - I want to assign an ID to that new SQL

Is there a documentation on what output structure I should use for output of new analyses (new table ,existing table)

The analysis I want to add is this SQL query: (in RedShift flavor)

(the query gets first visit and last visit for each patient, then rounds it to span years and arrives at span rounded to nearest lower multiple of 5 (substracts modulo results). It then counts number of patients in each bin.

SET SEARCH_PATH TO ccae_cdm4;
select v5span years_span, count(*) from (
            select person_id
            --, fdt, ldt,(ldt-fdt) ,(ldt-fdt)/365.25 dif, cast((ldt-fdt)/365.25 as integer) ydif
            ,cast((ldt-fdt)/365.25 as integer)- mod(cast((ldt-fdt)/365.25 as integer),5) v5span
            from(
                    select person_id, min(e.visit_start_date) fdt, max(e.visit_end_date) ldt
                    from visit_occurrence e   
                    group by person_id
            )
)
 group by v5span order by v5span;

The output gives numbers like this:

0       99,110,955 
5       12,406,560 
10      676,189 

WHich is for CCAE and it means there are 99M patients with visit span of 0-5 years
12M patients with visit span 5-10 years and no patients with more than 15 years time span.

It is similar to existing analysis, but that is not a set of 4 numbers but a graph.

It is based on this work:
https://github.com/OHDSI/sandbox/tree/master/CDMv4/analytical-SQL-code/vojtech_huser/IRIS

JDBC Query Failed: function left(integer, integer) does not exist

Hi everyone,

I'm trying to run the Achilles R code on a local postgresql database, and I'm finding an error recurring at 81% completion. The error file text is

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: function left(integer, integer) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 112)

SQL:
insert into testschema.ACHILLES_results (analysis_id, stratum_1, count_value)
select 1100 as analysis_id,
left(l1.zip,3) as stratum_1, COUNT(distinct person_id) as count_value
from PERSON p1
inner join LOCATION l1
on p1.location_id = l1.location_id
where p1.location_id is not null
and l1.zip is not null

group by left(l1.zip,3)

Any suggestions for how I can resolve this?

Thank you very much for your help!

Gratefully,
Trevor

Potential exportToJson.R Error

When trying to run the exportToJson.R code, an error message kept occurring. In looking into the code within this file, we noticed that the allReport functions appear to be ordered alphabetically. We think that our error was generated as a result of this ordering, specifically, the generateDashboardReport function occurs before the generatePersonReport, but it seems that the DashboardReport is dependent upon thePersonReport functioning having already been ran. When we moved the generateDashboardReport function to the end of the file we were able to successfully run the code (although we are unsure if the output is correct). Could there be another reason as to why we are experiencing this error?

ExportToJSON - Observation Period and Data Density

I am having problems getting proper JSON output for AchillesWeb for Observation Period and Data Density. I have reviewed my Achilles Results for the relevant analyses and all seem to be present and accounted for. But the JSON file produced is basically blank (ObservationPeriod) or extremely weird (DataDensity).

I am showing a screenshot from my data density page, notice the suspicious x-axis for Concept Type (P r o c e d u r). Something is amiss. Has anyone else got this fully working in Oracle?

datadensity oddity

ANALYSIS_ID 717 - QUANTITY > 600 for DRUG_CONCEPT_ID

When we see this error thrown it is often for drugs with fluid volumes (e.g. IV). I'm just noting that this ERROR should be updated (but probably easier to do in CDM v5).

Credit to Rupa figuring this one out.

Heel: version of Heel used to generate the errors

When Heel messages are generate, it would be nice to have msgID=0, version of Heel used was 1.1

That way, one can know what to expect in the data

See problem here - data set 6 used probably an earlier version of Heel.

filter(dsets,msgID==101)
Source: local data frame [9 x 2]

msgID msg
1 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,869)
2 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=21,924)
3 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=10,836)
4 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=42)
5 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=6)
6 101 Number of persons by age (n=338)
7 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=8,549)
8 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,100)
9 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.