ohdsi / achilles Goto Github PK

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database

Home Page: https://ohdsi.github.io/Achilles/

R 99.57% Perl 0.18% Shell 0.25%

achilles's Introduction

Achilles

Achilles is part of HADES.

Introduction

Automated Characterization of Health Information at Large-Scale Longitudinal Evidence Systems (ACHILLES) Achilles provides descriptive statistics on an OMOP CDM database. ACHILLES currently supports CDM version 5.3 and 5.4.

Features

Performs broad database characterization
Export feature for ARES
Export feature for AchillesWeb (deprecated)

Technology

Achilles is an R package.

System Requirements

Requires R (version 4.0 or higher).

Installation

See the instructions here for configuring your R environment, including RTools and Java.
In R, use the following commands to download and install Achilles:

install.packages("remotes")
remotes::install_github("OHDSI/Achilles")

User Documentation

Documentation can be found on the package website.

Support

Developer questions/comments/feedback: OHDSI Forums
We use the GitHub issue tracker for all bugs/issues/enhancements

Contributing

Read here how you can contribute to this package.

License

Achilles is licensed under Apache License 2.0

Development

Achilles is being developed in R Studio.

Development status

Achilles is ready for use.

Acknowledgements

This project was supported in part through the National Science Foundation grant IIS 1251151.

achilles's People

Contributors

Stargazers

Watchers

Forkers

toanong callahantiff rwpark99 aguynamedryan ybittu seven-qi ruilingliu wistephens bmwh yocheved-levitan hestiri aartibagul twankim maruthven rchwang npatibandla0214 pedsnet hojunpark wojszko leeevans agprz sblyman thehyve evalytica rfollett haivietduong mark-velez hajarhomayouni aperotte maximmoinat t-abdul-basser tomwhite ltscomputingllc msuchard c4lm vojtechhuser mi-erasmusmc mdkmongo myounglai schuemie trinetx solventrix jduke99 paulheider odysseusinc accesstoamazing shasthojoy agackovka jamesswiggins twamarc alistairewj shrikantkunte parkdongsu spectrumhealthresearch mustafaascha tropicalberto trberg maoxuejie jatjasonthomas julianflowers rpeys joaorafaelalmeida cl3777 acumarav hongwonjun aparrot89 djb2188 gazzola sena1022 rfherrerac richard1933 ablack3 andreyyiv lopsided obonyojimmy tubbz-alt amikaili alondhe utopiazh lifebellai aandryc scossin mdsung woutv kchaudharigit rjiang9 tmswhite megverschoortbio ncittm mohdahsank001 seungsoo6360 galozins rs21io mhra jungsungjae lcpinheiro formedicode lav-patel hihipch tomwhite-medstar

achilles's Issues

Heel (DQ): add % of missing data for each data domain (Dx, Proc, Meas, Obs)

A paper by Kahn http://repository.academyhealth.org/egems/vol3/iss1/7/
outlines very nice 20 item list.

For item 13 bullet 1 - Achilles should add

% of rows in drug exposure table mapped to concept 0 (not mapped) (perhaps even stratified by year)

and similarly for all other tables like Dx, Proc, Measurement and Observations

Multiline reports by organizaiton, care site, and provider

Multiline reports that allow us to view stats at multiple levels would help identify outliers.

These are the most likely places where EHR customization and local practices will trip.

Error when running Achilles against CDM v5

I updated my R install with the latest Achilles and SQLRender packages in order run against a CDM v5 database instance. Everything appears to work fine until I attempt to export the JSON. At this point, the query appears to be referring to vocabularies via numerical ID a la CDM v4 (1) rather than name (SNOMED)

exportToJson(pgConnectionDetails,"public", "achilles_results", "/Users/wstephens/ext/OHDSI/AchillesWeb-master/data/Test")
| |
0%Error executing SQL: Error in .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", as.character(query)[1]): org.postgresql.util.PSQLException: ERROR: operator does not exist: character varying = integer
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 1075

An error report has been created at /Users/wstephens/errorReport.txt
Error in value[3L] : no loop for break/next, jumping to top level

ACHILLES HEEL - Flipped DISEASE_CLASS and REVENUE_CODE queries.

At the top you define the following analysis:
--1609-Number of records by disease_class_concept_id
--1610-Number of records by revenue_code_concept_id

DISEASE_CLASS_CONCEPT_ID typically falls in VOCABULARY 40-DRG and REVENUE_CODE_CONCEPT_ID falls in VOCABULARY 43-REVENUE_CODE.

However, you have these VOCABULARY_IDs switched when actually look for the issue in the CDM. So your REVENUE_CODE query is assigned to 1609 when is it should be 1610 and vice ver ca.

small prevalence concept files are not generated

In Achilles Web going to the table view of the condition report and clicking on a condition with only a few people in it will generate a 404 error as the file is not found. This is due to the exportToJSON function not generating all necessary files.

Schema identification suggestion

The documentation for Achilles and Achilles Web has commands that take in arguments for the "names of the schemas holding the CDM data and target results ". This may be the terminology for other DB's, but in SQL Server, these arguments are the DB Name, not necessarily the schema that the tables are in within that DB.

But the more significant issue is that the SQL scripts assume the tables are in the 'dbo' schema within that DB. 'dbo' is either hardcoded into the scripts or no schema is provided, which results in falling back on the users default schema. Within our organization it's specific policy not to use 'dbo', so our implementation uses other schema's for the CDM tables. As a result, I had to modify all those scripts to get our implementation running.

We may be the only ones who face this issue, but would suggest modifying those calls to accept a source and target DB Name AND a source and target schema (perhaps this could default to 'dbo') and adjust the SQL Scripts accordingly. It could make Achilles a bit more flexible.

where is the SQL code for Achilles Heel reports?

e.g.,
210-Number of visit records with invalid care_site_id;

Error on Analysis 116 in Oracle

I am getting this error when running analysis 116, which seems to be a key analysis for making the individual visualizations work for conditions, procedures, etc. Looks like a connection error but other analyses running fine in the same session with same ConnectionDetails.

Compatibility with upcoming OMOP CDMV5

Just wondering if there are plans to bring this tool into compatibility with the upcoming OMOP CDMV5, how long it might take, and if there's anything I can do to help.

Field missing causing Achilles to crash at 59% complete

I am trying to run Achilles on CDM V5. After following the installation instructions in this page, I managed to get until 59% complete and then I get this error (extracted from the error log)

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: column vo1.place_of_service_concept_id does not exist
Position: 114)

SQL:
insert into results_schema.ACHILLES_results (analysis_id, stratum_1, count_value)
select 200 as analysis_id,
vo1.place_of_service_CONCEPT_ID as stratum_1,
COUNT(distinct vo1.PERSON_ID) as count_value
from
visit_occurrence vo1
group by vo1.place_of_service_CONCEPT_ID

After looking at the CDM V5 fileds for the visit_occurrence table, the column place_of_service_CONCEPT_ID does not exist.

Any suggestions? or is there a different way to run the CDMV5 Achilles, other than the instructions found in this page.

Thanks!

Implement Device Exposure statistics

Need to identify the Statistics we wish to calculate.
Need to define the report layout to present statistics.

Postgres Missing CAST in Join

In the sqlAgeAtFirstDiagnosis.sql file there is a missing (CAST as INT) in a join condition (line 12) which causes the join to fail in Postgres 9.1 due to a type mismatch error. We believe we have fixed this in a fork. Should we move forward with committing this change?

Problem with generating Achilles results

Hi,

I saw an error trying to generate Achilles result. Here's the error:

Error:

execute JDBC update query failed in dbSendUpdate (ERROR: operator does not exist: timestamp without time zone + integer
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Position: 610)

SQL:
CREATE TEMP TABLE temp_dates
AS
SELECT
DISTINCT
EXTRACT(YEAR FROM observation_period_start_date)_100 + EXTRACT(MONTH FROM observation_period_start_date) AS obs_month,
TO_DATE(CAST(EXTRACT(YEAR FROM observation_period_start_date) AS varchar(4)) || RIGHT('0' || CAST(EXTRACT(MONTH FROM OBSERVATION_PERIOD_START_DATE) AS VARCHAR(2)), 2) || '01' , 'yyyymmdd') AS obs_month_start,
((TO_DATE(CAST(EXTRACT(YEAR FROM observation_period_start_date) AS varchar(4)) || RIGHT('0' || CAST(EXTRACT(MONTH FROM OBSERVATION_PERIOD_START_DATE) AS VARCHAR(2)), 2) || '01' , 'yyyymmdd') + 1_INTERVAL'1 month') + -1) AS obs_month_end
FROM
omop_rz.observation_period

What is the possible fix for this error?

Thank you in advance.

Global Temporary Table - Analysis 116

I'm getting an error on Analysis 116 as follows:

Error:
execute JDBC update query failed in dbSendUpdate (ORA-01031: insufficient privileges
)

SQL:
CREATE TEMPORARY TABLE temp_dates
ON COMMIT PRESERVE ROWS
AS
SELECT
distinct
EXTRACT(YEAR FROM observation_period_start_date) as obs_year

FROM

OBSERVATION_PERIOD

I think it is trying to create the temp table on the CDM4 schema rather than on my Results schema. I have don't have permissions to create tables on the CDM4 database, but of course can create on the Results database.

Can you change code for 116 (and any others than use temp tables) to make the table on @results_schema.dbo.[table] so that a user does not require CREATE privileges on their CDM to run the analysis?

STDDEV problem in analysis 105

I'm getting the following errorReport.txt when I try to run the achilles function:

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: function stddev(interval) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
  Position: 343)

SQL:
insert into public.ACHILLES_results_dist (analysis_id, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)^M
select 105 as analysis_id,^M
        COUNT(count_value) as count_value,^M
        min(count_value) as min_value,^M
        max(count_value) as max_value,^M
        avg(1.0*count_value) as avg_value,^M
        STDDEV(count_value) as stdev_value,^M
        max(case when p1<=0.50 then count_value else -9999 end) as median_value,^M
        max(case when p1<=0.10 then count_value else -9999 end) as p10_value,^M
        max(case when p1<=0.25 then count_value else -9999 end) as p25_value,^M
        max(case when p1<=0.75 then count_value else -9999 end) as p75_value,^M
        max(case when p1<=0.90 then count_value else -9999 end) as p90_value^M
from^M
(^M
select ( op1.observation_period_end_date - op1.observation_period_start_date) as count_value,^M
        1.0*(row_number() over (order by ( op1.observation_period_end_date - op1.observation_period_start_date)))/(COUNT(( op1.observation_period_end_date - op1.observation_period_start_date)) over ()+1) as p1^M
from PERSON p1^M
        inner join ^M
        (select person_id, ^M
                OBSERVATION_PERIOD_START_DATE, ^M
                OBSERVATION_PERIOD_END_DATE, ^M
                ROW_NUMBER() over (PARTITION by person_id order by observation_period_start_date asc) as rn1^M
                 from OBSERVATION_PERIOD^M
        ) op1^M
        on p1.PERSON_ID = op1.PERSON_ID^M
        where op1.rn1 = 1^M
) t1^M

sql server connection error - win authentication

I installed the package.

I get this error

connectionDetails <- createConnectionDetails(dbms="sql server", server="myserver",schema="CDM")
conn <- connect(connectionDetails)

Connecting using SQL Server driver using Windows integrated security
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication.

Even if I try domain/user with psw. I am not able to connect due to security settings. Where I can configure the JDBC driver used by R?

require(RODBC)
conn = odbcConnect("myserver")

Achiles probably requires JDBC. I could try to use JDBC ODBC bridge....

require(RJDBC) #tried this

drv<-JDBC("sun.jdbc.odbc.JdbcOdbcDriver") not found, does not work, did not try to install JDBC ODBC bridge solution

SQL server throws 'fatal error' while running Achilles

Hi,

Achilles runs fine for about 53% and then it errors out with the message 'sql server closed connection'. On further investigation, I found that the following statement is causing the error. Has anyone encountered this error and is there a way to fix this?

--{104 IN (@list_of_analysis_ids)}?{
-- 104 Distribution of age at first observation period by gender
insert into @results_database_schema.ACHILLES_results_dist (analysis_id, stratum_1, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)

select 104 as analysis_id,
gender_concept_id,
COUNT_BIG(count_value) as count_value,
min(count_value) as min_value,
max(count_value) as max_value,
avg(1.0*count_value) as avg_value,
stdev(count_value) as stdev_value,
max(case when p1<=0.50 then count_value else -9999 end) as median_value,
max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
max(case when p1<=0.90 then count_value else -9999 end) as p90_value

from
(
select p1.gender_concept_id,
year(op1.index_date) - p1.YEAR_OF_BIRTH as count_value,
1.0_(row_number() over (partition by p1.gender_concept_id order by year(op1.index_date) - p1.YEAR_OF_BIRTH))/(COUNT_BIG(_) over (partition by p1.gender_concept_id)+1) as p1
from
OMOP_CDM4.dbo.PERSON p1
inner join (select person_id, MIN(observation_period_start_date) as index_date from OMOP_CDM4.dbo.OBSERVATION_PERIOD group by PERSON_ID) op1 on p1.PERSON_ID = op1.PERSON_ID) t1
group by gender_concept_id

Thanks!
Paulina.

Achilles IDs have 2 contradictory sources (no single source for IDs and analysis name)

J&J provided list of analysisIDs is not the same as the one in the webAPI
here
http://api.ohdsi.org/WebAPI/cohortanalysis

Having a single source of IDs would be best. Especially now with Hercules Heel in the mix as well.

Issue with prevalence calculations in conditions, condition era and drug exposures

We have been working on setting up Achilles for the first time and there are a few of the treemap graphs that don’t display. Tracing it back I believe it is due to the prevalence for each row (Conditions, Condition Era, Drug Exposures) being calculated at 0%. The other treemaps based on prevalence get calculated and displayed correctly.

Digging into the sql itself (e.g. sqlConditionTreemap.sql) I can run the sql directly and it brings back the correct results and prevalence calculations but the JSON files appear to incorrect prevalence calculations from the results of the SQL statements. Has something like this been seen before?

For CDMv5, make vocabulary warnings associated with DOMAIN

currently ACHILLES HEEL warnings are looking for specific vocabularies. Instead, for CDMv5, they should check to see if a record has concept from a valid domain.

ExportToJSON by AnalysisID / Table

This may not be possible, but exporting to JSON is now only done in bulk. Can take several hours. When doing incremental additional analyses, would be nice to select export of only certain analyses(or all analyses for a given table) to JSON.

Sample Dataset in CDM v4

Is there a sample dataset that is CDM v4 that I can use Achilles with? The OMOP website provides OSIM and OSIM2 but they both appear to be using legacy CDMs.

docker-compose build fails due to openjdk-6-jdk

$ sudo docker-compose build

---> Running in 5cb42fa9dd4e
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Get:2 http://security.debian.org jessie/updates/main amd64 Packages [177 kB]
Get:3 http://http.debian.net sid InRelease [253 kB]
Ign http://http.debian.net jessie InRelease
Get:4 http://http.debian.net jessie-updates InRelease [142 kB]
Get:5 http://http.debian.net jessie Release.gpg [2,373 B]
Get:6 http://http.debian.net sid/main amd64 Packages [10.1 MB]
Get:7 http://http.debian.net jessie Release [148 kB]
Get:8 http://http.debian.net jessie/main amd64 Packages [9,035 kB]
Get:9 http://http.debian.net jessie-updates/main amd64 Packages [3,614 B]
Fetched 19.9 MB in 2s (7,089 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Package openjdk-6-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
apt

E: Package 'openjdk-6-jdk' has no installation candidate
Service 'achilles' failed to build: The command '/bin/sh -c apt-get update && apt-get install -y libssl-dev libcurl4-openssl-dev libxml2-dev openjdk-6-jdk && rm -rf /var/lib/apt/lists/* && R CMD javareconf' returned a non-zero code: 100

Have you seen this before during builds?

$ uname -a
Linux ip-54-40-17-252 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Provide support for new features in CDMv5

The current version of Achilles does not break when using CDMv5, but it also doesn't use any of the new features.

Most importantly, it is not outputting statistics on the Measurement table.Other immediately useful information would be the metadata that was introduced in V5. Currently less important (because nobody I know has the information) is reports on devices, specimen, notes, and information on different types of observation periods.

Add test_id to Achilles Heel generation

Currently, there is no way to uniquely identify an error for an analysis. For example, you may receive 2 errors for the same Analysis_Id, and while they have different text messages, you can't tell if they came from different tests. You also can not compare results between 2 different heel reports over time to determine if the same error occurred or was fixed.

We propose the following change to all Achilles heel queries:

First, add a column to Achilles_heel_results

CREATE TABLE ACHILLES_HEEL_results (
  test_id INT,
  analysis_id INT,
  ACHILLES_HEEL_warning VARCHAR(255)
);

Next, for each Achilles heel test:

--check for non-zero counts from checks of improper data (invalid ids, out-of-bound data, inconsistent dates)
INSERT INTO ACHILLES_HEEL_results (
        test_id, -- this is new
    analysis_id,
    ACHILLES_HEEL_warning
    )
SELECT DISTINCT 1 as test_id, -- hard code test ID for this test
    or1.analysis_id,
    'ERROR: ' + cast(or1.analysis_id as VARCHAR) + '-' + oa1.analysis_name + '; count (n=' + cast(or1.count_value as VARCHAR) + ') should not be > 0' AS ACHILLES_HEEL_warning
FROM ACHILLES_results or1
INNER JOIN ACHILLES_analysis oa1
    ON or1.analysis_id = oa1.analysis_id
WHERE or1.analysis_id IN (
        7,
        8,
        9,
        114,
        115,
        207,
        208,
        209,
        210,
        302,
        409,
        410,
        411,
        412,
        413,
        509,
        510,
        609,
        610,
        612,
        613,
        709,
        710,
        711,
        712,
        713,
        809,
        810,
        812,
        813,
        814,
        908,
        909,
        910,
        1008,
        1009,
        1010,
        1415,
        1500,
        1501,
        1600,
        1601,
        1701
        ) --all explicit counts of data anamolies
    AND or1.count_value > 0;

With a test_id, we can then determine if a test was resolved between heel reports.

Problem with Analysis 606

I have run Analysis 606 several times and it does not make it to the Results table. But I do get this ERROR in Achilles HEEL:

ERROR: Distribution of age by procedure_concept_id; min should not be negative

Is this a "fatal" data problem then that prevents completion of the query? I want to sleuth out the cause (assuming something to do with birthdates). But wondering if ACHILLES could still complete output even with this error.

Can I use just one schema for input data and also for results?

There is a key step in the process that looks like this.
achillesResults <- achilles(connectionDetails, "cdm4_inst", "results")

Will the script crash if I have it point to the same special schema
achillesResults <- achilles(connectionDetails, "OMOPsandbox", "OMOPsandbox")?

I think it should not but I want to be sure before I request from my dbadmin the necessary users and schemas.

Postgres CAST Errors

In the TestAchillesCode.R we noticed that there is code written to test Achilles on PostgreSQL, but when we tried to use Achilles with PostgreSQL 9.1 data we ran into CAST errors. The Postgres ROUND function expects ROUND(numeric,int) but there are two instances of ROUND(CAST as FLOAT, int) which causes PG to throw an illegal data type error. When changed to ROUND(CAST as NUMERIC(8,6), int), these statements work correctly. We believe we have fixed all of these in a fork. Should we move forward with committing these changes?

Unable to connect Achilles to SQL server

I've copied both drivers jtds-1.2.7.jar and sqljdbc4.jar to the C:\Program Files\R\R-3.2.0\library\DatabaseConnector\java directory. Can anyone help in figuring out how to resolve this issue?

SQL Server authentication:

connectionDetails <- createConnectionDetails(dbms="sql server", server="localhost", schema="OMOP", user="xyz",pasword="123")
conn <- connect(connectionDetails)

Connecting using SQL Server driver
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: I/O Error: DB server closed connection.

Windows authentication: The error message received is
Connecting using SQL Server driver using Windows integrated security
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication. ClientConnectionId:4434edbb-347f-4373-b285-3c44636caeb9

Thanks!

Set up automated Docker builds on the registry

This would automatically build the docker image of Achilles every time the repo is updated, using the recently added Dockerfile, and store that image on the Docker Hub service (the registry). The advantages are two-fold:

Automatic monitoring of build process and notification of failures.
Removal of the local build step when running Achilles using Docker.

Dashboard - Persons with Continuous Observation By Month - Last Month Missing

In my TRUVEN MDCD data, the raw data ends 12/31/2012 however according to this chart no one has continuous observation time in 12/2012. The last month available on this chart is 11/2012 (see image).

The following query pulls back 6,743,855 people in our CDM.
SELECT COUNT(DISTINCT PERSON_ID)
FROM OBSERVATION_PERIOD op
WHERE op.OBSERVATION_PERIOD_START_DATE <= '12/01/2012'
AND op.OBSERVATION_PERIOD_END_DATE >= '12/31/2012'

Document process for proposing new statistics/reports to Achilles

There is interest in the community to add new queries/statistics/reports to Achilles, but not a lot of information about how this should be done.

Propose new wiki pages to guide through this process, possibly with videos.

Achilles shows last dataset but current label if the data folder is incorrect

I noticed this behavior today when I was adding a new dataset to review using Achilles directly in Firefox. Note that I am manually configuring Achilles by editing the datasources.json file. The new dataset was a sub-population of one I had already configured so I expected a much smaller number of patients. However, even though Achilles showed the new datasets name, the data it showed was from the larger dataset that I had viewed earlier in the morning. I found out that I had misspelled the data folder in datasources.json. Once corrected, I refreshed Achilles from the browser and the correct data was shown.

Obviously, this is fundamentally an "operator error" however, I think that Achilles behavior when the data folder name is incorrect should be to report an error or show no data at all so that users don't accidentally review a previously viewed dataset thinking that it's the new one.

thanks for all the great work!
-R

Analysis Id input parameter

Currently, AchillesWeb does not support displaying only a subset of reports for a given Table. But it is difficult to know which Analysis_Ids apply to a given table report. Thus it would be helpful to have a parameter in achilles where you enter the table you want to analyze (e.g., Person) and it runs all those reports.

achilles(connectionDetails, cdmSchema, resultsSchema, sourceName = "Test", TableToAnalyze="Person" createTable = TRUE, smallcellcount = 5)

myTables=c("Person","Condition_Occurrence)

achilles(connectionDetails, cdmSchema, resultsSchema, sourceName = "Test", TablesToAnalyze=myTables)

How to submit new analyses for Achilles (which SQL flavor to code)

In Achilles.sql file thare are many analyses. (most likely in SQL server flavor) (or what flavor exactly)
These get translated to postgres flavor using R package.

I would like to contribute new analyses. Probably some semi-final SQL code that gets incorporated by core developer.

For authoring new analyses, in which sql flavor should I author them?
(I need rounding of real number to integer and then to multiple of 5)
e.g., 21 rounded to 20
7 rounded to 5
6.3 rounded to 5

so it will be like x- (x % 5) (modulo function)

How sophisticated is the flavor translation?

Pre-release testing on R running on Ubuntu, nursing home dataset

Dear Achilles Team,

Please forgive me for taking the liberty to do some "pre-release" testing on your code. The truth is, I am working on an ETL for a nursing home dataset and could not resist the temptation tao test Achilles because I think that its summary reports could help me more rapidly see the influence of some of the decisions I have to make while loading the dataset.

In case its helpful, I have written below a brief account of my experience with installing and running the tool in its current state:

Installation
I installed Achilles and its dependencies on a machine running Ubuntu Precise (12.04). I had to upgrade my R from 2.x to 3.1. The instructions here were helpful (http://stackoverflow.com/questions/10476713/how-to-upgrade-r-in-ubuntu)

The dependencies that I noted for Achilles were:
-- Rcpp (Cran)
-- RJDBC (Cran)
-- rJava (Cran)
-- rjson (Cran)
-- DBI (Cran)
-- SqlRender (OHDSI)
-- DatabaseConnector (OHDSI)

Most of these installed from the command line with no issues. For Cran, I installed in R running as root like so:

# install.packages("Rcpp", lib="/usr/local/lib/R/site-library/")

For OHDSI packages, I installed like so:

sudo R CMD INSTALL SqlRender/ -l /usr/local/lib/R/site-library/

In SqlRender I had to comment out the include for _mingw.h present in SqlTranslate.cpp and SqlSplit.cpp. This seems to be a Windows compatability requirement? Another issue was that there was an include in SqlRender.h that seemed spelled incorrectly:

/*#include "SQLRender.h"*/
#include "SqlRender.h"

Finally, I installed Achilles:

sudo R CMD INSTALL Achilles/ -l /usr/local/lib/R/site-library/

Running Achilles
After installation, I followed the example provided in the help page and figured out how to connect to the DB with my nursing home data

> library("Achilles")
> connectionDetails <- createConnectionDetails(dbms="oracle",user="...",password="...",server="...",schema="DIKB_DEV")
> achillesResults <- achilles(connectionDetails, "DIKB_DEV", "DIKB_DEV", "TestDB")
Connecting using Oracle driver
Executing multiple queries. This could take a while
  |======================================================================| 100%
Analysis took 16.4 secs
Done. Results can now be found in DIKB_DEV

The database had new tables loaded with data and R held a loaded achillesResults dataframe.

I then tried to run 'plot' per the example in the help page but ran into an error:

> plot(achillesResults, "population")
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ

Not sure what happened here - any ideas?

Next, I wanted to play with AchillesWeb. I figured the exportToJson function was important for generating data that the web app could load so gave it a try. It almost worked:

> exportToJson(connectionDetails, "DIKB_DEV", "DIKB_DEV")
Warning: folder /home/PITT/rdb20/GeriOMOP already exists
Connecting using Oracle driver
Generating person reports
  |======================================================================| 100%
Generating observation period reports
  |======================================================================| 100%
Generating condition treemap
  |======================================================================| 100%
Generating condition reports
  |                                                                      |   0%

Error executing SQL: Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result...

The error file held the SQL query that failed (below). Running it against the DB yielded the following error:

ORA-01722: invalid number
01722. 00000 -  "invalid number"
*Cause:    
*Action:

Its no clear to me what happened. One issue I had with OSIM2 that might be relevant here is that the nursing home population does not have certain age groups. This can trip up code that assumes all age groups will have some members. Anything else I can test to help find out whats going on?

SQL Query that failed:

DBMS:
oracle

Error:
Unable to retrieve JDBC result set for select c1.concept_id as condition_concept_id, 
       c1.concept_name as condition_concept_name,
       c2.concept_group_id as concept_id,
       c2.concept_group_name as concept_name, 
       sum(ar1.count_value) as count_value
from ACHILLES_results ar1
       inner join
       DIKB_DEV.concept c1
       on ar1.stratum_1 = c1.concept_id
       inner join
       (
       select concept_id,
             case when concept_name like 'Inpatient%' then 10
                    when concept_name like 'Outpatient%' then 20
                    else concept_id end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 1
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 2
                    else 0 end as concept_group_id,
             case when concept_name like 'Inpatient%' then 'Claim- Inpatient: '
                    when concept_name like 'Outpatient%' then 'Claim- Outpatient: '
                    else concept_name end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 'Primary diagnosis'
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 'Secondary diagnosis'
                    else '' end as concept_group_name
       from DIKB_DEV.concept
       where vocabulary_id = 37

       ) c2
       on ar1.stratum_2 = c2.concept_id
where ar1.analysis_id = 405
group by c1.concept_id, 
       c1.concept_name,
       c2.concept_group_id,
       c2.concept_group_name (ORA-01722: invalid number
)

SQL:
select c1.concept_id as condition_concept_id, 
       c1.concept_name as condition_concept_name,
       c2.concept_group_id as concept_id,
       c2.concept_group_name as concept_name, 
       sum(ar1.count_value) as count_value
from ACHILLES_results ar1
       inner join
       DIKB_DEV.concept c1
       on ar1.stratum_1 = c1.concept_id
       inner join
       (
       select concept_id,
             case when concept_name like 'Inpatient%' then 10
                    when concept_name like 'Outpatient%' then 20
                    else concept_id end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 1
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 2
                    else 0 end as concept_group_id,
             case when concept_name like 'Inpatient%' then 'Claim- Inpatient: '
                    when concept_name like 'Outpatient%' then 'Claim- Outpatient: '
                    else concept_name end  
                    +
                    case when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name like '%primary%' or concept_name like '%1st position%') then 'Primary diagnosis'
                    when (concept_name like 'Inpatient%' or concept_name like 'Outpatient%' ) and (concept_name not like '%primary%' and concept_name not like '%1st position%') then 'Secondary diagnosis'
                    else '' end as concept_group_name
       from DIKB_DEV.concept
       where vocabulary_id = 37

       ) c2
       on ar1.stratum_2 = c2.concept_id
where ar1.analysis_id = 405
group by c1.concept_id, 
       c1.concept_name,
       c2.concept_group_id,
       c2.concept_group_name

Heel: Analysis-SUB-IDs

When Heel uses previous data, it checks for various variations, but the variation is not assigned a subID.

It is hard to identify if the problem is events prior birth or events after death.
They all get the same analysisID 101

101 <0
101 >100

This is quite key quality indicator and if it has the same ID, it is not good for tracking quality.

See the problem here:

filter(dsets,msgID==101)
Source: local data frame [9 x 2]

msgID msg
1 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,869)
2 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=21,924)
3 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=10,836)
4 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=42)
5 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=6)
6 101 Number of persons by age (n=338)
7 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=8,549)
8 101 Number of persons by age, with age at first observation period; should not have age < 0, (n=1,100)
9 101 Number of persons by age, with age at first observation period; should not have age > 100, (n=15)

Achilles Heel Output - More Information on Specific Issues on Issue Prevelance

Right now the Achilles Heel report only gives counts at the CONCEPT_ID level. Example:

ERROR: 706 - Distribution of age by observation_concept_ID (count = 16); min value should not be negative.

The count 16 is on the CONCEPT_ID, but that doesn't tell you how prevalent the issue is. It would be nice to know how many records and how many people it affects.

This is a feature requested by the Erasmus folks.

feature suggestion: increase number of rows from 5 to 15

Is there a performance reason why only 5 rows are displayed in most reports. (e.g., conditions, drugs, procedures)?

Can this be increased in future v5based version?

(Thanks to Achilles install in IMEDS Lab - I can now provide much better feedback)

error during install

Installing the package package below fails. I am on Win7 in RStudio.
The other 2 packages install fine.

Maybe other users see the same error....

Here is the error

install_github("ohdsi/SqlRender")

Installing SqlRender
"C:/PROGRA~~1/R/R-31~~1.0/bin/x64/R" --vanilla CMD INSTALL
"C:\Users\huserv\AppData\Local\Temp\1\Rtmp4OBuft\devtools38a45b0c7062\SqlRender-master"
--library="C:/Users/huserv/Documents/R/win-library/3.1" --install-tests

installing source package 'SqlRender' ...
** libs

*** arch - i386
Warning: running command 'make -f "Makevars.win" -f "C:/PROGRA~~1/R/R-31~~1.0/etc/i386/Makeconf" -f "C:/PROGRA~~1/R/R-31~~1.0/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="SqlRender.dll" OBJECTS="RcppExports.o RcppWrapper.o SqlRender.o SqlSplit.o SqlTranslate.o stringUtilities.o"' had status 127

ERROR: compilation failed for package 'SqlRender'

removing 'C:/Users/huserv/Documents/R/win-library/3.1/SqlRender'
Error: Command failed (1)

Analysis 406 and 1006 - Same Error Msg, Different Queries

Both Analysis_ID 406 and 1006 have the same error message:

"ERROR: Distribution of age by condition_concept_id; min should not be negative"

However, one is off the CONDITION_ERA and the other is off the CONDITION_OCCURRENCE. Impossible tell which is which from the error message.

This is the same for 906 vs 706 ("Distribution of age by drug_concept_id")

Add export modes

Introduce exportToJSON options for generating only missing files vs generating all files and overwriting existing files.

New analysis suggestion - count of LV5, LV10, LV15 and LV20 patients

I have a new analysis. Is there a list of all SQL queries Achilles runs - I want to assign an ID to that new SQL

Is there a documentation on what output structure I should use for output of new analyses (new table ,existing table)

The analysis I want to add is this SQL query: (in RedShift flavor)

(the query gets first visit and last visit for each patient, then rounds it to span years and arrives at span rounded to nearest lower multiple of 5 (substracts modulo results). It then counts number of patients in each bin.

SET SEARCH_PATH TO ccae_cdm4;
select v5span years_span, count(*) from (
            select person_id
            --, fdt, ldt,(ldt-fdt) ,(ldt-fdt)/365.25 dif, cast((ldt-fdt)/365.25 as integer) ydif
            ,cast((ldt-fdt)/365.25 as integer)- mod(cast((ldt-fdt)/365.25 as integer),5) v5span
            from(
                    select person_id, min(e.visit_start_date) fdt, max(e.visit_end_date) ldt
                    from visit_occurrence e   
                    group by person_id
            )
)
 group by v5span order by v5span;

The output gives numbers like this:

0       99,110,955 
5       12,406,560 
10      676,189

WHich is for CCAE and it means there are 99M patients with visit span of 0-5 years
12M patients with visit span 5-10 years and no patients with more than 15 years time span.

It is similar to existing analysis, but that is not a set of 4 numbers but a graph.

It is based on this work:
https://github.com/OHDSI/sandbox/tree/master/CDMv4/analytical-SQL-code/vojtech_huser/IRIS

For CDMv5 MEASUREMENT and OBSERVATION table, count persons with each VALUE_AS_CONCEPT_ID

for lab tests, it can be useful to summarize what concept values are observed for a given test.

for observations, like family history, the list of value_as_concept_id could be fairly large, so we need to determine the most effective visual representation (maybe treemap/table view)

JDBC Query Failed: function left(integer, integer) does not exist

Hi everyone,

I'm trying to run the Achilles R code on a local postgresql database, and I'm finding an error recurring at 81% completion. The error file text is

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: function left(integer, integer) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 112)

SQL:
insert into testschema.ACHILLES_results (analysis_id, stratum_1, count_value)
select 1100 as analysis_id,
left(l1.zip,3) as stratum_1, COUNT(distinct person_id) as count_value
from PERSON p1
inner join LOCATION l1
on p1.location_id = l1.location_id
where p1.location_id is not null
and l1.zip is not null

group by left(l1.zip,3)

Any suggestions for how I can resolve this?

Thank you very much for your help!

Gratefully,
Trevor

Potential exportToJson.R Error

When trying to run the exportToJson.R code, an error message kept occurring. In looking into the code within this file, we noticed that the allReport functions appear to be ordered alphabetically. We think that our error was generated as a result of this ordering, specifically, the generateDashboardReport function occurs before the generatePersonReport, but it seems that the DashboardReport is dependent upon thePersonReport functioning having already been ran. When we moved the generateDashboardReport function to the end of the file we were able to successfully run the code (although we are unsure if the output is correct). Could there be another reason as to why we are experiencing this error?

ExportToJSON - Observation Period and Data Density

I am having problems getting proper JSON output for AchillesWeb for Observation Period and Data Density. I have reviewed my Achilles Results for the relevant analyses and all seem to be present and accounted for. But the JSON file produced is basically blank (ObservationPeriod) or extremely weird (DataDensity).

I am showing a screenshot from my data density page, notice the suspicious x-axis for Concept Type (P r o c e d u r). Something is amiss. Has anyone else got this fully working in Oracle?

See problem here - data set 6 used probably an earlier version of Heel.