genepi / imputationserver-docker Goto Github PK
View Code? Open in Web Editor NEWDocker Image for Michigan Imputation Server
Docker Image for Michigan Imputation Server
Hi, i am getting the below error in the logs:
job.txt:
22/11/17 19:07:56 Executing Job installation....
22/11/17 19:07:57 Preparing Application 'Genotype Imputation (Minimac4)'...
22/11/17 19:07:57 Installing Application [email protected]...
22/11/17 19:08:00 Installation finished.
22/11/17 19:08:00 Preparing Application 'HapMap 2'...
22/11/17 19:08:01 Installing Application [email protected]...
22/11/17 19:08:16 Installation finished.
22/11/17 19:08:16 Executing Job setups....
22/11/17 19:08:17 Planner: WDL evaluated.
22/11/17 19:08:17 Planner: DAG created.
22/11/17 19:08:17 Nodes: 3
22/11/17 19:08:17 Input Validation
22/11/17 19:08:17 Inputs:
22/11/17 19:08:17 Outputs:
22/11/17 19:08:17 Quality Control
22/11/17 19:08:17 Inputs:
22/11/17 19:08:17 Outputs: mafFile chunkFileDir statisticDir
22/11/17 19:08:17 Quality Control (Report)
22/11/17 19:08:17 Inputs: mafFile myseparator
22/11/17 19:08:17 Outputs: qcreport
22/11/17 19:08:17 Dependencies: 2
22/11/17 19:08:17 Input Validation->Quality Control
22/11/17 19:08:17 Quality Control->Quality Control (Report)
22/11/17 19:08:17 Executor: execute DAG...
22/11/17 19:08:17 ------------------------------------------------------
22/11/17 19:08:17 Input Validation
22/11/17 19:08:17 ------------------------------------------------------
22/11/17 19:08:17 Versions:
22/11/17 19:08:17 Pipeline: michigan-imputationserver-1.4.1
22/11/17 19:08:17 Imputation-Engine: minimac4-1.0.2
22/11/17 19:08:17 Phasing-Engine: eagle-2.4
22/11/17 19:08:17 Configuration file '/data/apps/imputationserver/1.4.1/job.config' not available. Use default values.
22/11/17 19:08:28 Input Validation [10 sec]
22/11/17 19:08:28 ------------------------------------------------------
22/11/17 19:08:28 Quality Control
22/11/17 19:08:28 ------------------------------------------------------
22/11/17 19:08:28 Configuration file '/data/apps/imputationserver/1.4.1/job.config' not available. Use default values.
22/11/17 19:08:28 Reference Panel Ranges: genome-wide
22/11/17 19:09:14 Quality Control [45 sec]
22/11/17 19:09:14 Exporting parameter statisticDir...
22/11/17 19:09:14 Added 3 downloads.
22/11/17 19:09:14 Added 0 custom downloads.
22/11/17 19:09:14 ------------------------------------------------------
22/11/17 19:09:14 Quality Control (Report)
22/11/17 19:09:14 ------------------------------------------------------
22/11/17 19:09:14 Running script /data/apps/imputationserver/1.4.1/qc-report.Rmd...
22/11/17 19:09:14 Working Directory: /data/apps/imputationserver/1.4.1
22/11/17 19:09:14 Output: /data/jobs/job-20221117-190738-049/qcreport/qcreport.html
22/11/17 19:09:14 Parameters:
22/11/17 19:09:14 /data/jobs/job-20221117-190738-049/temp/mafFile/mafFile
22/11/17 19:09:14 Creating RMarkdown report from /data/apps/imputationserver/1.4.1/qc-report.Rmd...
22/11/17 19:09:15 Quality Control (Report) [ERROR]
22/11/17 19:09:15 Exporting parameter qcreport...
22/11/17 19:09:15 Added 0 downloads.
22/11/17 19:09:15 Added 0 custom downloads.
22/11/17 19:09:15 Executing onFailure...
22/11/17 19:09:15 ------------------------------------------------------
22/11/17 19:09:15 Send Notification on Failure
22/11/17 19:09:15 ------------------------------------------------------
22/11/17 19:09:15 Configuration file '/data/apps/imputationserver/1.4.1/job.config' not available. Use default values.
22/11/17 19:09:15 Send Notification on Failure [ERROR]
22/11/17 19:09:15 onFailure execution failed.
22/11/17 19:09:15 Job execution failed: Job Execution failed.
22/11/17 19:09:15 Cleaning up...
22/11/17 19:09:15 Cleaning up uploaded local files...
22/11/17 19:09:15 Cleaning up temporary local files...
22/11/17 19:09:15 Cleaning up temporary hdfs files...
22/11/17 19:09:15 Cleaning up hdfs files...
22/11/17 19:09:15 Cleanup successful.
std.out:
Export parameter 'statisticDir'...
Error: package or namespace load failed for 'graphics' in registerS3methods(nsInfo$S3methods, package, env):
read failed on /usr/lib/R/library/graphics/R/graphics.rdb
Export parameter 'qcreport'...
No action required. Email notification has been disabled in job.config```
We encountered a problem with running jobs on the Michigan Imputation Server locally.
We followed the steps from the README file and successfully ran the Docker image.
We tried both Hapmap2 and 1000 Genomes Phase 3, but in both cases got the following message:
Configuration file '/data/apps/imputationserver/1.5.7/job.config' not available. Use default values.
We have also tried 1.4.1 and 1.5.7 Genotype Imputation (Minimac4) versions, but we got the same error.
Just to note that the files we used as an input for the locally run imputation job were imputed using the Michigan Server UI.
Full log:
job.txt:
21/01/29 09:38:28 Executing Job installation....
21/01/29 09:38:28 Preparing Application 'Genotype Imputation (Minimac4)'...
21/01/29 09:38:28 Application 'Genotype Imputation (Minimac4)'is already installed.
21/01/29 09:38:28 Preparing Application 'HapMap 2'...
21/01/29 09:38:28 Application 'HapMap 2'is already installed.
21/01/29 09:38:28 Executing Job setups....
21/01/29 09:38:28 Planner: WDL evaluated.
21/01/29 09:38:28 Planner: DAG created.
21/01/29 09:38:28 Nodes: 3
21/01/29 09:38:28 Input Validation
21/01/29 09:38:28 Inputs:
21/01/29 09:38:28 Outputs:
21/01/29 09:38:28 Quality Control
21/01/29 09:38:28 Inputs:
21/01/29 09:38:28 Outputs: mafFile chunkFileDir statisticDir
21/01/29 09:38:28 Quality Control (Report)
21/01/29 09:38:28 Inputs: mafFile myseparator
21/01/29 09:38:28 Outputs: qcreport
21/01/29 09:38:28 Dependencies: 2
21/01/29 09:38:28 Input Validation->Quality Control
21/01/29 09:38:28 Quality Control->Quality Control (Report)
21/01/29 09:38:28 Executor: execute DAG...
21/01/29 09:38:28 ------------------------------------------------------
21/01/29 09:38:28 Input Validation
21/01/29 09:38:28 ------------------------------------------------------
21/01/29 09:38:28 Versions:
21/01/29 09:38:28 Pipeline: michigan-imputationserver-1.5.7
21/01/29 09:38:28 Imputation-Engine: minimac4-1.0.2
21/01/29 09:38:28 Phasing-Engine: eagle-2.4
21/01/29 09:38:28 Configuration file '/data/apps/imputationserver/1.5.7/job.config' not available. Use default values.
21/01/29 09:38:28 Input Validation [ERROR]
21/01/29 09:38:28 Executing onFailure...
21/01/29 09:38:28 ------------------------------------------------------
21/01/29 09:38:28 Send Notification on Failure
21/01/29 09:38:28 ------------------------------------------------------
21/01/29 09:38:28 Configuration file '/data/apps/imputationserver/1.5.7/job.config' not available. Use default values.
21/01/29 09:38:28 Send Notification on Failure [ERROR]
21/01/29 09:38:28 onFailure execution failed.
21/01/29 09:38:28 Job execution failed: Job Execution failed.
21/01/29 09:38:28 Cleaning up...
21/01/29 09:38:28 Cleaning up uploaded local files...
21/01/29 09:38:28 Cleaning up temporary local files...
21/01/29 09:38:28 Cleaning up temporary hdfs files...
21/01/29 09:38:28 Cleaning up hdfs files...
21/01/29 09:38:28 Cleanup successful.
std.out:
No action required. Email notification has been disabled in job.config
CC: @enazunic
The hosted Michigan Imputation Server provides the endpoint:
https://imputationserver.sph.umich.edu/api/v2/jobs/submit/imputationserver2-pgs
It doesn't seem to be provided by https://apps.cloudgene.io/ and I can't find mention of it on Github. Where can I install this app?
I'm trying to limit the resource usage per job so I can optimize the total run time for two or more jobs. How do I do that? I start the impute server like so: docker run -t -p 8080:80 -e DOCKER_CORES="16" -v $(pwd):/data/ --name imputeserver-16cores genepi/imputationserver
And I start each job like so: docker exec -t -i imputeserver-16cores cloudgene run imputationserver --files /data/input.vcf.gz --refpanel apps@hapmap2 --conf /etc/hadoop/conf
I figured I could perhaps change the settings in the files in /etc/hadoop/conf
, put them locally and change --conf
to point to those files in e.g /data/conf
, and so far changing the values in mapred-site.xml
doesn't affect the number of created map/reduce tasks. Should it work?
I also see in the terminal output that "/data/apps/imputationserver/job.config" is unavailable, perhaps that's where I can define the number of map/reduce tasks per job? I've looked around but haven't found any documentation about it so I don't know what it does.
I've downloaded the chr X or 1000g reference again because there is no chrX in the folder downloaded from https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-2.0.0.zip.
Nevertheless, I found some issues for imputation of chrX using docker image of Michigan Imutation Server.
The sever cannot find reference information of chrX.
[ERROR] Minimac reference panel cloudgene/apps/[email protected]/2.0.0/m3vcfs/X.nonPAR.1000g.Phase3.v5.With.Parameter.Estimates.m3vcf.gz not found.
Job chr_X.nonPAR (null) failed.
Wrong matching between my dataset and reference for chrX
I've already checked my data using 1000 genome reference before imputation (https://www.well.ox.ac.uk/~wrayner/tools/).
I uploaded QCed input files.
I used "EAS" population and selected the same population of 1000 genome data for the imputation.
However, the qcreport reported wrong information.
Most of mismatched frequencies were for chrX.
I think the server did not use the chrX of EAS population.
I checked the frequencies of the variants reported in the qcreport, but the frequences were not those of EAS population.
I attached my qcreport and log files from the docker server.
Please check them.
This might be more related to the genepi/cdh5-hadoop-mrv1 Dockerfile, but I had to manually symlink default-java -> java-8-openjdk-amd64 in /usr/lib/jvm
for Hadoop jobs to complete successfully. In the docker image, default-java points to java-7-openjdk-amd64.
Hi, I'm running my own imputation server using the docker image (thanks for making it possible!!) and I'm seeing that the download speed at the end is rather slow, pegged at around 50Mb/sec. since the "download" is local, I'm assuming that there's some sort of throttling going on and I was wondering if I could turn it off somehow....
Please advise,
Thanks!
Getting the error below when running the test example in docker. Any suggestions?
2018-03-10 23:53:25,764 [pool-3-thread-2] INFO cloudgene.mapred.jobs.CloudgeneJob - Installation of application hapmap2 finished.
2018-03-10 23:53:25,764 [pool-3-thread-2] INFO cloudgene.mapred.jobs.AbstractJob - Job job-20180310-235325-575: executing setups...
2018-03-10 23:53:25,765 [pool-3-thread-2] ERROR cloudgene.mapred.jobs.AbstractJob - Job job-20180310-235325-575: execution failed. Unable to find resource '/data/apps/imputationserver/minimac4.yaml'
2018-03-10 23:53:25,794 [pool-3-thread-2] INFO cloudgene.mapred.jobs.AbstractJob - Job job-20180310-235325-575: cleanup successful.
2018-03-10 23:53:25,794 [pool-3-thread-2] INFO cloudgene.mapred.jobs.WorkflowEngine - Input Validation for job job-20180310-235325-575 finished. Result: false
I analyzed by following steps.
1. docker run -d -p 4000 genepi/imputationserver:v1.2.7
2. access http://localhost:4000
3. install 1000genomes-phase3 reference panel in Admin Panel
4. local download and make m3vcf and bcf files for Chr X
5. Insert the m3vcf and bcf file into /data/apps/1000g-phase-3-v5/2.0.0 on genepi/imputationserver:v1.2.7
6. docker exec -t -i ImputationServer cloudgene run [email protected] --files /mnt/$PWD/nonPAR.vcf.gz --refpanel apps@[email protected] --conf /etc/hadoop/conf --population eas &> docker.log
In this case, I can see this log.
Please let me know how to fix
Thanks
Hello,
I would like to know if the 1000G Phase 3 30x (GRCh38/hg38) [BETA] reference panel zip files are or will be available.
If not, could you please provide some guidelines on how the pre-processing steps should be done (filtering, etc.) in order to replicate them?
We created a reference panel following MIS documentation using the vcfs available here, however we are not reaching the same performance in a local docker installation.
Thank you in advance for the information!
Hello,
Thanks for setting up your popular imputation server as a docker image! it's a great service and it's good to be able to use it in settings that require complete "privacy" for leagal reasons.
As I have been trying out the image, I ran it in non-daemon mode (i.e. without -d) and saw that the startup logs contains some errors and warnings. I wanted to be sure that they are OK:
org.h2.jdbc.JdbcSQLSyntaxErrorException: Syntax error in SQL statement "
ALTER TABLE DOWNLOADS MODIFY[*] PARAMETER_ID INTEGER;
CREATE INDEX IDX_DOWNLOADS_PARAMETER_ID ON DOWNLOADS(PARAMETER_ID);
CREATE INDEX IDX_PARAMETER_JOB_ID ON PARAMETER(JOB_ID,INPUT);
CREATE INDEX IDX_STEPS_JOB_ID ON STEPS(JOB_ID);
CREATE INDEX IDX_LOG_MESSAGES_STEP_ID ON LOG_MESSAGES(STEP_ID);
CREATE INDEX IDX_JOB_USER_ID ON JOB(USER_ID,STATE);
"; expected "., ADD, SET, RENAME, DROP, ALTER"; SQL statement:
-- 2.0.0-rc3
alter table downloads modify parameter_id INTEGER;
create index idx_downloads_parameter_id on downloads(parameter_id);
create index idx_parameter_job_id on parameter(job_id,input);
create index idx_steps_job_id on steps(job_id);
create index idx_log_messages_step_id on log_messages(step_id);
create index idx_job_user_id on job(user_id,state);
[42001-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:453)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
at org.h2.message.DbException.getSyntaxError(DbException.java:243)
at org.h2.command.Parser.getSyntaxError(Parser.java:1053)
at org.h2.command.Parser.parseAlterTable(Parser.java:7705)
at org.h2.command.Parser.parseAlter(Parser.java:6983)
at org.h2.command.Parser.parsePrepared(Parser.java:887)
at org.h2.command.Parser.parse(Parser.java:843)
at org.h2.command.Parser.parse(Parser.java:819)
at org.h2.command.Parser.prepareCommand(Parser.java:738)
at org.h2.engine.Session.prepareLocal(Session.java:657)
at org.h2.engine.Session.prepareCommand(Session.java:595)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1235)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:76)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:352)
at org.apache.commons.dbcp.DelegatingConnection.prepareStatement(DelegatingConnection.java:281)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:313)
at genepi.db.DatabaseUpdater.executeSQLFile(DatabaseUpdater.java:308)
at genepi.db.DatabaseUpdater.readAndPrepareSqlClasspath(DatabaseUpdater.java:268)
at genepi.db.DatabaseUpdater.update(DatabaseUpdater.java:132)
at genepi.db.DatabaseUpdater.updateDB(DatabaseUpdater.java:98)
at cloudgene.mapred.Main.runCloudgene(Main.java:165)
at cloudgene.mapred.cli.StartServer.run(StartServer.java:51)
at genepi.base.Tool.start(Tool.java:197)
at genepi.base.Toolbox.start(Toolbox.java:44)
at cloudgene.mapred.CommandLineInterface.main(CommandLineInterface.java:65)
I notice that the imputationserver version specified is 1.5.7 but the latest version is 1.7.3. Is the docker image intentionally pinned at 1.5.7? I.e. does it represent a version recommendation based on stability or some other consideration?
Hello support,
I am attempting to use the Michigan Imputation Docker server from an isolated EC2 instance and I am running into some issues. Note I have been following the steps here: https://github.com/genepi/imputationserver-docker and I started the server with this command:
$ sudo docker run -d -p 8080:80 -v /mnt/data/tools/michigan_imputation_server:/data/ genepi/imputationserver:v1.4.1
The first issue I am having is when I follow the instructions to add the 1000 Genome reference, the Admin Applications tab is blank. Is this an issue with running in an isolated environment? Or do I need to put the reference in a location that the docker application can see the reference?
The second issue I had while trying to execute with the already available HapMap reference. The job seemed to die after a few hours but with no outputs (no logs) and only created the job folder containing the input file thus I'm guessing the program did not have enough resources to execute properly. Would you be able to point me to a resources requirements description or file? I tried running on a t2.xlarge (4CPUs with 16GBs of ram). I would really appreciate it if you are able to provide a recommended AWS instance type or the resource requirements.
I'm testing the docker image and when I run a test the report says that the geneplotter package is missing. Could you install it?
It seems to make 0 progress and is stuck at the modal "Install application. Please wait while the application is configured." Then an error pops up after a long time saying "zip headers not found. probably not a zip file" (same as in docker logs).
docker logs --timestamps adefde2f84c3
shows the full stacktrace
full logs: https://gist.github.com/liezl200/4fecf9e375c15e26555ebbb42b4dda46
excerpt which looks relevant:
2019-12-24T01:05:35.617644000Z at java.lang.Thread.run(Thread.java:748)
2019-12-24T01:05:35.617687100Z Caused by: net.lingala.zip4j.exception.ZipException: zip headers not found. probably not a zip file
2019-12-24T01:05:35.617709500Z at net.lingala.zip4j.core.HeaderReader.readEndOfCentralDirectoryRecord(HeaderReader.java:122)
2019-12-24T01:05:35.617733600Z at net.lingala.zip4j.core.HeaderReader.readAllHeaders(HeaderReader.java:78)
2019-12-24T01:05:35.617757100Z at net.lingala.zip4j.core.ZipFile.readZipInfo(ZipFile.java:425)
2019-12-24T01:05:35.617780500Z at net.lingala.zip4j.core.ZipFile.extractAll(ZipFile.java:475)
2019-12-24T01:05:35.617852500Z at net.lingala.zip4j.core.ZipFile.extractAll(ZipFile.java:451)
2019-12-24T01:05:35.617893200Z at cloudgene.mapred.apps.ApplicationRepository.installFromZipFile(ApplicationRepository.java:350)
2019-12-24T01:05:35.617920200Z ... 55 more
2019-12-24T01:05:35.620071400Z 19/12/24 01:05:35 INFO LogService: 2019-12-24 01:05:35 172.17.0.1 - 172.17.0.2 80 POST /api/v2/server/apps - 400 115 100 110765 http://localhost:8080 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36 http://localhost:8080/admin.html
2019-12-24T01:07:32.823620800Z /opt/cloudgene/cloudgene: line 24: 1567 Killed java -cp $CLASS_PATH $CLASS "$@"
I tried all the installation methods from http://apps.cloudgene.io/imputationserver.1000genomes-phase3
I also tried installing an older version of the imputation server (v1.2.1). Installing cloudgene apps from the admin panel -- both via link and via apps.cloudgene.io -- doesn't work. Trying to update the imputation server from the admin panel ("Update to 1.2.4") also hangs with the same problem (see the comment of the same gist above https://gist.github.com/liezl200/4fecf9e375c15e26555ebbb42b4dda46).
Using cloudgene from command line just hangs:
full logs: https://gist.github.com/liezl200/bb8a4ee2c48af2cb788224aa3952e9cf
(base) liezls-mbp:cloudgene lie$ ./cloudgene install https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-2.0.0.zip
Cloudgene 2.0.5
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by travis on 2019-12-12T16:43:01Z
Built by travis on 2019-12-12T16:43:01Z
Installing application https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-2.0.0.zip...
I need the 1000g reference panel.
After logging in, I tried to find the admin panel, but I found just my profile and logout panel.
Why can't I find it?
How can I install 1000g reference panel without admin panel?
I have been trying to set up the Michigan Imputation Server Docker image following the steps here: https://github.com/genepi/imputationserver-docker
I have successfully installed Docker, started a persistent container, did a little fiddling to get around our local security protocols, and then have been able to install some of the applications from cloudgene (updated imputation server, Hello Cloudgene, haplocheck, FastQC/MultiQC). However, I can't install the HapMap2 and 1000 Genomes reference panels, or the mtDNA-Server. In each case I receive "Error: Operation failed. Application not installed", both whether through apps.cloudgene.io or using URL. Following the URLs directly for the reference panels comes up with "Not Found. The server has not found anything matching the request URI", which makes me think the files have been removed, renamed or moved?
A few months ago (before we had figured out a way around the security protocols) I downloaded the HapMap2 file from the above URL when it seemed to still be available, so tried uploading this copy to another cloud location and installing from URL that way, and I either get the same "Application not installed" error as above, or I get the error "No workflow file found" (trying both the zip file, and directing the to .yaml file in an unzipped version), depending on how I input the URL.
Hi.
I am unable to use docker on my computing system due to security reasons. Is there a way to start the session with singularity?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.