Coder Social home page Coder Social logo

akvo / akvo-flow-services Goto Github PK

View Code? Open in Web Editor NEW
4.0 18.0 1.0 749 KB

Akvo Flow service applications for reporting, bulk uploads and others

License: Other

Clojure 92.12% Shell 5.68% Dockerfile 0.75% Lua 1.11% Emacs Lisp 0.35%
clojure akvo-flow akvo

akvo-flow-services's Introduction

Akvo FLOW Services

An HTTP layer on top of the existing Akvo FLOW applet functionality for:

  • Generating reports
  • Importing data

Please read the running locally for development.

To deploy to production, run the ./ci/promote-test-to-prod.sh script AND follow the instructions.

License

Copyright © 2013 Stichting Akvo (Akvo Foundation)

Akvo FLOW is free software: you can redistribute it and modify it under the terms of the GNU Affero General Public License (AGPL) as published by the Free Software Foundation, either version 3 of the License or any later version.

Further documentation on licensing please read LICENSE.md

akvo-flow-services's People

Contributors

dlebrero avatar ichinaski avatar iperdomo avatar jonase avatar michaelakvo avatar muloem avatar orifito avatar peeb avatar rumca avatar stellanl avatar tangrammer avatar valllllll2000 avatar zuhdil avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nagyist

akvo-flow-services's Issues

Trim the value of `shortMessage` property

When creating a new Message via the Remote API, we need to make sure the value of shortMessage is 500 characters or less, otherwise we end up with some exceptions like:

java.lang.IllegalArgumentException: shortMessage: String properties must be 500 characters or less.  Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.

Bulk Upload intermittent failure during zip extraction

During testing I have seen the bulk upload feature occasionally fail due to issues with extracting the zip file. The error looks like:

[QuartziteScheduler_Worker-9] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: Error while expanding /tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip
java.util.zip.ZipException: archive is not a ZIP archive]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: Error while expanding /tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip
java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:88)
at org.apache.tools.ant.taskdefs.Expand.execute(Expand.java:132)
at akvo.flow_services.uploader$unzip_file.invoke(uploader.clj:63)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more
Caused by: java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory32(ZipFile.java:717)
at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:672)
at org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:406)
at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:206)
at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:62)
... 6 more
[QuartziteScheduler_Worker-2] ERROR org.quartz.core.JobRunShell - Job DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an unhandled Exception:
java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:86)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[QuartziteScheduler_Worker-2] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:86)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more

I'm not sure what exactly is causing this. The above file is 1.9mb and consistently causes the failure, but after extracting the zip file, removing the videos, re-zipping it and then re-attempting the upload - it completed without problem. Test file here: https://www.dropbox.com/s/3pwp7jzygrw87re/Onesixten.zip

Finally I have successfully used the bulk upload with a larger zip file (3.9mb) which contained videos and this completed without problem on first attempt.

FLOW Services QuartziteScheduler exception

I'm still seeing the below error on the logs for flow-services in production. Not sure if it needs addressed or not (I remember the last time we discussed it, that a scheduled task for creating a report was responsible but I cant remember what the proposed solution was):

[QuartziteScheduler_Worker-9] ERROR org.quartz.core.JobRunShell - Job DEFAULT.BB4036FE-EA2C-4949-812D-513AED9AEC51 threw an unhandled Exception:
com.google.appengine.tools.remoteapi.RemoteApiException: remote API call: unexpected HTTP response: 302
    at com.google.appengine.tools.remoteapi.RemoteRpc.makeException(RemoteRpc.java:153)
    at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:101)
    at com.google.appengine.tools.remoteapi.RemoteRpc.call(RemoteRpc.java:43)
    at com.google.appengine.tools.remoteapi.RemoteApiDelegate.makeDefaultSyncCall(RemoteApiDelegate.java:57)
    at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate.makeSyncCall(StandaloneRemoteApiDelegate.java:45)
    at com.google.appengine.tools.remoteapi.ThreadLocalDelegate.makeSyncCall(ThreadLocalDelegate.java:41)
    at com.google.apphosting.api.ApiProxy.makeSyncCall(ApiProxy.java:112)
    at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:38)
    at com.google.appengine.tools.remoteapi.HostedClientLogin.executePost(HostedClientLogin.java:42)
    at com.google.appengine.tools.remoteapi.ClientLogin.login(ClientLogin.java:37)
    at com.google.appengine.tools.remoteapi.HostedClientLogin.login(HostedClientLogin.java:28)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.loginImpl(RemoteApiInstaller.java:308)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.login(RemoteApiInstaller.java:276)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.install(RemoteApiInstaller.java:116)
    at akvo.flow_services.gae$get_installer.invoke(gae.clj:33)
    at akvo.flow_services.gae$put_BANG_.invoke(gae.clj:80)
    at akvo.flow_services.uploader$add_message.invoke(uploader.clj:106)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:153)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:165)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

If we should just ignore this, then please disregard this issue.

Fix logging

Implement a proper logging strategy

  • Support for java.util.logging.* (the applet code will use that)
  • Change logging level at runtime without stopping the server

Bulk Upload failures when zip file created on Mac OS X

I think this is some what of an edge case, however the follow error is thrown when a user attempts to Bulk Upload data from a zip file created on Mac OS X:

uploading /tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/__MACOSX/surveyal/7/3/6/1/9/._wfpPhoto30853005873619.jpg file 9 of 14
java.lang.NullPointerException
at com.gallatinsystems.common.util.ImageUtil.scaleImage(ImageUtil.java:112)
at com.gallatinsystems.common.util.ImageUtil.resizeImage(ImageUtil.java:62)
at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:156)
at akvo.flow_services.uploader$upload.invoke(uploader.clj:77)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
java.io.FileNotFoundException: /tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/resized/._wfpPhoto30853005873619.jpg (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at com.gallatinsystems.common.util.FileUtil.readFileBytes(FileUtil.java:118)
at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:158)
at akvo.flow_services.uploader$upload.invoke(uploader.clj:77)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

As per: http://old.floatingsun.net/2007/02/07/whats-with-__macosx-in-zip-files/ it seems that an additional root folder is created which the scale Image function is not informed about, so a null pointer is thrown.

Scale image is looking at:

/tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/resized/._wfpPhoto30853005873619.jpg

whereas I'd guess the image is actually at:

/tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/__MACOSX/resized/._wfpPhoto30853005873619.jpg

Integration with the CI server

This is a 'catchall' issue for any small changes which might be required for integrating Akvo FLOW services with the CI server:

[] Testing build trigger based on branch name

Make key transformation when loading the properties, instead of when getting a criteria map

The get-criteria function transform the keys on each call, it would be nicer if the transformation occurs when building the criteria configuration.

https://github.com/akvo/akvo-flow-services/blob/v0.5.5/src/akvo/flow_services/config.clj#L92

Proposed solution:

  • Define a set of alias {"uploadBase" "uploadUrl"}
  • Use that alias map to transform the properties in UploadConstants.properties to match the keys required by the applet code

Note: Thanks to @ichinaski for a fresh look into this

Include a way of checking the state of cached reports

Currently there is no way of checking the current state of the cache map. For troubleshooting it should possible to verify the current map of cached reports.

Possible solutions:

  • Embedding a REPL to which a dev can connect and check the scheduler/cache ref, or
  • Include a public route e.g. /status in which a presentation (html or json) of the map is printed

Bulk upload should include the checksum when sending data to the backend

Present situation:
When a device sends a zip file to the backend, it includes the checksum in the URL call. The bulk upload tool does not include the checksum.

Use case: storing the checksum can potentially be useful to detect identical data files that are uploaded in quick succession by the user. This can be used to avoid duplicates.

Expected situation:
The bulk upload tool includes the checksum of the generated zip file in the URL call to the backend.

Add cache expiration to prevent caching XHR requests

It seems that under some connections the XHR requests to generate a report are getting cached.
The used facing issue is that he is getting an old copy of the report.

Solution:
Append expires and Cache-control HTTP headers with expiration in the past

Make the invalidation process more robust, taking into account alias and instance-id

An invalidation request uses baseURL as part of the key for holding a reference to a cached version of a report.

This baseURL can be at least 2 for the same instance:

  • using the *.appspot.com domain
  • using the *.akvoflow.org domain

The invalidation process should take into account this scenario. The request always uses the alias under akvoflow.org, but another user/developer can use the appspot.com subdomain.

Proposed solution:
Build a mapping {alias, instance-id} based on the server-config repository that will allow the invalidation process, to identify possible cached versions of the report using the instance-id

Implement "copy survey from other instance" functionality

Currently, copying a survey from one instance to another requires a call such as http://icco.akvoflow.org/webapp/testharness?action=importsinglesurvey&source=http%3A%2F%2Fconnect4change.akvoflow.org&surveyId=201001&apiKey=............

The is cumbersome as the apiKeys have to be known by the partner team.

Instead, this functionality should go through akvo-flow-services.

Required functionality:

  • a way to copy a survey from one instance to another.

Move configuration settings to a file

The configuration settings are now pass as command line parameter, e.g.

java -jar /path/to/flow-services.x.y.z-standalone.jar /path/to/akvo-flow-server-config 3000

Now we want more settings to be configured for the running service. Instead of just adding more parameters as arguments, we want to point to a configuration file. This config file will be in EDN format, since is easy to read in Clojure (http://clojure.github.io/clojure/clojure.edn-api.html)

The new way of starting the service will be:

java -jar /path/to/flow-services.x.y.z-standalone.jar /path/to/config.edn

Some of the keys required in that config file:

{
:config-folder "/path/to/akvo-flow-server-config" ;; path to config folder
:http-port 3000 ;; port for starting the http server
:kinds ["User", "SurveyInstance"] ;; list of kinds interested for statistics
}

Related to issue #18

Configure Akvo FLOW Services to run on headless mode

This error can be avoided using java.awt.headless=true

No X11 DISPLAY variable was set, but this program performed an operation which requires it.]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.awt.HeadlessException: 
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
    at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:207)
    at java.awt.Window.<init>(Window.java:535)
    at java.awt.Frame.<init>(Frame.java:420)
    at java.awt.Frame.<init>(Frame.java:385)
    at javax.swing.SwingUtilities$SharedOwnerFrame.<init>(SwingUtilities.java:1756)
    at javax.swing.SwingUtilities.getSharedOwnerFrame(SwingUtilities.java:1831)
    at javax.swing.JDialog.<init>(JDialog.java:270)
    at javax.swing.JDialog.<init>(JDialog.java:204)
    at com.gallatinsystems.framework.dataexport.applet.ProgressDialog.<init>(ProgressDialog.java:93)
    at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:129)
    at akvo.flow_services.uploader$upload.invoke(uploader.clj:84)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:94)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more

Bulk uploader fails when the file is not a ZIP file

If the uploaded file is not a zip file, the bulk uploader fails with the following exception:

java.util.zip.ZipException: archive is not a ZIP archive]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: Error while expanding /tmp/akvo/flow/uploads/143194-wfpPhoto36169931556938jpg/wfpPhoto36169931556938.jpg
java.util.zip.ZipException: archive is not a ZIP archive
    at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:88)
    at org.apache.tools.ant.taskdefs.Expand.execute(Expand.java:132)
    at akvo.flow_services.uploader$unzip_file.invoke(uploader.clj:56)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:81)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:55)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more
Caused by: java.util.zip.ZipException: archive is not a ZIP archive
    at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory32(ZipFile.java:717)
    at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:672)
    at org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:406)
    at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:206)
    at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:62)
    ... 6 more

Steps to reproduce:

  • Try uploading a JPG file
  • Although the file gets uploaded to the temporary location, it fails to get uploaded to S3

Proposed solution:

  • Verify the file extension before trying to process as zip file

UploadJob should remove the folder at the end of the bulk upload process

The UploadJob takes care of the using the applet bulk uploader to upload the file to S3.
The current way of defining the folder name where the extracted files will reside, is based on the size and name of the file. This could potentially collide with previous runs of the job.

The proposed fix is to remove the folder after the bulk upload finishes.

The upload process fails when trying to combine non existent parts

When the file size is less than 512KB the client just makes 1 request with the whole file.

The process than tries to combine non existent parts, leading to an error.

In this example the file was just 19KB

[QuartziteScheduler_Worker-2] ERROR org.quartz.core.JobRunShell - Job DEFAULT.80b5e411-fce1-4cbd-aff7-61bcee0944c7 threw an unhandled Exception:
java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/3C95921F-89EE-4A09-95A5-24D4E2D027AA/Rawdata-MalangVerifTahap2CleanDashboard.ipe.xlsx.1' does not exist
        at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
        at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
        at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
        at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:82)
        at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

Normalise csv file before parsing it as cascade data

A few users have reported problems with uploading csv files to cascading resources. Two common problems are:

  • Encoding problems
  • Line ending problems.

Proposal: Normalise / clean the csv file after upload and before parsing:

  • normalise line endings to Unix line endings
  • normalise encoding to UTF-8, and abort with error message if encoding is not normalizable

A failed notification to GAE should be retried

Checking the logs in FLOW services, a notification for data ingestion could fail. We should reschedule it and retry with a delay.

clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}
[QuartziteScheduler_Worker-1] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.22749B98-74DF-4F0F-8CDE-FDC18A4C274B threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}]
Caused by: clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}

Upload type shouldn't rely on file extension only

The current code relies only on file name to decide what type of upload it is. We can know the mime-type by reading the first bytes of the file using Apache Tika, e.g.

project.clj

:dependencies [org.apache.tika/tika-core "1.6"]
(ns tika.core-test
  (:require [clojure.test :refer :all]
            [tika.core :refer :all]
            [clojure.java.io :as io])
  (:import org.apache.tika.Tika))

(deftest test-detect
  (testing "Tika/detect"
    (let [t (Tika.)]
      (is (= "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (.detect t (io/resource "RAW_DATA-1561005.xlsx"))))
      (is (= "application/vnd.ms-excel" (.detect t (io/resource "SURVEY_FORM-1561005.xls"))))
      (is (= "text/plain" (.detect t (io/resource "RAW_DATA_TEXT-1561005.txt"))))
      (is (= "application/zip" (.detect t (io/resource "test.zip")))))))
$ lein test
lein test tika.core-test
Ran 1 tests containing 4 assertions.
0 failures, 0 errors.

Move reports cache folder away from /tmp

The current cache folder for reports is under /tmp which is wiped out on server restarts.

  • Keeping old reports can be useful for recovering data on broken uploads, and for analysis of data over time.
  • The statistics of all instances use the same cache folder, and we'll like to keep a history of CSV files.

Bulk upload fails if one of the zip files has no content

When uploading a set of zip files (surveyal) folder from the devices, if one of the zip files is empty the bulk uploader fails with the following exception.

2014-12-09 09:54:12,603 services.akvoflow.org INFO [akvo.flow-services.uploader] - Bulk upload - path: /var/tmp/akvo/flow/uploads/63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06/zip-content - bucket: waterforpeople - file: 
[QuartziteScheduler_Worker-4] ERROR org.quartz.core.JobRunShell - Job DEFAULT.63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06 threw an unhandled Exception: 
java.util.zip.ZipException: zip file is empty
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:215)
    at java.util.zip.ZipFile.<init>(ZipFile.java:145)
    at java.util.zip.ZipFile.<init>(ZipFile.java:159)
    at sun.reflect.GeneratedConstructorAccessor563.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at clojure.lang.Reflector.invokeConstructor(Reflector.java:180)
    at akvo.flow_services.uploader$get_data.invoke(uploader.clj:121)
    at clojure.core$map$fn__4245.invoke(core.clj:2559)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core$concat$cat__3957$fn__3958.invoke(core.clj:701)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.lang.RT.nthFrom(RT.java:848)
    at clojure.lang.RT.nth(RT.java:807)
    at clojure.core$distinct$step__4716$fn__4717$fn__4719.invoke(core.clj:4618)
    at clojure.core$distinct$step__4716$fn__4717.invoke(core.clj:4618)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4264.invoke(core.clj:2595)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core.protocols$fn__6086.invoke(protocols.clj:146)
    at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54)
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6289)
    at clojure.core$group_by.invoke(core.clj:6602)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:148)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:170)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[QuartziteScheduler_Worker-4] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.util.zip.ZipException: zip file is empty]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.util.zip.ZipException: zip file is empty
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:215)
    at java.util.zip.ZipFile.<init>(ZipFile.java:145)
    at java.util.zip.ZipFile.<init>(ZipFile.java:159)
    at sun.reflect.GeneratedConstructorAccessor563.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at clojure.lang.Reflector.invokeConstructor(Reflector.java:180)
    at akvo.flow_services.uploader$get_data.invoke(uploader.clj:121)
    at clojure.core$map$fn__4245.invoke(core.clj:2559)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core$concat$cat__3957$fn__3958.invoke(core.clj:701)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.lang.RT.nthFrom(RT.java:848)
    at clojure.lang.RT.nth(RT.java:807)
    at clojure.core$distinct$step__4716$fn__4717$fn__4719.invoke(core.clj:4618)
    at clojure.core$distinct$step__4716$fn__4717.invoke(core.clj:4618)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4264.invoke(core.clj:2595)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core.protocols$fn__6086.invoke(protocols.clj:146)
    at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54)
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6289)
    at clojure.core$group_by.invoke(core.clj:6602)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:148)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:170)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more

Refactor bulk upload

The current bulk upload relies on the applet code. The process is straight forward and can be re-implemented in Clojure.

  • After combining all the chunks sent by the client,
  • Search for all zip files contained in the original surveyal.zip
  • Unzip each file and loop through the content of data.txt, we need to skip processing the zip files names wfpGenerated*.zip
  • For each survey instance (identified by uuid) in data.txt, check if that survey instance exists in the datastore by making a Remote API call (NOTE: this is a new step, not present in the old process)
  • If the survey instance doesn't exist in the datastore, make a zip with just one survey instance, upload it to S3 and notify the backend for ingestion.
  • We need to keep track of the uploaded survey instance for this particular bulk-upload execution, to avoid reprocessing a survey instance. (We can't rely on the Remote API lookup)

Implement missing parameter validation

There is pending validation in request parameters for report generation, e.g.

  • criteria is required and can't be null, it needs to be a valid JSON object
[qtp1625050241-31] WARN org.eclipse.jetty.server.AbstractHttpConnection - /generate?callback=FLOW.ReportLoader.handleResponse&criteria=null&_=1386827806483
java.lang.IllegalArgumentException: No implementation of method: :to-job-data of protocol: #'clojurewerkz.quartzite.conversion/JobDataMapConversion found for class: clojure.lang.PersistentList
        at clojure.core$_cache_protocol_fn.invoke(core_deftype.clj:541)
        at clojurewerkz.quartzite.conversion$fn__1633$G__1628__1638.invoke(conversion.clj:15)
        at clojurewerkz.quartzite.jobs$using_job_data.invoke(jobs.clj:57)
        at akvo.flow_services.scheduler$get_job.invoke(scheduler.clj:73)
        at akvo.flow_services.scheduler$schedule_job.invoke(scheduler.clj:82)
        at akvo.flow_services.scheduler$generate_report.invoke(scheduler.clj:118)
        at akvo.flow_services.core$generate_report.invoke(core.clj:32)
        at akvo.flow_services.core$fn__1992.invoke(core.clj:52)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.