akvo / akvo-flow-services Goto Github PK

View Code? Open in Web Editor NEW

4.0 18.0 1.0 749 KB

Akvo Flow service applications for reporting, bulk uploads and others

License: Other

Clojure 92.12% Shell 5.68% Dockerfile 0.75% Lua 1.11% Emacs Lisp 0.35%

clojure akvo-flow akvo

akvo-flow-services's Introduction

Akvo FLOW Services

An HTTP layer on top of the existing Akvo FLOW applet functionality for:

Generating reports
Importing data

Please read the running locally for development.

To deploy to production, run the ./ci/promote-test-to-prod.sh script AND follow the instructions.

License

Akvo FLOW is free software: you can redistribute it and modify it under the terms of the GNU Affero General Public License (AGPL) as published by the Free Software Foundation, either version 3 of the License or any later version.

Further documentation on licensing please read LICENSE.md

akvo-flow-services's People

Contributors

Stargazers

Watchers

Forkers

nagyist

akvo-flow-services's Issues

Include apiKey as part of the config criteria

The apiKey parameter is required for sending secure requests to the backend.

See issue: akvo/akvo-flow#256

Support for importing cascade resource definition via CSV file

Add support for importing Cascade Resource definition via CSV file

Enable RAW_DATA_TEXT export option

There is a built-in functionality to export data in tsv format.
See: akvo/akvo-flow#312

Trim the value of `shortMessage` property

When creating a new Message via the Remote API, we need to make sure the value of shortMessage is 500 characters or less, otherwise we end up with some exceptions like:

java.lang.IllegalArgumentException: shortMessage: String properties must be 500 characters or less.  Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.

Improve logging to set DEBUG|INFO only for interesting packages

The current set-debug, set-info sets the level INFO and DEBUG for the root logger. This leads to almost all classes spitting out lots of logging.

We're only interested in the logging of packages org.waterforpeople.* and com.gallatinsystems.*

Bulk Upload intermittent failure during zip extraction

During testing I have seen the bulk upload feature occasionally fail due to issues with extracting the zip file. The error looks like:

[QuartziteScheduler_Worker-9] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: Error while expanding /tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip
java.util.zip.ZipException: archive is not a ZIP archive]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: Error while expanding /tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip
java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:88)
at org.apache.tools.ant.taskdefs.Expand.execute(Expand.java:132)
at akvo.flow_services.uploader$unzip_file.invoke(uploader.clj:63)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more
Caused by: java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory32(ZipFile.java:717)
at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:672)
at org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:406)
at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:206)
at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:62)
... 6 more
[QuartziteScheduler_Worker-2] ERROR org.quartz.core.JobRunShell - Job DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an unhandled Exception:
java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:86)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[QuartziteScheduler_Worker-2] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.F91822BA-496F-4D48-AFFA-DF9DE6E71264 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/F91822BA-496F-4D48-AFFA-DF9DE6E71264/Onesixten.zip.1' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:86)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more

I'm not sure what exactly is causing this. The above file is 1.9mb and consistently causes the failure, but after extracting the zip file, removing the videos, re-zipping it and then re-attempting the upload - it completed without problem. Test file here: https://www.dropbox.com/s/3pwp7jzygrw87re/Onesixten.zip

Finally I have successfully used the bulk upload with a larger zip file (3.9mb) which contained videos and this completed without problem on first attempt.

FLOW Services QuartziteScheduler exception

I'm still seeing the below error on the logs for flow-services in production. Not sure if it needs addressed or not (I remember the last time we discussed it, that a scheduled task for creating a report was responsible but I cant remember what the proposed solution was):

[QuartziteScheduler_Worker-9] ERROR org.quartz.core.JobRunShell - Job DEFAULT.BB4036FE-EA2C-4949-812D-513AED9AEC51 threw an unhandled Exception:
com.google.appengine.tools.remoteapi.RemoteApiException: remote API call: unexpected HTTP response: 302
    at com.google.appengine.tools.remoteapi.RemoteRpc.makeException(RemoteRpc.java:153)
    at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:101)
    at com.google.appengine.tools.remoteapi.RemoteRpc.call(RemoteRpc.java:43)
    at com.google.appengine.tools.remoteapi.RemoteApiDelegate.makeDefaultSyncCall(RemoteApiDelegate.java:57)
    at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate.makeSyncCall(StandaloneRemoteApiDelegate.java:45)
    at com.google.appengine.tools.remoteapi.ThreadLocalDelegate.makeSyncCall(ThreadLocalDelegate.java:41)
    at com.google.apphosting.api.ApiProxy.makeSyncCall(ApiProxy.java:112)
    at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:38)
    at com.google.appengine.tools.remoteapi.HostedClientLogin.executePost(HostedClientLogin.java:42)
    at com.google.appengine.tools.remoteapi.ClientLogin.login(ClientLogin.java:37)
    at com.google.appengine.tools.remoteapi.HostedClientLogin.login(HostedClientLogin.java:28)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.loginImpl(RemoteApiInstaller.java:308)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.login(RemoteApiInstaller.java:276)
    at com.google.appengine.tools.remoteapi.RemoteApiInstaller.install(RemoteApiInstaller.java:116)
    at akvo.flow_services.gae$get_installer.invoke(gae.clj:33)
    at akvo.flow_services.gae$put_BANG_.invoke(gae.clj:80)
    at akvo.flow_services.uploader$add_message.invoke(uploader.clj:106)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:153)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:165)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

If we should just ignore this, then please disregard this issue.

Configuration changes broke the statistics code

The changes introduced in the refactor #34 broke the daily statistic generation. Right now we're getting a blank file.

Remove direct dependency on Ant for handling zip files

We're using Apache Ant for decompressing a Zip file, and Apache FileUtils for file handling. We can easily change those calls, and use fs a library for file-system operations.

Fix logging

Implement a proper logging strategy

Support for java.util.logging.* (the applet code will use that)
Change logging level at runtime without stopping the server

Bulk Upload failures when zip file created on Mac OS X

I think this is some what of an edge case, however the follow error is thrown when a user attempts to Bulk Upload data from a zip file created on Mac OS X:

uploading /tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/__MACOSX/surveyal/7/3/6/1/9/._wfpPhoto30853005873619.jpg file 9 of 14
java.lang.NullPointerException
at com.gallatinsystems.common.util.ImageUtil.scaleImage(ImageUtil.java:112)
at com.gallatinsystems.common.util.ImageUtil.resizeImage(ImageUtil.java:62)
at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:156)
at akvo.flow_services.uploader$upload.invoke(uploader.clj:77)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
java.io.FileNotFoundException: /tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/resized/._wfpPhoto30853005873619.jpg (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at com.gallatinsystems.common.util.FileUtil.readFileBytes(FileUtil.java:118)
at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:158)
at akvo.flow_services.uploader$upload.invoke(uploader.clj:77)
at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:89)
at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

As per: http://old.floatingsun.net/2007/02/07/whats-with-__macosx-in-zip-files/ it seems that an additional root folder is created which the scale Image function is not informed about, so a null pointer is thrown.

Scale image is looking at:

/tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/resized/._wfpPhoto30853005873619.jpg

whereas I'd guess the image is actually at:

/tmp/akvo/flow/uploads/9B82AA6A-C372-4456-AB77-5F9557702CC0/zip-content/__MACOSX/resized/._wfpPhoto30853005873619.jpg

Integration with the CI server

This is a 'catchall' issue for any small changes which might be required for integrating Akvo FLOW services with the CI server:

[] Testing build trigger based on branch name

Fix generation of report when no questions are selected for display name

Release Notes v0.6.1

Write the enhancements

Use GAE tools to parse the appengine-web.xml

Currently we're parting the appengine-web.xml manually, by traversing some nodes. There is a built-in parser in GAE SDK tools, that is more reliable.

New flow services release

Include functionality to generate the display name for data points in the RAW DATA report.

Ensure services are run with an unprivileged user account

Make key transformation when loading the properties, instead of when getting a criteria map

The get-criteria function transform the keys on each call, it would be nicer if the transformation occurs when building the criteria configuration.

https://github.com/akvo/akvo-flow-services/blob/v0.5.5/src/akvo/flow_services/config.clj#L92

Proposed solution:

Define a set of alias {"uploadBase" "uploadUrl"}
Use that alias map to transform the properties in UploadConstants.properties to match the keys required by the applet code

Note: Thanks to @ichinaski for a fresh look into this

Include a way of checking the state of cached reports

Currently there is no way of checking the current state of the cache map. For troubleshooting it should possible to verify the current map of cached reports.

Possible solutions:

Embedding a REPL to which a dev can connect and check the scheduler/cache ref, or
Include a public route e.g. /status in which a presentation (html or json) of the map is printed

Output different format for header columns when generating RAW report

Incorporate the exporterapplet.jar that solves issue akvo/akvo-flow#847.

The fix generates a different format for the column header in a raw report, depending on whether or not a question id is being used in the question.

Failure when trying to upload a file (e.g. xlsx) using a 1.7.x dashboard

A regression was introduced when uploading a file. The final message sent by the dashboard required to have a complete parameter, otherwise we return a HTTP status 400 (bad request). All 1.7.x dashboard are not aware of this new parameter.

Bulk upload should include the checksum when sending data to the backend

Present situation:
When a device sends a zip file to the backend, it includes the checksum in the URL call. The bulk upload tool does not include the checksum.

Use case: storing the checksum can potentially be useful to detect identical data files that are uploaded in quick succession by the user. This can be used to avoid duplicates.

Expected situation:
The bulk upload tool includes the checksum of the generated zip file in the URL call to the backend.

Remove useless :only from namespace declaration

On a namespace declaration, the :only sequence is discarded when used with :require, :only was useful together with :use

Add cache expiration to prevent caching XHR requests

It seems that under some connections the XHR requests to generate a report are getting cached.
The used facing issue is that he is getting an old copy of the report.

Solution:
Append expires and Cache-control HTTP headers with expiration in the past

report generation does not work

Use the client generated ID for identifying the scheduled job

Instead of using a server side generated UUID, use the unique identifier per file, to avoid scheduling 2 uploads for the same file.

Make the invalidation process more robust, taking into account alias and instance-id

An invalidation request uses baseURL as part of the key for holding a reference to a cached version of a report.

This baseURL can be at least 2 for the same instance:

using the *.appspot.com domain
using the *.akvoflow.org domain

The invalidation process should take into account this scenario. The request always uses the alias under akvoflow.org, but another user/developer can use the appspot.com subdomain.

Proposed solution:
Build a mapping {alias, instance-id} based on the server-config repository that will allow the invalidation process, to identify possible cached versions of the report using the instance-id

Implement "copy survey from other instance" functionality

Currently, copying a survey from one instance to another requires a call such as http://icco.akvoflow.org/webapp/testharness?action=importsinglesurvey&source=http%3A%2F%2Fconnect4change.akvoflow.org&surveyId=201001&apiKey=............

The is cumbersome as the apiKeys have to be known by the partner team.

Instead, this functionality should go through akvo-flow-services.

Required functionality:

a way to copy a survey from one instance to another.

Implement instance statistics as part of the FLOW services

The current implementation of getting consolidated stats is a bit hacky. A python script that crawls the GAE dashboard. See: https://github.com/akvo/akvo-flow-data-config/tree/master/stats

Implement a way of getting the statistics based on Remote API and Datastore Statistics
Schedule the report generation every day early morning

Provide a consolidated page for Dashboard and APK releases

Provide a single page with the list of recent releases of akvo-flow and akvo-flow-mobile

Move configuration settings to a file

The configuration settings are now pass as command line parameter, e.g.

java -jar /path/to/flow-services.x.y.z-standalone.jar /path/to/akvo-flow-server-config 3000

Now we want more settings to be configured for the running service. Instead of just adding more parameters as arguments, we want to point to a configuration file. This config file will be in EDN format, since is easy to read in Clojure (http://clojure.github.io/clojure/clojure.edn-api.html)

The new way of starting the service will be:

java -jar /path/to/flow-services.x.y.z-standalone.jar /path/to/config.edn

Some of the keys required in that config file:

{
:config-folder "/path/to/akvo-flow-server-config" ;; path to config folder
:http-port 3000 ;; port for starting the http server
:kinds ["User", "SurveyInstance"] ;; list of kinds interested for statistics
}

Related to issue #18

Configure Akvo FLOW Services to run on headless mode

This error can be avoided using java.awt.headless=true

No X11 DISPLAY variable was set, but this program performed an operation which requires it.]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.awt.HeadlessException: 
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
    at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:207)
    at java.awt.Window.<init>(Window.java:535)
    at java.awt.Frame.<init>(Frame.java:420)
    at java.awt.Frame.<init>(Frame.java:385)
    at javax.swing.SwingUtilities$SharedOwnerFrame.<init>(SwingUtilities.java:1756)
    at javax.swing.SwingUtilities.getSharedOwnerFrame(SwingUtilities.java:1831)
    at javax.swing.JDialog.<init>(JDialog.java:270)
    at javax.swing.JDialog.<init>(JDialog.java:204)
    at com.gallatinsystems.framework.dataexport.applet.ProgressDialog.<init>(ProgressDialog.java:93)
    at org.waterforpeople.mapping.dataexport.SurveyBulkUploader.executeImport(SurveyBulkUploader.java:129)
    at akvo.flow_services.uploader$upload.invoke(uploader.clj:84)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:94)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more

Implement robots.txt

The access log from flowdev1 server, shows several requests from web crawlers, trying to access several paths.

Since this is not a public facing service, we should disallow all the routes to bots.

User-agent: *
Disallow: /

From: http://www.robotstxt.org/robotstxt.html

Bulk uploader fails when the file is not a ZIP file

If the uploaded file is not a zip file, the bulk uploader fails with the following exception:

java.util.zip.ZipException: archive is not a ZIP archive]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: Error while expanding /tmp/akvo/flow/uploads/143194-wfpPhoto36169931556938jpg/wfpPhoto36169931556938.jpg
java.util.zip.ZipException: archive is not a ZIP archive
    at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:88)
    at org.apache.tools.ant.taskdefs.Expand.execute(Expand.java:132)
    at akvo.flow_services.uploader$unzip_file.invoke(uploader.clj:56)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:81)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:55)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more
Caused by: java.util.zip.ZipException: archive is not a ZIP archive
    at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory32(ZipFile.java:717)
    at org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:672)
    at org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:406)
    at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:206)
    at org.apache.ant.compress.taskdefs.Unzip.expandFile(Unzip.java:62)
    ... 6 more

Steps to reproduce:

Try uploading a JPG file
Although the file gets uploaded to the temporary location, it fails to get uploaded to S3

Proposed solution:

Verify the file extension before trying to process as zip file

UploadJob should remove the folder at the end of the bulk upload process

The UploadJob takes care of the using the applet bulk uploader to upload the file to S3.
The current way of defining the folder name where the extracted files will reside, is based on the size and name of the file. This could potentially collide with previous runs of the job.

The proposed fix is to remove the folder after the bulk upload finishes.

Fix jar invocation documentation

The upload process fails when trying to combine non existent parts

When the file size is less than 512KB the client just makes 1 request with the whole file.

The process than tries to combine non existent parts, leading to an error.

In this example the file was just 19KB

[QuartziteScheduler_Worker-2] ERROR org.quartz.core.JobRunShell - Job DEFAULT.80b5e411-fce1-4cbd-aff7-61bcee0944c7 threw an unhandled Exception:
java.io.FileNotFoundException: File '/tmp/akvo/flow/uploads/3C95921F-89EE-4A09-95A5-24D4E2D027AA/Rawdata-MalangVerifTahap2CleanDashboard.ipe.xlsx.1' does not exist
        at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:265)
        at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1490)
        at akvo.flow_services.uploader$combine.invoke(uploader.clj:45)
        at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:82)
        at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

Statistics process fail on missing directory

Statistics process fails when the stats-path directory doesn't exist.

Solution:

Verify if the folder exists

Implement a way of reloading the configuration files with a HTTP call

The current situation forces a developer to stop the service and manually pull the changes from the akvo-flow-server-config repo and restart the service each time (new|changed) data comes into the repo.

The proposed feature will allow to configure a GitHub web hook that will reload the configuration files with an HTTP call.

More info at:

https://help.github.com/articles/post-receive-hooks

New flow services release

Normalise csv file before parsing it as cascade data

A few users have reported problems with uploading csv files to cascading resources. Two common problems are:

Encoding problems
Line ending problems.

Proposal: Normalise / clean the csv file after upload and before parsing:

normalise line endings to Unix line endings
normalise encoding to UTF-8, and abort with error message if encoding is not normalizable

A failed notification to GAE should be retried

Checking the logs in FLOW services, a notification for data ingestion could fail. We should reschedule it and retry with a delay.

clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}
[QuartziteScheduler_Worker-1] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.22749B98-74DF-4F0F-8CDE-FDC18A4C274B threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}]
Caused by: clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, :environment {client #<client$wrap_output_coercion$fn__5042 clj_http.client$wrap_output_coercion$fn__5042@442bdedb>, req {:request-method :get, :url "http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"}, map__4926 {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, resp {:orig-content-encoding nil, :trace-redirects ["http://some-host.appspot.com/processor?action=submit&fileName=fe7d545f-3cc5-4d67-8127-cde0b97677b5.zip"], :request-time 4706, :status 500, :headers {"Connection" "close", "Alternate-Protocol" "80:quic,p=0.01", "Content-Length" "323", "Server" "Google Frontend", "Content-Type" "text/html; charset=UTF-8", "Date" "Mon, 10 Nov 2014 13:06:05 GMT"}, :body "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}, status 500}}

Upload type shouldn't rely on file extension only

The current code relies only on file name to decide what type of upload it is. We can know the mime-type by reading the first bytes of the file using Apache Tika, e.g.

project.clj

:dependencies [org.apache.tika/tika-core "1.6"]

(ns tika.core-test
  (:require [clojure.test :refer :all]
            [tika.core :refer :all]
            [clojure.java.io :as io])
  (:import org.apache.tika.Tika))

(deftest test-detect
  (testing "Tika/detect"
    (let [t (Tika.)]
      (is (= "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (.detect t (io/resource "RAW_DATA-1561005.xlsx"))))
      (is (= "application/vnd.ms-excel" (.detect t (io/resource "SURVEY_FORM-1561005.xls"))))
      (is (= "text/plain" (.detect t (io/resource "RAW_DATA_TEXT-1561005.txt"))))
      (is (= "application/zip" (.detect t (io/resource "test.zip")))))))

$ lein test
lein test tika.core-test
Ran 1 tests containing 4 assertions.
0 failures, 0 errors.

Add support for cascading questions

Implements cascading question functionality.
Implements the flow services part of:
akvo/akvo-flow#189
akvo/akvo-flow-mobile#204
akvo/akvo-product-design#33

Move reports cache folder away from /tmp

The current cache folder for reports is under /tmp which is wiped out on server restarts.

Keeping old reports can be useful for recovering data on broken uploads, and for analysis of data over time.
The statistics of all instances use the same cache folder, and we'll like to keep a history of CSV files.

Bulk upload fails if one of the zip files has no content

When uploading a set of zip files (surveyal) folder from the devices, if one of the zip files is empty the bulk uploader fails with the following exception.

2014-12-09 09:54:12,603 services.akvoflow.org INFO [akvo.flow-services.uploader] - Bulk upload - path: /var/tmp/akvo/flow/uploads/63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06/zip-content - bucket: waterforpeople - file: 
[QuartziteScheduler_Worker-4] ERROR org.quartz.core.JobRunShell - Job DEFAULT.63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06 threw an unhandled Exception: 
java.util.zip.ZipException: zip file is empty
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:215)
    at java.util.zip.ZipFile.<init>(ZipFile.java:145)
    at java.util.zip.ZipFile.<init>(ZipFile.java:159)
    at sun.reflect.GeneratedConstructorAccessor563.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at clojure.lang.Reflector.invokeConstructor(Reflector.java:180)
    at akvo.flow_services.uploader$get_data.invoke(uploader.clj:121)
    at clojure.core$map$fn__4245.invoke(core.clj:2559)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core$concat$cat__3957$fn__3958.invoke(core.clj:701)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.lang.RT.nthFrom(RT.java:848)
    at clojure.lang.RT.nth(RT.java:807)
    at clojure.core$distinct$step__4716$fn__4717$fn__4719.invoke(core.clj:4618)
    at clojure.core$distinct$step__4716$fn__4717.invoke(core.clj:4618)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4264.invoke(core.clj:2595)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core.protocols$fn__6086.invoke(protocols.clj:146)
    at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54)
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6289)
    at clojure.core$group_by.invoke(core.clj:6602)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:148)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:170)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[QuartziteScheduler_Worker-4] ERROR org.quartz.core.ErrorLogger - Job (DEFAULT.63A3DF00-EE81-47F9-8ABA-8C1C83E0BE06 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.util.zip.ZipException: zip file is empty]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.util.zip.ZipException: zip file is empty
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:215)
    at java.util.zip.ZipFile.<init>(ZipFile.java:145)
    at java.util.zip.ZipFile.<init>(ZipFile.java:159)
    at sun.reflect.GeneratedConstructorAccessor563.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at clojure.lang.Reflector.invokeConstructor(Reflector.java:180)
    at akvo.flow_services.uploader$get_data.invoke(uploader.clj:121)
    at clojure.core$map$fn__4245.invoke(core.clj:2559)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core$concat$cat__3957$fn__3958.invoke(core.clj:701)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.lang.RT.nthFrom(RT.java:848)
    at clojure.lang.RT.nth(RT.java:807)
    at clojure.core$distinct$step__4716$fn__4717$fn__4719.invoke(core.clj:4618)
    at clojure.core$distinct$step__4716$fn__4717.invoke(core.clj:4618)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4264.invoke(core.clj:2595)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:598)
    at clojure.core$next.invoke(core.clj:64)
    at clojure.core.protocols$fn__6086.invoke(protocols.clj:146)
    at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
    at clojure.core.protocols$fn__6078.invoke(protocols.clj:54)
    at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6289)
    at clojure.core$group_by.invoke(core.clj:6602)
    at akvo.flow_services.uploader$bulk_survey.invoke(uploader.clj:148)
    at akvo.flow_services.uploader$bulk_upload.invoke(uploader.clj:170)
    at akvo.flow_services.scheduler.BulkUploadJob.execute(scheduler.clj:54)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    ... 1 more

Refactor bulk upload

The current bulk upload relies on the applet code. The process is straight forward and can be re-implemented in Clojure.

After combining all the chunks sent by the client,
Search for all zip files contained in the original surveyal.zip
Unzip each file and loop through the content of data.txt, we need to skip processing the zip files names wfpGenerated*.zip
For each survey instance (identified by uuid) in data.txt, check if that survey instance exists in the datastore by making a Remote API call (NOTE: this is a new step, not present in the old process)
If the survey instance doesn't exist in the datastore, make a zip with just one survey instance, upload it to S3 and notify the backend for ingestion.
We need to keep track of the uploaded survey instance for this particular bulk-upload execution, to avoid reprocessing a survey instance. (We can't rely on the Remote API lookup)

Implement missing parameter validation

There is pending validation in request parameters for report generation, e.g.

criteria is required and can't be null, it needs to be a valid JSON object

[qtp1625050241-31] WARN org.eclipse.jetty.server.AbstractHttpConnection - /generate?callback=FLOW.ReportLoader.handleResponse&criteria=null&_=1386827806483
java.lang.IllegalArgumentException: No implementation of method: :to-job-data of protocol: #'clojurewerkz.quartzite.conversion/JobDataMapConversion found for class: clojure.lang.PersistentList
        at clojure.core$_cache_protocol_fn.invoke(core_deftype.clj:541)
        at clojurewerkz.quartzite.conversion$fn__1633$G__1628__1638.invoke(conversion.clj:15)
        at clojurewerkz.quartzite.jobs$using_job_data.invoke(jobs.clj:57)
        at akvo.flow_services.scheduler$get_job.invoke(scheduler.clj:73)
        at akvo.flow_services.scheduler$schedule_job.invoke(scheduler.clj:82)
        at akvo.flow_services.scheduler$generate_report.invoke(scheduler.clj:118)
        at akvo.flow_services.core$generate_report.invoke(core.clj:32)
        at akvo.flow_services.core$fn__1992.invoke(core.clj:52)

Update list of dev instances

There is a new dev instance akvoflow-dev2, needs to be excluded from stats.