This sample demonstrates a concurrent approach to scheduling Flux workflows that requires high throughput and coordinated executions. The dataset used in this sample comes from Amazon S3 bucket "1000genomes": http://aws.amazon.com/1000genomes/. It uses Amazon S3 APIs to retrieve objects and schedule them using a java action in Flux.
This sample implements Flux best practices when dealing with java actions. Java action is a double-edged sword, while offering more power for advanced users, it is sometimes tricky to implement how to gracefully return them to Flux when involving calls to external services which Flux has no control over. This can be implemented using flowContext.isInterrupted() "API": https://support.flux.ly/80/javadoc/flux/FlowContext.html#isInterrupted(). Interrupting the workflow would gracefully shutdown resources and exit from execution normally. This may be useful in testing too, users can interrupt it from Flux Operations console to skip current execution and proceed to next steps.
- Download and Install Flux from here
- Install flux.jar to your local maven repo:
mvn install:install-file -DgroupId=flux -DartifactId=flux -Dversion=8.0.11 -Dpackaging=jar -Dfile=flux.jar
- Engine configuration is defined in engine-config.properties and runtime-config.properties, make changes appropriately.
ConcurrentTest is a standalone test case which runs using an in-memory H2 database by default. Configuration for Postgres and MySQL are provided.
mvn test -Djava.awt.headless=true
For MySQL
mvn test -Ddatabase=mysql -DclearEngine=true -Djava.awt.headless=true
For Postgres
mvn test -Ddatabase=postgres -Djava.awt.headless=true
For Microsoft SQL Server, download the driver from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774 and install to your local maven repository
mvn install:install-file -Dfile=sqljdbc4.jar -Dpackaging=jar -DgroupId=com.microsoft.sqlserver -DartifactId=sqljdbc4 -Dversion=4.0
mvn test -Ddatabase=sql_server -Djava.awt.headless=true
There are two workflows used: parent and child. Child workflow template is stored in Flux repository. Parent workflow spins off child instance for each S3 bucket.