DATAVIEW (www.dataview.org) is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. Current research focuses on the performance and cost optimization for running workflows in clouds.
DATAVIEW supports two programing interfaces to develop and run workflows:
- JAVA API: A programmer can develop various workflow tasks and workflows based on the DATAVIWE libraries. /DATAVIEW/src/test.java shows the six steps to create a customized workflow and execute it in Amazon EC2.
- The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
- The accessKey and secretKey should be updated in config.properties under /DATAVIEW/WebContent/workflowLibDir/
- After finishing the workflow, please terminate all the EC2 instances from your AWS account manually.
- Visual Programming: DATAVIEW is deployed as a Web site in Tomcat and a user can drag and drop tasks and link them into a workflow in a visual workflow design and execution environment called Webbench.
- A dropbox accout is necessary to store all the input data, workflow tasks, the final output files produced by the workflow execution. The user needs to create Three default folders Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow. Four relational algebra tasks (jar files) and input files are already stored under the DATAVIEW/WebContent/workflowTaskDir folder.
- A local account needs to be registered to show a visualized workflow.
- A dropbox token should be provided in the main interface when you login in, which can be generated based on this tutorial:https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/
- Download the DATAVIEW package from https://github.com/shiyonglu/DATAVIEW by clicking the "Clone or Download" button.
- Unzip the DATAVIEW-master.zip file and import the DATAVIEW project into Eclipse as an "Existing Projects into Workspace" by selecting "Projects from Folder or Archive".
- The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
- /DATAVIEW/src/test.java shows the six steps to create a new workflow and execute it with local executor.
- Follow the first three steps from
- Create three default folders Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow in your dropbox.
- Get a dropbox token.
- Chapter 1: A gentle introduction to DATAVIEW ๏ผhttps://youtu.be/7S4iGKXpaAc)
- How to download, import DATAVIEW into Eclipse as Java API and run a workflow with local executor (https://youtu.be/xJikeWptYSw)