Coder Social home page Coder Social logo

linkedin-gradle-plugin-for-apache-hadoop's Introduction

Build Status Download

LinkedIn Gradle Plugin for Apache Hadoop

The LinkedIn Gradle Plugin for Apache Hadoop (which we shall refer to as simply the "Hadoop Plugin" for brevity) will help you more effectively build, test and deploy Hadoop applications.

In particular, the Plugin will help you easily work with Hadoop applications like Apache Pig and build workflows for Hadoop workflow schedulers such as Azkaban and Apache Oozie.

The Plugin includes the LinkedIn Gradle DSL for Apache Hadoop (which we shall refer to as simply the "Hadoop DSL" for brevity), a language for specifying jobs and workflows for Azkaban.

Hadoop Plugin User Guide

The Hadoop Plugin User Guide is available at User Guide.

Hadoop DSL Language Reference

The Hadoop DSL Language Reference is available at Hadoop DSL Language Reference.

Getting the Hadoop Plugin

The Hadoop Plugin is now published at plugins.gradle.org. Click on the link for a short snippet to add to your build.gradle file to start using the Hadoop Plugin.

Can I Benefit from the Hadoop Plugin and Hadoop DSL?

You must use Gradle as your build system to use the Hadoop Plugin. If you are using Azkaban, you should start using the Hadoop Plugin immediately and you should use the Hadoop DSL to develop all of your Azkaban workflows.

If you are using Apache Pig, the Plugin includes features that will statically validate your Pig scripts, saving you time by finding errors at build time instead of when you run your Pig script.

If you run Apache Pig or Apache Spark on a Hadoop cluster through a gateway node, the Plugin includes tasks that will automate the process of launching your Pig or Spark jobs on the gateway without you having to manually download your code and dependencies there first.

If you are using Gradle and you feel that you might benefit from any of the above features, consider using the Hadoop Plugin and the Hadoop DSL.

Example Project

We have added an Example Project that uses the Hadoop Plugin and DSL to build an example Azkaban workflow consisting of Apache Pig, Apache Hive and Java Map-Reduce jobs.

Apache Oozie Status

The Hadoop Plugin includes Gradle tasks for Apache Oozie, including the ability to upload versioned directories to HDFS, as well as Gradle tasks for issuing Oozie commands. If you are using Gradle as your build system and Apache Oozie as your Hadoop workflow scheduler, you might find the Hadoop Plugin useful. However, we would like to mention the fact that since we are no longer actively using Oozie at LinkedIn, it is possible that the Oozie tasks may fall into a non-working state.

Although we started on a Hadoop DSL compiler for Oozie, we did not complete it, and it is currently not in a usable form. We are not currently working on it and it is unlikely to be completed.

Recent News

  • May 2017 We have added an Example Project that uses the Hadoop Plugin and DSL
  • April 2016 We have made a refresh of the User Guide and Hadoop DSL Language Reference Wiki pages
  • January 2016 The Hadoop Plugin is now published on plugins.gradle.org
  • November 2015 Gradle version bumped to 2.7 and the Gradle daemon enabled - tests run much, much faster
  • August 2015 Initial pull requests for Oozie versioned deployments and the Oozie Hadoop DSL compiler have been merged
  • August 2015 The Hadoop Plugin and Hadoop DSL were released on Github! See the LinkedIn Engineering Blog post for the announcement!
  • July 2015 See our talk at the Gradle Summit

Project Structure

The project structure is setup as follows:

  • azkaban-client: Code to work with Azkaban via the Azkaban REST API
  • example-project: Example project that uses the Hadoop Plugin and DSL to build an example Azkaban workflow
  • hadoop-jobs: Code for re-usable Hadoop jobs and implementations of Hadoop DSL job types
  • hadoop-plugin: Code and tests for the various plugins that comprise the Hadoop Plugin
  • li-hadoop-plugin: LinkedIn-specific extensions to the Hadoop Plugin, and tests

Although the li-hadoop-plugin code is generally specific to LinkedIn, it is included in the project to show you how to use subclassing to extend the core functionality of the Hadoop Plugin for your organization (and to make sure our open-source contributions don't break the LinkedIn customizations).

Building and Running Test Cases

To build the Plugin and run the test cases, run ./gradlew build from the top-level project directory.

Unit tests

Unit tests are invoked by running ./gradlew :hadoop-plugin:test. Individual tests may be executed using ./gradlew :hadoop-plugin:test --tests="com.linkedin.gradle.zip.HadoopZipTest".

Integration tests

Integration tests are invoked by running ./gradlew :hadoop-plugin:integTest, or by running the check task.

linkedin-gradle-plugin-for-apache-hadoop's People

Contributors

akshayrai avatar arjun4084346 avatar arpang avatar burgerkingeater avatar convexquad avatar convexshiba avatar djaiswal83 avatar dpukyle avatar edwinalu avatar erwa avatar hungj avatar jamiesjc avatar jenniferdai avatar jianlingzhong avatar jinhyukchang avatar loganrosen avatar mtrna avatar ncowanli avatar nntnag17 avatar oliverhu avatar pranayhasan avatar pranayyerra avatar rajagopr avatar rakeshmalladi avatar rride avatar saikadam avatar surennihalani avatar tzuhanjan avatar ypadron-in avatar zxcware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedin-gradle-plugin-for-apache-hadoop's Issues

Refactor Hadoop Validator into its own project

We've got the great new Hadoop / Pig validator enhancements in the 0.10.x series thanks to @kurtgodel95 and @nntnag17. I would like to make some minor refactorings as we bring this code in preparation for wide use at LinkedIn (and in the open-source community).

I would to pull https://github.com/linkedin/linkedin-gradle-plugin-for-apache-hadoop/tree/master/hadoop-plugin/src/main/groovy/com/linkedin/gradle/hadoopValidator into its own top-level subproject called "hadoop-validator". This will more easily enable this code to be tested, and possibly re-used as an API dependency in other projects. Additionally, we'll be able to isolate its dependencies in its own build.gradle file, which will help us increase the reliability of the Hadoop Plugin as the LinkedIn Gradle plugins classpath is already getting filled with all kinds of stuff.

We might additionally need to pull some other things in the hadoop-plugin subproject out (I think the hadoopValidator code uses the HDFS utility classes in the Hadoop Plugin). If so, we can additionally pull out another subproject called "hadoop-base" on which all other subprojects can depend.

Ability to upload to multiple Azkaban instances

Update the .azkabanPlugin.json file so that it supports a list of named Azkaban instances. One of them will be the "default" instance. Currently, the file looks like:

{
    "azkabanUrl": "https://ltx1-holdemaz01.grid.linkedin.com:8443",
    "azkabanProjName": "abain-xgboost-demo",
    "azkabanZipTask": "azkabanHadoopZip",
    "azkabanValidatorAutoFix": "true",
    "azkabanUserName": "abain"
}

It should look something like:

{
  {
    // Instance that has no name is the default one
    "azkabanUrl": "https://myDefaultAzkabanInstance.linkedin.com:8443",
    "azkabanProjName": "myAzkabanProject",
    "azkabanZipTask": "azkabanHadoopZip",
    "azkabanValidatorAutoFix": "true",
    "azkabanUserName": "abain"
  },
  {
    "azkabanName": "secondInstance"   
    "azkabanUrl": "https://secondInstance.linkedin.com:8443",
    "azkabanProjName": "myOtherProject",
    "azkabanZipTask": "azkabanHadoopZip",
    "azkabanValidatorAutoFix": "true",
    "azkabanUserName": "abain"
  },
  ...
}

This enhancement needs to be done in such a way that it is backwards compatible with the existing format of the .azkabanPlugin.json file, or perhaps it notices when the file is in the old format and automatically updates it.

In the azkabanUpload task you would pass -PazkabanName=secondInstance on the command line to use the second config. In the interactive mode if the user makes any edits to the values we would ask them if they want to save the changes to the same instance, or to created a new named instance and save the changes to that instance.

Automate the creation of the .azkabanPlugin.json file

Currently, users have to create the .azkabanPlugin.json file with ".gradlew writeAzkabanPluginJson" and then update it manually before they can use the azkabanUpload task.

We should automate the creation of this file as follows. When users run the azkabanUpload task, the task should check if this file exists, and then apply the following logic:

  1. The file doesn't exist, the azkabanUpload task should ask the user for this information
  2. If the file exists, the task should use this information, but give the user the chance to confirm or change the Azkaban URL / project / user. If the user changes this information, ask them if they want to save the changes (to the .azkabanPlugin.json file).
  3. Additionally, the task should be able to take a -PskipInteractive=true command line parameter to skip asking for confirmation and ONLY read from the .azkabanPlugin.json file. When specifying this command line parameter, the task should fail if the file does not exist or is not completely filled out.

Build fails when run in LinkedIn multiproduct

When I try to build this repo inside a LinkedIn multiproduct, it fails due to

Caused by: java.lang.NoClassDefFoundError: groovy/text/SimpleTemplateEngine
        at org.codenarc.report.AbstractReportWriter.<init>(AbstractReportWriter.groovy:47)
        at org.codenarc.report.XmlReportWriter.<init>(XmlReportWriter.groovy)
        at org.codenarc.report.ReportWriterFactory.getReportWriter(ReportWriterFactory.groovy:36)
        at org.codenarc.report.ReportWriterFactory.getReportWriter(ReportWriterFactory.groovy:49)
        at org.codenarc.report.ReportWriterFactory$getReportWriter.call(Unknown Source)
        at org.codenarc.ant.CodeNarcTask.addConfiguredReport(CodeNarcTask.groovy:120)
        at org.apache.tools.ant.IntrospectionHelper$AddNestedCreator.istore(IntrospectionHelper.java:1477)
        at org.apache.tools.ant.IntrospectionHelper$AddNestedCreator.store(IntrospectionHelper.java:1471)
        at org.apache.tools.ant.IntrospectionHelper$Creator.store(IntrospectionHelper.java:1374)
        ... 61 more
Caused by: java.lang.ClassNotFoundException: groovy.text.SimpleTemplateEngine
        ... 70 more

Hadoop Validator for projects not using Hadoop DSL

Currently the Hadoop Validator can only validate pig scripts of projects using Hadoop DSL. We should support validation of any general pig script that can be run on gateway by taking appropriate parameters.

Also it's good to include validation as a part of build for all hadoop projects with Pig scripts. We might want an optional parameter to disregard it's validation in build by running ligradle build -x pig Validate

I can't find any examples

I'd love to use this plugin. I assume it can help me bundle a Hadoop application and actually run it. I can't quite figure out how to do that. How would I use this with the Hadoop WordCount2 example?

The azkabanUpload task should automatically go into edit mode if there are missing required fields

Currently the azkabanUpload task will ask you if you want to edit fields, even if some of them are blank:

Entering interactive mode. You can use the -PskipInteractive command line parameter to skip interactive mode and ONLY read from the .azkabanPlugin.json file.

Azkaban Project Name: 
Azkaban URL: https://ltx1-holdemaz01.grid.linkedin.com:8443
Azkaban User Name: abain
Azkaban Zip Task: 
> Building 93% > :xgboost-demo:azkabanUpload > Want to change any of the above? [y/N]: 

If the user is missing any of the required fields, we should just go directly into edit mode.

Leverage the Dependency Plugin in the li-hadoop-plugin subproject

@nntnag17 Reminded me that the Dependency Plugin in the hadoop-plugin subproject can emit warnings when you include certain artifacts in your Hadoop Runtime configuration, and that this is just a config change.

We should use this to warn / block users from including Hadoop artifacts in their zips. I think we should block com.linkedin.hadoop and org.apache.hadoop artifacts. We should probably just warn if they include com.linkedin.pig and com.linkedin.hive artifacts (I'm not sure if we should block or warn if they include org.apache.pig and org.apache.hive artifacts, since they should use the com.linkedin versions of these).

I'm also not sure if we should block or warn org.apache.spark artifacts - probably we want to block them.

Since this enhancement could block users if we get it wrong, we should be careful with it. We should include Jack Dintruff in the discussion and also discuss with Hadoop Dev team leads.

Hadoop DSL SparkJob should support namedAppParams construct

Currently you declare the arguments to a Hadoop DSL SparkJob using "appParams" where you just list the job arguments directly.

Another way to do this would be to support "namedAppParams" that takes a list of keys for job properties that you have already set on the job and looks up their corresponding values as the values to use for the job argument. This can thrown an error if you haven't declared the given key for the job.

Simpler Hadoop DSL multi-grid builds

I have observed that users struggle in using Hadoop DSL hadoopClosure and namespace language features to create multi-grid Hadoop DSL workflows.

I have an idea to improve this situation. Essentially, I will provide a new Hadoop DSL mechanism in which you specify multiple definition sets against which to evaluate the Hadoop DSL. The buildAzkabanFlows task will then evaluate the Hadoop DSL for the first definition set, then clear its state, re-evaluate the Hadoop DSL for the second definition set, then clear its state, etc. After each re-evaluation of the Hadoop DSL, the compiled output will be written to a unique output location.

Basically, we'll evaluate the Hadoop DSL for each definitionSet, clearing it in between, and writing the compiled output to a different location each time.

The advantage of this is that it will be incredibly simpler for users from the perspective of the mental model they need. They will just use lookupDef to lookup any values that are different for different grids. Users will not have to use hadoopClosure or namespace language constructs for multi-grid builds (although you can still use these features).

The disadvantage of this is that it will be slower as it will re-evaluate all of your Hadoop DSL for each definitionSet. In addition, you will get WARNING messages from the Hadoop DSL static checker for each re-evaluation. You will also have a lot more compiled output files (a very minor concern).

// It will look something like this. In your build.gradle you will have:
apply from: 'src/main/gradle/definitionSets.gradle'  // First, declare your definition sets

// Now customize your Hadoop DSL build
hadoopDslBuild {
  buildPath "azkaban"

  apply files: [
    'src/main/gradle/workflows1.gradle',
    'src/main/gradle/workflows2.gradle',
    'src/main/gradle/common.gradle'
  ]

  definitionSets: ['holdem', 'war']
}

// Now declare that you want to build the Hadoop DSL when you run your build
build.dependsOn buildAzkabanFlows

// When you build, you will have the following output:
//   ${projectDir}/azkaban/holdem
//   ${projectDir}/azkaban/war

// Now easily declare your Hadoop zips for each grid
hadoopZip {
  zip("azkabanHoldem") {
    from "${projectDir}/azkaban/holdem"
  }
  zip("azkabanWar") {
    from "${projectDir}/azkaban/war"
  }
}

Published plugin misses its oozie-jaxb dependency

Applying the plugin using the buildscript block as shown on https://plugins.gradle.org/plugin/com.linkedin.gradle.hadoop.HadoopPlugin fails with message like:

FAILURE: Build failed with an exception.

* What went wrong:
A problem occurred configuring root project 'mapred-samples'.
> Could not resolve all dependencies for configuration ':classpath'.
   > Could not find linkedin-gradle-plugin-for-apache-hadoop:oozie-jaxb:0.7.9.
     Searched in the following locations:
         https://plugins.gradle.org/m2/linkedin-gradle-plugin-for-apache-hadoop/oozie-jaxb/0.7.9/oozie-jaxb-0.7.9.pom
         https://plugins.gradle.org/m2/linkedin-gradle-plugin-for-apache-hadoop/oozie-jaxb/0.7.9/oozie-jaxb-0.7.9.jar
     Required by:
         :mapred-samples:unspecified > gradle.plugin.linkedin-gradle-plugin-for-apache-hadoop:hadoop-plugin:0.7.9

Hadoop DSL -> Jar compiler

This is a visionary idea that will require more technical depth and work, but it would be a pretty awesome enhancement!

Internal to LinkedIn, the Photon Plugin extends the Hadoop DSL with syntax to declare re-usable Hadoop DSL workflows. I intend to extend the Hadoop DSL language directly to encompass these features sometime in the near future.

However, that's only one side of the issue. The second issue (which the Photon Plugin does not itself solve) is how to package re-usable Hadoop DSL workflows so that other teams can instantiate them.

The way to do this to setup a new Hadoop DSL compiler. Instead of compiling to Azkaban or Oozie, the compiler will build a jar that encodes the structure of the declared Hadoop DSL in the jar! Then users can add the jar to their buildscript classpath, and there needs to be a special Hadoop DSL method that is able to read back the encoded Hadoop DSL structure from the jar.

This feature would enable various teams (like the ML-Algorithms Team or even like UMP) to declare re-usable Hadoop DSL workflows, distribute them as multiproduct artifacts, and other teams to invoke them / reuse them. This could be a giant win for LinkedIn and would be a highly-visible technical accomplishment.

This is probably a 1-2 month effort (perhaps even a quarter-long effort) for a single developer. @nntnag17 @akshayrai @pranayhasan @rajagopr keep this in mind as a potential larger and more technical task. I would agree to providing design ideas and technical feedback for this enhancement.

Implement Hadoop DSL Canonical Compiler and Azkaban Reverse Job Compiler

The idea is to implement a Hadoop DSL compiler that actually writes out formatted Hadoop DSL. This would be necessary for a Azkaban job files -> Hadoop DSL reverse compiler.

In people's Hadoop DSL files they will have all kinds of Groovy that is actually evaluated at Gradle build time. They end up with a Hadoop DSL structure in memory. The canonical compiler would spit out the Hadoop DSL structure as Hadoop DSL itself, but obviously with no Groovy logic in it.

Once the canonical compiler is in place, the task of implementing an Azkaban reverse compiler would simply consist of a processor for the Azkaban .job and .properties files that builds up the corresponding Hadoop DSL structure in memory and then just writes it out using the canonical compiler.

runSparkJob will fail if running python Spark job.

  1. In the source code, it checks for the same logic as the old required fields.
    source code
    If Hadoop dsl checker will check this, is it necessary to have SparkPlugin check it again?
  2. Also the build spark command method will not produce the right command.
    Because it's doing this
    The --class option should be neglected if appClass is null.

Enable azkabanUpload task to validate session before uploading

For an invalid session, the azkabanUpload task performs uploading the zip twice where the first POST request fails and the further request succeeds. Thus it takes twice the time necessary for uploading the zip, if the session is invalid. A quick fix is to check each time if the session is valid and then proceed with upload. @convexquad @nntnag17 @akshayrai Please let me know if there is a better way, if any.

Generate Hadoop DSL visualizations for workflow graphs

Internally at LinkedIn Luke Duncan created the JIRA LIHADOOP-20359 with the following message:

Hey Hadoop Team,

I find myself often taking screenshots of Azkaban DAGs to include them in
RBs. Reading Hadoop DSL refactorings in RB can sometimes be difficult and
the visual help. I'm pretty sure the open source version of Azkaban
includes the tool for visualizing Azkaban DAGs. I wonder if we could use it
in git-review, and automatically create a new image and include it with
every RB that affects a hadoop multi-product?

    Luke

My response was:

Luke, a really cool way to do this would be to add an implementation of the Hadoop DSL Compiler class (https://github.com/linkedin/linkedin-gradle-plugin-for-apache-hadoop/blob/master/hadoop-plugin/src/main/groovy/com/linkedin/gradle/hadoopdsl/HadoopDslCompiler.groovy) that generates a GraphViz DOT file, and then add a Gradle task to the Hadoop Plugin that runs the compiler and directly generates a PNG based on your DOT file.

I still think this would be an interesting and useful Hadoop DSL enhancement.

-Alex

Use GitHub Issues More

I'm opening a GitHub issue to use GitHub issues more. I was super glad to have met with @nntnag17 and @akshayrai during their visit to Mountain View, and I let them know that since I have changed teams, I haven't been watching (LinkedIn internal) LIHADOOP JIRA's as much these days.

Thus, I'll post all my ideas and feedback as GitHub JIRA's. @nntnag17 @akshayrai @pranayhasan @rajagopr feel free to continue to reference LIHADOOP JIRA's in your Pull Requests / internal work as you see fit, and on my side I'll try to get all my ideas posted as GitHub issues.

Make the plugin work for Gradle 5

When using Gradle 5.4.1, and the build.gradle file is e.g.:

plugins {
    id 'com.linkedin.gradle.hadoop.HadoopPlugin' version '0.13.3'
}
...

I get the following exception:

org.gradle.api.ProjectConfigurationException: A problem occurred configuring project ':workflow'.
	at org.gradle.configuration.project.LifecycleProjectEvaluator.wrapException(LifecycleProjectEvaluator.java:79)
	at org.gradle.configuration.project.LifecycleProjectEvaluator.addConfigurationFailure(LifecycleProjectEvaluator.java:72)
	at org.gradle.configuration.project.LifecycleProjectEvaluator.access$600(LifecycleProjectEvaluator.java:53)
	at org.gradle.configuration.project.LifecycleProjectEvaluator$EvaluateProject$1.run(LifecycleProjectEvaluator.java:108)
	at org.gradle.internal.Factories$1.create(Factories.java:25)
	at org.gradle.internal.work.DefaultWorkerLeaseService.withLocks(DefaultWorkerLeaseService.java:183)
	at org.gradle.internal.work.StopShieldingWorkerLeaseService.withLocks(StopShieldingWorkerLeaseService.java:40)
	at org.gradle.api.internal.project.DefaultProjectStateRegistry$ProjectStateImpl.withProjectLock(DefaultProjectStateRegistry.java:226)
	at org.gradle.api.internal.project.DefaultProjectStateRegistry$ProjectStateImpl.withMutableState(DefaultProjectStateRegistry.java:220)
	at org.gradle.api.internal.project.DefaultProjectStateRegistry$ProjectStateImpl.withMutableState(DefaultProjectStateRegistry.java:186)
	at org.gradle.configuration.project.LifecycleProjectEvaluator$EvaluateProject.run(LifecycleProjectEvaluator.java:95)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
	at org.gradle.configuration.project.LifecycleProjectEvaluator.evaluate(LifecycleProjectEvaluator.java:67)
	at org.gradle.api.internal.project.DefaultProject.evaluate(DefaultProject.java:695)
	at org.gradle.api.internal.project.DefaultProject.evaluate(DefaultProject.java:143)
	at org.gradle.execution.TaskPathProjectEvaluator.configure(TaskPathProjectEvaluator.java:35)
	at org.gradle.execution.TaskPathProjectEvaluator.configureHierarchy(TaskPathProjectEvaluator.java:62)
	at org.gradle.configuration.DefaultBuildConfigurer.configure(DefaultBuildConfigurer.java:41)
	at org.gradle.initialization.DefaultGradleLauncher$ConfigureBuild.run(DefaultGradleLauncher.java:302)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
	at org.gradle.initialization.DefaultGradleLauncher.configureBuild(DefaultGradleLauncher.java:210)
	at org.gradle.initialization.DefaultGradleLauncher.doBuildStages(DefaultGradleLauncher.java:151)
	at org.gradle.initialization.DefaultGradleLauncher.getConfiguredBuild(DefaultGradleLauncher.java:129)
	at org.gradle.internal.invocation.GradleBuildController$2.execute(GradleBuildController.java:67)
	at org.gradle.internal.invocation.GradleBuildController$2.execute(GradleBuildController.java:64)
	at org.gradle.internal.invocation.GradleBuildController$3.create(GradleBuildController.java:82)
	at org.gradle.internal.invocation.GradleBuildController$3.create(GradleBuildController.java:75)
	at org.gradle.internal.work.DefaultWorkerLeaseService.withLocks(DefaultWorkerLeaseService.java:183)
	at org.gradle.internal.work.StopShieldingWorkerLeaseService.withLocks(StopShieldingWorkerLeaseService.java:40)
	at org.gradle.internal.invocation.GradleBuildController.doBuild(GradleBuildController.java:75)
	at org.gradle.internal.invocation.GradleBuildController.configure(GradleBuildController.java:64)
	at org.gradle.tooling.internal.provider.runner.ClientProvidedBuildActionRunner.run(ClientProvidedBuildActionRunner.java:57)
	at org.gradle.launcher.exec.ChainingBuildActionRunner.run(ChainingBuildActionRunner.java:35)
	at org.gradle.launcher.exec.ChainingBuildActionRunner.run(ChainingBuildActionRunner.java:35)
	at org.gradle.launcher.exec.BuildOutcomeReportingBuildActionRunner.run(BuildOutcomeReportingBuildActionRunner.java:58)
	at org.gradle.tooling.internal.provider.ValidatingBuildActionRunner.run(ValidatingBuildActionRunner.java:32)
	at org.gradle.launcher.exec.BuildCompletionNotifyingBuildActionRunner.run(BuildCompletionNotifyingBuildActionRunner.java:39)
	at org.gradle.launcher.exec.RunAsBuildOperationBuildActionRunner$3.call(RunAsBuildOperationBuildActionRunner.java:51)
	at org.gradle.launcher.exec.RunAsBuildOperationBuildActionRunner$3.call(RunAsBuildOperationBuildActionRunner.java:45)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:102)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.call(DelegatingBuildOperationExecutor.java:36)
	at org.gradle.launcher.exec.RunAsBuildOperationBuildActionRunner.run(RunAsBuildOperationBuildActionRunner.java:45)
	at org.gradle.launcher.exec.InProcessBuildActionExecuter$1.transform(InProcessBuildActionExecuter.java:49)
	at org.gradle.launcher.exec.InProcessBuildActionExecuter$1.transform(InProcessBuildActionExecuter.java:46)
	at org.gradle.composite.internal.DefaultRootBuildState.run(DefaultRootBuildState.java:78)
	at org.gradle.launcher.exec.InProcessBuildActionExecuter.execute(InProcessBuildActionExecuter.java:46)
	at org.gradle.launcher.exec.InProcessBuildActionExecuter.execute(InProcessBuildActionExecuter.java:31)
	at org.gradle.launcher.exec.BuildTreeScopeBuildActionExecuter.execute(BuildTreeScopeBuildActionExecuter.java:42)
	at org.gradle.launcher.exec.BuildTreeScopeBuildActionExecuter.execute(BuildTreeScopeBuildActionExecuter.java:28)
	at org.gradle.tooling.internal.provider.ContinuousBuildActionExecuter.execute(ContinuousBuildActionExecuter.java:78)
	at org.gradle.tooling.internal.provider.ContinuousBuildActionExecuter.execute(ContinuousBuildActionExecuter.java:52)
	at org.gradle.tooling.internal.provider.SubscribableBuildActionExecuter.execute(SubscribableBuildActionExecuter.java:59)
	at org.gradle.tooling.internal.provider.SubscribableBuildActionExecuter.execute(SubscribableBuildActionExecuter.java:36)
	at org.gradle.tooling.internal.provider.SessionScopeBuildActionExecuter.execute(SessionScopeBuildActionExecuter.java:68)
	at org.gradle.tooling.internal.provider.SessionScopeBuildActionExecuter.execute(SessionScopeBuildActionExecuter.java:38)
	at org.gradle.tooling.internal.provider.GradleThreadBuildActionExecuter.execute(GradleThreadBuildActionExecuter.java:37)
	at org.gradle.tooling.internal.provider.GradleThreadBuildActionExecuter.execute(GradleThreadBuildActionExecuter.java:26)
	at org.gradle.tooling.internal.provider.ParallelismConfigurationBuildActionExecuter.execute(ParallelismConfigurationBuildActionExecuter.java:43)
	at org.gradle.tooling.internal.provider.ParallelismConfigurationBuildActionExecuter.execute(ParallelismConfigurationBuildActionExecuter.java:29)
	at org.gradle.tooling.internal.provider.StartParamsValidatingActionExecuter.execute(StartParamsValidatingActionExecuter.java:60)
	at org.gradle.tooling.internal.provider.StartParamsValidatingActionExecuter.execute(StartParamsValidatingActionExecuter.java:32)
	at org.gradle.tooling.internal.provider.SessionFailureReportingActionExecuter.execute(SessionFailureReportingActionExecuter.java:55)
	at org.gradle.tooling.internal.provider.SessionFailureReportingActionExecuter.execute(SessionFailureReportingActionExecuter.java:41)
	at org.gradle.tooling.internal.provider.SetupLoggingActionExecuter.execute(SetupLoggingActionExecuter.java:48)
	at org.gradle.tooling.internal.provider.SetupLoggingActionExecuter.execute(SetupLoggingActionExecuter.java:32)
	at org.gradle.launcher.daemon.server.exec.ExecuteBuild.doBuild(ExecuteBuild.java:67)
	at org.gradle.launcher.daemon.server.exec.BuildCommandOnly.execute(BuildCommandOnly.java:36)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.WatchForDisconnection.execute(WatchForDisconnection.java:37)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.ResetDeprecationLogger.execute(ResetDeprecationLogger.java:26)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.RequestStopIfSingleUsedDaemon.execute(RequestStopIfSingleUsedDaemon.java:34)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.ForwardClientInput$2.call(ForwardClientInput.java:74)
	at org.gradle.launcher.daemon.server.exec.ForwardClientInput$2.call(ForwardClientInput.java:72)
	at org.gradle.util.Swapper.swap(Swapper.java:38)
	at org.gradle.launcher.daemon.server.exec.ForwardClientInput.execute(ForwardClientInput.java:72)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.LogAndCheckHealth.execute(LogAndCheckHealth.java:55)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.LogToClient.doBuild(LogToClient.java:62)
	at org.gradle.launcher.daemon.server.exec.BuildCommandOnly.execute(BuildCommandOnly.java:36)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.EstablishBuildEnvironment.doBuild(EstablishBuildEnvironment.java:81)
	at org.gradle.launcher.daemon.server.exec.BuildCommandOnly.execute(BuildCommandOnly.java:36)
	at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:104)
	at org.gradle.launcher.daemon.server.exec.StartBuildOrRespondWithBusy$1.run(StartBuildOrRespondWithBusy.java:50)
	at org.gradle.launcher.daemon.server.DaemonStateCoordinator$1.run(DaemonStateCoordinator.java:295)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.gradle.internal.exceptions.LocationAwareException: Build file '/home/vbmacher/projects/analytics-data-platform/workflow/build.gradle' line: 2
An exception occurred applying plugin request [id: 'com.linkedin.gradle.hadoop.HadoopPlugin', version: '0.13.3']
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator.applyPlugin(DefaultPluginRequestApplicator.java:232)
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator.applyPlugins(DefaultPluginRequestApplicator.java:148)
	at org.gradle.configuration.DefaultScriptPluginFactory$ScriptPluginImpl.apply(DefaultScriptPluginFactory.java:201)
	at org.gradle.configuration.BuildOperationScriptPlugin$1$1.run(BuildOperationScriptPlugin.java:69)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
	at org.gradle.configuration.BuildOperationScriptPlugin$1.execute(BuildOperationScriptPlugin.java:66)
	at org.gradle.configuration.BuildOperationScriptPlugin$1.execute(BuildOperationScriptPlugin.java:63)
	at org.gradle.configuration.internal.DefaultUserCodeApplicationContext.apply(DefaultUserCodeApplicationContext.java:48)
	at org.gradle.configuration.BuildOperationScriptPlugin.apply(BuildOperationScriptPlugin.java:63)
	at org.gradle.configuration.project.BuildScriptProcessor$1.run(BuildScriptProcessor.java:44)
	at org.gradle.internal.Factories$1.create(Factories.java:25)
	at org.gradle.api.internal.project.DefaultProjectStateRegistry$ProjectStateImpl.withMutableState(DefaultProjectStateRegistry.java:200)
	at org.gradle.api.internal.project.DefaultProjectStateRegistry$ProjectStateImpl.withMutableState(DefaultProjectStateRegistry.java:186)
	at org.gradle.configuration.project.BuildScriptProcessor.execute(BuildScriptProcessor.java:41)
	at org.gradle.configuration.project.BuildScriptProcessor.execute(BuildScriptProcessor.java:26)
	at org.gradle.configuration.project.ConfigureActionsProjectEvaluator.evaluate(ConfigureActionsProjectEvaluator.java:34)
	at org.gradle.configuration.project.LifecycleProjectEvaluator$EvaluateProject$1.run(LifecycleProjectEvaluator.java:106)
	... 108 more
Caused by: org.gradle.api.plugins.InvalidPluginException: An exception occurred applying plugin request [id: 'com.linkedin.gradle.hadoop.HadoopPlugin', version: '0.13.3']
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator.exceptionOccurred(DefaultPluginRequestApplicator.java:247)
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator.applyPlugin(DefaultPluginRequestApplicator.java:229)
	... 130 more
Caused by: org.gradle.api.internal.plugins.PluginApplicationException: Failed to apply plugin [class 'com.linkedin.gradle.validator.hadoop.HadoopValidatorPlugin']
	at org.gradle.api.internal.plugins.DefaultPluginManager.doApply(DefaultPluginManager.java:163)
	at org.gradle.api.internal.plugins.DefaultPluginManager.addImperativePlugin(DefaultPluginManager.java:88)
	at org.gradle.api.internal.plugins.DefaultPluginManager.addImperativePlugin(DefaultPluginManager.java:94)
	at org.gradle.api.internal.plugins.DefaultPluginContainer.apply(DefaultPluginContainer.java:92)
	at org.gradle.api.plugins.PluginContainer$apply.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
	at com.linkedin.gradle.hadoop.HadoopPlugin.apply(HadoopPlugin.groovy:54)
	at com.linkedin.gradle.hadoop.HadoopPlugin.apply(HadoopPlugin.groovy)
	at org.gradle.api.internal.plugins.ImperativeOnlyPluginTarget.applyImperative(ImperativeOnlyPluginTarget.java:42)
	at org.gradle.api.internal.plugins.RuleBasedPluginTarget.applyImperative(RuleBasedPluginTarget.java:50)
	at org.gradle.api.internal.plugins.DefaultPluginManager.addPlugin(DefaultPluginManager.java:177)
	at org.gradle.api.internal.plugins.DefaultPluginManager.access$300(DefaultPluginManager.java:51)
	at org.gradle.api.internal.plugins.DefaultPluginManager$AddPluginBuildOperation.run(DefaultPluginManager.java:267)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
	at org.gradle.api.internal.plugins.DefaultPluginManager$2.execute(DefaultPluginManager.java:155)
	at org.gradle.api.internal.plugins.DefaultPluginManager$2.execute(DefaultPluginManager.java:152)
	at org.gradle.configuration.internal.DefaultUserCodeApplicationContext.apply(DefaultUserCodeApplicationContext.java:48)
	at org.gradle.api.internal.plugins.DefaultPluginManager.doApply(DefaultPluginManager.java:152)
	at org.gradle.api.internal.plugins.DefaultPluginManager.apply(DefaultPluginManager.java:133)
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator$3.run(DefaultPluginRequestApplicator.java:151)
	at org.gradle.plugin.use.internal.DefaultPluginRequestApplicator.applyPlugin(DefaultPluginRequestApplicator.java:225)
	... 130 more
Caused by: org.gradle.internal.metaobject.AbstractDynamicObject$CustomMessageMissingMethodException: Could not find method leftShift() for arguments [com.linkedin.gradle.validator.pig.PigValidatorPlugin$_createDataValidator_closure2@57d7a3b1] on task ':workflow:pigDataExists' of type com.linkedin.gradle.validator.pig.PigDataValidator.
	at org.gradle.internal.metaobject.AbstractDynamicObject.methodMissingException(AbstractDynamicObject.java:179)
	at org.gradle.internal.metaobject.AbstractDynamicObject.invokeMethod(AbstractDynamicObject.java:164)
	at com.linkedin.gradle.validator.pig.PigDataValidator_Decorated.invokeMethod(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
	at com.linkedin.gradle.validator.pig.PigValidatorPlugin.createDataValidator(PigValidatorPlugin.groovy:88)
	at com.linkedin.gradle.validator.pig.PigValidatorPlugin.apply(PigValidatorPlugin.groovy:52)
	at com.linkedin.gradle.validator.pig.PigValidatorPlugin$apply.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
	at com.linkedin.gradle.validator.hadoop.HadoopValidatorPlugin.apply(HadoopValidatorPlugin.groovy:47)
	at com.linkedin.gradle.validator.hadoop.HadoopValidatorPlugin.apply(HadoopValidatorPlugin.groovy)
	at org.gradle.api.internal.plugins.ImperativeOnlyPluginTarget.applyImperative(ImperativeOnlyPluginTarget.java:42)
	at org.gradle.api.internal.plugins.RuleBasedPluginTarget.applyImperative(RuleBasedPluginTarget.java:50)
	at org.gradle.api.internal.plugins.DefaultPluginManager.addPlugin(DefaultPluginManager.java:177)
	at org.gradle.api.internal.plugins.DefaultPluginManager.access$300(DefaultPluginManager.java:51)
	at org.gradle.api.internal.plugins.DefaultPluginManager$AddPluginBuildOperation.run(DefaultPluginManager.java:267)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
	at org.gradle.api.internal.plugins.DefaultPluginManager$2.execute(DefaultPluginManager.java:155)
	at org.gradle.api.internal.plugins.DefaultPluginManager$2.execute(DefaultPluginManager.java:152)
	at org.gradle.configuration.internal.DefaultUserCodeApplicationContext.apply(DefaultUserCodeApplicationContext.java:48)
	at org.gradle.api.internal.plugins.DefaultPluginManager.doApply(DefaultPluginManager.java:152)
	... 158 more

Convert flows to .yml along with .job for Azkaban Flow 2.0

We plan on adapting the HadoopDSL to output two YAML files - .flow and .project - instead of .job/.properties files. Azkaban will read a YAML (.flow) file for each flow as Flow 2.0 is designed and released. The .project file will be used to define Project-level properties, some of which are currently only configurable through the UI (like project permissions).

This allows for easier to define flow-level properties (such as schedules and data-availability based triggers) as well as easier comprehension when trying to understand the generated .zip (job files themselves are hard to read as a cohesive flow unit, especially in large projects).

Note that the hadoopDSL will always able to be configured to output .job files to allow for backward compatibility with old Azkaban versions.

@jamiesjc is in charge of Flow 2.0 and should feel free to add more info ๐Ÿ˜ƒ

Additional enhancements for azkabanUpload task

We have been working on improving the azkabanUpload task. Thanks to @pranayhasan for his great work so far!

In the commit 9a27f6f I noted that I would love to get the following further enhancements:

  1. Perfecting console printing of the project values and query questions
  2. Displaying the current values within the query questions
  3. The double slash (e.g. ":8443//manager") in the Azkaban project URL that prevents you from going directly to the project page
  4. Also the enhancement to show the Azkaban zips you have configured in your hadoopZip block

These would all serve to further improve the azkabanUpload task. Please go to the conversation for that commit for further details about these enhancements.

Conditional workflow feature not working while using this plugin

We are using version 0.13.3 of com.linkedin.gradle.hadoop.HadoopPlugin .

We are trying to specify conditions in a job inside workflow as follows:

addJob('javaprocess', 'jobA') {
    baseProperties 'basePropertiesName'
    set properties: [
            'type'     : 'javaprocess',
            'Name'      : 'jobA',
    ]
    depends 'dependentJobId'
    conditions 'all_done'
}

We are getting below error during build:

A problem occurred evaluating script.

Could not find method conditions() for arguments [all_done] on (Job: name = jobA) of type com.linkedin.gradle.hadoopdsl.job.Job.

Following are my queries:

  1. Does the linkedin plugin support conditional workflow parameters?
  2. Is this the correct way to specify conditions in a job and if yes, any idea why we are getting this error.

Please give me the pointers if anybody has idea about this.

Azkaban schedules and data-availability based triggers defined in HadoopDSL

The Azkaban team is working on data-availability based triggers and will be launching that feature with HadoopDSL integration.

These dependencies will be written in .flow files that were introduced in Flow 2.0 (#193). This feature will not be available for users outputting .job/.properties files.

As a side-product of data-availability based triggers, this change will also allow HadoopDSL-defined schedules to be created.

In the future, we're aiming for all jobs and their associated schedules/triggers to be defined in .flow files. This will allow versioning of schedules, which hasn't been possible in the past.

Old .job/.properties files will still be able to be generated in the future for backward compatibility with older Azkaban versions, but won't have these new features.

@chengren311 is leading the data-availability based trigger project on the Azkaban team, and he should feel free to add more if he so chooses ๐Ÿ˜ƒ

AzkabanHelper: improve GRADLE_OPTS warning for Gradle 5 compatibility

The GRADLE_OPTS warning in AzkabanHelper is not quite accurate for Gradle 5 compatibility.

For projects setting a org.gradle.jvmargs value in their gradle.properties file, the GRADLE_OPTS must match this verbatim. Otherwise Gradle spawns more than one JVM process, and the (unsupported by Gradle!) System.console interaction will fail.

I suggest adding this verbiage:

CRITICAL: If you additionally set a custom org.gradle.jvmargs in your gradle.properties file, the GRADLE_OPTS must contain the exact same string.

Display warning when the buildSourceZip zip is too large

One common issue is that people's Hadoop zips get large because their embedded sources zip (which is produced by the buildSourceZip task) includes unnecessary binary resources. One of my teammates was accidentally producing a 1 GB sources zip file because he had a binary machine learning model stored in the zip.

Users can easily fix this problem by using the writeScmPluginJson task and adding an exclude. However (even though this is documented on go/HadoopPlugin and at https://github.com/linkedin/linkedin-gradle-plugin-for-apache-hadoop/wiki), they don't realize it.

What we could do is at the end of the buildSourceZip task add a check on the final size of the zip, and if it is bigger than some arbitrary size (perhaps 20 MB is a good size), display a logger.lifecycle message telling users they can use the writeScmPluginJson task to trim their sources zip.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.