update ojdbc version to

This is in response to an Oracle Bug 9577583: False ORA-942 or other errors when multiple schemas have identical object names.
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html

add support for multiple workflow XMLs in a single HDFS directory

Currently a workflow XML is the 'workflow.xml' file under the HDFS directory specified in the job property 'oozie.wf.application.path'.

This means that a given HDFS directory can have only one workflow app (the workflow.xml file).

In many cases is desirable to share configurations and binaries among multiple workflow apps.

Today this is not possible.

Proposal:

1* If 'oozie.wf.application.path' points to a HDFS directory, the workflow app is 'workflow.xml' (today's behavior)
2* If 'oozie.wf.application.path' points to an XML file in HDFS, the workflow app is the specified file path and the workflow app directory (for all resources and and binaries) is the parent directory.

This proposal preserves backwards compatibility.

XTestCase starts minicluster with hardcoded group for proxyusers

It is using "users", it should use the method getTestGroup()

when extjs is not installed the index page should show a message indicating the reason why the console is not working

Update documentation to not refer to SSH, HTTP, and Email actions

According to Alejandro, these have been deprecated.

bin/mkdistro.sh -DskipTests -dmysql -uroot -llocalhost:3306/oozie fails

when I run it, I get

mvn clean install -Dbuild.time=2010.07.16-13:47:01GMT -Dsvn.revision=GIT -Dsvn.url=git://github.com/yahoo/oozie.git -DskipTests -dmysql -uroot -llocalhost:3306/oozie
Unable to parse command line options: Unrecognized option: -dmysql

coordinator done-flag usage/behavior is not documented

Copy of oozie-default.xml in documentation should have extension .txt

The assembly file should take care of that

bin/mkdistro.sh needs to migrate from SVN to GIT

The following variables need to be redefined: VC_REV, VC_URL

leverage Hadoop rules for JT/NN Kerberos principal resolution

Currently Oozie requires the JT and NN kerberos principals to be in the WF job properties when submitting a job.

Hadoop has built in rules to create these principals (i.e. mapred/_HOST@${local.realm}).

Oozie should leverage those rules if the WF job properties do not include JT/NN kerberos principals thus not requiring them as mandatory WF job properties on WF submission.

Add jar files in <file> with smbolic link to java classpath

in workflow.xml, when use the following line:
/tmp/a.jar#a.jar

then this jar is added to distributed cache ("mapred.cache.files") only.

but this jar also needs to be in java classpath ("java.class.path").

Oozie should support Kerberos authentication on its HTTP REST API

The correct way of doing this would be using an SPNEGO filter on the server side.

Ideally authentication should be plugglable, allowing support for cookie based auth, certs, etc.

build-setup/setup-maven.sh uses bash-isms but references /bin/sh

Script defines a function and therefore fails on systems where /bin/sh is not bash.

Suggest:

--- setup-maven.sh.orig 2010-09-08 16:10:20.000000000 -0700
+++ setup-maven.sh 2010-09-07 16:14:39.000000000 -0700
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

Oozie bundle client support + servlet + bundle engine

Provide client support for Oozie bundles, provide corresponding support in servlets,
and further provide a bundle engine similar to DagEngine and CoordinatorEngine.

The parent issue of bundles is GH-49 [http://github.com/yahoo/oozie/issues/#issue/49].

references SVN in bin/mkdistro.sh

there is a 'git info' type thing here - http://justamemo.com/2009/02/09/git-info-almost-like-svn-info/ which may help

We have to start publishing maven artifacts to a public repo

Deprecated code OozieSchema.java and Schema.java can be removed

There are some deprecated code can be removed, such as OozieSchema.java and Schema.java.

release notes are not up to date since 2.2.1 tag

Oozie should use fully qualified names for the data base objects like <schema name>.<object name>

Oozie should use fully qualified names for the data base objects like .

Add Hive action

Add support for Hive actions in workflows.

This would be via a new action executor and an extension schema.

Enable build to include Hadoop JARs in oozie.war

The generated Oozie WAR does not include the Hadoop JARs.

Add a build option that would force the inclusion of the Hadoop JARs used for building Oozie.

Default behavior should be the current one (no Hadoop JARs in the WAR)

oozie client shell script has an invalid/not-needed variable

The invocation of Oozie CLI use invalid $EXECCLASS variable.

It should be removed

update/simplify examples

Current examples don't work against a cluster running Kerberos.

Also, the setup of examples (what it is done by the prepare script) is twisted and confuses users.

Oozie should not materialize a coordinator job right after its submission if the job will only run in far future.

Currently Oozie will materialize a coordinator job right after job submission, even if the job will only run in far future.

We need to modify CoordJobMatLookupCommand so that it also checks materialization's start time (set as job's start time for a newly submitted job) against current time. Only if it falls into a valid range (say, within one hour of future), we proceed with materialization.

unix users/groups for testcases should be fixed and normalized

current test users are '${user.name}, test, test2, test3' and current test group is 'testg'.

many testcases (166) fail if the test user used for oozie is not ${user.name}

for example default values for test users should be: testuser1, testuser2, testuser3, testuser4 and test group 'testgroup1' with users 2 & 3 belonging to it.

methods in the XTestCase should be renamed to be aligned with default values.

missing internal repo reference for artifacts in main POM

currently the main POM contains the internal repo reference (a local dir) for plugins only, it should be there for artifacts also.

Feature to supply a comma separated list of jars in an 'archive tag' of workflow

It will be good to have ability to supply a comma separated list of jars in an 'archive tag' instead of putting each jar in a new line. Hadoop distributed cache allows listing a comma separated list of files.

add support for new MR API

This can be done with a new MapReduceMain class.

Supporting bundle in oozie

Oozie currently has two level of abstractions:

Workflow that execute DAG of actions.
Coordinator that executes workflow periodically when the specified set of data directories are available.

This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.

******* The proposed high-level requirements to support bundle are enumerated below:

This feature will allow user to specify a list of coordinator applications in XML file format.
The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
User can also submit a bundle job through WS API.
User will be able to define variables /parameters for each coordinator application.
All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
Oozie will not support any partial bundle submission.
When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
User will be able to combine submit and start into run that will start the bundle immediately.
User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
User will be able to query Oozie for its status through CLI and WS API.
User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
User will be able to kill a bundle id that will kill all spawned coordinator jobs.
User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
User will be able to resume a bundle id that will resume all spawned coordinator jobs.
Bundle rerun requirements TBD.

This is a sample bundle XML :

<bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">

  <controls>
       <kick-off-time>2009-02-02T00:00Z</kick-off-time>
  </controls>

   <coordinator>
       <configuration>
         <property>
              <name>START_TIME</name>
              <value>2009-02-01T00:00Z</value>
          <property>
          .................
          ...............
      </configuration>
      <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
   <coordinator>

   <coordinator>
       <configuration>
         <property>
              <name>END_TIME</name>
              <value>2010-02-01T00:00Z</value>
          <property>
          .................
          ...............
      </configuration>
      <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
   <coordinator>          
</bundle-app>

Oozie jobs configuration properties with variables should be resolved to concrete values

The servlets receiving a job submission/rerun should resolve values with variables to their concrete values before proceeding with the submission.

For example:

a=A

b=${a}

Should be resolved to

a=A

b=A

actions should not be materialized after nominal time for current mode jobs

I ran a coordinator job under current mode, then check its status:

[chaow@pressglass examples]$ oozie job -info 0000007-100727102157647-oozie-chao-C

ID Created Nominal Time
...
0000007-......@2 2010-07-27 19:41 2010-07-27 19:40
...

We see that creation time is after nominal time, which is not correct.
Note here the cluster is not stressed at all - so actions should be created a bit earlier than the nominal time.

Oozie bundles commands

Create commands for bundles logic. The parent issue of bundles is GH-49[http://github.com/yahoo/oozie/issues/#issue/49].

javadoc generation does encoding warning

The core/pom.xml has

${maven.compile.encoding}

Using UTF-8 instead removes the warning

add support for a share lib directory in HDFS for action binaries.

Currently every workflow that uses a pig action must bundle the Pig JAR in the workflow lib/ directory.

This is also true for commonly use JARs files across different workflow apps.

By adding a share lib job property, which is added as a secondary lib/ directory, all commons JARs (Pig, Hive) can be added in a /usr/share/lib directory in HDFS and used by multiple workflow applications without having to have a private copy per workflow app.

The location of HDFS share lib would be specified as job property (a later rev of the workflow XML schema may add support for it too)

2.2.0 distribution download contains no WAR and won't build?!

According to the quickstart (http://yahoo.github.com/oozie/releases/2.2.0/DG_QuickStart.html), the distribution tar.gz contains an oozie.war file.

But http://github.com/yahoo/oozie/tarball/oozie-2.2.0 contains no such artifact in the tar.gz.

Worse, when I try to build with "mvn clean package" or "mvn clean package assembly:single", maven fails with:

[INFO] Building Oozie Core
[INFO] task-segment: [clean, package]
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] 'add-resource' was specified in an execution, but not found in the plugin

Has anyone else been able to use and/or build the 2.2.0 distribution?

Oozie console references the RowExpander.js file from an incorrect path.

The index.html file that launches oozie console references the RowExpander.js file from "ext-2.2/RowExpander.js". The correct location of this file however is "ext-2.2/examples/grid/RowExpander.js".

This causes the console to not show up correctly.

POMs cleanup, remove unneeded repositories, remove/exclude commons-cli 2.0

The main POM file contains references to snapshot repositories which are not needed to build Oozie.

The core and example POM have references to commons-cli 2.0 that are not needed.

The POMs of hadoop artifacts used by Oozie have references to commons-cli 2.0 which are not needed and they should be excluded.

Annotate Oozie group IDs in the POMs

Like for Oozie version occurrences, add a comment next to the groupId, ie:

<groupId>com.yahoo.oozie</groupId> <!-- OOZIE_GROUP_ID -->

This annotation enables an easy replacement via scripting of the value as it is already done with the value.

Add a .gitignore file

To make GIT ignore Maven, Eclipse, Intellij, Structures101 and other build files/dirs

parameterize hadoop/pig artifactIds in the POMs

POMs have groupId and version parameterized with a property from the main POM.

The same should be done for the artifactIds to enable use of JARs available under alternate artifact names.

As with groupId and version, the default values should remain as today.

Add support the coordiator job submitted to run in far future

The following code in PriorityDelayQueue.java class

public int compareTo(Delayed o) {
        return (int) (getDelay(TimeUnit.MILLISECONDS) - o.getDelay(TimeUnit.MILLISECONDS));
}

should be replace by something like this:

public int compareTo(Delayed o) {
        long diff = (getDelay(TimeUnit.MILLISECONDS) - o.getDelay(TimeUnit.MILLISECONDS));
        if(diff > 0) {
            return 1;
         } else if(diff < 0) {
            return -1;
         } else {
            return 0;
         } 
}

The XTestCase.getTestUser() method returns the hardcoded username test. This results in failure due to SSH authentication failure in case this user is not setup correctly.

One workaround would be to run the tests as test user and have the environment setup correctly for that. The other solution would be to by default use the current user for test purposes, and overwrite that to another user where necessary.

Add the ability to schedule jobs in the past to backfill new pipelines

Implement better way of managing test users in unit tests when setting up mini-cluster

Currently, the way minicluster is set requires test users to be exposed to the UNIX environment. This is quite inconvenient, since it requires those who would like to run unit tests to be in a position of playing a role of system administrator on a system: adding test users, adding test group and mapping test users to a test group.

According to the Hadoop development team, there's a better way of achieving the same based on the UserGroupInformation.createUserForTesting.

All we have to do is to setup the test users in XTestCase the same way we're setting up other aspects of the minicluster.

POMs in master should be bumped to 2.2.1-SNAPSHOT

coordinator rerun doesn't consider empty output-event

User experience:

If the user selects cleanup but there is no "output-event" in the cooridnator xml, the code will throw NullPointerException (NPE):

private void cleanupOutputEvents(Element eAction, String user, String group) {
        Element outputList = eAction.getChild("output-events", eAction.getNamespace());
        for (Element data : (List) outputList.getChildren("data-out", eAction.getNamespace())) {

Line 3 will throw NPE.

Solution:

if (outputList != null) {
    for (Element data : (List) outputList.getChildren("data-out", eAction.getNamespace())) {
     ......
}

Additional activities:

Add an unit test
QA needs to add one test case with this scenario.

Commons CLI2 maven artifact is commons-cli2

mvn install from http://svn.apache.org/repos/asf/commons/sandbox/cli2/trunk creates an artifact with this info:

org.apache.commons
commons-cli2
2.0-SNAPSHOT

Bad link to 2.2.0 download from quick start page

this page: http://yahoo.github.com/oozie/releases/2.2.0/DG_QuickStart.html
links to both http://yahoo.github.com/oozie/downloads (from the view of the HTML) and http://yahoo.github.com/oozie/releases/2.2.0/Http://yahoo.github.com/oozie/downloads.html as the actual link. Neither of which are valid.

addtowar.sh script should take extjs ZIP file

Currently it requires the user to manually download and expand the extjs ZIP file.

addtowar.sh It should handle also when the ZIP file is given.

yahooarchive / oozie Goto Github PK

oozie's Issues

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

This is a sample bundle XML :

b=${a}

b=A

User experience:

Additional activities:

Recommend Projects

Recommend Topics

Recommend Org