Coder Social home page Coder Social logo

liquibase-impala's Introduction

Table of contents

  1. About liquibase-impala
  2. Notes on compatibility
  3. How to install
  4. How to use
  5. How to test locally

About liquibase-impala

Liquibase-impala is a Liquibase extension, which adds support for Impala and Hive.

Notes on compatibility

As of version 1.1.x the plugin was tested and should work with the following versions of external dependencies:

Dependency Versions
Liquibase 3.5.2, 3.5.3
Impala JDBC driver Cloudera Impala JDBC 2.5.32
Hive JDBC driver Cloudera Impala Hive 2.5.18

Version 1.2.x

Dependency Versions
Liquibase 3.5.2
Impala JDBC driver Cloudera Impala JDBC 2.6.4
Hive JDBC driver Cloudera Impala Hive 2.6.2

Other configurations are likely to work too so you are encouraged to test with your versions. Let us know when you do!

How to install

version 1.1.x

As of version 1.1.x liquibase-impala depends on proprietary Cloudera connectors for Impala and Hive. These are not present in any public Maven repositories. Therefore, to build and install the plugin, you must do the following:

  1. Download Impala JDBC driver and its dependencies from http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-32.html
  2. Download Hive JDBC driver from http://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-18.html
  3. Unpack and install the following dependencies in your local Maven repository, using standard Maven command: mvn install:install-file -Dfile=${file} -DgroupId=${groupId} -DartifactId=${artifactId} -Dversion=${version} -Dpackaging=jar
file groupId artifactId version
ql.jar com.cloudera.impala.jdbc ql 2.5.32
hive_metastore.jar com.cloudera.impala.jdbc hive_metastore 2.5.32
hive_service.jar com.cloudera.impala.jdbc hive_metastore 2.5.32
ImpalaJDBC41.jar com.cloudera.impala.jdbc ImpalaJDBC41.jar 2.5.32
TCLIServiceClient.jar com.cloudera.impala.jdbc TCLIServiceClient.jar 2.5.32
HiveJDBC41.jar com.cloudera.hive.jdbc HiveJDBC41.jar 2.5.18
  1. (optional, but recommended) Deploy the above artifacts to an internal, private Maven repository such as Nexus or Artifactory, for subsequent use.
  2. Build liquibase-impala by executing mvn clean install. This will install liquibase-impala in your local Maven repo and create a liquibase-impala.jar fat-jar in the target/ directory.
  3. (optional, but recommended) Deploy liquibase-impala to your internal, private Maven repository.

version 1.2.x

  1. Download Impala JDBC driver and its dependencies from https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-4.html
  2. Download Hive JDBC driver from https://www.cloudera.com/downloads/connectors/hive/jdbc/2-6-2.html
  3. Unpack and install the following dependencies in your local Maven repository, using standard Maven command: mvn install:install-file -Dfile=${file} -DgroupId=${groupId} -DartifactId=${artifactId} -Dversion=${version} -Dpackaging=jar
file groupId artifactId version
ImpalaJDBC41.jar com.cloudera.impala.jdbc ImpalaJDBC41 2.6.4
HiveJDBC41.jar com.cloudera.hive.jdbc HiveJDBC41 2.6.2
  1. (optional, but recommended) Deploy the above artifacts to an internal, private Maven repository such as Nexus or Artifactory, for subsequent use.
  2. Build liquibase-impala by executing mvn clean install. This will install liquibase-impala in your local Maven repo and create a liquibase-impala.jar fat-jar in the target/ directory.
  3. (optional, but recommended) Deploy liquibase-impala to your internal, private Maven repository.

How to use

There are two distinct ways liquibase-impala can be used to manage your Impala or Hive database.

with a Maven plugin

To use liquibase-impala in concert with liquibase-maven-plugin:

  1. Make sure liquibase-impala is present in your local or remote (internal) Maven repo.
  2. Add the following to your pom.xml file:
<build>
  <plugins>
    <!-- (...) -->
    <plugin>
      <groupId>org.liquibase</groupId>
      <artifactId>liquibase-maven-plugin</artifactId>
      <version>${liquibase.version}</version>
      <dependencies>
        <!-- (...) -->
        <dependency>
          <groupId>org.liquibase.ext.impala</groupId>
          <artifactId>liquibase-impala</artifactId>
          <version>${liquibase.impala.version}</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>
</build>
  1. Run Liquibase as you normally would using Maven plugin, for example:
mvn liquibase:update \
  -Dliquibase.changeLogFile=changelog/changelog.xml \
  -Dliquibase.driver=com.cloudera.hive.jdbc41.HS2Driver \
  -Dliquibase.username=<user>
  -Dliquibase.password=<password>
  -Dliquibase.url=jdbc:hive2://<host>:<port>/<database>;UID=<user>;UseNativeQuery=1

with a standalone liquibase binary

  1. Make sure that liquibase is on your $PATH
  2. Modify liquibase.properties according to your Impala/Hive endpoint
  3. Put liquibase-impala fat-jar on your classpath, f.e. under the ${LIQUIBASE_HOME}/lib
  4. Start migration, f.e.: liquibase update

Liquibase-impala specific configuration

Liquibase-impala provides a number of additional configuration parameters that can be used to influence its behaviour:

parameter values description
liquibase.lock true (default), false enables/disables locking facility for a given job
liquibase.syncDDL true (default), false wraps every statement with SYNC_DDL

How to test locally

Script examples/run.sh performs basic integration testing of Impala and Hive, which includes:

  • update execution
  • tag execution
  • rollback execution

The script can be executed with the command ./run.sh <both|hive|impala> PATH_TO_LIQUIBASE_HOME

liquibase-impala's People

Contributors

dependabot[bot] avatar eselyavka avatar turu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

liquibase-impala's Issues

Update error - Impala

Hey,

I'm trying to run the update command on a liquibase 4.3.2 version using the impala driver created by maven command, but gets on the same error:

Unexpected error running Liquibase: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Syntax error in line 1:
... VARCHAR(255), CONSTRAINT PK_DATABASECHANGELOGLOCK PRI...

There is any way to alter the databasechangeloglock configuration?

Tag command doesn't work properly

I've built the latest liquibase-impala 1.2-SNAPSHOT and tried commands from run.sh on Hive.
Update and rollback works fine, but tag command has a problem: no errors during execution, but it doesn't add a new row to databasechangelog table.
Could you test the tag command with Hive driver?

COMPILATION ERROR : error using latest Java and maven

Getting COMPILATION ERROR : error using latest Java and maven

$ mvn clean install
[INFO] Scanning for projects...
[INFO]
[INFO] -------------< org.liquibase.ext.impala:liquibase-impala >--------------
[INFO] Building liquibase-impala 1.3.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ liquibase-impala ---
[INFO] Deleting C:\Users*\Downloads\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ liquibase-impala ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Users*\Downloads\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.0:compile (default-compile) @ liquibase-impala ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 51 source files to C:\Users*
*\Downloads\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\liquibase-impala-fix-support_for_apache_hive_jdbc_driver\target\classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] Source option 6 is no longer supported. Use 7 or later.
[ERROR] Target option 6 is no longer supported. Use 7 or later.
[INFO] 2 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.033 s
[INFO] Finished at: 2021-06-16T13:40:33-04:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project liquibase-impala: Compilation failure: Compilation failure:
[ERROR] Source option 6 is no longer supported. Use 7 or later.
[ERROR] Target option 6 is no longer supported. Use 7 or later.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

3 MINGW64 ~/Downloads/liquibase-impala-fix-support_for_apache_hive_jdbc_driver/liquibase-impala-fix-support_for_apache_hive_jdbc_driver

Make non-public Cloudera connector dependencies optional

When using liquibase on Cloudera cluster via a binary release, it's convenient to have a liquibase-impala fat-jar with all necessary (non-public) Hive&Impala dependencies bundled inside. However by setting strict compile dependency on those libraries, we are preventing ourselves from being able to:

  1. Release liquibase-impala to a public Maven repository, so that people running liquibase via Maven plugin can start using liquibase-impala with no extra-installation steps at all (provided that they already have Cloudera connectors or are planning to use other drivers instead: see #5 ).
  2. Make a binary release on Github, so that people running liquibase via a binary can start using liquibase-impala with no installation steps at all (provided that they already have Cloudera connectors or are planning to use other drivers instead: see #5 ) - other than downloading a jar from Github that is.
  3. Allow people to use liquibase-impala with different versions of Hive/Impala drivers than those declared in liquibase-impala pom.

To alleviate this situation, we could make non-public (or all non-essential) Cloudera dependencies optional and place them in a Maven profile (see https://cwiki.apache.org/confluence/display/MAVENOLD/Profiles+for+optional+dependencies). That way, all of the below scenarios would be easily achievable:

  1. Creating a fat-jar with all present Cloudera dependencies bundled inside - just like now.
  2. Releasing liquibase-impala to a public Maven repo, without hardcoded dependencies on any particular drivers, so that people using liquibase via Maven plugin could use any driver in any version they like.
  3. Creating a binary release on Github, without any driver dependencies bundled inside, so that people using a binary release of Liquibase can use any driver in any version they like.

Add support for Apache Hive(2) Driver

Currently, both HiveDatabase and ImpalaDatabase are coupled to Cloudera-specific JDBC connectors. However, there is a large number of use-cases and people who use the Apache Hive2 JDBC driver. We should make it possible for them to use liquibase-impala. There should be nothing preventing it from working with the Apache driver.

As a user I would like to be able to set org.apache.hive.jdbc.HiveDriver as my liquibase driver. Also, this driver should not be bundled in liquibase-impala, since it's a publicly available library present in all major Maven repos.

Updates / Deletes not supported in HIVE

Hi,

I want to give a try to this plugin but the problem is that updates and deletes are not supported in hive.

As I understand, is not possible to enable the acid transactions through the jdbc driver.

Even though this could maybe managed on instance level, the liquibase technical tables must be precreated with buckets, stored as orc and with transactional=true.

Is there a way to overcome the below error error?

DEBUG 9/24/19 4:56 PM: liquibase: Executing EXECUTE database command: DELETE FROM DATABASECHANGELOGLOCK
Unexpected error running Liquibase: [Cloudera]HiveJDBCDriver ERROR processing query/statement. Error Code: 10294, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.:17:16, org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400, org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:187, org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:271, org.apache.hive.service.cli.operation.Operation:run:Operation.java:337, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:439, org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:416, org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:282, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:501, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313, org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298, org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:747, org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286, java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149, java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624, java.lang.Thread:run:Thread.java:748, *org.apache.hadoop.hive.ql.parse.SemanticException:Attempt to do update or delete using transaction manager that does not support these operations.:21:5, org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer:analyzeInternal:UpdateDeleteSemanticAnalyzer.java:65, org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer:analyze:BaseSemanticAnalyzer.java:223, org.apache.hadoop.hive.ql.Driver:compile:Driver.java:558, org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1356, org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1343, org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:185], sqlState:42000, errorCode:10294, errorMessage:Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.), Query: DELETE FROM lrp.databasechangeloglock. [Failed SQL: DELETE FROM LRP.DATABASECHANGELOGLOCK]

Usage with a Kerberized Hive

Hi,

i got this plugin working in a non kerberized environment, but i'm getting login problems with a kerberized one, in particular "Unexpected error running Liquibase: java.sql.SQLException: [Cloudera]HiveJDBCDriver Error initialized or created transport for authentication: Peer indicated failure: Unsupported mechanism type PLAIN." This happens after a kinit too.

My liquibase.properties is as follow:
changeLogFile: /path/db.changelog.xml
driver: com.cloudera.hive.jdbc41.HS2Driver
url: jdbc:hive2://HOSTNAME:10000/DATABASE;principal=hive/HOSTNAME@DOMAIN

What i'm doing wrong? is Kerberos supported?

Thank you in advance for the answer.

Error truncating DATABASECHANGELOGLOCK table in Hive, when running as a maven plugin

Hi,

First of all I would like to say well done to you for creating this extension - it's a great step forward in automating db management in Hadoop warehousing world.

I've been experimenting a bit with liquibase-impala on our testing cluster and I bumped into a following issue:

When running liquibase for Hive as a maven plugin - via mvn liquibase:do_sth (which may or may not have something to do with my issue), liquibase fails to execute liquibase:update goal with locking turned on, due to issuing a malformed TRUNCATE sql statement on databasechangeloglock table. TABLE keyword is missing after TRUNCATE. The following message appears:
[Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: ParseException line 1:9 missing TABLE at 'DATABASECHANGELOGLOCK' near '<EOF>':17:16,

This is due to the fact that TruncateGenerator is selected instead of HiveTruncateGenerator, despite both of them supporting that particular call.

Closer inspection reveals that indeed the TABLE keyword is missing from TruncateGenerator. See

String sql = "TRUNCATE " + database.escapeTableName(statement.getCatalogName(), statement.getSchemaName(), statement.getTableName());

A quick fix could be to add the missing keyword to TruncateGenerator. It shouldn't break anything since the extended syntax of 'TRUNCATE TABLE' is supported by both Hive and Impala. However, perhaps the real fix would be to adjust priorities of these generators or even remove HiveTruncateGenerator altogether after applying that SQL syntax fix? I can submit a pr, when we agree on the right way to resolve this issue.

What do you think @eselyavka ?

Make a public, non-snapshot release of liquibase-impala

As a user, I would like to be able to:

  1. Declare dependency in pom of my project, on a non-snapshot version of liquibase-impala, accessible in a public Maven repo without any installation steps required - assuming that I also separately declare dependency on any Hive/Impala driver I like.
  2. Download a binary release of liquibase-impala from Github and start using it without any additional steps required - assuming that I also separately put any Hive/Impala driver I like on the classpath.

Depends on: #2, #3, #5, #6

Truncate table DATABASECHANGELOGLOCK not working

Hey Guys,

I tried to use this plugin to connect to AWS Glue metastore and applied change datasets. When I run the liquibase update command, it creates DATABASECHANGELOGLOCK tables successfully, but when it tried to truncate the table, it failed.

Can anyone please help me out what can be the issue here?

Logs:
Screenshot 2022-05-26 at 3 32 39 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.