This repository contains the source code for the beeswax jdbc driver, a tool for connecting to impala using the jdbc interface.
Note that impala is an open source project by cloudera using a SQL dialect (HiveQL) to query data from a hadoop cluster.
Impala is also available on github.
For using the driver, you need to have a properly installed hadoop cluster environment, running impala on it. Especially the DNS settings have to be consistent, because hadoop uses hostnames frequently. We assume that your impala service is started and well configured. If you are able use impala with the command line interface, you'll be alright.
We built this project using our jdbc driver generation stack. To create the driver on your own, you need to install these tools in your local maven repository (mvn install). Of course they are open source software too and you can checkout them on github:
JDBC-Annotations - only needed at build time
JDBC-Utils - runtime dependency
You must build them considering the instructions on their project sites.
This project is built with maven. You should install maven on your operating system to be able to build the project. Use 'mvn -package' in the root directory of the project (the project's POM file is located there). After a successful build process it will generate a zip file in the "target" folder, containing the driver jar and the dependencies of it in a "lib" sub directory.
Note that the tests of the driver will require some data in impala. Therefore, you should skip the tests.
After building, you can extract the generated zip file in a directory of your choice. Then you can connect with an arbitrary tool (Sqirrel SQL, Execute Query, ... etc) or using the java programming language. The full qualified driver name is:
de.tiq.beeswax.jdbc.BeeswaxDriver
The connection URL for the driver looks like this:
jdbc:beeswax://host:port/
Example:
jdbc:beeswax://slave1.impala.cloudera:21000/default
Note that this driver is in beta state. Some errors might be raised during the query process.
Please report them to us using the issue feature on the github site.
This project is developed by TIQ Solutions GmbH, a german enterprise for data quality management. You can contact us: [email protected]
The project is licensed under terms of a dual treatment. For non-commercial projects, the source code is provided under terms of the GPL. If you wish to include this project in a proprietary context, you must be granted a special vendor license.