exasol / athena-virtual-schema Goto Github PK
View Code? Open in Web Editor NEWVirtual Schema for connecting Athena as a data source to Exasol
License: MIT License
Virtual Schema for connecting Athena as a data source to Exasol
License: MIT License
We have added a few new functions to the common part recently.
We need to check if some dialects could support them. The list of the new function capabilities to check:
Virtual-schema-shared-integration-tests provide generic integration tests.
Refactor this repository to use that library.
It seems that meanwhile it is mandatory to define a WorkGroup. In my case, queries wouldn't run without this parameter.
Update dependencies to use enhanced Datatype Detection For Result Sets from virtual-schemas-common-jdbc
Hi, I can create a virtual schema, but I cannot query it.
CREATE VIRTUAL SCHEMA VS_ufos USING ADAPTER.ATHENA_JDBC_ADAPTER WITH SQL_DIALECT = 'ATHENA' CONNECTION_NAME = 'FS_UFOS' SCHEMA_NAME = 'fs-eu-west-1';
Error:
An error occurred while performing the operation: VM error: F-UDF-CL-LIB-1126: F-UDF-CL-SL-JAVA-1006: F-UDF-CL-SL-JAVA-1026: com.exasol.ExaUDFException: F-UDF-CL-SL-JAVA-1068: Exception during singleCall adapterCall java.lang.AssertionError: E-ID-2: Unable to create identifier "fs-eu-west-1" because it contains illegal characters. For information about valid identifiers, please refer to https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html com.exasol.adapter.dialects.athena.AthenaIdentifier.of(AthenaIdentifier.java:49) com.exasol.adapter.dialects.athena.AthenaSqlDialect.applyQuote(AthenaSqlDialect.java:112) com.exasol.adapter.dialects.SqlGenerationVisitor.visit(SqlGenerationVisitor.java:184) com.exasol.adapter.dialects.SqlGenerationVisitor.visit(SqlGenerationVisitor.java:30) com.exasol.adapter.sql.SqlTable.accept(SqlTable.java:46) com.exasol.adapter.dialects.SqlGenerationVisitor.visit(SqlGenerationVisitor.java:78) com.exasol.adapter.dialects.SqlGenerationVisitor.visit(SqlGenerationVisitor.java:30) com.exasol.adapter.sql.SqlStatementSelect.accept(SqlStatementSelect.java:104) com.exasol.adapter.dialects.AbstractQueryRewriter.createPushdownQuery(AbstractQueryRewriter.java:62) com.exasol.adapter.dialects.AbstractQueryRewriter.rewrite(AbstractQueryRewriter.java:48) com.exasol.adapter.dialects.AbstractSqlDialect.rewriteQuery(AbstractSqlDialect.java:134) com.exasol.adapter.jdbc.JdbcAdapter.pushdown(JdbcAdapter.java:210) com.exasol.adapter.RequestDispatcher.dispatchPushDownRequestToAdapter(RequestDispatcher.java:142) com.exasol.adapter.RequestDispatcher.processRequest(RequestDispatcher.java:80) com.exasol.adapter.RequestDispatcher.executeAdapterCall(RequestDispatcher.java:52) com.exasol.adapter.RequestDispatcher.adapterCall(RequestDispatcher.java:41) com.exasol.ExaWrapper.runSingleCall(ExaWrapper.java:100) (Session: 1703537717580726272)
It is probably due to the minus characters in the schema_name. I think the adapter would have to put it in backticks as described here:
https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
Thank you for investigating.
The latest version of the JDBC driver requires the outbound traffic for port 444 to be enabled in Exasol.
Way to test it:
The users have problems because of the default length of the String data type reported by the JDBC driver (255).
We map the columns according to the information we get from the JDBC driver.
So in the case of Athena String data type, the driver reports the length 255 by default:
The driver also provides a way to change this value. You need to specify the argument in the connection string. For example, I want to set my String column's length to 1000. This is my connection:
CREATE OR REPLACE CONNECTION ATHENA_CONNECTION
TO 'jdbc:awsathena://AwsRegion=eu-west-1;S3OutputLocation=s3://virtual-schemas-test-bucket-2/test/sampledb;StringColumnLength=1000'
USER 'user'
IDENTIFIED BY 'pass';
When I use this connection, all String columns in Virtual Schema have size 1000 instead of the default 255.
I found out that the current Athena JDBC driver does not have the version number in the name (as suggested in the user guide):
https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
Also, maybe you could add a section that describes how to connect if you need a VPC endpoint for Athena AND Glue (these are two different endpoints!). The current example only works when no endpoint is needed.
Or just link this article:
https://aws.amazon.com/premiumsupport/knowledge-center/athena-connection-timeout-jdbc-odbc-driver/
Of course it is all fully documented here, also:
https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC-2.0.23.1000/docs/Simba+Athena+JDBC+Connector+Install+and+Configuration+Guide.pdf
Here an example that I used to make it work for my case:
CREATE OR REPLACE CONNECTION FS_UFOS
TO 'jdbc:awsathena://vpce-06aabbxx-iook4aaao-eu-west-1b.athena.eu-west-1.vpce.amazonaws.com:443;S3OutputLocation=s3://fstestbla1/athena_temp;WorkGroup=primary'
USER 'AKIAKAAKI'
IDENTIFIED BY 'passYpassXpassZ';
For tables contained columns starting with underscore VSATHENA added backticks around the column name.
This leads to an error on Athena side.
However, the query runs fine when adding double quotes instead.
I assume AthenaIdentifier.java, line 24 is the corresponding line in the code.
To my understanding it should always be doublequotes, backticks are only used for CREATE TABLE statements in Athena.
SQL_DIALECT = 'ATHENA' is given in the example.
This throws an error. Please adapt the example and delete this property definition, then it works ;-)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.