Coder Social home page Coder Social logo

exasol / virtual-schema-common-document-files Goto Github PK

View Code? Open in Web Editor NEW
2.0 7.0 1.0 658 KB

This Virtual Schemas allows you to access documents, stored in files, like any regular Exasol table.

License: MIT License

Java 99.97% Shell 0.03%
exasol virtual-schema json bucketfs exasol-integration

virtual-schema-common-document-files's Introduction

Virtual Schema for Files

Build Status Maven Central โ€“ Virtual Schema for document data in files

Quality Gate Status

Security Rating Reliability Rating Maintainability Rating Technical Debt

Code Smells Coverage Duplicated Lines (%) Lines of Code

Virtual Schema Common Document Files allows you to query data stored in a document file in the same way as if the data was stored in a regular Exasol database table.

This module is part of a larger project called Virtual Schemas covering document based dialects as well as JDBC based, see complete list of dialects.

Document-based virtual schemas are characterized by

  • a storage that is basically a container hosting the document files and also defining the access control and type of account needed to access the files and
  • a document type defining the format of the document containing the data.

Storage Variants

You cannot directly use this adapter. Please, use one of the dialects for specific storage variants below.

If this list does not contain your file source you can implement your own file source.

Document Types

Each storage variant can contain documents using any of the following supported document types:

You can also add support for other document types.

Integration Tests

VSDF builds and publishes a test-jar with common integration tests for document-oriented virtual schemas that can be used by any derived virtual schema. The derived virtual schema only needs to extend class com.exasol.adapter.document.files.AbstractDocumentFilesAdapterIT to inherit all common integration tests.

Performance Regression Tests

AbstractDocumentFilesAdapterIT also contains performance regression tests tagged with regression.

Changes in Performance Regression Tests

The following changes to the performance regression tests might influence comparability of test results:

  • Version 7.3.1
    • CSV tests now use all six data types (string, boolean, integer, double, date and timestamp) instead of only string. The column count is unchanged.
    • Test names in the test report changed. They now use suffix () instead of (TestInfo).

Additional Information

virtual-schema-common-document-files's People

Contributors

ckunki avatar jakobbraun avatar kaklakariada avatar morazow avatar pj-spoelders avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rohankumardubey

virtual-schema-common-document-files's Issues

Hidden columns can cause crash

Currently using Virtual Schema columns only in ORDER BY but not in the select list can cause a database crash.

Example:

SELECT ID FROM BOOKS ORDER BY PRICE;

Workaround:

Avoid having such columns only in ORDER BY by adding them to the select list in a nested statement.

SELECT ID FROM (SELECT ID, PRICE FROM BOOKS ORDER BY PRICE);

Root cause:

This is caused by SPOT-11018 (internal). Once this is fixed, activate the relevant test marked with SPOT-11018.

Dependency check fails

Error:  Failed to execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit (default-cli) on project virtual-schema-common-document-files: Detected 1 vulnerable components:
Error:    ch.qos.reload4j:reload4j:jar:1.2.18.3:compile; https://ossindex.sonatype.org/component/pkg:maven/ch.qos.reload4j/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Error:      * 1 vulnerability found (8.6); https://ossindex.sonatype.org/vulnerability/sonatype-2022-5401

Upgrade dependencies

 Error:  Failed to execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.1.0:audit (default-cli) on project virtual-schema-common-document-files: Detected 1 vulnerable components:
Error:    org.postgresql:postgresql:jar:42.2.23.jre7:test; https://ossindex.sonatype.org/component/pkg:maven/org.postgresql/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.1.1
Error:      * [CVE-2022-21724] pgjdbc is the offical PostgreSQL JDBC Driver. A security hole was found in the j... (9.8); https://ossindex.sonatype.org/vulnerability/0f319d1b-e964-4471-bded-db3aeb3c3a29?component-type=maven&component-name=org.postgresql.postgresql&utm_source=ossindex-client&utm_medium=integration&utm_content=1.1.1

Fix vulnerabilites in dependencies

[ERROR] Failed to execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit (default-cli) on project virtual-schema-common-document-files: Detected 10 vulnerable components:
[ERROR]   com.squareup.okhttp:okhttp:jar:2.7.5:compile; https://ossindex.sonatype.org/component/pkg:maven/com.squareup.okhttp/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2021-0341] CWE-295: Improper Certificate Validation (7.5); https://ossindex.sonatype.org/vulnerability/CVE-2021-0341?component-type=maven&component-name=com.squareup.okhttp%2Fokhttp&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [sonatype-2018-0035] CWE-20: Improper Input Validation (5.9); https://ossindex.sonatype.org/vulnerability/sonatype-2018-0035?component-type=maven&component-name=com.squareup.okhttp%2Fokhttp&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   io.netty:netty-common:jar:4.1.72.Final:test; https://ossindex.sonatype.org/component/pkg:maven/io.netty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2022-24823] CWE-668: Exposure of Resource to Wrong Sphere (5.5); https://ossindex.sonatype.org/vulnerability/CVE-2022-24823?component-type=maven&component-name=io.netty%2Fnetty-common&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   org.apache.hadoop:hadoop-common:jar:3.3.1:compile; https://ossindex.sonatype.org/component/pkg:maven/org.apache.hadoop/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2022-26612] CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') (9.8); https://ossindex.sonatype.org/vulnerability/CVE-2022-26612?component-type=maven&component-name=org.apache.hadoop%2Fhadoop-common&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   com.google.guava:guava:jar:31.0.1-jre:compile; https://ossindex.sonatype.org/component/pkg:maven/com.google.guava/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [sonatype-2020-0926] CWE-379: Creation of Temporary File in Directory with Incorrect Permissions (6.2); https://ossindex.sonatype.org/vulnerability/sonatype-2020-0926?component-type=maven&component-name=com.google.guava%2Fguava&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   io.netty:netty-handler:jar:4.1.72.Final:test; https://ossindex.sonatype.org/component/pkg:maven/io.netty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [sonatype-2020-0026] CWE-300: Channel Accessible by Non-Endpoint ('Man-in-the-Middle') (6.5); https://ossindex.sonatype.org/vulnerability/sonatype-2020-0026?component-type=maven&component-name=io.netty%2Fnetty-handler&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   org.apache.xmlrpc:xmlrpc-common:jar:3.1.3:test; https://ossindex.sonatype.org/component/pkg:maven/org.apache.xmlrpc/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2016-5003] CWE-502: Deserialization of Untrusted Data (9.8); https://ossindex.sonatype.org/vulnerability/CVE-2016-5003?component-type=maven&component-name=org.apache.xmlrpc%2Fxmlrpc-common&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2016-5002] CWE-611: Improper Restriction of XML External Entity Reference ('XXE') (7.8); https://ossindex.sonatype.org/vulnerability/CVE-2016-5002?component-type=maven&component-name=org.apache.xmlrpc%2Fxmlrpc-common&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   com.google.protobuf:protobuf-java:jar:2.5.0:compile; https://ossindex.sonatype.org/component/pkg:maven/com.google.protobuf/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2021-22569] CWE-400: Uncontrolled Resource Consumption ('Resource Exhaustion') (5.5); https://ossindex.sonatype.org/vulnerability/CVE-2021-22569?component-type=maven&component-name=com.google.protobuf%2Fprotobuf-java&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   commons-codec:commons-codec:jar:1.11:compile; https://ossindex.sonatype.org/component/pkg:maven/commons-codec/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.1.1
[ERROR]     * [sonatype-2012-0050] CWE-20: Improper Input Validation (5.3); https://ossindex.sonatype.org/vulnerability/sonatype-2012-0050?component-type=maven&component-name=commons-codec%2Fcommons-codec&utm_source=ossindex-client&utm_medium=integration&utm_content=1.1.1
[ERROR]   org.apache.xmlrpc:xmlrpc-client:jar:3.1.3:test; https://ossindex.sonatype.org/component/pkg:maven/org.apache.xmlrpc/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [CVE-2016-5004] CWE-400: Uncontrolled Resource Consumption ('Resource Exhaustion') (6.5); https://ossindex.sonatype.org/vulnerability/CVE-2016-5004?component-type=maven&component-name=org.apache.xmlrpc%2Fxmlrpc-client&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]   com.google.code.gson:gson:jar:2.2.4:compile; https://ossindex.sonatype.org/component/pkg:maven/com.google.code.gson/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]     * [sonatype-2021-1694] CWE-502: Deserialization of Untrusted Data (7.5); https://ossindex.sonatype.org/vulnerability/sonatype-2021-1694?component-type=maven&component-name=com.google.code.gson%2Fgson&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1

Virtual Schema uses too many workers for small amount of files

If only a few files e.g: 2 are loaded the virtual schema still uses all available workers. Starting them is a huge overhead and leads to a bad performance.

Workarround

set the property MAX_PARALLEL_UDFS = 1.
This wil force the virtual-schema to only use one worker.

Example:

CREATE VIRTUAL SCHEMA FILES_VS_TEST USING ADAPTER.S3_FILES_ADAPTER WITH
    CONNECTION_NAME = 'S3_CONNECTION'
    MAPPING         = '/bfsdefault/default/path/to/mappings/in/bucketfs'
   MAX_PARALLEL_UDFS = 1;

Tests interfer

Currently the tests influence each other since the files from BucketFS are not cleaned before each test run.

Solution:

Add timestamp prefix to test files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.