Coder Social home page Coder Social logo

jlmorton / tableau Goto Github PK

View Code? Open in Web Editor NEW
7.0 0.0 2.0 48 KB

Fast, Multi-threaded CSV to Tableau Data Extract Conversion (TDE). Thin wrapper around Tableau SDK.

License: Apache License 2.0

Shell 11.61% Java 85.66% Makefile 2.74%
tableau tableau-server csv-parsing

tableau's Introduction

Tableau SDK Wrapper

This utility wraps the Tableau SDK to parse CSV files and convert them into Tableau Data Extracts.

The Tableau Extract API is not thread-safe when inserting a row to the extract, but the work of parsing the CSV and generating a Tableau Row can be multi-threaded. This utility allows you to specify the number of threads to use when generating an extract. Inserting rows to the extract is synchronized, so there are diminishing returns to higher thread counts.

Included in the utility is a thin wrapper to publish an extract to Tableau.

The latest version is 1.2.1, which is available here.

Performance

On my small dual core Macbook Pro, I see the following performance:

 1 Thread: 28,286 rows/second
 2 Threads: 40,072 rows/second
 3 Threads: 44,624 rows/second

On my 4-core 2015 iMac, I see the following performance:

 1 Thread: 37,034 rows/s
 2 Threads: 53,494 rows/s
 3 Threads: 60,181 rows/s
 4 Threads: 66,868 rows/s

Platforms

I have tested this on CentOS 7, and OS X High Sierra. The Tableau SDK supports Fedora 18 and later, CentOS 7 and later, and Ubuntu 12.04 and later. Support for CentOS 6 was removed from the Tableau SDK in version 10.2 of the SDK.

If you encounter a problem such as java.lang.UnsatisfiedLinkError: Unable to load library TableauCommon: /lib64/libc.so.6: version 'GLIBC_2.14' not found, your OS is unfortunately not supported. Consider running in a Docker container (see below).

I have not tested this on Windows. Certainly the various shell scripts will not work, but you should be able to install the SDK for Windows and invoke Java directly. Any pull requests to add better support for Windows would be appreciated.

Docker

There are public images for this project in Docker Hub. You can simple mount a Docker volume, and invoke this utility within the container. Note that Docker volumes must be an absolute path. Assuming you've cloned this project to a folder called "tableau" in your home directory, the command below will create a Tableau Extract using a sample CSV and schema:

PATH_TO_CLONED_PROJECT="~/tableau"
docker run --rm -it -v $PATH_TO_CLONED_PROJECT:/build jlmorton/tableau-sdk-wrapper:latest /opt/tableau-sdk-wrapper/bin/extract.sh \
  -o /build/sample.tde \
  -s /build/samples/sample-schema.json \
  -f /build/samples/sample-extract.csv -t 4

This will download the latest image from Docker Hub, run a container, and attempt to build a TDE extract using the "sample.csv" and "sample.schema" within your ~/tableau folder.

Dependencies

This library uses the Tableau SDK to create and publish Tableau extracts. This SDK is not available in Central Maven repositories. The SDK license allows distribution, but I've chosen to exclude it from this repository.

Instead, there is a small shell script included in this repository, bin/install_tableau_sdk.sh. This shell script will download the SDK, extract it to the lib folder within the top-level repository directory, and then mvn install the Java dependencies to your local Maven installation.

The utility requires Java 8.

Installation

Download the current release distribution and unzip. On Linux, run the install_tableau_sdk.sh to install the Tableau SDK. Alternatively, be sure to install the Tableau SDK for your platformn manually.

Building

After installing the Tableau SDK, simply run mvn install

Schema

This utility expects a schema file which describes the data types in the CSV file. The schema is in JSON format. The utility handles strings (CHAR_STRING), booleans (BOOLEAN), dates (DATE), date & times (DATETIME), integers (INTEGER) and doubles (DOUBLE).

Here is a sample schema file:

{
  "schemaName": "Sample",
  "schema": {
    "foo": "CHAR_STRING",
    "bar": "INTEGER",
    "baz": "BOOLEAN",
    "bax": "DOUBLE",
    "test": "DATE",
    "test_time": "DATETIME"
  }
}

Usage

  usage: java -jar tableau.jar
   -a,--append             Append to existing extract
   -c,--project <arg>      Project name to publish to
   -d,--datasource <arg>   Name of datasource to publish
   -e,--extract <arg>      Filename of extract to publish
   -f,--file <arg>         CSV file to import
   -h,--help
   -n,--username <arg>     Tableau Server username for publishing
   -o,--output <arg>       Output file name, or name of existing extract in
                         append mode
   -p,--publish            Publish an extract to Tableau (requires
                         --extract, --site, --project, --datasource,
                         --username --password, and --url,
   -s,--site <arg>         Tableau site name to publish
   -t,--threads <arg>      Number of threads (default: 1)
   -u,--url <arg>          Tableau Server URL for publishing
   -x,--password <arg>     Tableau Server password for publishing`

Creating an Extract

./bin/extract.sh -o MyExtract.tde -s samples/test.schema -f samples/test.csv -t 2

Publishing an Extract

./bin/publish.sh -e MyExtract.tde -u https://my-tableau-server -n username -x password -s tableau-site-name -p project-name -d datasource-name

Note: If you require using a proxy server to publish the extracts, the Tableau SDK resepects the standard http_proxy and https_proxy environment variables to specify the proxy server. The SDK also exposes hooks to set the proxy username and password, but this wrapper does not currently implement that.

The Tableau Server user used to publish the SDK must have permission to publish a datasource.

License

This software is licensed under the Apache 2.0 license.

tableau's People

Contributors

jlmorton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tableau's Issues

publish error

I run the following command with my own tde file, tableau server address and account
./bin/publish.sh -e MyExtract.tde -u https://my-tableau-server -n username -x password -s tableau-site-name -p project-name -d datasource-name
But failed.
It says:

/opt/tableau-sdk-wrapper/bin/publish.sh: illegal option -- u
Usage: /opt/tableau-sdk-wrapper/bin/publish.sh -e <extract path> -s <site name> -d <datasource name>

Test failures and build fail

Hi,

I can't seem to build the project. Log:

[tableau-jlmorton] mvn install                                                                                                                                                                                                        master
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------< net.jlmorton:tableau >------------------------
[INFO] Building tableau 1.2.1
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ tableau ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ tableau ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ tableau ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ tableau ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ tableau ---
[INFO]
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running net.jlmorton.tableau.CsvInputSourceTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.053 s - in net.jlmorton.tableau.CsvInputSourceTest
[INFO] Running net.jlmorton.tableau.MultiThreadedExtractWriterTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.674 s <<< FAILURE! - in net.jlmorton.tableau.MultiThreadedExtractWriterTest
[ERROR] testCreateExtract(net.jlmorton.tableau.MultiThreadedExtractWriterTest)  Time elapsed: 0.674 s  <<< ERROR!
java.lang.UnsatisfiedLinkError: Unable to load library 'TableauCommon': dlopen(libTableauCommon.dylib, 9): image not found
	at net.jlmorton.tableau.MultiThreadedExtractWriterTest.testCreateExtract(MultiThreadedExtractWriterTest.java:16)

[INFO] Running net.jlmorton.tableau.RowWriterTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.001 s <<< FAILURE! - in net.jlmorton.tableau.RowWriterTest
[ERROR] testCreateRow(net.jlmorton.tableau.RowWriterTest)  Time elapsed: 0 s  <<< ERROR!
java.lang.UnsatisfiedLinkError: Unable to load library '/Library/Frameworks/TableauExtract.framework/TableauExtract': dlopen(/Library/Frameworks/TableauExtract.framework/TableauExtract, 9): image not found
	at net.jlmorton.tableau.RowWriterTest.testCreateRow(RowWriterTest.java:13)

[INFO] Running net.jlmorton.tableau.SchemaTest
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0 s <<< FAILURE! - in net.jlmorton.tableau.SchemaTest
[ERROR] testColumnOrdering(net.jlmorton.tableau.SchemaTest)  Time elapsed: 0 s  <<< ERROR!
java.lang.UnsatisfiedLinkError: Unable to load library '/Library/Frameworks/TableauExtract.framework/TableauExtract': dlopen(/Library/Frameworks/TableauExtract.framework/TableauExtract, 9): image not found
	at net.jlmorton.tableau.SchemaTest.setUp(SchemaTest.java:16)

[ERROR] testReadFromJson(net.jlmorton.tableau.SchemaTest)  Time elapsed: 0 s  <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class com.tableausoftware.extract.Extract
	at net.jlmorton.tableau.SchemaTest.setUp(SchemaTest.java:16)

[INFO] Running net.jlmorton.tableau.utilities.BooleanParserTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in net.jlmorton.tableau.utilities.BooleanParserTest
[INFO] Running net.jlmorton.tableau.utilities.DateParserTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in net.jlmorton.tableau.utilities.DateParserTest
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   MultiThreadedExtractWriterTest.testCreateExtract:16 » UnsatisfiedLink Unable t...
[ERROR]   RowWriterTest.testCreateRow:13 » UnsatisfiedLink Unable to load library '/Libr...
[ERROR]   SchemaTest.setUp:16 » UnsatisfiedLink Unable to load library '/Library/Framewo...
[ERROR]   SchemaTest.setUp:16 NoClassDefFound Could not initialize class com.tableausoft...
[INFO]
[ERROR] Tests run: 10, Failures: 0, Errors: 4, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  2.916 s
[INFO] Finished at: 2019-05-13T10:55:25+07:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on project tableau: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/dansmith/Downloads/tableau-jlmorton/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.