Coder Social home page Coder Social logo

embulk-output-parquet's People

Contributors

choplin avatar cosmo0920 avatar fs-wu avatar yuokada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

embulk-output-parquet's Issues

access error when AWS_SECRET_ACCESS_KEY include "/"

It will be failed when use url format such as s3a://AWS_ACCESS_KEY_ID@AWS_SECRET_ACCESS_KEY:bucket/keys
and the AWS_SECRET_ACCESS_KEY include the character "/".
Is it possible to add params such like S3 file output plugin with parameters "access_key_id","secret_access_key"?
or we wait for hadoop 2.8's launch?

extra_configurations for 'hadoop.security.authentication' not works

I try to import data and export to hdfs by using embulk-output-parquet library.
But when I run embulk by below settings, "Error: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] " exception occurred.

in:
  type: mysql
  host: 192.168.65.2
  port: 3306
  user: user
  password: "pwd"
  database: testdb
  table: testtable
  options: { useLegacyDatetimeCode: true }
out:
  type: parquet
  path_prefix: hdfs://{url}/tmp/embulk
  extra_configurations:
    hadoop.security.authentication: 'KERBEROS'
  config_files:
    - {HADOOP_CONFIG_HOME}/hdfs-site.xml
    - {HADOOP_CONFIG_HOME}/core-site.xml

kerberos setting is ok all, and I inserted code for checking 'hadoop.security.authentication' is really set by 'SIMPLE'. But 'KEBEROS' value was overwrited correctly. I guess even though set by any string, it is overwrited by 'SIMPLE' somewhere.

os : hadoop
embulk : v0.97
haddop : hadoop-2.6.0-cdh5.14.2

related issue I found : civitaspo/embulk-output-hdfs#17

Can't gcs?

Looking at the docs, it seems like there is no topic about GCS.
Is it not compatible with GCS?

--log-level is not respected

If I run:

embulk run --log-level error my_config.yml

I still get info and warning messages printing. I expect to only see error messages.

Thanks.

Encoded Data as Values

Hi, I'm trying to Parquet into S3 via your plugin. It looks like the values in the data are somehow being encoded in a way that seems to make the Parquet file unusable with AWS Glue / Athena (Presto). When I use the stdout plugin, the data looks correct.

When I try to open the Parquet file with Sublime (which renders it like JSON), I see data that looks like the following:

{"event_id":"MjI0Yzg4ODc0ZjEzYWJjM2Q4OGI3M2NiYWE5NTcwODQ=","event_timestamp":"MjAxOC0wOS0wNCAxNTozMjoxOC4wMDEwMDAgKzAwMDA="}
{"event_id":"ZjQzNmQxMmNkNmFlNGM5ZmJkMTc3OTExOTJmZGY2MmY=","event_timestamp":"MjAxOC0wOS0wNCAxNTozMjoxNi4xNzIwMDAgKzAwMDA="}
…

Here's the relevant parts of the Embulk file:

in:
  type: command
  command: lib/splunk export …
      
  parser:
    type: jsonl
    columns:
      - {name: "event_id", type: string}
      - {name: "event_timestamp", type: timestamp, format: "%Q"}
      
out:
  type: parquet
  path_prefix: s3a://…
  extra_configurations:
    fs.s3a.access.key: 
    fs.s3a.secret.key: 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.