Coder Social home page Coder Social logo

yu-iskw / bigquery-to-datastore Goto Github PK

View Code? Open in Web Editor NEW
58.0 4.0 18.0 96 KB

Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow

Makefile 2.95% Shell 3.77% Java 92.81% Dockerfile 0.48%
google-datastore bigquery google-cloud google-dataflow apache-beam beam

bigquery-to-datastore's People

Contributors

jontradesy avatar tadeegan avatar yu-iskw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bigquery-to-datastore's Issues

how do i auth against my gc?

I do have a GC login and am also logged in via the CLI. How do I tell either the JAR or the docker image to pick up my credentials?

Install it with brew tap

It would be nice like that.

brew tap yu-iskw/bigquery-to-datastore
brew install bigquery-to-datastore

Add flag for indexing

What is the reason for setting setExcludedFromIndexes to true? Ideally this would be an additional flag when running the main shell script

Can specify indexed columns

Overview

I don't make any index of any values at version 0.2. Sometimes, I guess users want to do indexing specific columns.

Command Line Options Spec

java -cp ...bigquery-to-datastore.jar
  ...
  --indexedColumns="age,name"
  ...

Import failing

Hey Yu,

Really great you put this together. I an finally getting successful builds however I am not seeing any data appear in my datastore. Is there something I am doing wrong?

Output is:

[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building bigquery-to-datastore 0.2
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ bigquery-to-datastore ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /Users/cwilliams/Dropbox/Development/DevOps/Google/interview/bestbuy/bigquery-to-datastore/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ bigquery-to-datastore ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ bigquery-to-datastore ---
Nov 12, 2017 5:08:37 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Nov 12, 2017 5:08:37 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 106 files. Enable logging at DEBUG level to see which files will be staged.
Nov 12, 2017 5:08:37 PM org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Read validate
INFO: Project of TableReference not set. The value of BigQueryOptions.getProject() at execution time will be used.
Nov 12, 2017 5:08:37 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Nov 12, 2017 5:08:37 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Uploading 106 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Staging files complete: 106 files cached, 0 files newly uploaded
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/Read(BigQueryTableSource) as step s1
Nov 12, 2017 5:08:40 PM org.apache.beam.sdk.io.gcp.bigquery.BigQueryTableSource setDefaultProjectIfAbsent
INFO: Project ID not set in TableReference. Using default project from BigQueryOptions.
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/PassThroughThenCleanup/ParMultiDo(Identity) as step s2
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/PassThroughThenCleanup/View.AsIterable/View.CreatePCollectionView/ParDo(ToIsmRecordForGlobalWindow) as step s3
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/PassThroughThenCleanup/View.AsIterable/View.CreatePCollectionView/CreateDataflowView as step s4
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/PassThroughThenCleanup/Create(CleanupOperation)/Read(CreateSource) as step s5
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding BigQueryIO.Read/PassThroughThenCleanup/Cleanup as step s6
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding ParDo(TableRow2Entity) as step s7
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding DatastoreV1.Write/Convert to Mutation/Map as step s8
Nov 12, 2017 5:08:40 PM org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding DatastoreV1.Write/Write Mutation to Datastore as step s9
Dataflow SDK version: 2.1.0
Nov 12, 2017 5:08:42 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/bestbuy-185314/dataflow/job/2017-11-12_08_08_41-5441556467331747849
Submitted job: 2017-11-12_08_08_41-5441556467331747849
Nov 12, 2017 5:08:42 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:

gcloud beta dataflow jobs --project=bestbuy-185314 cancel 2017-11-12_08_08_41-5441556467331747849
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.604 s
[INFO] Finished at: 2017-11-12T17:08:42+01:00
[INFO] Final Memory: 34M/113M
[INFO] ------------------------------------------------------------------------

Any ideas?

Best
Chris

Timestamp Issue

Having issue importing a timestamp back into datastore.

: com.google.datastore.v1.client.DatastoreException: Invalid PROTO payload received. Timestamp seconds exceeds limit for field: timestampValue, code=INVALID_ARGUMENT
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126)
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:169)
at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:89)
at com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:1288)
at org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:1260)

Make a docker image

It would be nice to offer this tool by docker.

docker run yuiskw/bigquery-to-datastore \
  --project=your-gcp-project \
  --runner=DataflowRunner \
  --inputBigQueryDataset=test_dataset \
  --inputBigQueryTable=test_table \
  --outputDatastoreNamespace=test_namespace \
  --outputDatastoreKind=TestKind \
  --parentPaths=Parent1:p1,Parent2:p2 \
  --keyColumn=id \
  --indexedColumns=col1,col2,col3 \
  --tempLocation=gs://test_bucket/test-log/ \
  --gcpTempLocation=gs://test_bucket/test-log/

attempting to try this

Maybe I am missing something, but when I try to run the job I'm getting this error:

(dfb1d562509e1bce): java.lang.NullPointerException
at com.github.yuiskw.beam.TableRow2EntityFn.convertTableRowToEntity(TableRow2EntityFn.java:149)
at com.github.yuiskw.beam.TableRow2EntityFn.processElement(TableRow2EntityFn.java:55)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.