Coder Social home page Coder Social logo

pygbq's Introduction

PyGBQ

Easily integrate data in BigQuery

Example

from pygbq import Client
import requests

client = Client()
token = client.get_secret('secret_name')
headers = {'Authorization': f'Bearer {token}'}
url = ...
data = requests.get(url, headers=headers).json()
response = client.update_table_using_temp(data=data, table_id='mydataset.mytable', how=['id'])

This snippets gets some data from an url and (merges)[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement] ((upserts)[https://en.wikipedia.org/wiki/Merge_(SQL)]) on id column it to the table mytable in the dataset mydataset.

Install and set up

pip install pygbq

Set up the authentication.

How it works

how=['column1', 'column2', ...]

PyGBQ generates one or many temporary tables that are merged into the target table. During the merge all the columns of the target table are updated. Here's how it looks like:

  1. Split data into batches.
  2. For every batch create mydataset.mytable_tmp_SOMERANDOMPOSTFIX, put it inside and run
MERGE myproject.mydataset.mytable T
USING myproject.mydataset.mytable_tmp_SOMERANDOMPOSTFIX S
ON T.column1 = S.column1 AND T.column2 = S.column2
WHEN NOT MATCHED THEN
	INSERT ROW
WHEN MATCHED THEN
	UPDATE SET
column1 = S.column1,
column2 = S.column2,
column3 = S.column3,
column4 = S.column4
...

how='replace'

  1. Creates a table mydataset.mytable with schema automatically generated by bigquery-schema-generator.
  2. Splits data into batches and inserts it to mydataset.mytable.

how='fail'

Identical to how='replace' except that it fails if mydataset.mytable exists.

how='insert'

Splits data into batches and inserts (appends) it to mydataset.mytable.

For more details look at Documentation section.

Documentation

Here's the documentation with default parameters.

Client

init

from pygbq import Client
client = Client(default_dataset=None, path_to_key=None)

Initalizes a client. You can specify:

  • default_dataset - (str) default dataset that the client will be using to reference tables
  • path_to_key - (str) By default PyGQB uses from google.auth import default to get credentials, but you can specify this parameter if you wish to use from google.auth import load_credentials_from_file instead.

update_table_using_temp

client.update_table_using_temp(data, table_id, how, schema: Union[str, List[dict]] = None, expiration=1, max_insert_num_rows=4000)

Updates table.

  • data - list of dict
  • table_id - (str) Table id, could have one of the following forms:
    • table_name if default_dataset is set
    • dataset_name.table_name
    • project_id.dataset_name.table_name
  • how - (str or List[dict]) Look at How it works section
  • expiration - (float) temporary tables expiration time in hours
  • max_insert_num_rows - (int) how many rows per temporary table is inserted

get_secret

client.get_secret(self, secret_id, version="latest")

Get a secret stored in Secret Manager.

  • secret_id - (str) Secret name
  • version - Secret version

add_secret

client.get_secret(self, secret_id, version="latest")

Adds a new secret version in Secret Manager.

  • secret_id - (str) Secret name
  • data - (str) Secret value

read_jsonl

from pygbq import read_jsonl
read_jsonl(name: str = "data.jsonl")

Reads a new line delimited json.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.