Coder Social home page Coder Social logo

gcp-bigquery-client's People

Contributors

alu avatar andyquinterom avatar burmecia avatar dependabot[bot] avatar enricozb avatar henriiik avatar jameshinshelwood avatar jeffreybolle avatar jichaos avatar kiibo382 avatar komi1230 avatar lawngnome avatar lee-hen avatar lihram avatar lquerel avatar marcoieni avatar mathiaskindberg avatar mchlgibs avatar mikhailms avatar newapplesho avatar nixxholas avatar robtova avatar shirlo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gcp-bigquery-client's Issues

Project model (numeric_id) is incorrect.

I ran the following code.

client
    .project()
    .list(GetOptions::default().max_results(10))
    .await?
    .projects

However, an error occurred as follows.

Error: RequestError(reqwest::Error { kind: Decode, source: Error("invalid type: string \"88888\", expected u64", line: 8, column: 33) })

Perhaps the type of numeric_id is incorrect.

Therefore, It is necessary to change from u64 to String.

job: get_job: builder error: unsupported pair

Stacktrace: Oct 28 17:51:18.337 ERROR solana_skip_indexer::helpers::bq_utils: FATAL! builder error: unsupported pair

All we got from the client was builder error: unsupported pair

Access token was a vector of u8 with a size of 990.
The rest of the data required to execute get_job is as shown below

Screenshot 2021-10-28 at 5 53 15 PM

Rationale

The purpose of getting this fixed is not just to keep it working, but to allow users of jobs be able to check the job's status again should it be in 'RUNNING' or 'PENDING' state after awaiting for the initial 'insert' or delete' or etc. function.

struct ServiceAccount is used but not available

Hi, this might be an issue on my side understanding the whole situation but i ran into an issue using service-account keys.

Authenticating a client using

    /// Constructs a new BigQuery client from a [`ServiceAccountKey`].
    /// # Argument
    /// * `sa_key` - A GCP Service Account Key `yup-oauth2` object.
    /// * `readonly` - A boolean setting whether the acquired token scope should be readonly.
    ///
    /// [`ServiceAccountKey`]: https://docs.rs/yup-oauth2/*/yup_oauth2/struct.ServiceAccountKey.html
    pub async fn from_service_account_key(sa_key: ServiceAccountKey, readonly: bool) -> Result<Self, BQError> {
        ClientBuilder::new()
            .build_from_service_account_key(sa_key, readonly)
            .await
    }

refrences the struct ServiceAccountKey (Struct origins at [email protected])

While using the default features inside this crate, i cannot find the ServiceAccountKey.
So i cannot create a struct which es needed as a parameter.

I am not sure how to manage this and if it is even related to this crate, since i don't know the responsibilities on who has to include what.
Any hint highly appreciated.

Workaround

A workaround for me is to read the key-file of the service account and use

from_service_account_key_file(sa_key_file: &str) -> Result<Self, BQError>

Nice lib btw!

README example is vulnerable to SQL injection

The example in the README uses the following code to build a query:

    // Query
    let mut rs = client
        .job()
        .query(
            project_id,
            QueryRequest::new(format!(
                "SELECT COUNT(*) AS c FROM `{}.{}.{}`",
                project_id, dataset_id, table_id
            )),
        )
        .await?;

This appears to be vulnerable to SQL injection: if any of the project_id, dataset_id, table_id fields come from an untrusted source, they may contain additional SQL statements, e.g. DROP TABLE, which will be injected into the query and passed on to the BigQuery API.

If this is indeed the case, an example should be provided that avoids the issue. If BigQuery does not provide an API that's immune to SQL injection, the inputs should be sanitized of SQL statements recognized by BigQuery.

Question about using the QueryParameter

Hey!

Originally posted this in the dicsussion forum but realized it might not have been the right place to post this in, so creating an issue instead:

Thanks for creating this library, I'm really happy to be able to use rust for working with bigquery data!

I have a question about using the query parameter option:

Looking through the docs I've found that there is the option QueryParameter however it requires you to declare a QueryParameterType which I'm not quite clear on how to do.

Looking at the code snippet for the QueryParameterType, I'm understanding this as a recursive type since there is no Option around array_type, I know this is normally fine because of the Box but I don't understand how to opt out of this without an option or some other kind of enum or break point.

pub struct QueryParameterType {
    pub array_type: Box<QueryParameterType>,
    pub struct_types: Option<Vec<QueryParameterTypeStructTypes>>,
    pub type: String,
}

Maybe there is some trick I don't know with Box to not make it infinitely recursive? Would love to just get a short code snippet of how to define a simple parameter type like a string or int if that would be possible?

Tips for reading TableRows

Hi, thanks for writing and releasing this repository!

I'm using the query_all API to get a bunch of rows, and I'm trying to transform the results from TableRow to some struct.

As #31 mentioned, it seems like all the results are all returned as strings. It's also pretty cumbersome to do this transformation. Consider the parse function in the example below:

struct Example {
    letter: String,
    number: i64,
}

fn parse(row: &TableRow) -> Example {
    Example {
        letter: row
            .columns
            .as_ref()
            .unwrap()
            .get(0)
            .unwrap()
            .value
            .as_ref()
            .unwrap()
            .as_str()
            .unwrap()
            .to_string(),
        number: row
            .columns
            .as_ref()
            .unwrap()
            .get(1)
            .unwrap()
            .value
            .as_ref()
            .unwrap()
            .as_str()
            .unwrap()
            .parse::<i64>()
            .unwrap(),
    }
}

async fn load_examples() -> Result<Vec<Example>, BQError> {
    let client = Client::from_service_account_key_file(BQ_SA_KEY).await?;
    let response = client.job().query_all(
        GCP_PROJECT,
        JobConfigurationQuery {
            query: "SELECT x AS letter, 1 AS number FROM UNNEST(['a', 'b', 'c']) x".to_string(),
            use_legacy_sql: Some(false),
            ..Default::default()
        },
        Some(2),
    );

    tokio::pin!(response);

    let mut examples: Vec<Example> = vec![];
    while let Some(page) = response.next().await {
        match page {
            Ok(rows) => {
                examples.extend(rows.iter().map(parse));
            }
            Err(e) => {
                return Err(e);
            }
        }
    }
    Ok(examples)
}

fn main() {
    let rt = Runtime::new().unwrap();
    let examples = rt.block_on(load_examples()).expect("bigquery error");
    for e in examples {
        println!("letter: {}\tnumber: {}", e.letter, e.number);
    }
}

It's pretty awkward! There are two issues at play:

  1. I need to take the number result as a string and then parse an i64 out of it.
  2. It's pretty cumbersome to get the actual result values from a TableRow.

I'm sure there exists a good way to, given a TableRow and a TableSchema, construct a struct, but I'm not sure how to do it. And maybe the first issue is not a bug, and just an issue with how I'm reading the data, but I can't find a way to get it to work.

Would it be possible to create an example of how to use this API to generate a clean data structure out of a TableRow?

Allow to change the base url

Hi! I would like to test this crate locally by using bigquery-emulator.
However, this crate doesn't allow changing the base url (it always assumes https://bigquery.googleapis.com/bigquery/v2/):

let req_url = "https://bigquery.googleapis.com/bigquery/v2/projects";

Instead, google_bigquery2 allows to customize both the base_url and the root_url.

Do you plan to add support for this?
If not, would you accept a PR?
Thanks ๐Ÿ™

"Rows are not present" panics for query_all

I've been using version 0.16.6 since it's release, paging through big results with occasional panics but successfully. Since a few days ago my ingestion panics on every request, making it unusable. I'm wondering if there's been a change on Google's end?

I'm now running on 0.16.7 and the only way i could get some successful queries going was to cut my query size way way down. But this is multiplying my query cost and still panicking quite often.

Default Return Type

Is the default return type of TableCell a string? If not, where can I change that? Am I missing some setting in BQ?

Thank you for your help!

[request] release a new crate version

Hi @lquerel, first of all, thanks a lot for developing this crate! I'm using it and it works quite nicely!

And the reason for this issue, can you release a new crate? I'm having issues running a project that relies on it due to conflicts on the hyper-rustls version. Releasing a new version with the hyper-rustls 0.24 should fix it.

   --> /Users/andrehahn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gcp-bigquery-client-0.16.6/src/auth.rs:66:28
    |
66  |                 auth: Some(auth),
    |                       ---- ^^^^ expected `HttpsConnector<HttpConnector>`, found `hyper_rustls::connector::HttpsConnector<HttpConnector>`
    |                       |
    |                       arguments to this enum variant are incorrect
    |
    = note: `hyper_rustls::connector::HttpsConnector<HttpConnector>` and `HttpsConnector<HttpConnector>` have similar names, but are actually distinct types
note: `hyper_rustls::connector::HttpsConnector<HttpConnector>` is defined in crate `hyper_rustls`
   --> /Users/andrehahn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-rustls-0.24.0/src/connector.rs:19:1
    |
19  | pub struct HttpsConnector<T> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: `HttpsConnector<HttpConnector>` is defined in crate `hyper_rustls`
   --> /Users/andrehahn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-rustls-0.23.2/src/connector.rs:20:1
    |
20  | pub struct HttpsConnector<T> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: perhaps two different versions of crate `hyper_rustls` are being used?

Thanks!

Support profiling queries via tracing

Adding support for tracing via https://docs.rs/tracing/latest/tracing/ would be useful. Given that most of the requests are going through reqwest, seems like https://github.com/TrueLayer/reqwest-middleware might be a good candidate.

Let me know if this is something that you think will be generally useful and I'm happy to submit a PR for this. This can be behind a cargo feature, maybe profile, that way it can be enabled only when profiling.

Thanks for the awesome crate.

Replace chrono dependency with time?

result of cargo audit when trying to use gcp-bigquery-client:

Crate:         chrono
Version:       0.4.19
Title:         Potential segfault in `localtime_r` invocations
Date:          2020-11-10
ID:            RUSTSEC-2020-0159
URL:           https://rustsec.org/advisories/RUSTSEC-2020-0159
Solution:      No safe upgrade is available!
Dependency tree: 
chrono 0.4.19
โ”œโ”€โ”€ yup-oauth2 6.3.1
โ”‚   โ””โ”€โ”€ gcp-bigquery-client 0.11.0
โ”‚       โ””โ”€โ”€ cjms 1.0.0
โ””โ”€โ”€ gcp-bigquery-client 0.11.0

chrono does not appear to be actively maintained.

Could the time crate meet the needs of this repo?

I have recently had a PR merged into yup-oauth2 that moves that crate from chrono to time. dermesser/yup-oauth2#172 I'd be happy to do a PR here too.

This would have helped me with integrating gcp-bigquery-client into my project. In the end I've needed such a small fraction of the power of your repo that I've just taken the few pieces that I need.

Support for GZIP compression

Hey @lquerel!
I've noticed that, for some reason, GZIP is not enabled for outgoing request body. Based on my data, enabling GZIP compression for request body results in faster transfer speeds. I want to contribute a small pull request to implement it.

I see two ways to implement this:

  1. Adding a custom feature, GZIP.
  2. Adding a parameter with the type of enum to the function TableDataApi::insert_all, which indicates the compression algorithm.

The same data is sent three times with a max batch size of 50_000.

Without GZIP:

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 13.08 seconds

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 12.93 seconds

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 12.56 seconds

With GZIP:

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 7.49 seconds

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 7.57 seconds

Inserting 52511 rows to ***, geographic location europe-west2
Inserted 52511 rows to *** in 7.26 seconds

How to propagate location setting to Job configuration

Hello there,

It may be a bit of a silly question, but how can one propagate location setting when creating job against BigQuery table?
Something like your pagination.rs example, but with location set?

Looking through the source code/docs it is not really obvious where it needs to be set - I'd assume it to be part of JobConfigurationQuery but doesn't look like that struct has a location field

I tried to use ConnectionProperty (which is field of JobConfigurationQuery) but doesn't look like a correct option
And as far as I can see here JobConfigurationQuery is the only thing which is can be passed down to the actual call to create the job

Would definitely appreciate some help on this one :)

UPD:
Found this LoC - so one can specify location of the job, however it seems not be exposed when calling client.job().query_all - so feels like a small bug (unless I am missing something else here)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.