Coder Social home page Coder Social logo

tap-google-sheets's People

Contributors

asaf-erlich avatar bhuvana-talend avatar cosimon avatar dscoleman avatar dsprayberry avatar hpatel41 avatar jeffhuth-bytecode avatar kallan357 avatar krispersonal avatar kspeer825 avatar leslievandemark avatar luandy64 avatar namrata270998 avatar prijendev avatar rdeshmukh15 avatar shantanu73 avatar yusaku2 avatar zachharris1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tap-google-sheets's Issues

'str' object has no attribute 'get'

I'm getting this error from google:

{  "error": "invalid_grant",  "error_description": "Token has been expired or revoked."}

this causes this error, as error.code property is missing:

...
tap_google_sheets/client.py", line 124, in raise_for_error
    error_code = response.get('error', {}).get('code')
AttributeError: 'str' object has no attribute 'get'

obviously property error cannot be a string an an object at the same time

Google Sheet Integration failing with file not found

Same sheet was getting pulled earlier; now it is giving this error; Tried to create a new sheet and pulled it but the same error.

INFO HTTP request to "file_metadata" endpoint took 0.203s, returned status code 404
2020-03-03 08:45:46,070Z tap - CRITICAL {'code': 404, 'message': 'File not found: xxxxxxxxxxxx.', 'errors': [{'location': 'fileId', 'message': 'File not found: xxxxxxxxxxx.', 'locationType': 'parameter', 'reason': 'notFound', 'domain': 'global'}]}: Unknown Error

Reloads Entire Sheet Each Load

After setting up the integration with Stitch, I have found that every time the data is loaded into the warehouse, every single row is re-imported during the process. This is going to quickly increase the monthly row count for the integration. Is there a way to add only the new rows during each import?

Exponential backoff factor was not being enough

The current exponential backoff factor is 3.

Context

  • I have an airflow pipeline that extracts data from 12 spreadsheets with 20k rows each.
  • The steps ran sequentially.
  • In one of the steps, intermittently, the pipeline failed multiple times because the number of retries of the exponential backoff is exceeded. As you can see in the image below:
    image

Proposal

By changing the exponential factor from 3 to 4 in a fork of this repository, I was able to fix this issue on my pipeline.

    @backoff.on_exception(backoff.expo,
                          (Server5xxError, ConnectionError, Server429Error),
                          max_tries=7,
                          factor=3,
                          jitter=None)
    @utils.ratelimit(100, 100)

Can we implement this change or are there any problems that I'm not aware of?

I really would like to use the official version of this tap.

Best Regards,

Option to slugify column names

Currently, names of the columns for a single google-sheet data stream is inferred from the first row of the sheet. Due to bad practice from sheet owners, column names often end up with non-ASCII characters, or problematic characters such as newlines. Such cases can be problematic depending on the singer target.

Would it be hard to implement an option to use python-slugify, or another way to standardize strings, to standardize column names?

How to select stream from where the data will be retrieved

Running

tap-google-sheets --config config.json --discover > catalog.json

Gives the catalog.json file, as expected (with schema, stream name, metadata, etc).

But now, I want to actually get data from one sheet (say, "Sheet 1"). How am I supposed to do that? Because running:

tap-google-sheets --config config.json --catalog catalog.json

gives me nothing.

So I tried to understand what was happening, and found this: https://github.com/singer-io/singer-python/blob/6c6c773d8b6dc6223551e598574eb0df41f0c415/singer/catalog.py#L47, which basically verifies if a stream is selected. But it turns out this is not automatically generated in the catalog file; so I needed to go to the specific stream ("Sheet 1") and add "selected": true inside the schema property.

Am I missing something here? I think it should be a way to automatically select which stream ("Sheet 1", "file_metadata", etc) you want to get data from.

AttributeError: module 'singer.metadata' has no attribute 'get_standard_metadata'

Machine - Ubuntu 20.04 LTS
python version - 3.8

ran the following commands.
pip install tap-google-sheets
tap-google-sheets --config config.json --discover > catalog.json

the config.json contains the following:

{
"refresh_token":"---------",
"client_id":"----------",
"client_secret":"-------------",
"user_agent": "tap-google-sheets (via singer.io)",
"start_date": "2019-01-01T00:00:00Z",
"spreadsheet_id": "--------------",
"range": "A2:E2"
}


INFO Authorized, token expires = 2020-08-20 14:57:22.412650
INFO Starting discover
CRITICAL module 'singer.metadata' has no attribute 'get_standard_metadata'
Traceback (most recent call last):
  File "/home/owais/anaconda3/envs/singer5/bin/tap-google-sheets", line 8, in <module>
    sys.exit(main())
  File "/home/owais/anaconda3/envs/singer5/lib/python3.8/site-packages/singer/utils.py", line 225, in wrapped
    return fnc(*args, **kwargs)
  File "/home/owais/anaconda3/envs/singer5/lib/python3.8/site-packages/tap_google_sheets/__init__.py", line 49, in main
    do_discover(client, spreadsheet_id)
  File "/home/owais/anaconda3/envs/singer5/lib/python3.8/site-packages/tap_google_sheets/__init__.py", line 26, in do_discover
    catalog = discover(client, spreadsheet_id)
  File "/home/owais/anaconda3/envs/singer5/lib/python3.8/site-packages/tap_google_sheets/discover.py", line 6, in discover
    schemas, field_metadata = get_schemas(client, spreadsheet_id)
  File "/home/owais/anaconda3/envs/singer5/lib/python3.8/site-packages/tap_google_sheets/schema.py", line 269, in get_schemas
    mdata = metadata.get_standard_metadata(
AttributeError: module 'singer.metadata' has no attribute 'get_standard_metadata'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.