Coder Social home page Coder Social logo

ethanthompson / airscraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from banditelol/airscraper

0.0 0.0 0.0 21 KB

Airtable backup script package

Home Page: https://pypi.org/project/airscraper/

License: MIT License

Python 18.80% Jupyter Notebook 81.20%

airscraper's Introduction

Airscraper

Open In Colab PyPI version

A simple scraper to download csv from any airtable shared view programatically, think of it as a programatic way of downloading csv from airtable shared view. Use it if:

  • You want to download a shared view periodically
  • You don't mind the shared view to be accessed basically without authorization

Requirements

Because its a simple scraper, basically only beautifulsoup is needed

  • BeautifulSoup4

Installation

Using pip (Recommended)

pip install airscraper

Build From Source

  • Install build dependencies:
pip install --upgrade pip setuptools wheel
pip install tqdm
pip install --user --upgrade twine
  • Build the Package
    • python setup.py bdist_wheel
  • Install the built Package
    • pip install --upgrade dist/airscraper-0.1-py3-none-any.whl
  • Use it without adding python in front of it
    • airscraper [url]

Direct Execution (Testing Purpose)

  • Clone this project
  • Install the requirements
    • pip install -r requirements.txt
  • run the code
    • python airscraper/airscraper.py [url]

Usage

Create a shared view link and use that link to download the shared view into csv. All [url] mentioned in the examples are referring to the shared view link you get from this step.

As CLI

# Print Result to Terminal
python airscraper/airscraper.py [url]

# Pipe the result to csv file
python airscraper/airscraper.py [url] > [filename].csv

As Python Package

from airscraper import AirScraper

client = AirScraper([url])
data = client.get_table().text

# print the result
print(data)

# save as file
with open('data.csv','w') as f:
  f.write(data)

# use it with pandas
from io import StringIO
import pandas as pd

df = pd.read_csv(StringIO(data), sep=',')
df.head()

Help

usage: airscraper [-h] [-l LOCALE] [-tz TIMEZONE] view_url

Download CSV from Airtable Shared View Link, You can pass the result to file using
'> name.csv'

positional arguments:
  view_url              url generated from sharing view using link in airtable

optional arguments:
  -h, --help            show this help message and exit
  -l LOCALE, --locale LOCALE
                        Your locale, default to 'en'
  -tz TIMEZONE, --timezone TIMEZONE
                        Your timezone, use URL encoded string, default to
                        'Asia/Jakarta'

What's next

Currently I'm thinking of several things in mind:

  • ✅ Making this installed package
  • Adds accessibility to use it in FaaS Platform (most use case I could thought of are related to this)
  • ✅ Create a proper package that can be imported (so I could use it in my ETL script)
  • ✅ Fill in LICENSE and setup.py, (to be honest I have no idea yet what to put into it)
    • It turns out there are a lot of resources out there if you know what to look for :)

Contributing

If you have similar problem oor have any idea to improve this package please let me know in the issues or just hit me up on twitter @BanditelolRP

airscraper's People

Contributors

banditelol avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.