Coder Social home page Coder Social logo

flaco's Introduction

flaco

Code Style CI PyPI PyPI - Wheel Downloads


Install:

pip install flaco


The easiest and perhaps most memory efficient way to get PostgreSQL data (more flavors to come?) into pyarrow.Table, pandas.DataFrame or Arrow (IPC/Feather) and Parquet files.

Since Arrow supports efficient and even larger-than-memory processing, as with dask, duckdb, or others. Just getting data onto disk is sometimes the hardest part; this aims to make that easier.

API: flaco.read_sql_to_file: Read SQL query into Feather or Parquet file. flaco.read_sql_to_pyarrow: Read SQL query into a pyarrow table.

NOTE: This is still a WIP. I intend to generalize it more to be useful towards a wider audience. Issues and pull requests welcome!


Example

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   122    147.9 MiB    147.9 MiB           1   @profile
   123                                         def memory_profile():
   124    147.9 MiB      0.0 MiB           1       stmt = "select * from test_table"
   125
   126                                             # Read SQL to file
   127    150.3 MiB      2.4 MiB           1       flaco.read_sql_to_file(DB_URI, stmt, 'result.feather', flaco.FileFormat.Feather)
   128    150.3 MiB      0.0 MiB           1       with pa.memory_map('result.feather', 'rb') as source:
   129    150.3 MiB      0.0 MiB           1           table1 = pa.ipc.open_file(source).read_all()
   130    408.1 MiB    257.8 MiB           1           table1_df1 = table1.to_pandas()
   131
   132                                             # Read SQL to pyarrow.Table
   133    504.3 MiB     96.2 MiB           1       table2 = flaco.read_sql_to_pyarrow(DB_URI, stmt)
   134    644.1 MiB    139.8 MiB           1       table2_df = table2.to_pandas()
   135
   136                                             # Pandas
   137    648.8 MiB      4.7 MiB           1       engine = create_engine(DB_URI)
   138   1335.4 MiB    686.6 MiB           1       _pandas_df = pd.read_sql(stmt, engine)

License

Why did you choose such lax licensing? Could you change to a copy left license, please?

...just kidding, no one would ask that. This is dual licensed under Unlicense or MIT, at your discretion.

flaco's People

Contributors

dependabot[bot] avatar milesgranger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flaco's Issues

Separate crates

  • flaco core implementation as pure Rust. Build as lib and/or CLI executable
  • flaco python, wrapper to flaco core

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.