Coder Social home page Coder Social logo

crate-workbench / cratedb-toolkit Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 3.0 501 KB

CrateDB Toolkit.

Home Page: https://cratedb-toolkit.readthedocs.io/

License: GNU Affero General Public License v3.0

Python 99.44% Dockerfile 0.50% Shell 0.06%
data-retention olap olap-database expiration data-expiration retention retention-policies retention-policy toolkit cratedb cratedb-client cratedb-driver data-processing database-adapter sqlalchemy materialized-view materialized-views

cratedb-toolkit's Introduction

CrateDB Toolkit

Tests Test coverage Python versions

License Status PyPI Downloads

» Documentation | Changelog | Community Forum | PyPI | Issues | Source code | License | CrateDB

About

This software package includes a range of modules and subsystems to work with CrateDB and CrateDB Cloud efficiently.

You can use CrateDB Toolkit to run data I/O procedures and automation tasks of different kinds around CrateDB and CrateDB Cloud. It can be used both as a standalone program, and as a library.

It aims for DWIM-like usefulness and UX, and provides CLI and HTTP interfaces, and others.

Status

Please note that the cratedb-toolkit package contains alpha-, beta- and incubation-quality code, and as such, is considered to be a work in progress. Contributions of all kinds are much welcome, in order to make it more solid, and to add features.

Breaking changes should be expected until a 1.0 release, so version pinning is strongly recommended, especially when using it as a library.

Install

Install package.

pip install --upgrade cratedb-toolkit

Verify installation.

ctk --version

Run with Docker.

alias ctk="docker run --rm "ghcr.io/crate-workbench/cratedb-toolkit" ctk"
ctk --version

Development

Contributions are very much welcome. Please visit the documentation to learn about how to spin up a sandbox environment on your workstation, or create a ticket to report a bug or share an idea about a possible feature.

cratedb-toolkit's People

Contributors

amotl avatar dependabot[bot] avatar hammerhead avatar pilosus avatar surister avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

cratedb-toolkit's Issues

Apply PyMongo-like amalgamation to AstraPy, to emulate DataStax Astra DB

Introduction

In the spirit of the PyMongo driver amalgamation, it looks like AstraPy, the Python client SDK for DataStax Astra and Stargate, based on the DataStax python-driver, has a very similar interface.

Features

According to the data sheet of DataStax Astra DB, a few or all of those features would need to be unlocked to achieve reasonable feature parity.

Supported APIs

  • REST
  • Document (JSON)
  • GraphQL
  • gRPC API with equivalent performance as drivers
  • CQL API

Supported Languages

  • Java
  • Node.js
  • C#
  • Python
  • Go

Supported Data formats

  • Tabular (Column-family)
  • Document (JSON)
  • Key-Value

Resources

[LIB] Improve UX for ad hoc applications

About

For certain ad hoc applications like presenting functionalities in Jupyter Notebooks, accessing data from CrateDB in Python, or otherwise exploring it, querying should not be more difficult than like how EasyDB, TinyDB, dataset, and Datasette are demonstrating it, with or without using SQLite.

EasyDB

from easydb import EasyDB

db = EasyDB("filename.db")
for record in db.query("SELECT * FROM mytable"):
  print(record)

TinyDB

from tinydb import TinyDB, Query

db = TinyDB("/path/to/db.json")
db.insert({'int': 1, 'char': 'a'})
db.insert({'int': 1, 'char': 'b'})

db.search((User.name == 'John') & (User.age <= 30))

dataset

import dataset

db = dataset.connect('sqlite:///:memory:')

table = db['sometable']
table.insert(dict(name='John Doe', age=37))
table.insert(dict(name='Jane Doe', age=34, gender='female'))

john = table.find_one(name='John Doe')

Datasette

datasette serve path/to/database.db
open http://localhost:8001/

References

Prevent multiple strategies operating on the same table

About

The idea behind the composite primary key PRIMARY KEY ("strategy", "table_schema", "table_name") was to prevent duplicate strategies on the same table. Too bad we don't have UNIQUE constraints in CrateDB.

Regression?

Is there elsewhere in the code a check to prevent duplicates (i.e., for the same table, one entry with DELETE and 3 days retention, and another with DELETE and 5 days retention on the same table)?

Originally posted by @hammerhead in #20 (comment)

Testing: Improve "Testcontainers for Python" implementation

Introduction

We are aiming to provide canonical "Testcontainers" implementations for Java and Python, per testcontainers-java and testcontainers-python.

About

At the spots enumerated below, we added the first version of a corresponding Python implementation, originally conceived at daq-tools/lorrystream#47.

Backlog

  • Add documentation
  • GH-53
  • GH-58
  • Currently, the adapter and test layer is being exercised using an SQLAlchemy connection and corresponding test case. It makes sense to also exercise and demonstrate a pure DBAPI-based variant of the same thing.
  • It will be nice to have a modern test layer which forms a cluster, for both Java and Python. I think cr8 has it already?
  • Cherry-pick CrateDB invocation options from cr8: '-Cdiscovery.initial_state_timeout=0', '-Cnetwork.host=127.0.0.1', '-Cudc.enabled=false', '-Ccluster.name=cr8-tests'
  • Revisit downstream issues crate/cratedb-examples#72 and crate/cratedb-examples#282.
  • Upstream to testcontainers-python.

Share and use datasets via Python code

ValueError: max() arg is an empty sequence

@hammerhead reported this problem, happening right away when invoking cratedb-toolkit without any command line options.

~/ cratedb-toolkit          
Traceback (most recent call last):
  File "/usr/local/bin/cratedb-toolkit", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1054, in main
    with self.make_context(prog_name, args, **extra) as ctx:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 920, in make_context
    self.parse_args(ctx, args)
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1610, in parse_args
    echo(ctx.get_help(), color=ctx.color)
         ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 699, in get_help
    return self.command.get_help(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1298, in get_help
    self.format_help(ctx, formatter)
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1331, in format_help
    self.format_options(ctx, formatter)
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1533, in format_options
    self.format_commands(ctx, formatter)
  File "/usr/local/lib/python3.11/site-packages/click_aliases/__init__.py", line 65, in format_commands
    max_len = max(len(cmd) for cmd in sub_commands)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: max() arg is an empty sequence

Apply database schema already when connecting

At 1 and 2, we have been using SQLAlchemy's abilities to specify the database schema on the connection string already, using ?schema=foobar. In this way, table names will not need to be addressed in full-qualified notation "by hand". Instead, they can be addressed by using basename only, when selecting the schema at connection time already.

Let's also do it in the same spirit here.

Footnotes

  1. https://github.com/crate-workbench/mlflow-cratedb

  2. https://github.com/crate-workbench/langchain

AnyIO

About

High level asynchronous concurrency and networking framework that works on top of either trio or asyncio
Topics.

AnyIO is an asynchronous networking and concurrency library that works on top of either asyncio or trio. It implements trio-like structured concurrency (SC) on top of asyncio and works in harmony with the native SC of trio itself.

References

Testing: Adapt "Testcontainers" implementation to `unittest`

Introduction

Over here, we reported about the state of the "Testcontainers for Python" implementation, for supporting application testing with CrateDB.

About

At the issue referenced above, we will need to resolve this backlog item, in order to make the test layer usable for applications/libraries which are using Python's unittest module for testing.

While a pytest-based wrapper adapter around the "Testcontainers" implementation is nice, the projects crate-python and crash are using Python's builtin unittest module. Can we also grow a unittest-based wrapper adapter, to be reusable by both downstream projects?

Task

Use testing infrastructure from cratedb_toolkit.testing.testcontainers.cratedb and maybe cratedb_toolkit.tests.conftest.CrateDBFixture, and adapt that to unittest instead of using the pytest-specific details.

First Candidate

As a first candidate to apply this adapter, we identified the crash terminal program. This other ticket there outlines how/where to use the unittest-based adapter instead of the previous one.

CFR: Problem with `sys.jobs_log` table on `sys-export` operation

Problem

On a CrateDB database instance up for two days or so, I received this error when running ctk cfr --debug sys-export.

polars.exceptions.ComputeError: could not append value: "line 1:25: mismatched input '-' expecting {<EOF>, ';'}" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

Details

14:05:17        [cratedb_toolkit.util.cli            ] ERROR   : could not append value: "line 1:25: mismatched input '-' expecting {<EOF>, ';'}" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

it might also be that a value overflows the data-type's capacity
Traceback (most recent call last):
  File "/path/to/cratedb-toolkit/cratedb_toolkit/cfr/cli.py", line 50, in sys_export
    path = stc.save()
           ^^^^^^^^^^
  File "/path/to/cratedb-toolkit/cratedb_toolkit/cfr/systable.py", line 149, in save
    df = self.read_table(tablename=tablename)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/cratedb-toolkit/cratedb_toolkit/cfr/systable.py", line 107, in read_table
    return pl.read_database(
           ^^^^^^^^^^^^^^^^^
  File "/path/to/polars/io/database/functions.py", line 267, in read_database
    ).to_polars(
      ^^^^^^^^^^
  File "/path/to/polars/io/database/_executor.py", line 462, in to_polars
    frame = frame_init(
            ^^^^^^^^^^^
  File "/path/to/polars/io/database/_executor.py", line 274, in _from_rows
    return frames if iter_batches else next(frames)  # type: ignore[arg-type]
                                       ^^^^^^^^^^^^
  File "/path/to/polars/io/database/_executor.py", line 261, in <genexpr>
    DataFrame(
  File "/path/to/polars/dataframe/frame.py", line 376, in __init__
    self._df = sequence_to_pydf(
               ^^^^^^^^^^^^^^^^^
  File "/path/to/polars/_utils/construction/dataframe.py", line 433, in sequence_to_pydf
    return _sequence_to_pydf_dispatcher(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/polars/_utils/construction/dataframe.py", line 644, in _sequence_of_tuple_to_pydf
    return _sequence_of_sequence_to_pydf(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/polars/_utils/construction/dataframe.py", line 561, in _sequence_of_sequence_to_pydf
    pydf = PyDataFrame.from_rows(
           ^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: could not append value: "line 1:25: mismatched input '-' expecting {<EOF>, ';'}" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.