Coder Social home page Coder Social logo

hopsworks-sdk's Introduction

Hopsworks SDK

Disclaimer: This project is not an official client for the Hopsworks Platform and its Feature Store, and is not maintained by the Hopsworks team. It is rather a proof of concept and a personal project to explore the capabilities of Rust in supporting end-to-end machine learning pipelines.

Hopsworks Feature Store has both production-ready python or java SDK to support a wide variety of use cases. This project aims to provide a Rust SDK as well as corresponding python bindings to interact with the Hopsworks platform and its Feature Store. As of now the SDK is in early development and only supports a subset of Hopsworks capabilities. The public api should not be considered stable, as it is still unclear whether it will evolve to be more idiomatic Rust or stay closer to the Python SDK for simplicity.

Currently, this project is driven by a single person. However there is a lot more to do than what I have time for. As such contributions are welcome. Please reach out or open an issue. This repository also contains some CLI tools to interact with the Hopsworks platform. Again these are hobby projects and receive a corresponding amount of attention and energy.

Quickstart

Step 1: Register for Hopsworks Serverless Platform

If you have your own Hopsworks cluster check out this section.

To get started with minimal setup you can use Hopsworks Serverless Platform to register for a free account. Once you have registered you can create your project and follow the instructions to create an api key. Save it for later! From there you can head to the examples directory which has a few tutorials or follow the quickstart below to get a feel for hopsworks SDK.

Step 2: Install dependencies, build and compile the SDK

As of now there is no released binaries for this library, you need to compile it yourself.

Get the rye package manager:

curl -sSf https://rye.astral.sh/get | bash # Checkout the website for a more secure way to install
git clone https://github.com/vatj/hopsworks-sdk
cd hopsworks-sdk
rye sync
source .venv/bin/activate

Step 3: Start playing around

You can copy your API key in a config file and set the correct project name.

cp configs/config-example.toml configs/config.toml

Or simply set the environment variables:

export HOPSWORKS_API_KEY=your_api_key
export HOPSWORKS_PROJECT_NAME=your_project_name

You can use the sdk in your python files:

from hopsworks_sdk import login
import polars as pl

project = login()
fs = project.get_feature_store()

# Create a feature group locally, this will not yet be registered in the feature store
trans_fg = fs.get_or_create_feature_group("local_fg", version=1, primary_key=["tid"], event_time="datetime", online_enabled=True)

# Read data from a csv file
trans_df = pl.read_csv(
    "https://repo.hops.works/master/hopsworks-tutorials/data/card_fraud_data/transactions.csv",
    try_parse_dates=True,
)
print(trans_df.head(5))

# Use the polars schema to register Feature Group metadata, in particular feature names and types
trans_fg.save(trans_df)

# insert data into the feature store
trans_fg.insert(trans_df.head(10))

# Read arrow/polars data from the online store via sql
online_rb = trans_fg.read_from_online_store(return_type="pyarrow")
print("pyarrow Record Batch: \n", online_rb)

# Read polars data from the online store
online_df = trans_fg.read_from_online_store(return_type="polars")
print("Polars DataFrame: \n", online_df)

# Once the materialization job is done writing your data to the Feature Store
# Read arrow/polars data from the offline store via arrow flight
offline_rb = trans_fg.read_from_offline_store(return_type="pyarrow")
print("pyarrow Record Batch: \n", offline_rb)

# Read polars data from the offline
offline_df = trans_fg.read_from_offline_store(return_type="polars")
print("Polars DataFrame: \n", offline_df)

Look at the examples in the examples/python directory to get started.

Benefit of using Rust

There are several benefits to making mixed rust/python modules:

  • Performance critical part can be written in Rust and called from Python
  • Dependencies on some python packages can be avoided by hiding them in the rust binaries
  • New compute engines are being written in Rust, it will be more efficient to interface using Rust to enable performance critical code path to be efficient
  • Rust Arrow implementation is extremely complete and actively developed

Contributing

Contributions are welcome, although the contributing.md is still a work in progress. If you have any questions, suggestions or want to contribute, feel free to open a github issue for now.

Open-Source

Hopsworks-rs is available under the AGPL-V3 license. In plain English this means that you are free to use it and even build paid services on it, but if you modify the source code, you should also release back your changes and any systems built around it as AGPL-V3.

hopsworks-sdk's People

Contributors

vatj avatar

Stargazers

Javier Cabrera avatar  avatar

Watchers

Javier Cabrera avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.