Coder Social home page Coder Social logo

exasol / sagemaker-extension Goto Github PK

View Code? Open in Web Editor NEW
3.0 7.0 1.0 342 KB

An Exasol extension to interact with AWS SageMaker from inside the database

License: MIT License

Python 57.06% Shell 12.12% Lua 30.04% Dockerfile 0.28% Java 0.50%
exasol-integration exasol sagemaker machine-learning data-science

sagemaker-extension's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

frschwab

sagemaker-extension's Issues

Update dependencies

  • update typeguard = "^2.11.1" since the latest versions lead to following error:
    TypeError: typechecked() got an unexpected keyword argument 'always'

Add Python3 UDF to orchestrate the SageMaker Autopilot training from Exasol

Background

Tips

Acceptance Criteria

  • Python3 UDF which can start a SageMaker training and waits to its completion and returns the training id

Add probability score associated with the prediction

  • in classification problem type probabilities that the result is predicted can be presented as a second target column.
  • We provide problem_type as an optional parameter in this extension like AutoML.
  • We need to get problem_type out from model. While deploying the endpoint, the problem_type should be determined and the predicionUDF should be prepared accordingly..

/bin/sh: amalg.lua: command not found

image
Dear team,
while trying to install sagemaker extension, I got the attached issue.

Please let me know if you require any additional details.
databaseProductVersion 7.1.3

Add Lua Function for a Lua Script to export a query to S3 as CSV files

Background

Tips

Acceptance Criteria

  • We have a Lua function which exports a query in parallel as CSV files
  • We have unit tests with a Mock for pquery
  • We have integration tests

Remove unnecessary release droid files

Background

  • Release droid only needs release_droid_upload_github_release_assets.yml
  • the following configs can be removed release_droid_prepare_original_checksum.yml, release_droid_print_quick_checksum.yml

Acceptance Criteria

  • The two yaml files are removed

Create Python CLI Tool to deploy the extension

Background

  • We need a way to deploy the CREATE SCRIPT statements.
  • All scripts need to be created schema

Acceptance Criteria

  • User can provide host of the DB
  • User can provide credentials for the DB
  • User can provide a schema where the scripts get installed to
  • The CLI installs all necessary CREATE SCRIPT statements to run the sagemaker extension.
  • The CLI can print out all CREATE SCRIPT statements, such that user can copy it and run them via another SQLClient

Support temporary AWS session credentials

Background

  • It is common that many users use MFA to secure their AWS accounts or that you assume a role in CI, in both cases, you get temporary session credentials, instead of using a user with permanent credentials
  • The extension should support the usage of those temporary session credentials

Acceptance Criteria

  • The extensions should support permanent credentials and temporary session credentials

Setup ci-isolation for integration test

An integration test has been performed using AWS emulator Localstack. However It is necessary to perform CI in real AWS.

  • Setup ci-isolation environment (https://github.com/exasol/ci-isolation-aws)
  • Specify external configuration to pytest so that it is possible to switch between AWS and Localstack integrations.
  • In Localstack integration, localstack is on the same network as exasol container and uses a specified ip address. Update this configuration by taking into account the risk of overlapping IP addresses.

Get details of the polled Autopilot job status

  • Current polling implementation just provides general job statuses such as FeatureEngineering, Training, ...
  • Get details of job status, e.g. in case of failure state the reason by following below steps:
  1. call DescribeTrainingJobResponse
  2. check FailureReason field

Update Developer Guide

in Developer Guide

  • explain how to add AWS role for CI

the other updates:

  • update change log for recent changes
  • put credentials of the CI user into the keeper

Add Python3 UDF to poll Autopilot training status

Autopilot consist of thee main steps: Analyzing Data, Feature Engineering, Model Tuning. In addition to them it includes five different job status : Completed, InProgress, Failed, Stopped, Stopping.

We need to observe these steps and statuses of training models. For this, the following steps can be implemented as a UDF script:

  1. Create a table including metadata of training models like model_name
  2. Retrieve status of a given model which trains on Autopilot and insert it into a table

Remove setup.py

Background

  • with poetry 1.4.0 it doesn't create the setup.py anymore https://github.com/python-poetry/poetry/releases/tag/1.4.0
  • we currently use poetry build to generate the setup.py
  • however, the setup.py isn't needed with newer pip versions and if there are releases to pypi or as wheels
  • for that reason, we can remove setup.py and githook that generates it, from this repo

Acceptance Criteria

  • Update workflows to poetry 1.4.0
  • Remove setup.py Github Workflow
  • Remove setup.py githook
  • Remove setup.py file

Add Conda environment for Lua

Background

  • This project needs Lua for testing and development
  • Currently, the project relies on the system Lua which could have the wrong Lua version

Implement PredictionUDF using api-endpoint

Background

  • The extension only gets useful with prediction from Exasol
  • Question:
    1. Do we create UDF per Model, TrainingJob (preferred)
    • possible to hard code model specific things, like model connection object name
      • the connection object can be changed without changing the UDF
    • you can do proper typing -> proper types Input and output variables
    1. use generic UDF, where you variadic parameters
  • Option 1. requires a Luascript which create the UDF, connection object and the api endpoint
  • How to shut down and resume endpoint?

Add Autopilot endpoint information into Metadata table

  • keep SageMaker Autopilot endpoint information in the metadata row of the relevant model. So that users can keep track of which model is deployed and details of the endpoint such as the endpoint-name.
  • Discuss how to update a row when an endpoint is created/deleted.

Cannot connect to Exasol V8 over TLS

The command

python3 -m exasol_sagemaker_extension.deployment.deploy_cli --host=w.x.y.y --port=8563 --user=xxx --pass=yyy --schema=RETAIL

returns an error, that it cannot connect via Non-TLS connections. V8 only allows TLS connections:

pyexasol.exceptions.ExaRequestError:
(
message => Connection exception - Only TLS connections are allowed.
dsn => www.xxx.yyy.zzz:8563
user => xxxxxxxxxxx
schema =>
session_id =>
code => 08004
)

Needs to be fixed for V8

Save polled Autopilot training status into a log table

Add option which saves polled statuses into a log table for a given interval

  • training might be long, so it might produce too many status logs. it might be better to restrict the polling interval.
  • log table should be deleted after training is finished.

Add Lua Script which combines export and training

Background

  • I this feature we want to call following features from a Lua Script to have a single point of usage for the User

Acceptance Criteria

  • Lua Script which first exports the data and then starts the training of SageMaker Autopilot
  • Write a User Guide including deployment and running it

Fix aws tests

Currently the aws tests fail with
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateAutoMLJob operation: Could not assume role ***. Please ensure that the role exists and allows principal 'sagemaker.amazonaws.com' to assume the role.

Evaluate in-database prediction for trained Auto-ML models

Background

  • SageMaker autopilot stores the models in S3 and they look like we might be able to run them in the UDFs
  • We need to test if we can run some in python outside the DB and inside
  • Check if different models and preprocessing steps have somewhat consistent interface
  • How do find the best candidate?

Implement GetCandidateListUDF script

  • this scripts returns a list of trained models' names.
  • if users wants to run a model other than the best model, they can take the name of model from this list.
  • update endpoint deployment script to get model name as input argument

Run real tests sequentially

  • Real tests, communicating with AWS services, depend on each other's results.
  • In real tests, the flow is as follows: train->(poll) -> deploy -> predict -> delete
  • Note that training part is asynchronous run. It should be checked that it is completed by polling in sequential run.

Fix CI setup

aws iam create-role --role-name sagemaker-role --assume-role-policy-document '{
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "sagemaker.amazonaws.com"
                        },
                        "Action": "sts:AssumeRole"
                    }
                ]
            }'

aws iam create-policy --policy-name "sagemaker-s3-access" --policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreatePolicy",
        "iam:CreateRole",
        "iam:AttachRolePolicy",
        "iam:DeleteRole",
        "iam:DeletePolicy",
        "iam:DetachRolePolicy"
      ],
      "Resource": "*"
    }
  ]
}'

aws iam attach-role-policy --role-name sagemaker-role --policy-arn arn:aws:iam::166283903643:policy/sagemaker-s3-access
aws iam attach-role-policy --role-name sagemaker-role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

Improve error handling by validating inputs and outputs of functions

Background

  • We got at some point the following error message

    [Code: 0, SQL State: 43000] "attempt to index a nil value (field 'integer index')" caught in script "IDA"."SME_DEPLOY_SAGEMAKER_AUTOPILOT_ENDPOINT" at line 402 (Session: 1775016774761775104)

  • This error is caused by something which is nil, but shouldn't be nil, we should check this before we're accessing and throw proper error messages.

  • It is likely, that the following line caused the error

    • and that the metadata_row nil was, because the db_metadata_reader didn't find the metadata row for this particular job.
    • To get better error message, we should have checked in the db_metadata_reader, if the row was nil or not

Name the return column as predictions

  • prediction return columns is named as "predictions"
  • but scania trucks classification return predictions in the target column, rather than "predictions" columns

Implement UDF scripts for endpoint operations

Implement Create/Delete endpoint udf scripts

  • CreateEndpoint Lua Script
    • Create the Endpoint for model
    • Create or Update Model Connection (with the endpoint information)
    • Create UDF Function (if not exists)
  • DeleteEndpoint Lua Script
    • Delete the Endpoint
    • Update the connection object with "Not running"

Prepare release 0.1.0

  • Check consistency of version numbers (poetry, changelog)
  • Complete release letter
  • Configure release droid for the project

Handle prediction of uncompleted Autopilot job

  • if a job is interrupted before completion due to working of one of the "max_runtime" stopping criteria, autopilot might have several candidate but might not have an best candidate.
  • we can throw an exception saying that there is not a best candidate.

Add CREATE SCRIPT statements for deployment of the training UDF

Background

def udf_wrapper():
   from exasol_udf_mock_python.udf_context import UDFContext
   from exasol_sagemaker_extension.autopilot_training_udf import AutopilotTrainingUDF

   def mocked_training_method(**kwargs):
       return "test_job_name"

   udf = AutopilotTrainingUDF(exa, training_method=mocked_training_method)

   def run(ctx: UDFContext):
       udf.run(ctx)

the CREATE STATEMENT would look like

CREATE PYTHON3 SET SCRIPT AutopilotTrainingUDF(model_name VARCHAR(23), ....)
EMITS (model_name VARCHAR(32)) AS
    from exasol_sagemaker_extension.autopilot_training_udf import AutopilotTrainingUDF

    udf = AutopilotTrainingUDF(exa)

    def run(ctx):
        udf.run(ctx)
/

Acceptance Criteria

  • Create CREATE SCRIPT statement for AutopilotTrainingUDF and AutopilotTrainingStatusUDF

Update to Python 3.8

Background

  • Python 3.6 is eol and for Python 3.7 we don't have officially supported language container
  • We are going to switch all other project as well to python 3.8

Acceptance Criteria

  • The supported python version was changed to 3.8
  • The packages are updated accordingly
  • The language container is also moved to python 3.8 minimal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.