Coder Social home page Coder Social logo

fsiddiqi / dxc-industrialized-ai-starter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dxc-technology/dxc-industrialized-ai-starter

0.0 1.0 0.0 3.28 MB

Industrialized AI Starter

License: Apache License 2.0

Python 4.29% Jupyter Notebook 95.71%

dxc-industrialized-ai-starter's Introduction

DXC

DXC Industrialized AI Starter

DXC Indusrialized AI Starter makes it easier to build and deploy Indusrialized AI. This Library does the following:

  • Access, clean, and explore raw data
  • Build data pipelines
  • Run AI experiments
  • Publish microservices

Installation

In order to install and use above library please use the below code snippet:

1. pip install DXC-Industrialized-AI-Starter
2. from dxc import ai

Getting Started

Access, Clean, and Explore Raw Data

Here's a quick example of using the library to access, clean, and explore raw data.

#Access raw data
df = ai.read_data_frame_from_remote_json(json_url)
df = ai.read_data_frame_from_remote_csv(csv_url)
df = ai.read_data_frame_from_local_json()
df = ai.read_data_frame_from_local_csv()
df = ai.read_data_frame_from_local_excel_file()

#Clean data
raw_data = ai.clean_dataframe(df)

#Explore raw data
ai.visualize_missing_data(raw_data)
ai.explore_features(raw_data)
ai.plot_distributions(raw_data)

For more info click here

Build Data Pipelines

Below example showcases how to build a data pipeline. In order to get started,you need to first have an MongoDB account which you can signup for free and create a database "connection_string" and specify those details in the data_layer below.

#Insert data into MongoDB:
data_layer = {
    "connection_string": "<your connection_string>",
    "collection_name": "<your collection_name>",
    "database_name": "<your database_name>"
}
wrt_raw_data = ai.write_raw_data(data_layer, raw_data, date_fields = [])

This code instructs the data store on how to refine the output of raw_data into something that can be used to train a machine-learning model. Update data_pipeline() with code with an aggregation pipeline that fits your project. The refined data will be stored in the Pandas dataframe. Make sure the output is what you want before continuing. Below is the example for creating pipeline:

pipeline = [
        {
            '$group':{
                '_id': {
                    "funding_source":"$funding_source",
                    "request_type":"$request_type",
                    "department_name":"$department_name",
                    "replacement_body_style":"$replacement_body_style",
                    "equipment_class":"$equipment_class",
                    "replacement_make":"$replacement_make",
                    "replacement_model":"$replacement_model",
                    "procurement_plan":"$procurement_plan"
                    },
                "avg_est_unit_cost":{"$avg":"$est_unit_cost"},
                "avg_est_unit_cost_error":{"$avg":{ "$subtract": [ "$est_unit_cost", "$actual_unit_cost" ] }}
            }
        }
]

df = ai.access_data_from_pipeline(wrt_raw_data, pipeline)

For more detailed explaination click here

Run AI Experiments

Sample code snippet to run an AI Experiment. This code executes an experiment by running run_experiment() on a model. Update experiment_design with parameters that fit your project. The data parameter should remain the refined training data. The model parameter must be a model subclass. The labels parameter indicates the column of the data dataframe to be predicted. For the prediction model, the meta-data must describe the column to be predicted and the types for non-numeric columns.

experiment_design = {
    #model options include ['regression()', 'classification()']
    "model": ai.regression(),
    "labels": df.avg_est_unit_cost_error,
    "data": df,
    #Tell the model which column is 'output'
    #Also note columns that aren't purely numerical
    #Examples include ['nlp', 'date', 'categorical', 'ignore']
    "meta_data": {
      "avg_est_unit_cost_error": "output",
      "_id.funding_source": "categorical",
      "_id.department_name": "categorical",
      "_id.replacement_body_style": "categorical",
      "_id.replacement_make": "categorical",
      "_id.replacement_model": "categorical",
      "_id.procurement_plan": "categorical"
  }
}

trained_model = ai.run_experiment(experiment_design)

For more info click here

Publish Microservice

Below is the example for publishing a Microservice. In order to design the microservice, you must create an Algorithmia account. This code defines the parameters needed to build and delpoy a microservice based on the trained model. Update microservice_design with parameters appropriate for your project.

trained_model is the output of run_experiment() function
microservice_design = {
    "microservice_name": "<Name of your microservice>",
    "microservice_description": "<Brief description about your microservice>",
    "execution_environment_username": "<Algorithmia username>",
    "api_key": "<your api_key>",
    "api_namespace": "<your api namespace>",   
    "model_path":"<your model_path>"
}

# publish the micro service and display the url of the api
api_url = ai.publish_microservice(microservice_design, trained_model)
print("api url: " + api_url)

For more info click here

Docs

For detailed and complete documentation, please click here

Example of colab notebook

Here is an detailed and in-depth example of DXC Indusrialized AI Starter library usage.

Contributing Guide

To know more about the contribution and guidelines please click here

Reporting Issues

If you find any issues, feel free to report them here with clear description of your issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.