Coder Social home page Coder Social logo

dxc-industrialized-ai-starter's People

Contributors

amarify avatar dependabot[bot] avatar giuseppecozza avatar itsmemarty avatar jdamascoty avatar karthikreddykuna avatar kishorpulagam92 avatar kkrdy avatar likhil avatar madhu407 avatar madhubandru avatar mbandru2 avatar soujanya8977 avatar vamsi7behara avatar vivekbachala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dxc-industrialized-ai-starter's Issues

Metrics or statistics of differences between raw data and clean data.

name title about labels assignees
Transparency Request metrics or statistics of differences between raw data and clean data.

Describe the area of code that needs more transparency:
Display metrics or statistics that show the difference between raw data and clean data.
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
Show the stats of raw and clean data as different columns
We should have metrics for categorical and numerical data, should also think about how to handle providing usable metrics for data sets with lots of features.

Provide Auto-ML documentation link in user guide for running models

name title about labels assignees
Transparency Request Provide Auto-ML documentation link

Describe the area of code that needs more transparency:
Provide Auto-ML documentation link in the user guide for running models
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
If possible we should provide an easy way to expose deep links to the specific algorithms as part of our way to support Data Scientist making their work explainable

distplot is a deprecated

Describe the bug
distplot is a deprecated function and will be removed in a future version, we have to find an alternative to replace this function

To Reproduce
Steps to reproduce the behavior:

  1. in the DXC-Industrialized-AI-Starter.ipynb in colab, execute this command ai.plot_distributions(data1)
  2. you will get this warning
    distplot is a deprecated function and will be removed in a future version. Please adapt your code to use either displot (a figure-level function with similar flexibility) or histplot (an axes-level function for histograms). Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be data, and passing other arguments without an explicit keyword will result in an error or misinterpretation.

Expected behavior
WE have to find an alternative for this function

Screenshots
image

Additional context
N/A

Research- Add encryption to the published microservice

name title about labels assignees
Transparency Request Add encryption to the published microservice Research

Describe the area of code that needs more transparency:
Add encryption to the published microservice
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
research levels of best-practice security for different types of data. We could offer a parameter mapped to predefined configurations for low, medium, high, and/or extra-high levels of security.

Read data from local excel file in Colab

Describe the bug
Trying to upload an excel file (in Colab) but its failing with below error

read_data_frame_from_local_excel_file()
29 uploaded = files.upload()
30 excel_file_name = list(uploaded.keys())[0]
---> 31 df = pd.read_excel(io.BytesIO(uploaded[excel_file_name]))
32 return(df)
33

NameError: name 'io' is not defined

To Reproduce
Steps to reproduce the behavior:

  1. dataframe = ai.read_data_frame_from_local_excel_file()
  2. Browse the excel file (in Colab)
  3. See error

Expected behavior
Dataframe with the Excel data

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Verbose and succinct mode for running an experiment

By default ai.run_experiment() produces a lot of output. Can we add a parameter to the function that defines a verbose and succinct mode? In verbose mode, the function produces the full output, but in succinct mode, only a summary is output. I recommend making succinct mode the default.

Cannot upload a local file into Colab

Problem:
In the DXC_Industrialized_AI_Starter.ipynb notebook, there are a couple instances where it suggests you can upload a local file to colab. This doesn't work. The error message I get is :

MessageError: TypeError: Cannot read property '_uploadFiles' of undefined

ai.clean_dataframe() unable to parse date formatted in MM/D/YYYY

MM/D/YYYY and MM/DD/YYYY are two popular US date format and it appears the ai.clean_dataframe() function cannot recognize them.

ParserError: Could not match input '10/1/2020' to any of the following formats: YYYY-MM-DD, YYYY-M-DD, YYYY-M-D, YYYY/MM/DD, YYYY/M/DD, YYYY/M/D, YYYY.MM.DD, YYYY.M.DD, YYYY.M.D, YYYYMMDD, YYYY-DDDD, YYYYDDDD, YYYY-MM, YYYY/MM, YYYY.MM, YYYY, W

Add test cases links in contributing guide

name title about labels assignees
Transparency Request Add test cases links in contributing guide

Describe the area of code that needs more transparency:
Add test cases links in contributing guide, revisit the contribute guide document, and do the necessary changes.
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:

Unclear "Set up the development environment" instructions

Most of those attending the DXC-Industralized AI course have not used Colab before.
Some of the instructions for getting started in the "Set up the development environment" could be clearer.
e.g. "This code installs all the packages you'll need. Run it first."
Most people who have not used Colab before would not know how to do this.

Indicate the completeness or correctness of the data and show the outliers

name title about labels assignees
Transparency Request indicate the completeness or correctness of the data and show the outliers

Describe the area of code that needs more transparency:
indicate the completeness or correctness of the data and show the outliers
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
Logan comments:
Not just visualization, completeness, correctness, outliers, and other metrics should be saved statistics, too. We can’t force others to make their work explainable to an end-user, but we can ensure that their work is capable of being explained if they use our package.

Publishing a microservice of a custom model

In the documentation can you show an example of publishing a microservice from a custom model instead of a model generated from run_experiment() function? The custom model could be something as simple as a function that adds two number or appends text to an input.

[BUG] Error when importing ai package

Describe the bug
Error occurred when trying to import ai from dxc

To Reproduce
Steps to reproduce the behavior:

  1. used pip to install ai starter package
  2. from dxc import ai
  3. error

Expected behavior
ai was getting loaded earlier

Screenshots
image

[BUG] Unable to run AI experiments for Time-Series problems

Describe the bug
I am unable to run AI experiments for Time-Series problems

To Reproduce
When I passed timeseries as a parameter, I got an error as below:

image

I tried to pick up an example from https://nbviewer.jupyter.org/github/dxc-technology/DXC-Industrialized-AI-Starter/blob/c58754247060262ac0949396e48f71861cb79d4e/Examples/Time_series_Model.ipynb

on setting the value : "model": 'timeseries',
The timeseries values are not displaying as expected. Instead it shows the same value for all predictions

Please let me a way to handle timeseries problems

Expected behavior
Please create a function for Time-Series problem. Please revoke the functionality. As it looks like, it was already implemented

Screenshots
Added the images

Additional context
Add any other context about the problem here.

custom dataset creation for images

Resize the pictures
Convert all images into the same file format
Merging the images into a single file
Convert images into a CSV file
Few Changes to the CSV file
Loading the CSV file

Code failed when try to import dxc ai - bug level : Blocker

Describe the bug
The code fails when I run the part responsible for importing ai from DXC (from dxc import ai)

To Reproduce
Steps to reproduce the behavior:

  1. pip install DXC library (First block of code)
  2. run the second block of code (from dxc import ai)

Expected behavior
The code should import the ai from DXC library without any issue

Screenshots
image

Additional context
Non

Handle column names in data pipeline

User facing issue with column names in "access_data_from_pipeline" function in below scenarios:

  1. When column names are case sensitive.
  2. When column have SPACE in between.

So fix need to be done to:

  1. Handle column names case sensitive.
  2. Handle column with SPACE in between to replace with “_”.

[BUG] Pandas.IO.JSON.JSON_Normalize is depreceated, (Build Data Pipeline)

Describe the bug
the lib pandas.io.json.json_normalize is depreceated, recommendation to use pandas.json_normalize instead

To Reproduce
Steps to reproduce the behavior:

  1. Go to build data pipe line in Collab
  2. Click on the cell #TODO: Define the code needed to refine the raw data
  3. Run cell
  4. See error

Expected behavior
no error from Pandas lib

Screenshots
image

[BUG] Loose Dependency Resolver Constraints

Describe the bug
Pip install DXC_Industrialized_AI_Starter-2.3.9-py3-none-any.whl takes several minutes and times out in Google Colab due to multiple versions of libraries being downloaded

To Reproduce
Steps to reproduce the behavior:
run pip install

Expected behavior
The starter to load in at the most 10 minutes

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Rename the variable name in notebook

name title about labels assignees
Transparency Request Rename the variable name in notebook

Describe the area of code that needs more transparency:
Rename the variable name representing the dataset after cleaning the data (ex: Raw_data, Clean_data)
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:

Cant resubmit after rework on badge

After getting a "rework" status on a badge, doing the changes required and then resubmits gets you an error response that an assertion already exists:

<Response [400]> { "errorMessage": { "statusCode": 400, "exception": "BadRequest", "message": "Assertion already exists", "payload": { "evidence": "https://colab.research.google.com/drive/1oMUEVLS5x1netqWaL0kAt8FVR-ABeJ4U", "lastUpdated": "2021-02-10T15:37:33Z", "badge": "Create a Data Story", "status": "rework", "created": "2021-02-09T16:06:25Z", "comments": [ { "date": "2021-02-10T15:37:33Z", "comment": "Hello, nice work, but for the sample data set area, please upload your data set, not the iris file.", "email": "[email protected]" } ], "email": "[email protected]", "d1": "user:6366a530-391b-41be-bbff-6f372658afef", "d2": "badge:dd05bbdf-ad5b-469d-ab2c-4dd218fd68fe", "salt": "ecaa9028a8feb321be864bf98ac1ebe6", "reviewer": "[email protected]", "sk": "assertion", "pk": "assertion:93372602-81eb-47dc-bf05-5bc475c276b6" } } }

image

Add the drift function after pipeline

name title about labels assignees
Transparency Request Add the drift function after pipeline

Describe the area of code that needs more transparency:
Add the drift function after pipeline
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
Calculate the drift between given two data sets over a period of time or between training sets

Creation of new algorithm through API.

In publish microservice, we need to create a new algorithm if that is not existing in algorithmia. It was not working in our previous code and did changes to make it run.

Verbose and succinct mode for publishing a micro-service

By default ai.publish_microservice() produces a lot of output. Can we add a parameter to the function that defines a verbose and succinct mode? In verbose mode, the function produces the full output, but in succinct mode, only a summary is output. I recommend making succinct mode the default.

Include metadata details

name title about labels assignees
Transparency Request Include metadata details

Describe the area of code that needs more transparency:
Include metadata details while writing data into MongoDB
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
Include in the code as a minimum, but best in Mongo
Where did we got the data from, did we ran cleaner on the data
What version of data is
Where we stored the data
This is a manual entry

Add logs to the crucial steps

name title about labels assignees
Transparency Request Add logs to the crucial steps

Describe the area of code that needs more transparency:
Add logs to the crucial steps that give feasibility to the user to revert changes
Describe the solution you'd like:
Describe the alternatives you've considered:
Additional context or comments:
Add logs to the crucial steps(aggregation) which give the feasibility to the user to revert changes if something goes wrong.
Log storage options, recommend config options (at least) for both Mongo Atlas DB and local storage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.