Coder Social home page Coder Social logo

automlwhitebox's Introduction

Sberbank version of AutoWoE

GitHub all releases PyPI - Downloads

This is the repository for AutoWoE library, developed by Sber AI Lab AutoML group. This library can be used for automatic creation of interpretable ML model based on feature binning, WoE features transformation, feature selection and Logistic Regression.

Authors: Vakhrushev Anton, Grigorii Penkin, Alexander Kirilin

Library setup can be done by one of three scenarios below:

  1. Installation from PyPI:
pip install autowoe
  1. Installation from source code

First of all you need to install git and poetry.

# Load LAMA source code
git clone https://github.com/sberbank-ai-lab/AutoMLWhitebox.git

cd AutoMLWhiteBox/

# !!!Choose only one item!!!

# 1. Recommended: Create virtual environment inside your project directory
poetry config virtualenvs.in-project true

# 2. Global installation: Don't create virtual environment
poetry config virtualenvs.create false --local

# For more information read poetry docs

# Install WhiteBox
poetry install

Usage tutorials are in Jupyter notebooks in the repository root. For parameters description take a look at parameters_info.md.

Bugs / Questions / Suggestions::

automlwhitebox's People

Contributors

alexmryzhkov avatar btbpanda avatar burlakovsber avatar cybsloth avatar vabun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

automlwhitebox's Issues

Возможность добавления кастомных метрик в Summary отчета

Всем привет!
Спасибо большое за работу.

Рассмотрите пожалуйста возможность добавления кастомных метрик в Summary отчета. Например метрика может быть ROC-AUC, но на несбалансированном таргете для оценки эффективности модели иногда важно смотреть f1, ROC PRC и т.п. и смотреть как меняется качество модели.

ValueError for calculating roc_auc_score for multiclass target

As I've seen

Supported target types are: ('binary', 'multiclass').

But when using multiclass target there is a ValueError for calculating roc_auc_score

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_12526/2714231335.py in <module>
----> 1 oof_preds_woe, real_test_preds_woe = get_oof_and_test_pred(train[[*features, TARGET2]], test[[*features]])

/tmp/ipykernel_12526/1487485061.py in get_oof_and_test_pred(tr, real_te)
     27                            verbose=0)
     28 
---> 29         auto_woe.fit(X_tr, target_name=TARGET2)
     30 
     31         val_pred = auto_woe.predict_proba(X_val)

~/raif/venv/lib/python3.8/site-packages/autowoe/lib/autowoe.py in fit(self, train, target_name, features_type, group_kf, max_bin_count, features_monotone_constraints, validation)
    481 
    482         logger.info("Feature selection...")
--> 483         selector = Selector(interpreted_model=self.params['interpreted_model'],
    484                             train=self.train_df,
    485                             target=self.target,

~/raif/venv/lib/python3.8/site-packages/autowoe/lib/selectors/selector_last.py in __init__(self, interpreted_model, train, target, features_type, n_jobs, cv_split)
     33         """
     34         self.__features_fit = list(features_type.keys())
---> 35         self.__pearson_selector = ComposedSelector(train, target)
     36         self.__main_selector = L1(train=train,
     37                                   target=target,

~/raif/venv/lib/python3.8/site-packages/autowoe/lib/selectors/composed_selector.py in __init__(self, train, target)
     38         cc = np.abs(sp.corrcoef(train.values, rowvar=False))
     39         self.precomp_corr = pd.DataFrame(cc, index=train.columns, columns=train.columns)
---> 40         self.precomp_aucs = pd.Series([1 - roc_auc_score(target, train[x]) for x in train.columns],
     41                                       index=train.columns)
     42 

~/raif/venv/lib/python3.8/site-packages/autowoe/lib/selectors/composed_selector.py in <listcomp>(.0)
     38         cc = np.abs(sp.corrcoef(train.values, rowvar=False))
     39         self.precomp_corr = pd.DataFrame(cc, index=train.columns, columns=train.columns)
---> 40         self.precomp_aucs = pd.Series([1 - roc_auc_score(target, train[x]) for x in train.columns],
     41                                       index=train.columns)
     42 

~/raif/venv/lib/python3.8/site-packages/sklearn/metrics/_ranking.py in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
    557             )
    558         if multi_class == "raise":
--> 559             raise ValueError("multi_class must be in ('ovo', 'ovr')")
    560         return _multiclass_roc_auc_score(
    561             y_true, y_score, labels, multi_class, average, sample_weight

ValueError: multi_class must be in ('ovo', 'ovr')

Update (Example_1) BasicUsageAndParams.ipynb error

В последнем блоке кода формирования отчета появляется ошибка:
TypeError: unsupported operand type(s) for -: 'str' and 'float'

error in:
module utilities_images.py
function def plot_modules_weights
ax.bar(features.index, features.values, color='g')

Also, in example_2 and example_3.

Выбор способа оценки коррелиции

Добрый день! Спасибо огромное за работу, классно!
Хотела уточнить: почему смотрите корреляцию Пирсона, а не Крамера? Кажется, что последняя больше подходит для оценки силы взаимосвязи категориальных переменных (woe)

ReportDeco.generate_report fails when predict_proba returns 1.0 on train data

     68                          "target_descr": "___ОПИСАНИЕ ЦЕЛЕВОГО СОБЫТИЯ___",
     69                          "non_target_descr": "___ОПИСАНИЕ НЕЦЕЛЕВОГО СОБЫТИЯ___"}
---> 70         autowoe_model.generate_report(report_params)
     71 
     72         accuracy_on_valid = accuracy_score(test_data[target_name], (test_preds > 0.5).astype(int))

~/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/autowoe/lib/report/report.py in generate_report(self, report_params, groupby)
    377 
    378             # Split score into 10 bins for train and test
--> 379             train_binned, test_binned = self.__get_binned_data(10)
    380             names = ['binned_train_total.png', 'binned_train_posneg.png']
    381             plot_binned(

~/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/autowoe/lib/report/report.py in __get_binned_data(self, bin_count)
    527                 df['ScoreBin'] = pd.cut(df['Score'], bins, retbins=False)
    528             else:
--> 529                 df['ScoreBin'], bins = pd.cut(df['Score'], bin_count, retbins=True)
    530 
    531         return train_binned, test_binned

~/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
    241         if np.isinf(mn) or np.isinf(mx):
    242             # GH 24314
--> 243             raise ValueError(
    244                 "cannot specify integer `bins` when input data contains infinity"
    245             )

ValueError: cannot specify integer `bins` when input data contains infinity

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.