Deion Reading through the code it looks like the base XR lin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Awesome thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hov

I would love support for Cost Sensitive Classification about pecos HOT 2 CLOSED

amzn commented on June 14, 2024

I would love support for Cost Sensitive Classification

from pecos.

Comments (2)

OctoberChang commented on June 14, 2024

@simonhughes22 , thanks for your interest in PECOS and the usage of cost-sensitive learning!
As of version=v0.1.0, we did not support cost-sensitive learning at the high level Python API HierarchicalMLModel.train because of three reasons:
(1) How to design a proper relevance matrix R is very task-specific; a poorly-designed R may deteriorate performance.
(2) The scale of relevance matrix R will affect balance between the loss function and the regularization term. Thus, Cp in PECOS needed to be tuned carefully.
(3) The value of relevance matrix needed to be greater than zero! Otherwise the loss function is no longer convex.

With those notes in mind, here is an example code snippet of cost-sensitive learning on the wiki10-31k XMC dataset.

import copy
import numpy as np
from sklearn.preprocessing import normalize
from pecos.core import clib
from pecos.utils import smat_util
from pecos.xmc import Indexer, LabelEmbeddingFactory
from pecos.xmc import HierarchicalMLModel, MLProblem, MLModel

# load your data
X_trn = smat_util.load_matrix("./wiki10-31k/attnxml/X.trn.npz")
X_tst = smat_util.load_matrix("./wiki10-31k/attnxml/X.tst.npz")
Y_trn = smat_util.load_matrix("./wiki10-31k/Y.trn.npz")
Y_tst = smat_util.load_matrix("./wiki10-31k/Y.tst.npz")
R_trn = copy.deepcopy(Y_trn) # only for XMC benchmarks, need to define your own!

# get cluster chain
label_feat = LabelEmbeddingFactory.create(Y_trn, X_trn, method="pifa")
C_chain = Indexer.gen(label_feat, "hierarchicalkmeans", nr_splits=32, max_leaf_size=100)

# one way to get relevance matrix chain for the hierarchical label tree of XR-Linear models
Y_chain, R_chain = [Y_trn], [R_trn]
for C in reversed(C_chain[1:]):
  Y_t = clib.sparse_matmul(Y_chain[-1], C, threads=32)
  Y_chain.append(Y_t.tocsc())
  # this is a heuristic we found useful for XMC benchmarks
  R_t = clib.sparse_matmul(R_chain[-1], C, threads=32)
  R_chain.append(R_t)
Y_chain.reverse()
R_chain = [normalize(R_t, norm="l1", axis=1) for R_t in R_chain[::-1]]

DO_COST_SENSITIVE = True
# cost-sensitive learning needs to tune the Cp hyper-parameters
Cp_list = [1.0, 8.0, 4.0] if DO_COST_SENSITIVE else [1.0, 1.0, 1.0]
model_chain = []
for t, (Y_t, C_t, R_t) in enumerate(zip(Y_chain, C_chain, R_chain)):
    train_params = MLModel.TrainParams.from_dict({"Cp": Cp_list[t]})
    pred_params = MLModel.PredParams()
    R_t = R_t if DO_COST_SENSITIVE else None 
    if t == 0:
        cur_prob = MLProblem(X_trn, Y_t, R=R_t)
    else:
        M_t = smat_util.binarized(Y_chain[t - 1].tocsc())
        cur_prob = MLProblem(X_trn, Y_t, R=R_t, C=C_t, M=M_t)
    cur_model = MLModel.train(cur_prob, train_params, pred_params)
    model_chain.append(cur_model)

hlm = HierarchicalMLModel(model_chain)
Y_tst_pred = hlm.predict(X_tst)
print(smat_util.Metrics.generate(Y_tst, Y_tst_pred))

Here are the evaluation results:

# for cost-sensitive learning, we got
# prec  = 85.72 80.86 75.01 70.03 65.24 61.45 57.90 54.67 51.83 49.27
# recall = 5.08 9.52 13.10 16.18 18.69 20.99 22.95 24.66 26.18 27.56

# For original xrlinear, the numbers are
# prec  = 84.54 78.98 73.13 68.27 64.10 60.53 57.22 54.26 51.65 49.21
# recall = 5.00 9.28 12.75 15.73 18.35 20.66 22.65 24.44 26.05 27.48

We are still investigating if there's a stable/robust default hyper-parameters for the cost-senstive learning usage.
And after the investigation, we may consider implementing this functionality at the higher level Python API such as HierarchicalMLModel.train.

If you find any successful case of using cost-sensitive PECOS on your applications, please let us know.
We would be very happy to hear your experience!

from pecos.

simonhughes22 commented on June 14, 2024

Awesome thanks @OctoberChang.

from pecos.

I would love support for Cost Sensitive Classification about pecos HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent