I call LightGBMLSS.train like this <div class="snippet-clipboard-content notransla

Do these changes make sense? <div class="highlight highlight-source-diff notransla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

IndexError: LightGBMLSS.train(valid_sets=[dataset_val]) calls set_valid_margin, which seems to expect both train+val about lightgbmlss HOT 7 CLOSED

statmixedml commented on August 25, 2024

IndexError: LightGBMLSS.train(valid_sets=[dataset_val]) calls set_valid_margin, which seems to expect both train+val

from lightgbmlss.

Comments (7)

StatMixedML commented on August 25, 2024 1

Sure, thanks - I assume there is no issue by calling set_init_score twice on the training set if it happens to be included in the list of validation sets too? (once at the start of train() and then again during set_valid_margins)

It shouldn't since it is only over-writing the existing values again.

from lightgbmlss.

ninist commented on August 25, 2024

set_valid_margin may also have unexpected behaviour if passed in several - for example 10 - validation sets, as it only processes the first two in the list.

from lightgbmlss.

ninist commented on August 25, 2024

Do these changes make sense?

--- model.py	2023-10-11 12:39:06.000991201 +0200
+++ model.py	2023-10-13 13:57:18.200670834 +0200
@@ -165,8 +165,16 @@
         self.set_params(params)
         self.set_init_score(train_set)
 
+        # Set the base margin for the validation sets.
+        # self.start_values should already be initialized after the call above
+        # to self.set_init_score(train_set).
         if valid_sets is not None:
-            valid_sets = self.set_valid_margin(valid_sets, self.start_values)
+            # Avoid set_init_score calculating start_values from valid_set.
+            if self.start_values is None:
+                raise ValueError("start_values were not set/calculated from train_set yet.")
+
+            for valid_set in valid_sets:
+                self.set_init_score(valid_set)
 
         self.booster = lgb.train(params,
                                  train_set,
@@ -547,39 +555,6 @@
         elif plot_type == "Feature_Importance":
             shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])
 
-    def set_valid_margin(self,
-                         valid_sets: list,
-                         start_values: np.ndarray
-                         ) -> list:
-        """
-        Function that sets the base margin for the validation set.
-
-        Arguments
-        ---------
-        valid_sets : list
-            List of tuples containing the train and evaluation set.
-        valid_names: list
-            List of tuples containing the name of train and evaluation set.
-        start_values : np.ndarray
-            Array containing the start values for the distributional parameters.
-
-        Returns
-        -------
-        valid_sets : list
-            List of tuples containing the train and evaluation set.
-        """
-        valid_sets1 = valid_sets[0]
-        init_score_val1 = (np.ones(shape=(valid_sets1.get_label().shape[0], 1))) * start_values
-        valid_sets1.set_init_score(init_score_val1.ravel(order="F"))
-
-        valid_sets2 = valid_sets[1]
-        init_score_val2 = (np.ones(shape=(valid_sets2.get_label().shape[0], 1))) * start_values
-        valid_sets2.set_init_score(init_score_val2.ravel(order="F"))
-
-        valid_sets = [valid_sets1, valid_sets2]
-
-        return valid_sets
-
     def save_model(self,
                    model_path: str
                    ) -> None:

From what I can see, self.set_valid_margin performs the same operations as self.set_init_score, with the exception of using provided start_values. These provided start values are self.start_values, which get initialized in self.set_init_score the first time it is called.
https://github.com/StatMixedML/LightGBMLSS/blob/master/lightgbmlss/model.py#L89-L105

For now it looks possible to reuse the code from self.set_init_score for the valid sets as well, with an extra check to make sure the start values already exist when self.set_init_score(val_set) is called.

from lightgbmlss.

StatMixedML commented on August 25, 2024

Thanks for your interest in the project. I am on vacation until end of October, so please expect some delay in my reply.

from lightgbmlss.

StatMixedML commented on August 25, 2024

@ninist Sorry for the very late response.

The initial code assumed that there are always two datasets in eval_sets. However, as you pointed out, it makes more sense to loop over the list so that it is also possible to have only one dataset in the list. I pushed the changes, using your suggested code

 for valid_set in valid_sets:
        self.set_init_score(valid_set)

Many thanks!

from lightgbmlss.

StatMixedML commented on August 25, 2024

@ninist Can I close this?

from lightgbmlss.

ninist commented on August 25, 2024

Sure, thanks - I assume there is no issue by calling set_init_score twice on the training set if it happens to be included in the list of validation sets too? (once at the start of train() and then again during set_valid_margins)

from lightgbmlss.

IndexError: LightGBMLSS.train(valid_sets=[dataset_val]) calls set_valid_margin, which seems to expect both train+val about lightgbmlss HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent