Coder Social home page Coder Social logo

IndexError: LightGBMLSS.train(valid_sets=[dataset_val]) calls set_valid_margin, which seems to expect both train+val about lightgbmlss HOT 7 CLOSED

statmixedml avatar statmixedml commented on August 25, 2024
IndexError: LightGBMLSS.train(valid_sets=[dataset_val]) calls set_valid_margin, which seems to expect both train+val

from lightgbmlss.

Comments (7)

StatMixedML avatar StatMixedML commented on August 25, 2024 1

Sure, thanks - I assume there is no issue by calling set_init_score twice on the training set if it happens to be included in the list of validation sets too? (once at the start of train() and then again during set_valid_margins)

It shouldn't since it is only over-writing the existing values again.

from lightgbmlss.

ninist avatar ninist commented on August 25, 2024

set_valid_margin may also have unexpected behaviour if passed in several - for example 10 - validation sets, as it only processes the first two in the list.

from lightgbmlss.

ninist avatar ninist commented on August 25, 2024

Do these changes make sense?

--- model.py	2023-10-11 12:39:06.000991201 +0200
+++ model.py	2023-10-13 13:57:18.200670834 +0200
@@ -165,8 +165,16 @@
         self.set_params(params)
         self.set_init_score(train_set)
 
+        # Set the base margin for the validation sets.
+        # self.start_values should already be initialized after the call above
+        # to self.set_init_score(train_set).
         if valid_sets is not None:
-            valid_sets = self.set_valid_margin(valid_sets, self.start_values)
+            # Avoid set_init_score calculating start_values from valid_set.
+            if self.start_values is None:
+                raise ValueError("start_values were not set/calculated from train_set yet.")
+
+            for valid_set in valid_sets:
+                self.set_init_score(valid_set)
 
         self.booster = lgb.train(params,
                                  train_set,
@@ -547,39 +555,6 @@
         elif plot_type == "Feature_Importance":
             shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])
 
-    def set_valid_margin(self,
-                         valid_sets: list,
-                         start_values: np.ndarray
-                         ) -> list:
-        """
-        Function that sets the base margin for the validation set.
-
-        Arguments
-        ---------
-        valid_sets : list
-            List of tuples containing the train and evaluation set.
-        valid_names: list
-            List of tuples containing the name of train and evaluation set.
-        start_values : np.ndarray
-            Array containing the start values for the distributional parameters.
-
-        Returns
-        -------
-        valid_sets : list
-            List of tuples containing the train and evaluation set.
-        """
-        valid_sets1 = valid_sets[0]
-        init_score_val1 = (np.ones(shape=(valid_sets1.get_label().shape[0], 1))) * start_values
-        valid_sets1.set_init_score(init_score_val1.ravel(order="F"))
-
-        valid_sets2 = valid_sets[1]
-        init_score_val2 = (np.ones(shape=(valid_sets2.get_label().shape[0], 1))) * start_values
-        valid_sets2.set_init_score(init_score_val2.ravel(order="F"))
-
-        valid_sets = [valid_sets1, valid_sets2]
-
-        return valid_sets
-
     def save_model(self,
                    model_path: str
                    ) -> None:

From what I can see, self.set_valid_margin performs the same operations as self.set_init_score, with the exception of using provided start_values. These provided start values are self.start_values, which get initialized in self.set_init_score the first time it is called.
https://github.com/StatMixedML/LightGBMLSS/blob/master/lightgbmlss/model.py#L89-L105

For now it looks possible to reuse the code from self.set_init_score for the valid sets as well, with an extra check to make sure the start values already exist when self.set_init_score(val_set) is called.

from lightgbmlss.

StatMixedML avatar StatMixedML commented on August 25, 2024

Thanks for your interest in the project. I am on vacation until end of October, so please expect some delay in my reply.

from lightgbmlss.

StatMixedML avatar StatMixedML commented on August 25, 2024

@ninist Sorry for the very late response.

The initial code assumed that there are always two datasets in eval_sets. However, as you pointed out, it makes more sense to loop over the list so that it is also possible to have only one dataset in the list. I pushed the changes, using your suggested code

 for valid_set in valid_sets:
        self.set_init_score(valid_set)

Many thanks!

from lightgbmlss.

StatMixedML avatar StatMixedML commented on August 25, 2024

@ninist Can I close this?

from lightgbmlss.

ninist avatar ninist commented on August 25, 2024

Sure, thanks - I assume there is no issue by calling set_init_score twice on the training set if it happens to be included in the list of validation sets too? (once at the start of train() and then again during set_valid_margins)

from lightgbmlss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.