Hi, I'd like to create a custom bandit bernoulli where each context/state has a differ

Help with creating a custom bandit. Error message: cannot add bindings to a locked environment,about nth-iteration-labs/contextual

Comments (3)

g0ulash commented on May 28, 2024 1

Hi,

I suspect this is because you have to specify the variables you want to use, like so:

   public = list(
    weights = NULL,
    prob = NULL,
    p = NULL,
    class_name = "ContextualBernoulliBandit2",
    initialize = function(weights, prob) {

This is something R6 specific. I tried to adapt your code that way and it I did not get the error when initializing the bandit. (I've had this problem many times before and R is not always clear in this :-))

Here is your full bandit class that returns no errors for me:

ContextualBernoulliBandit2 <- R6::R6Class(
  inherit = ContextualBernoulliBandit,
  class = FALSE,
  public = list(
    weights = NULL,
    prob = NULL,
    p = NULL,
    class_name = "ContextualBernoulliBandit2",
    initialize = function(weights, prob) {
      self$weights     <- weights
      self$prob        <- prob
      if (is.vector(weights)) {
        self$weights <- matrix(weights, nrow = 1L)
      } else {
        self$weights <- weights               # d x k weight matrix
      }
      self$d           <- nrow(self$weights)  # d features
      self$k           <- ncol(self$weights)  # k arms
      self$p           <- length(self$prob)
    },
    get_context = function(t) {
      # generate d dimensional feature vector, one random feature active at a time
      Xa <- sample(c(1,rep(0,self$d-1)), prob = self$p)
      context <- list(
        X = Xa,
        k = self$k,
        d = self$d,
        p = self$p
      )
    },
    get_reward = function(t, context, action) {
      # which arm was selected?
      arm            <- action$choice
      # d dimensional feature vector for chosen arm
      Xa             <- context$X
      # weights of active context
      weight         <- Xa %*% self$weights
      # assign rewards for active context with weighted probs
      rewards        <- as.double(weight > runif(self$k))
      optimal_arm    <- which_max_tied(weight)
      reward  <- list(
        reward                   = rewards[arm],
        optimal_arm              = optimal_arm,
        optimal_reward           = rewards[optimal_arm]
      )
    }
  )
)

from contextual.

b-rodrigues commented on May 28, 2024

Thanks!

This indeed solves the issue, but I stumbled upon another mistake on my part which I corrected; actually we don't need to define self$p <- length(self$prob) (which was wrong anyways) and we can simply write Xa <- sample(c(1,rep(0,self$d-1)), prob = self$prob) direclty.

from contextual.

g0ulash commented on May 28, 2024

Great! Good luck with the rest :-)

If there's any other questions, do not hesitate to ask them of course.

from contextual.

Recommend Projects