facebookresearch / lamcts Goto Github PK

View Code? Open in Web Editor NEW

433.0 18.0 70.0 38.03 MB

The release codes of LA-MCTS with its application to Neural Architecture Search.

License: Other

Python 99.42% Shell 0.58%

lamcts's People

Contributors

Stargazers

Watchers

Forkers

linnanwang rfonseca github30 shercklo xrosliang hubayirp bingchuanwang phoenixera qifengweng yard1 huang-jinyun huyiming2016 fage452 zjamy-hust muximuxi zhihaoliu-git chensonvan aoyuqc olegjakushkin cosmicbhejafry fredysu weichenliu jinbest17 kdpan bigcolin mldl sxxtyz triper1022 lengjia sidphbot longjohncoder ai-natural-language-processing-lab deepandy manikant92 memazouni snapbuy mariusanje xychen2022 liguoyu1 yangziyi1990 qinxuliang1997 ain-soph songlei00 frankinwi wangfeng012316 sunnylyz jacobrgardner zclllyybb ganfoubudalian licq328 zhaisilong yunyuewei germayneng animeshbchowdhury wheatjo momo1578 amifalk navidpanchi victorcallejas juhyeon12345 agier9 evolutionary-intelligence marina-du jskim91998 juju-ren qianyiliu309 elotech47 xitiandi

lamcts's Issues

TuRBO experiment reproduction

Hi, I want to reproduce the experiment using TuRBO. It seems that following the steps in https://github.com/facebookresearch/LaMCTS/tree/main/LA-MCTS can not successfully run the commented code. I have some questions about current code.

Line 362 in Classifier.py. It seems that we need to input initial points into TuRBO agent. But the original TuRBO implementation does not support this customization.
In the paper, you mentioned that "we restrict the TuRBO to uniformly sample from the intersection of the bounding box and $\Omega_{selected}$ ". But I didn't find the implementation in current code.

Could you give me some suggestions about TuRBO reproduction?

Clustering features

In the line

LaMCTS/LA-MCTS/Classifier.py

Line 418 in fa76cb5

tmp = np.concatenate( (self.X, self.fX.reshape([-1, 1]) ), axis = 1 )

the clustering in nodes is done based on both x and f(x) though in the paper it is said that clustering is performed based on f(x). Could you please clarify, is f(x) considered as just another variable of x for clustering?

Cannot reproduce the results of Ackley-20d in the paper

Hi, thank you for your good work.

Now I am trying to reproduce the results of Ackley-20d in the LAMCTS paper (Fig. 4). I use the same hyper-parameters as (cp=1, leaf size=20) and run:
cd LA-MCTS
python run.py --func ackley --dims 20 --iterations 1000

I have repeated the process for 20 times, however I cannot obtain the the results in the paper. Is there anything I need to change to reproduce the results?

关于search中函数值计算的一些问题

我注意到在collect_samples function中对函数值是做了*-1，我的理解是想取到函数的最小值，但是为何在输出 best f(x)时使用np.absolute(curt_best_value)？

Hi can you reproduce the acc in cifar10 now

Hi
I am now reproducing the acc in cifar10 (99%),since the process is very slow and i get limited gpus ,also i noticed that the previous issue about lanet ,so i want to know what configs should i use.
tks a lot
adamas

I wonder if I can get all the test functions about the mujoco task？

How long did it take the author to train the LaNet with 1000 EPOCH.

The training speed on the V100 without using any tricks is about 0.23h/epoch, which is too slow. If autoaug and ema are used, the speed will become more slowly.
So I would like to ask: how long did it take the author to train the LaNet with 1000 EPOCH.

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

When I use the scripts:
python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 1500
to train the cifar-10 dataset, I met this runtime error
How could I fix this?
I just use one P100 GPU
my pytorch version is 1.11.0+cu113

Python package

Are there any plans to add setuptools scripts (or other), so that LA-MCTS and LaNAS can be packaged and installed through pip? It would greatly simplify development of projects that would like to use them as dependencies.

I would be happy to help out with this.

Discrete Search Space

Hello,
I'd like to apply LaMCTS to a discrete but highly dimensional search space (e.g., 200 dimensions).
Each dimension, however, has only four options.
Would LaMCTS be suitable for this? How hard is it to adapt the code for this?

Can i use LaMCTS to optimize black box functions with discrete features ?

Questions about one-shot LaNAS

Hi, when I try to replicate the one-shot LaNAS with the source code, I cannot get the curve as in the paper.

Then I check the code and find something strange, could you please help me work it out?
(1) In one-shot LaNAS/LaNAS/Classifier.py, it seems that the learning rate is too small and the learned linear model is far from the optimal. I think the learning rate should be bigger (such as 0.01), is that true?

(2) When (1) is solved, I find that in function search_samples_under_constraints in one-shot LaNAS/LaNAS/MCTS.py, why just a pair of W and b is considered? And when I change the code into considering all Ws and bs, it seems to be hard to retrieve a sample under all constraints. How to solve this issue?

(3) In the search function in one-shot LaNAS/LaNAS/MCTS.py, the tree is updated after each sample is evaluated, but in other scripts (like Distributed_LanAS and LaNAS_NASBench101), the tree is updated after evaluating 20/50 samples. How often should the "learning phase" be recondutced (or in other words, how many samples should be retrieved in each "searching phase")?
(4) Which Cp should be chosen to replicate the result of one-shot LaNAS? I find it very big (Cp=10) in the code and it seems a little bit unreasonable.

Question regarding the Mujoco environments

Dear authors,

Thank you for open-sourcing the code! I have a few questions regarding the Mujoco environments. First, I wonder how the mean and std arrays of `

class Hopper:

def __init__(self):
    self.mean    = np.array([1.41599384, -0.05478602, -0.25522216, -0.25404721, 
                             0.27525085, 2.60889529,  -0.0085352, 0.0068375, 
                             -0.07123674, -0.05044839, -0.45569644])
    self.std     = np.array([0.19805723, 0.07824488,  0.17120271, 0.32000514, 
                             0.62401884, 0.82814161,  1.51915814, 1.17378372, 
                             1.87761249, 3.63482761, 5.7164752 ])`

are computed?

I am also hoping to figure out how to construct the equivalent function evaluation class for 2dWalker, HalfCheetah, Ant, and Humanoid as reported in the paper. As these are not provided in the repo, could you let me know how to do so? Do they also require some hard-coded mean and std values to normalize the policy matrix M?

Thank you very much!

latnet cutmix import not found

Hello I have problem to run your code on CiFar10
i cannot find the function cutmix
from cutmix.cutmix import CutMix from cutmix.utils import CutMixCrossEntropyLoss

Assertion error in Node.py update_kids() after increasing batch_size TuRBO-1

Hello,

I am testing LA-MCTS together with TuRBO-1 and found it works well. I tried to increase the batch_size in TuRBO-1 to reduce the time spent on GP fitting. From what I got, the TuRBO paper indicates this is a legit choice for increasing overall computational efficiency without affecting the quality of the results.

To this end, I only modified collect_samples() in MCTS.py to handle the multiple values returned by each TuRBO-1 iteration. Nothing particularly ambiguous here.

And now on rare occasion I get an AssertionError thrown at line 47 in update_kids() in Node.py:

assert self.kids[0].classifier.get_mean() > self.kids[1].classifier.get_mean()

I'm not sure why just yet. And I don't understand why returning more samples from TuRBO-1 could trigger this.

Could you please indicate possible reasons for this?

I believe increasing batch_size in TuRBO-1 is something desirable and could benefit many.

Or am I misunderstanding something in the interplay between LA-MCTS and TuRBO-1?

Thanks for the great algo!

In high-dimensional tasks, does the root node still represent the entire search space?

Hi,

For example, the search space size of NASNet is 3.5x10^21. it is [3.5x10^21, 160] after encoding all arch to masks. this array is quite huge, so it can not work on my device. so how do you deal with it?

Thanks in advance!

Does Cp support negative value?

The objective goal is to maximize my function f(range from 100 to 300). To use LaMCTS, I minimize the -f returning to LaMCTS. Just wanna double-check in this case Cp should be -10, correct? Thank you.

The prediction results of kmean are all the same & cause the svm error

Hi,

Recently I am trying to use LA-MCTS to optimize my own task, but some errors occurred during the execution.

My task has 9 dimensions, and the HP setting is:

Cp = 10 ( follow the suggestion from the paper, ~= 10% of max f(x) )
leaf_size = 10
ninits = 40
kernel_type = "rbf"

Here is the error log:

Traceback (most recent call last):
  File "mcts-exec.py", line 30, in <module>
    agent.search(iterations = args.iterations)
  File "/root/workspace/lamcts/MCTS.py", line 239, in search
    self.dynamic_treeify()
  File "/root/workspace/lamcts/MCTS.py", line 116, in dynamic_treeify
    bad_kid.update_bag(  bad_kid_data  )
  File "/root/workspace/lamcts/Node.py", line 83, in update_bag
    self.is_svm_splittable = self.classifier.is_splittable_svm()
  File "/root/workspace/lamcts/Classifier.py", line 55, in is_splittable_svm
    self.learn_boundary(plabel)
  File "/root/workspace/lamcts/Classifier.py", line 410, in learn_boundary
    self.svm.fit(self.X, plabel)
  File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 173, in fit
    y = self._validate_targets(y)
  File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 560, in _validate_targets
    " class" % len(cls))
ValueError: The number of classes has to be greater than one; got 1 class

Since it occurs at Classifier.learn_clusters(), I print the result of self.kmean.predict(tmp) and find some clues:

The error will occur if plabel (the results of self.kmean.predict(tmp)) are all the same

# normal
plabel: [0 1 1 0 0 0 1 0 0 1 1]
# exception
plabel: [0 0 0 0 0]

I temporarily avoid this exception by making is_splittable_svm return False when plabel only contains a single class.

However, I would like to know that is it possible to happen in the general case? Or it may be caused by my own function?

Thank you for the work & sharing your code!

Why propose_samples_turbo is commented?

Why does not provide TurBO code in Classifier.py ?
I notice the method of def propose_samples_turbo(self, num_samples, path, func) in https://github.com/facebookresearch/LaMCTS/blob/master/LA-MCTS/Classifier.py#L343 is commented.

Reproducing CIFAR10 experiment

Hi,

firstly, thank you for great work and for sharing your code!

I'm trying to run train the best architecture on CIFAR-10, yet I cannot seem to get the reported numbers, I guess I must be missing something. I tried both 1500 epochs as well as 600 as you suggested in another ticket, but still the accuracy seems lower than 99%. Is there perhaps another hyperparameter setting I need to adjust ?

python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 600

`2021-01-28 03:44:52,831 gpu device = 0
2021-01-28 03:44:52,831 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=600, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-600-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-28 03:44:56,852 param size = 44.591498MB
2021-01-28 03:44:59,182 epoch 0 lr 2.499966e-02
...
2021-02-03 00:52:40,405 valid_acc: 98.410000
2021-02-03 00:52:40,405 best_acc: 98.490000
2021-02-03 00:52:42,316 epoch 599 lr 0.000000e+00

or for 1500 epochs respectivelly: ```

2021-01-11 10:40:08,600 gpu device = 0
2021-01-11 10:40:08,600 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=1500, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-1500-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-11 10:40:11,371 param size = 44.591498MB
2021-01-11 10:40:12,744 epoch 0 lr 2.499995e-02


2021-01-27 07:02:55,357 valid_acc: 98.250000
2021-01-27 07:02:55,358 best_acc: 98.310000
2021-01-27 07:02:57,400 epoch 1499 lr 5.483112e-08

Thank you!
Lukas

Use the K-Means labels or the SVM prediction labels to separate the samples into kid nodes?

Hello, I am now re-implementing the LA-MCTS according to the paper. My question is:
I use the labels predicted by SVM to decide which nodes of the parent should go to the good kid or bad kid. However, the SVM predicted labels are NOT always the same as those of K-Means. In some extreme cases, the mean of the bad kid was bigger than that of the good kid, due to the SVM classification error.
I checked the code of yours (as below), I found K-Means labels were used, instead of the SVM labels. I don't understand why should use the clustering labels, they don't represent the splitting, right?

def split_data(self):
        good_samples = []
        bad_samples = []
        train_good_samples = []
        train_bad_samples = []
        if len(self.samples) == 0:
            return good_samples, bad_samples

        plabel = self.learn_clusters()
        self.learn_boundary(plabel)

        for idx in range(0, len(plabel)):
            if plabel[idx] == 0:
                # ensure the consistency
                assert self.samples[idx][-1] - self.fX[idx] <= 1
                good_samples.append(self.samples[idx])
                train_good_samples.append(self.X[idx])
            else:
                bad_samples.append(self.samples[idx])
                train_bad_samples.append(self.X[idx])

        train_good_samples = np.array(train_good_samples)
        train_bad_samples = np.array(train_bad_samples)

        assert len(good_samples) + len(bad_samples) == len(self.samples)

        return good_samples, bad_samples

Cannot reproduce the results of Ant-v2 in the paper LAMCTS

Hi, recently we try to reproduce the result of mujoco task Ant-v2 in your paper. However we found all algorithms include LAMCTS cannot find reward more than -1000 after 10000 iterations. (In the paper LAMCTS should get a reward more than 1000.)

According to your paper, We use mean and std for linear policy from https://github.com/modestyachts/ARS and use 20 rollouts to get an average reward. (We use turbo as the local optimizer)

Actually, we got similar results of the paper in simple tasks like Swim, Hopper and HalfCheetah. But for Ant-888d task, we cannot reproduce the result in your paper. Can you reproduce the Ant-v2 result using the current code? Is there anything we need to change to reproduce the results?

multi gpu run

Hi, is there a multi-gpu running option?
when I try to set --gpu=0,1 .. I get an error.

I'm asking because I saw that the checkpoint that got 99% acc had 96 batch size and on one GPU I can't run a train with 96 batch size.

Thanks

Genetic algorithm

I would like to test it out for genetic algorithm. But I'm not sure where the fitness function, or the parent selection function would go. Would you have a demo that perharps applies this?

Are there any other ways to get the CIFAR10 pre-trained checkpoint?

Hi,
When I tried to download the pre-trained checkpoint lanas_128_99.03.zip from the url https://drive.google.com/file/d/1bZsEoG-sroVyYR4F_2ozGLA5W50CT84P/view?usp=sharing you provided, I failed. Therefore, are there any other ways to get the pre-trained checkpoint?

A question on the readme.md

I found in the readme that the command python train.py --auxiliary --batch_size=32 --init_ch=128 --**layer**=24 sets the parameter **layer** to 24. However, in the actual code, it should be specified as **layers** using parser.add_argument('--**layers**', type=int, default=20, help='total number of layers'). I am not sure if using 20 for layers can achieve an accuracy of 99.03%.