human-analysis / neural-architecture-transfer Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 21.0 4.28 MB

Neural Architecture Transfer (Arxiv'20), PyTorch Implementation

Home Page: http://hal.cse.msu.edu/papers/neural-architecture-transfer/

Python 100.00%

neural-architecture-transfer's People

Contributors

Stargazers

Watchers

neural-architecture-transfer's Issues

Ask for the label file of flowers102

Hello, can you provide the label file of flowers102? The label file can not be downloaded from the web. Thank you~

Would you release searching code soon?

Hi,
I'm wondering whether you would release searching code(i.e. evolutionary search) soon or not.

Mismatch between paper and code

Hi!
Your paper is fantastic. I notice a small mismatch between the paper and the code. In section "3.2 Search Space and Encoding", paper says "Each stage in turn comprises multiple layers, and each layer it self is an inverted residual bottleneck structure [54]." and the same can be seen in figure 2a. However, your inverted residual is different from inverted residual block proposed in MobileNetV2.

Specifically, the line 43 (res += self.shortcut(x)). MobileNetV2 inverted residual block doesn't apply the shortcut.

It's a bug on the released code or it's a deliberated change? If it's the second, I think that should be stated in the section 3.2 and figure 2a of the paper,

MobileInvertedResidualBlock code:

class MobileInvertedResidualBlock(MyModule):

    def __init__(self, mobile_inverted_conv, shortcut, drop_connect_rate=0.0):
        super(MobileInvertedResidualBlock, self).__init__()

        self.mobile_inverted_conv = mobile_inverted_conv
        self.shortcut = shortcut
        self.drop_connect_rate = drop_connect_rate

    def forward(self, x):
        if self.mobile_inverted_conv is None or isinstance(self.mobile_inverted_conv, ZeroLayer):
            res = x
        elif self.shortcut is None or isinstance(self.shortcut, ZeroLayer):
            res = self.mobile_inverted_conv(x)
        else:
            # res = self.mobile_inverted_conv(x) + self.shortcut(x)
            res = self.mobile_inverted_conv(x)

            if self.drop_connect_rate > 0.:
                res = drop_connect(res, self.training, self.drop_connect_rate)

            res += self.shortcut(x)  # <- here is the difference. Standard residual block is res += x

        return res

* Acc@1 4.0 (95.982) Acc@5 10.9 (89.052)

sh run.sh M1
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 6067408
Test: [ 0/782] Time: 0.926s (0.926s, 69.13/s) Loss: 6.8924 (6.8924) Acc@1: 4.688 ( 4.688) Acc@5: 12.500 ( 12.500)
Test: [ 50/782] Time: 0.259s (0.106s, 601.80/s) Loss: 6.8926 (6.8912) Acc@1: 1.562 ( 5.453) Acc@5: 10.938 ( 14.614)
Test: [ 100/782] Time: 0.040s (0.096s, 669.39/s) Loss: 6.8925 (6.8907) Acc@1: 7.812 ( 5.662) Acc@5: 15.625 ( 13.800)
Test: [ 150/782] Time: 0.040s (0.092s, 698.14/s) Loss: 6.8940 (6.8904) Acc@1: 12.500 ( 5.702) Acc@5: 14.062 ( 13.742)
Test: [ 200/782] Time: 0.152s (0.089s, 722.01/s) Loss: 6.8926 (6.8904) Acc@1: 4.688 ( 5.729) Acc@5: 14.062 ( 13.658)
Test: [ 250/782] Time: 0.041s (0.089s, 717.89/s) Loss: 6.8923 (6.8905) Acc@1: 6.250 ( 5.671) Acc@5: 14.062 ( 13.515)
Test: [ 300/782] Time: 0.043s (0.089s, 717.04/s) Loss: 6.8924 (6.8906) Acc@1: 4.688 ( 5.565) Acc@5: 9.375 ( 13.424)
Test: [ 350/782] Time: 0.038s (0.089s, 716.55/s) Loss: 6.8927 (6.8905) Acc@1: 1.562 ( 5.235) Acc@5: 6.250 ( 12.985)
Test: [ 400/782] Time: 0.052s (0.088s, 723.56/s) Loss: 6.8926 (6.8903) Acc@1: 1.562 ( 4.949) Acc@5: 3.125 ( 12.496)
Test: [ 450/782] Time: 0.035s (0.089s, 719.58/s) Loss: 6.8925 (6.8902) Acc@1: 0.000 ( 4.691) Acc@5: 4.688 ( 12.088)
Test: [ 500/782] Time: 0.337s (0.089s, 721.14/s) Loss: 6.8925 (6.8901) Acc@1: 1.562 ( 4.491) Acc@5: 6.250 ( 11.711)
Test: [ 550/782] Time: 0.277s (0.088s, 723.97/s) Loss: 6.8925 (6.8900) Acc@1: 6.250 ( 4.356) Acc@5: 10.938 ( 11.496)
Test: [ 600/782] Time: 0.041s (0.088s, 730.85/s) Loss: 6.8927 (6.8899) Acc@1: 1.562 ( 4.282) Acc@5: 6.250 ( 11.359)
Test: [ 650/782] Time: 0.040s (0.088s, 730.81/s) Loss: 6.8924 (6.8899) Acc@1: 3.125 ( 4.179) Acc@5: 9.375 ( 11.245)
Test: [ 700/782] Time: 0.041s (0.087s, 732.60/s) Loss: 6.8927 (6.8898) Acc@1: 3.125 ( 4.054) Acc@5: 4.688 ( 11.016)
Test: [ 750/782] Time: 0.041s (0.087s, 731.90/s) Loss: 6.8929 (6.8898) Acc@1: 4.688 ( 4.026) Acc@5: 6.250 ( 10.952)

Acc@1 4.0 (95.982) Acc@5 10.9 (89.052)

Sorry to interrupt, I want to ask a question. I tested M1, M2, and M3 according to the model and script provided by you, but the top1 is very low. I hope you can help me if you have time. Thank you

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

root@workspace-job-5fadedfbbf46a9bf616072f7-qnp4z:# sh run.sh M2
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 7688312
Traceback (most recent call last):
File "evaluator.py", line 109, in
main(cfgs)
File "evaluator.py", line 83, in main
validate(model, test_loader, criterion)
File "evaluator.py", line 29, in validate
loss = criterion(output, target)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1317, in log_softmax
ret = input.log_softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

hello, when I evaluator your model in imagenet, but I got this error, can you have any advice? thanks.

mean and std for normalizing the image

How do you obtain the norm and std used for normalizing the images?
are they calculated over entire dataset or only on the validate set？
many thanks

Questions about re-training codes.

Nice work and amazing performance.

Would you mind sharing the hyperparameters and scripts for re-training NAT-M1/NAT-M2?

How is the number of channels on each layer determined?

Great work!
But I can't understand How is the number of channels in each layer calculated from the encoding?
I did not find this process in the paper.
Can you tell me the details.

Are the supernet weights trained during the search?

Hello,

I was wondering whether the weights of the supernetwork are continuously trained during the search?

I noticed that in the code of your previous paper (NSGANetV2) that you reference in another issue, the supernet is not actually trained during the search; instead, the supernet weights are used for initializing subnet weights, which are trained for 5 epochs, used for evaluation, and are then discarded; the next subnet is initialized again with the original supernet weights.

Which is why I'd like to know whether NAT does that too?

An additional question: if NAT doesn't discard the trained weights, how do you deal with the fact that the performances in the archive were reported based on older weights? Doesn't this negatively impact the predictor's accuracy?

Thanks in advance!

nn.Sequential instead of nn.ModuleList

In codebase/networks/natnet.py, instead of creating a nn.ModuleList out of the blocks, wouldn't it be better if we created a nn.Sequential out of it, i.e.

class NATNet(MyNetwork):
	""" variants of MobileNet-v3 """
	def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
		super(NATNet, self).__init__()

		self.first_conv = first_conv
		self.blocks = nn.ModuleList(blocks)
		self.final_expand_layer = final_expand_layer
		self.feature_mix_layer = feature_mix_layer
		self.classifier = classifier

	def forward(self, x):
		x = self.first_conv(x)
		for block in self.blocks:
			x = block(x)
		x = self.final_expand_layer(x)
		x = x.mean(3, keepdim=True).mean(2, keepdim=True)  # global average pooling
		x = self.feature_mix_layer(x)
		x = torch.squeeze(x)
		x = self.classifier(x)
		return x

changes to,

class NATNet(MyNetwork):
	""" variants of MobileNet-v3 """
	def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
		super(NATNet, self).__init__()

		self.first_conv = first_conv
		self.blocks = nn.Sequential(*blocks)
		self.final_expand_layer = final_expand_layer
		self.feature_mix_layer = feature_mix_layer
		self.classifier = classifier

	def forward(self, x):
		x = self.first_conv(x)
		x = self.blocks(x)
		x = self.final_expand_layer(x)
		x = x.mean(3, keepdim=True).mean(2, keepdim=True)  # global average pooling
		x = self.feature_mix_layer(x)
		x = torch.squeeze(x)
		x = self.classifier(x)
		return x

This is useful because functions like IntermediateLayerGetter from torchvision.models._utils cannot handle a list but can use an nn.Sequential.

human-analysis / neural-architecture-transfer Goto Github PK

neural-architecture-transfer's People

Contributors

Stargazers

Watchers

Forkers

neural-architecture-transfer's Issues

Ask for the label file of flowers102

Would you release searching code soon?

Mismatch between paper and code

* Acc@1 4.0 (95.982) Acc@5 10.9 (89.052)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

mean and std for normalizing the image

Questions about re-training codes.

How is the number of channels on each layer determined?

Are the supernet weights trained during the search?

nn.Sequential instead of nn.ModuleList

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent