Coder Social home page Coder Social logo

neural-architecture-transfer's People

Contributors

mikelzc1990 avatar vboddeti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural-architecture-transfer's Issues

Mismatch between paper and code

Hi!
Your paper is fantastic. I notice a small mismatch between the paper and the code. In section "3.2 Search Space and Encoding", paper says "Each stage in turn comprises multiple layers, and each layer it self is an inverted residual bottleneck structure [54]." and the same can be seen in figure 2a. However, your inverted residual is different from inverted residual block proposed in MobileNetV2.

Specifically, the line 43 (res += self.shortcut(x)). MobileNetV2 inverted residual block doesn't apply the shortcut.

It's a bug on the released code or it's a deliberated change? If it's the second, I think that should be stated in the section 3.2 and figure 2a of the paper,

MobileInvertedResidualBlock code:

class MobileInvertedResidualBlock(MyModule):

    def __init__(self, mobile_inverted_conv, shortcut, drop_connect_rate=0.0):
        super(MobileInvertedResidualBlock, self).__init__()

        self.mobile_inverted_conv = mobile_inverted_conv
        self.shortcut = shortcut
        self.drop_connect_rate = drop_connect_rate

    def forward(self, x):
        if self.mobile_inverted_conv is None or isinstance(self.mobile_inverted_conv, ZeroLayer):
            res = x
        elif self.shortcut is None or isinstance(self.shortcut, ZeroLayer):
            res = self.mobile_inverted_conv(x)
        else:
            # res = self.mobile_inverted_conv(x) + self.shortcut(x)
            res = self.mobile_inverted_conv(x)

            if self.drop_connect_rate > 0.:
                res = drop_connect(res, self.training, self.drop_connect_rate)

            res += self.shortcut(x)  # <- here is the difference. Standard residual block is res += x

        return res

* Acc@1 4.0 (95.982) Acc@5 10.9 (89.052)

sh run.sh M1
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 6067408
Test: [ 0/782] Time: 0.926s (0.926s, 69.13/s) Loss: 6.8924 (6.8924) Acc@1: 4.688 ( 4.688) Acc@5: 12.500 ( 12.500)
Test: [ 50/782] Time: 0.259s (0.106s, 601.80/s) Loss: 6.8926 (6.8912) Acc@1: 1.562 ( 5.453) Acc@5: 10.938 ( 14.614)
Test: [ 100/782] Time: 0.040s (0.096s, 669.39/s) Loss: 6.8925 (6.8907) Acc@1: 7.812 ( 5.662) Acc@5: 15.625 ( 13.800)
Test: [ 150/782] Time: 0.040s (0.092s, 698.14/s) Loss: 6.8940 (6.8904) Acc@1: 12.500 ( 5.702) Acc@5: 14.062 ( 13.742)
Test: [ 200/782] Time: 0.152s (0.089s, 722.01/s) Loss: 6.8926 (6.8904) Acc@1: 4.688 ( 5.729) Acc@5: 14.062 ( 13.658)
Test: [ 250/782] Time: 0.041s (0.089s, 717.89/s) Loss: 6.8923 (6.8905) Acc@1: 6.250 ( 5.671) Acc@5: 14.062 ( 13.515)
Test: [ 300/782] Time: 0.043s (0.089s, 717.04/s) Loss: 6.8924 (6.8906) Acc@1: 4.688 ( 5.565) Acc@5: 9.375 ( 13.424)
Test: [ 350/782] Time: 0.038s (0.089s, 716.55/s) Loss: 6.8927 (6.8905) Acc@1: 1.562 ( 5.235) Acc@5: 6.250 ( 12.985)
Test: [ 400/782] Time: 0.052s (0.088s, 723.56/s) Loss: 6.8926 (6.8903) Acc@1: 1.562 ( 4.949) Acc@5: 3.125 ( 12.496)
Test: [ 450/782] Time: 0.035s (0.089s, 719.58/s) Loss: 6.8925 (6.8902) Acc@1: 0.000 ( 4.691) Acc@5: 4.688 ( 12.088)
Test: [ 500/782] Time: 0.337s (0.089s, 721.14/s) Loss: 6.8925 (6.8901) Acc@1: 1.562 ( 4.491) Acc@5: 6.250 ( 11.711)
Test: [ 550/782] Time: 0.277s (0.088s, 723.97/s) Loss: 6.8925 (6.8900) Acc@1: 6.250 ( 4.356) Acc@5: 10.938 ( 11.496)
Test: [ 600/782] Time: 0.041s (0.088s, 730.85/s) Loss: 6.8927 (6.8899) Acc@1: 1.562 ( 4.282) Acc@5: 6.250 ( 11.359)
Test: [ 650/782] Time: 0.040s (0.088s, 730.81/s) Loss: 6.8924 (6.8899) Acc@1: 3.125 ( 4.179) Acc@5: 9.375 ( 11.245)
Test: [ 700/782] Time: 0.041s (0.087s, 732.60/s) Loss: 6.8927 (6.8898) Acc@1: 3.125 ( 4.054) Acc@5: 4.688 ( 11.016)
Test: [ 750/782] Time: 0.041s (0.087s, 731.90/s) Loss: 6.8929 (6.8898) Acc@1: 4.688 ( 4.026) Acc@5: 6.250 ( 10.952)

  • Acc@1 4.0 (95.982) Acc@5 10.9 (89.052)

Sorry to interrupt, I want to ask a question. I tested M1, M2, and M3 according to the model and script provided by you, but the top1 is very low. I hope you can help me if you have time. Thank you

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

root@workspace-job-5fadedfbbf46a9bf616072f7-qnp4z:# sh run.sh M2
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 7688312
Traceback (most recent call last):
File "evaluator.py", line 109, in
main(cfgs)
File "evaluator.py", line 83, in main
validate(model, test_loader, criterion)
File "evaluator.py", line 29, in validate
loss = criterion(output, target)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1317, in log_softmax
ret = input.log_softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

hello, when I evaluator your model in imagenet, but I got this error, can you have any advice? thanks.

mean and std for normalizing the image

How do you obtain the norm and std used for normalizing the images?
are they calculated over entire dataset or only on the validate set?
many thanks

Are the supernet weights trained during the search?

Hello,

I was wondering whether the weights of the supernetwork are continuously trained during the search?

I noticed that in the code of your previous paper (NSGANetV2) that you reference in another issue, the supernet is not actually trained during the search; instead, the supernet weights are used for initializing subnet weights, which are trained for 5 epochs, used for evaluation, and are then discarded; the next subnet is initialized again with the original supernet weights.

Which is why I'd like to know whether NAT does that too?

An additional question: if NAT doesn't discard the trained weights, how do you deal with the fact that the performances in the archive were reported based on older weights? Doesn't this negatively impact the predictor's accuracy?

Thanks in advance!

nn.Sequential instead of nn.ModuleList

In codebase/networks/natnet.py, instead of creating a nn.ModuleList out of the blocks, wouldn't it be better if we created a nn.Sequential out of it, i.e.

class NATNet(MyNetwork):
	""" variants of MobileNet-v3 """
	def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
		super(NATNet, self).__init__()

		self.first_conv = first_conv
		self.blocks = nn.ModuleList(blocks)
		self.final_expand_layer = final_expand_layer
		self.feature_mix_layer = feature_mix_layer
		self.classifier = classifier

	def forward(self, x):
		x = self.first_conv(x)
		for block in self.blocks:
			x = block(x)
		x = self.final_expand_layer(x)
		x = x.mean(3, keepdim=True).mean(2, keepdim=True)  # global average pooling
		x = self.feature_mix_layer(x)
		x = torch.squeeze(x)
		x = self.classifier(x)
		return x

changes to,

class NATNet(MyNetwork):
	""" variants of MobileNet-v3 """
	def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
		super(NATNet, self).__init__()

		self.first_conv = first_conv
		self.blocks = nn.Sequential(*blocks)
		self.final_expand_layer = final_expand_layer
		self.feature_mix_layer = feature_mix_layer
		self.classifier = classifier

	def forward(self, x):
		x = self.first_conv(x)
		x = self.blocks(x)
		x = self.final_expand_layer(x)
		x = x.mean(3, keepdim=True).mean(2, keepdim=True)  # global average pooling
		x = self.feature_mix_layer(x)
		x = torch.squeeze(x)
		x = self.classifier(x)
		return x

This is useful because functions like IntermediateLayerGetter from torchvision.models._utils cannot handle a list but can use an nn.Sequential.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.