human-analysis / neural-architecture-transfer Goto Github PK
View Code? Open in Web Editor NEWNeural Architecture Transfer (Arxiv'20), PyTorch Implementation
Home Page: http://hal.cse.msu.edu/papers/neural-architecture-transfer/
Neural Architecture Transfer (Arxiv'20), PyTorch Implementation
Home Page: http://hal.cse.msu.edu/papers/neural-architecture-transfer/
Hello, can you provide the label file of flowers102? The label file can not be downloaded from the web. Thank you~
Hi,
I'm wondering whether you would release searching code(i.e. evolutionary search) soon or not.
Hi!
Your paper is fantastic. I notice a small mismatch between the paper and the code. In section "3.2 Search Space and Encoding", paper says "Each stage in turn comprises multiple layers, and each layer it self is an inverted residual bottleneck structure [54]." and the same can be seen in figure 2a. However, your inverted residual is different from inverted residual block proposed in MobileNetV2.
Specifically, the line 43 (res += self.shortcut(x)
). MobileNetV2 inverted residual block doesn't apply the shortcut.
It's a bug on the released code or it's a deliberated change? If it's the second, I think that should be stated in the section 3.2 and figure 2a of the paper,
MobileInvertedResidualBlock
code:
class MobileInvertedResidualBlock(MyModule):
def __init__(self, mobile_inverted_conv, shortcut, drop_connect_rate=0.0):
super(MobileInvertedResidualBlock, self).__init__()
self.mobile_inverted_conv = mobile_inverted_conv
self.shortcut = shortcut
self.drop_connect_rate = drop_connect_rate
def forward(self, x):
if self.mobile_inverted_conv is None or isinstance(self.mobile_inverted_conv, ZeroLayer):
res = x
elif self.shortcut is None or isinstance(self.shortcut, ZeroLayer):
res = self.mobile_inverted_conv(x)
else:
# res = self.mobile_inverted_conv(x) + self.shortcut(x)
res = self.mobile_inverted_conv(x)
if self.drop_connect_rate > 0.:
res = drop_connect(res, self.training, self.drop_connect_rate)
res += self.shortcut(x) # <- here is the difference. Standard residual block is res += x
return res
sh run.sh M1
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 6067408
Test: [ 0/782] Time: 0.926s (0.926s, 69.13/s) Loss: 6.8924 (6.8924) Acc@1: 4.688 ( 4.688) Acc@5: 12.500 ( 12.500)
Test: [ 50/782] Time: 0.259s (0.106s, 601.80/s) Loss: 6.8926 (6.8912) Acc@1: 1.562 ( 5.453) Acc@5: 10.938 ( 14.614)
Test: [ 100/782] Time: 0.040s (0.096s, 669.39/s) Loss: 6.8925 (6.8907) Acc@1: 7.812 ( 5.662) Acc@5: 15.625 ( 13.800)
Test: [ 150/782] Time: 0.040s (0.092s, 698.14/s) Loss: 6.8940 (6.8904) Acc@1: 12.500 ( 5.702) Acc@5: 14.062 ( 13.742)
Test: [ 200/782] Time: 0.152s (0.089s, 722.01/s) Loss: 6.8926 (6.8904) Acc@1: 4.688 ( 5.729) Acc@5: 14.062 ( 13.658)
Test: [ 250/782] Time: 0.041s (0.089s, 717.89/s) Loss: 6.8923 (6.8905) Acc@1: 6.250 ( 5.671) Acc@5: 14.062 ( 13.515)
Test: [ 300/782] Time: 0.043s (0.089s, 717.04/s) Loss: 6.8924 (6.8906) Acc@1: 4.688 ( 5.565) Acc@5: 9.375 ( 13.424)
Test: [ 350/782] Time: 0.038s (0.089s, 716.55/s) Loss: 6.8927 (6.8905) Acc@1: 1.562 ( 5.235) Acc@5: 6.250 ( 12.985)
Test: [ 400/782] Time: 0.052s (0.088s, 723.56/s) Loss: 6.8926 (6.8903) Acc@1: 1.562 ( 4.949) Acc@5: 3.125 ( 12.496)
Test: [ 450/782] Time: 0.035s (0.089s, 719.58/s) Loss: 6.8925 (6.8902) Acc@1: 0.000 ( 4.691) Acc@5: 4.688 ( 12.088)
Test: [ 500/782] Time: 0.337s (0.089s, 721.14/s) Loss: 6.8925 (6.8901) Acc@1: 1.562 ( 4.491) Acc@5: 6.250 ( 11.711)
Test: [ 550/782] Time: 0.277s (0.088s, 723.97/s) Loss: 6.8925 (6.8900) Acc@1: 6.250 ( 4.356) Acc@5: 10.938 ( 11.496)
Test: [ 600/782] Time: 0.041s (0.088s, 730.85/s) Loss: 6.8927 (6.8899) Acc@1: 1.562 ( 4.282) Acc@5: 6.250 ( 11.359)
Test: [ 650/782] Time: 0.040s (0.088s, 730.81/s) Loss: 6.8924 (6.8899) Acc@1: 3.125 ( 4.179) Acc@5: 9.375 ( 11.245)
Test: [ 700/782] Time: 0.041s (0.087s, 732.60/s) Loss: 6.8927 (6.8898) Acc@1: 3.125 ( 4.054) Acc@5: 4.688 ( 11.016)
Test: [ 750/782] Time: 0.041s (0.087s, 731.90/s) Loss: 6.8929 (6.8898) Acc@1: 4.688 ( 4.026) Acc@5: 6.250 ( 10.952)
Sorry to interrupt, I want to ask a question. I tested M1, M2, and M3 according to the model and script provided by you, but the top1 is very low. I hope you can help me if you have time. Thank you
root@workspace-job-5fadedfbbf46a9bf616072f7-qnp4z:# sh run.sh M2
train val
No horovod in environment
Running validation on imagenet
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.875
Model created, param count: 7688312
Traceback (most recent call last):
File "evaluator.py", line 109, in
main(cfgs)
File "evaluator.py", line 83, in main
validate(model, test_loader, criterion)
File "evaluator.py", line 29, in validate
loss = criterion(output, target)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1317, in log_softmax
ret = input.log_softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
hello, when I evaluator your model in imagenet, but I got this error, can you have any advice? thanks.
How do you obtain the norm and std used for normalizing the images?
are they calculated over entire dataset or only on the validate set?
many thanks
Nice work and amazing performance.
Would you mind sharing the hyperparameters and scripts for re-training NAT-M1/NAT-M2?
Great work!
But I can't understand How is the number of channels in each layer calculated from the encoding?
I did not find this process in the paper.
Can you tell me the details.
Hello,
I was wondering whether the weights of the supernetwork are continuously trained during the search?
I noticed that in the code of your previous paper (NSGANetV2) that you reference in another issue, the supernet is not actually trained during the search; instead, the supernet weights are used for initializing subnet weights, which are trained for 5 epochs, used for evaluation, and are then discarded; the next subnet is initialized again with the original supernet weights.
Which is why I'd like to know whether NAT does that too?
An additional question: if NAT doesn't discard the trained weights, how do you deal with the fact that the performances in the archive were reported based on older weights? Doesn't this negatively impact the predictor's accuracy?
Thanks in advance!
In codebase/networks/natnet.py
, instead of creating a nn.ModuleList out of the blocks, wouldn't it be better if we created a nn.Sequential out of it, i.e.
class NATNet(MyNetwork):
""" variants of MobileNet-v3 """
def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
super(NATNet, self).__init__()
self.first_conv = first_conv
self.blocks = nn.ModuleList(blocks)
self.final_expand_layer = final_expand_layer
self.feature_mix_layer = feature_mix_layer
self.classifier = classifier
def forward(self, x):
x = self.first_conv(x)
for block in self.blocks:
x = block(x)
x = self.final_expand_layer(x)
x = x.mean(3, keepdim=True).mean(2, keepdim=True) # global average pooling
x = self.feature_mix_layer(x)
x = torch.squeeze(x)
x = self.classifier(x)
return x
changes to,
class NATNet(MyNetwork):
""" variants of MobileNet-v3 """
def __init__(self, first_conv, blocks, final_expand_layer, feature_mix_layer, classifier, **kwargs):
super(NATNet, self).__init__()
self.first_conv = first_conv
self.blocks = nn.Sequential(*blocks)
self.final_expand_layer = final_expand_layer
self.feature_mix_layer = feature_mix_layer
self.classifier = classifier
def forward(self, x):
x = self.first_conv(x)
x = self.blocks(x)
x = self.final_expand_layer(x)
x = x.mean(3, keepdim=True).mean(2, keepdim=True) # global average pooling
x = self.feature_mix_layer(x)
x = torch.squeeze(x)
x = self.classifier(x)
return x
This is useful because functions like IntermediateLayerGetter
from torchvision.models._utils
cannot handle a list but can use an nn.Sequential.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.