Coder Social home page Coder Social logo

Comments (13)

delzac avatar delzac commented on August 29, 2024

Hmmm, i just looked through the scale and bias in LayerNormalization, i don't see anything wrong with it, what's your reservation?

*I think its better to have the conversation here, there might be people with the same issue :)

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

scale = Parameter(_INFERRED, init=initial_scale, name='scale')
at https://cntk.ai/pythondocs/_modules/cntk/layers/layers.html#LayerNormalization.

It's predetermined that 'scale' is rank-1, so if input is rank-2 or more, scale will only have the length of the last dimensionality of input.

For example, if
x = C.input((3,100,100)
then
scale.shape == (100, )
while it should be (3, 100, 100).

from cntkx.

delzac avatar delzac commented on August 29, 2024

That's because LayerNormalization was created/invented for regularization of RNNs and not CNNs.

My experience with using LayerNormalization instead of BatchNormalization is that it hurts performance of CNNs.

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

I haven't tried LayerNorm CNN yet, but using BatchNorm to train CNN to learn features leads to noisy features - I think because it requires popular batch statistics at runtime. Without normalization the training optimization is too "stochastic" for me.

Btw, is there a way to infer tensor rank when defining a custom layer? Currently I define one layer for each tensor rank.

from cntkx.

delzac avatar delzac commented on August 29, 2024

I have had good experiences with batchnorm on cnns even with small batch sizes of around 16, sometimes even on batch size 4.

If you want to extend LayerNormalization to all three static axes, i think you can do something like this:

scale = Parameter((_INFERRED, _INFERRED, _INFERRED), init=initial_scale, name='scale')

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

That's exactly what I did, so I have LayerNormalization1D, 2D, 3D etc...
Would be nice if it's possible to define one layer functional to handle all cases.

from cntkx.

delzac avatar delzac commented on August 29, 2024

I gave some thought about it, i don't think it is doable cause there's no way of knowing the shape of the input tensor at the point of defining the parameter.

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

It's not possible using only Python API, unless (IMO) we can define C API similar to BatchNorm, and do resize() & reshape() in C.

from cntkx.

delzac avatar delzac commented on August 29, 2024

Well, if you are okay with not using the @C.BlockFunction decorator in the inner function of your layer function, then yes you can do it using python to check for their shape and dynamically route the correct function to it.

Alternatively, why not create your layernorm like the Dense API where it explicitly ask for either input_rank or map_rank to decide what rank its weight parameter is.

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

On that note, can you please tell me what's the purpose of C.BlockFunction?

from cntkx.

delzac avatar delzac commented on August 29, 2024

There are two uses for C.BlockFunction, it wraps all the 'inner' ops and make it appear like a single op when you view it using Netron or when you save it as a graph, i.e. it looks like a single block.

Second, a C.BlockFunction is also a C.Function. A C.Function is used to create a cntk function. It particularly important to first wrap certain blocks of code as a C.Function when you want to use encoder-decoder attention mechanism. You can see this for an example. If you remove the decorator the whole thing will throw an error.

from cntkx.

haixpham avatar haixpham commented on August 29, 2024

Thanks mate, very insightful! Look forward to future discussion.

from cntkx.

delzac avatar delzac commented on August 29, 2024

Happy to help. If you have any issues, feel free to reopen or open a new issue!

from cntkx.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.