amarshah / complex_rnn Goto Github PK
View Code? Open in Web Editor NEWunitary matrix for hidden to hidden layer
License: BSD 3-Clause "New" or "Revised" License
unitary matrix for hidden to hidden layer
License: BSD 3-Clause "New" or "Revised" License
Hi there, thanks for making your code public! I have been translating it to TensorFlow as an exercise and have therefore been reading it quite carefully.
I'm a little confused by some of the mathematics in the reflection function times_reflection
(https://github.com/amarshah/complex_RNN/blob/master/models.py#L49). Basically, I'm either making a mistake in my own calculations, I didn't understand something, or the code is not quite right. So, apologies in advance for how long this is, I'm trying to be comprehensive.
As I understand it, this function is applying the typical reflection operator in a complex vector v, e.g. on an input vector h:
At the end of the above-mentioned function, there is:
output = T.inc_subtensor(output[:, :n_hidden], - 2. / vstarv * (a + b))
output = T.inc_subtensor(output[:, n_hidden:], - 2. / vstarv * (d - c))
(where vstarv
is the squared norm of v)
So I conclude that
(let's call this equation 1)
Doing a bit of expanding: (sorry for ugly LaTeX)
Simplifying the appearance of this expression by rewriting the inner product terms (removing the dot and the vector symbol) to remind us these are just real scalars at this point, and gathering real and imaginary components:
So now we have:
Looking back at the code, we have
a = T.outer(input_re_reflect_re - input_im_reflect_im, reflect_re)
b = T.outer(input_re_reflect_im + input_im_reflect_re, reflect_im)
c = T.outer(input_re_reflect_re - input_im_reflect_im, reflect_im)
d = T.outer(input_re_reflect_im + input_im_reflect_re, reflect_re)
LaTeXing these up for easier comparison (the outer
, I think, should just be accounting for the fact that the first dimension of these input_...
variables is the batch size, so this is all batched, which shouldn't matter for these calculations), and remembering that input
is h and reflect
is v in my notation:
Putting it together to get expressions for a+b and d-c...
According to equation 1 the right hand side here should equal the right hand side of equation 2, but... it doesn't seem to. What is going on? Some of the signs are wrong. Did I make a mistake?
I thought that perhaps the operator in question is not the typical reflection operator, but is in fact
(the dagger there denoting the conjugate transpose)
... but this also seems not to reproduce the expressions in the code [proof left to the reader]. I might also have made an error checking that, but my question would then be: if that is the intended operator, why conjugate transpose? Where does this operator come from?
What's the parameters you use for memory_problem? Using the default, and changing type for x and y to float32, I'm getting
TypeError: Cannot convert Type TensorType(float32, matrix) (of Variable Subtensor{::, int32:int32:}.0) into Type TensorType(float32, 3D). You can try to manually convert Subtensor{::, int32:int32:}.0 into a TensorType(float32, 3D).
at
train = theano.function([index], costs[0], givens=givens, updates=updates)
I like the simple, flat Theano code you've got here, but I am scared to write my own code with any reference to it because you haven't included a license (and that could mean you retain full copyright). Could you please put a license in the repository so I know to what extent I or anyone else can use this code?
At the line h_0_batch = T.tile(h_0, [x.shape[1], 1])
x.shape[1]
is Subtensor{int64}.0
x
is <TensorType(int32, matrix)>
.
Executing T.tile gives the following error
ValueError: reps argument to tile must be a constant (e.g. tuple, list of integers)
I'm using all the default parameters when running memory_problem.py. What am I doing wrong?
Hi Amar and Martin
I've read your uRNN paper and the following reddit discussion with great interest. At reddit you mention that you have a code that is 4x faster than the one on github.
https://www.reddit.com/r/MachineLearning/comments/3uk2q5/151106464_unitary_evolution_recurrent_neural/cxfwsqw
is the current github version the faster code or is that not released yet?
BR Casper
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.