Comments (7)
many of us are realizing that minimizing parameters is not the way to go - encourage you to read up on scaling laws
from x-transformers.
@Vbansal21 PKMs is kind of a niche subject that i've only seen 2-3 papers for - if you want to greatly expand parameters and keep compute constant, i suggest looking at mixture of expert https://github.com/lucidrains/mixture-of-experts
from x-transformers.
Thanks for the pro-tip.π
And thanks for answering my queries, will be closing the issue.
And for the parameters part, I am actually focusing on embedded inference on low compute device for general purpose (Perceiver IO for small devices) like SD QC 800s Intel Pentium/i3 etc.
from x-transformers.
@Vbansal21 oh ok, that makes sense then!
from x-transformers.
@Vbansal21 Hi Vaibhav! A group of us have dissected that paper, but it really didn't add any meaningful improvements on top of multihead attention (recurrent refinement between queries and keys, as well as the learned scaling)
I think it is only useful as a theoretical perspective. It has not added anything beneficial to actual practice and construction of transformers.
from x-transformers.
Thanks for the replyπ.
Now I understand why my training didn't show any special improvements π.
One more question though,
what about PKMs?
from x-transformers.
Oh, so PKM is essentially a way to increase parameters without increasing the compute alot.
Well then that won't be useful to me, cause I am trying to minimise parameters and compute.π
Thanks for replying.
from x-transformers.
Related Issues (20)
- Feature request: different normalization layers at different depths
- Question: attn_head_scale with use_scalenorm HOT 1
- Transformer Goat HOT 1
- pre_norm_has_final_norm kwarg not used HOT 1
- Using Rotary Positional Encoding with Continuous Wrapper HOT 1
- Question: return_mems and rotary_pos_emb HOT 5
- Do you consider adding rwkv HOT 1
- Question: masking in token shifting HOT 1
- [Bug] ContinuousTransformerWrapper - return_mems doens't work HOT 1
- rotary embedding issues when training in mixed precision HOT 2
- Masking for prepend_embeds HOT 7
- ONNX export failed HOT 14
- Bert token type embedding HOT 2
- Simplifying Transformer Blocks (https://arxiv.org/abs/2311.01906) HOT 9
- Support for NormSoftmax HOT 16
- Question: num_memory_tokens > 0 and return_mems = True HOT 3
- "Stabilizing Transformer Training by Preventing Attention Entropy Collapse" improvement to ViT HOT 1
- how to set inputs to the right shape HOT 1
- kv cache breaks generation HOT 5
- Question: How to load model trained on earlier version of x-transformers HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from x-transformers.