Jarvis GPT uses Gemma as a base model to try and improve reasoning in math and coding through algorithmic design.
None
Architectures
- Gemma
- Griffin
- Mamba
Benchmarks
- HellaSwag
- PIQA
- SIQA
- Boolq
- Windogrande
- CQA
- OBQA
- ARC-e
- ARC-c
- TriviaQA
- NQ
- HumanEval
- MBPP
- GSM8K
- MATH
- AGIEval
- BBH
Potential Improvements
- Rotary Positional Embedding
- GLU Feed Forward Network
- Soft Mixture-of-Experts
- Top K Mixture-of-Experts
- Mixture-of-Depths for Inference
- Chain-of-Thought Fine Tuning
- Tree-of-Thought Fine Tuning
- Graph-of-Thought Fine Tuning
- Retrieval Augmented Generation
- Speculative Decoding
- LoRA / QLoRA
- Direct Preference Optimization
- Self Improving LLM Math with MCTS
- Megatron Scaling Laws
- Model Sharding / Data Sharding
- Recurrent Transformer Variants
- Gecko Text Embedding Fine Tuning
- Multimodal (Text, Image, Video, Audio)
and many more...