Coder Social home page Coder Social logo

llamafia.github's Introduction

llamafia.github

LLaMafia 是一个中文前沿 AI / LLM 开源讨论空间。LLaMa 指 LLaMA 模型, Mafia 指极客群体,合起来叫 LLaMafia

LLaMafia 关注最扎实的工程和最前沿的科学,所有讨论基于第一性科学原理和第一手工程经验,鼓励 critical thinking, promote insightful work

在当下的时代,人们研究 AI 原因有很多,可以是追求产品价值,投资机会,学术资源,社会影响力

LLaMafia 研究 AI,是因为纯粹的热爱

Tech Log

20231203

  • 数字切开验证&&数字计算
  • Instruction following 能力
  • 论文分析:1.《我在Performer中发现了Transformer-VQ的踪影》2.Multimodal understanding benchmark!
  • 讨论:1.LLama2 预测结果不一致 2.LLama 的 tokenizer 和 titoken 本质区别 3.特定的domain用self-instruct 的效果 4.大模型SFT阶段训练不稳定的探索

20231125

  • 对 AI Safety / AI open source 以及 large scale AI deployment 的看法
  • 轻量方法动态压缩序列
  • 论文分享:1. Transformer升级之路:15、Key归一化助力长度外推 2. Component-Wise Gradient Norm Clipping 3. Superalignment 4. Detecting Pretraining Data from Large Language Models
  • 讨论: 1. 召回向量 & RAG 2. Claude 2.1上下文信息提取能力 3.Medusa 框架 & lookahead decoding 4. LLM局域信息

20231119

  • Learning Rate 和 Batch Size 的关系
  • 多机多卡并行方案
  • Grok-1 中匈牙利考试数据集
  • 推荐论文的 Agent
  • RNN 类模型
  • 涌现能力的原理,小模型可以吗?

20231112

  • 为什么大模型普遍选用更宽而不是更高的模型架构
  • 如何实现知识更新
  • 复读机问题的原因与解决
  • 关于LLM外推
  • 位数越多GPT的加法正确率越低

20231022 Compression Theory. 讨论录像

  • Arithmetic Coding 算法
  • 语言模型是无损压缩器
  • 压缩得越好,模型越有可能恢复数据的生成过程
  • 为什么智能是一种副产物:过度优化的问题

Contact

[email protected]

llamafia.github's People

Contributors

byxshr avatar franxyao avatar hu-xiaobai avatar huybery avatar zubingou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.