zigzagcai,Season,github

alpa

Training and serving large-scale neural networks with auto parallelization.

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

colossalai

Making large AI models cheaper, faster and more accessible

cream

This is a collection of our NAS and Vision Transformer work.

cutlass

CUDA Templates for Linear Algebra Subroutines

deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

easylm

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

flash-attention

Fast and memory-efficient exact attention

flax

Flax is a neural network library for JAX that is designed for flexibility.

gloo

Collective communications library with various primitives for multi-machine training.

horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

internlm

InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system.

internvl

[CVPR 2024] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

llava

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivation. It is probably the code which is the most close to selective_scan_cuda in mamba.

zigzagcai Goto Github PK

Season's Projects

Recommend Projects

Recommend Topics

Recommend Org