Topic: distributed-training Goto Github
Some thing interesting about distributed-training
Some thing interesting about distributed-training
distributed-training,Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Organization: alibaba
distributed-training,TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
Organization: alibaba
distributed-training,Training and serving large-scale neural networks with auto parallelization.
Organization: alpa-projects
Home Page: https://alpa.ai
distributed-training,This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Organization: aws
distributed-training,Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow
Organization: awslabs
distributed-training,Dynamic training with Apache MXNet reduces cost and time for training deep neural networks by leveraging AWS cloud elasticity and scale. The system reduces training cost and time by dynamically updating the training cluster size during training, with minimal impact on model training accuracy.
Organization: awslabs
distributed-training,A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch
User: bindog
distributed-training,A Comprehensive Tutorial on Video Modeling
User: bryanyzhu
Home Page: https://cvpr20-video.mxnet.io
distributed-training,A high performance and generic framework for distributed DNN training
Organization: bytedance
distributed-training,IDDM (Industrial, landscape, animate...), support DDPM, DDIM, PLMS, webui and multi-GPU distributed training. Pytorch实现,生成模型,扩散模型,分布式训练
User: chairc
distributed-training,A full pipeline AutoML tool for tabular data
Organization: datacanvasio
Home Page: https://hypergbm.readthedocs.io/
distributed-training,universal visual model trained on LAION-400M
Organization: deepglint
Home Page: https://arxiv.org/pdf/2304.05884.pdf
distributed-training,DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Organization: deeprec-ai
distributed-training,HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Organization: dena
distributed-training,Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Organization: determined-ai
Home Page: https://determined.ai
distributed-training,How to use Cross Replica / Synchronized Batchnorm in Pytorch
User: dougsouza
distributed-training,FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.
Organization: fedml-ai
Home Page: https://fedml.ai
distributed-training,Demonstrate throughput of PyTorch FSDP
Organization: foundation-model-stack
Home Page: https://pytorch.org/docs/stable/fsdp.html
distributed-training,a PyTorch Tutorial to Class-Incremental Learning | a Distributed Training Template of CIL with core code less than 100 lines.
User: g-u-n
distributed-training,[MLSys 2022] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling" by Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin
Organization: gatech-eic
distributed-training,Learn how to design, develop, deploy and iterate on production-grade ML applications.
User: gokumohandas
Home Page: https://madewithml.com
distributed-training,Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
User: guitaricet
Home Page: https://arxiv.org/abs/2307.05695
distributed-training,A Jax-based library for designing and training transformer models from scratch.
User: hmunachi
distributed-training,
User: hongxinxiang
distributed-training,Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Organization: huggingface
distributed-training,PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Organization: huggingface
Home Page: https://huggingface.co/docs/timm
distributed-training,Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Organization: idea-ccnl
distributed-training,DLRover: An Automatic Distributed Deep Learning System
Organization: intelligent-machine-learning
distributed-training,Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Organization: learning-at-home
distributed-training,Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
Organization: lsds
distributed-training,YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCV 2018)
User: maudzung
Home Page: https://arxiv.org/pdf/1808.02350v1.pdf
distributed-training,Efficient Deep Learning Systems course materials (HSE, YSDA)
User: mryab
distributed-training,This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.
User: omerbsezer
distributed-training,LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Organization: oneflow-inc
Home Page: https://libai.readthedocs.io
distributed-training,PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Organization: paddlepaddle
Home Page: http://www.paddlepaddle.org/
distributed-training,👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Organization: paddlepaddle
Home Page: https://paddlenlp.readthedocs.io
distributed-training,Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
Organization: paddlepaddle
distributed-training,Pytorch分布式训练框架
User: panjinquan
distributed-training,Resource-adaptive cluster scheduler for deep learning training.
Organization: petuum
Home Page: https://adaptdl.readthedocs.io/
distributed-training,Pinpoint Node.js agent
Organization: pinpoint-apm
distributed-training,TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Organization: pytorch
Home Page: https://pytorch.org/torchx
distributed-training,Distributed, mixed-precision training with PyTorch
User: richardkxu
distributed-training,SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
Organization: skypilot-org
Home Page: https://skypilot.readthedocs.io
distributed-training,[ICLR 2018] Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
User: synxlin
Home Page: https://arxiv.org/pdf/1712.01887.pdf
distributed-training,pytorch分布式训练
User: taishan1994
distributed-training,Fast and flexible AutoML with learning guarantees.
Organization: tensorflow
Home Page: https://adanet.readthedocs.io
distributed-training,Library for Fast and Flexible Human Pose Estimation
Organization: tensorlayer
Home Page: https://hyperpose.readthedocs.io
distributed-training,Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
User: wenwei202
distributed-training,PyTorch implementation of over 30 realtime semantic segmentations models, e.g. BiSeNetv1, BiSeNetv2, CGNet, ContextNet, DABNet, DDRNet, EDANet, ENet, ERFNet, ESPNet, ESPNetv2, FastSCNN, ICNet, LEDNet, LinkNet, PP-LiteSeg, SegNet, ShelfNet, STDC, SwiftNet, and support knowledge distillation, distributed training etc.
User: zh320
distributed-training,OpenKS - 领域可泛化的知识学习与计算引擎
Organization: zju-openks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.