Coder Social home page Coder Social logo

snowflake-arctic's Introduction

License Apache 2.0 Twitter

❄️ Snowflake AI Research ❄️

Latest News

Overview

The Snowflake AI Research team is conducting open, foundational research to advance the field of AI while making enterprise AI easy, efficient, and trusted. This repo contains several artifacts to help efficiently train and inference popular LLMs in practice. We released Arctic in April of 2023 and are proud to announce the release of our Massive LLM inference and fine-tuning stacks specifically tailored to Llama 3.1 405B.

Llama 3.1 405B

In collaboration with DeepSpeed, Hugging Face, vLLM, and the broader AI community we are excited to open-source our inference and fine-tuning stacks optimized for Llama 3.1 405B. For inference we support a massive 128K context window from day one, while enabling real-time inference with up to 3x lower end-to-end latency and 1.4x higher throughput than existing open source solutions. Please see our blog, Achieve Low-Latency and High-Throughput Inference with Meta's Llama 3.1 405B using Snowflake’s Optimized AI Stack, that deep dive into all of these innovations. For fine-tuning we support training on a single and multi-node training environments using the latest in memory efficient training techniques such as parameter-efficient fine-tuning, FP8 quantization, ZeRO-3-inspired sharding, and targeted parameter offloading (when necessary). Please see our blog, Fine-Tune Llama 3.1 405B on a Single Node using Snowflake’s Memory-Optimized AI Stack, for a deep dive into how we did this.

Getting started

Arctic

At Snowflake, we see a consistent pattern in AI needs and use cases from our enterprise customers. Enterprises want to use LLMs to build conversational SQL data copilots, code copilots and RAG chat bots. From a metrics perspective, this translates to LLMs that excel at SQL, code, complex instruction following and the ability to produce grounded answers. We capture these abilities into a single metric we call enterprise intelligence by taking an average of Coding (HumanEval+ and MBPP+), SQL Generation (Spider), and Instruction following (IFEval).

As seen here, Arctic is on par or better than both LLAMA 3 8B and LLAMA 2 70B on enterprise metrics, while using less than ½ of the training compute budget. Similarly, despite using 17x less compute budget, Arctic is on par with Llama3 70B in enterprise metrics like Coding (HumanEval+ & MBPP+), SQL (Spider) and Instruction Following (IFEval). It does so while remaining competitive on overall performance, for example, despite using 7x less compute than DBRX, it remains competitive on Language Understanding and Reasoning (a collection of 11 metrics) while being better in Math (GSM8K).

Arctic uses a unique Dense-MoE Hybrid transformer architecture. It combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. To learn more about this architecture please read through our blog post here.

The Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-tier enterprise focused LLM that pushes the frontiers of cost-effective training and openness. Arctic is efficiently intelligent and truly open.

  • Efficiently Intelligent: Arctic excels at enterprise tasks such as SQL generation, coding and instruction following benchmarks even when compared to open source models trained with significantly higher compute budgets. In fact, it sets a new baseline for cost effective training to enable Snowflake customers to create high quality custom models for their enterprise needs at a low cost.

  • Truly Open: Apache 2.0 license provides ungated access to weights and code. In addition, we are also open sourcing all of our data recipes and research insights.

Getting Started

Inference API Providers Access Arctic via your model garden or catalog of choice including AWS, NVIDIA AI Catalog, Replicate, Lamini, Perplexity, and Together AI over the next coming days.

Model Weights The best way to get yourself running with Arctic is through Hugging Face. We have uploaded both the Base and Instruct model variants to the Hugging Face hub:

Inference

We provide two different tutorials on standing up Arctic for inference:

Cookbooks/Tutorials

We believe in a thriving research community, and we are committed to sharing our insights as we build the Arctic family of models, to advance research and reduce the cost of LLM training and inference for everyone. Please check out our on-going cookbook releases where we will dive deeper into several areas crucial for training models like Arctic.

snowflake-arctic's People

Contributors

jeffra avatar aurickq avatar sfc-gh-jrasley avatar sfc-gh-rsamdani avatar andrewgcodes avatar sfc-gh-aqiao avatar eltociear avatar iskhare avatar sfc-gh-yozuysal avatar

Stargazers

 avatar Roni Vegh avatar Charles Cai avatar Jack Li avatar Minoru Mizutani avatar Cogito Ergo Sum avatar Wing Lian avatar John Russell avatar Ismaël Mejía avatar  avatar Gavin Lee avatar  avatar  avatar  avatar Haiaty Varotto avatar rust avatar  avatar Nick Urban avatar Maddie avatar Tom K. avatar Andreas Hochsteger avatar  avatar pryh4ck avatar  avatar Deep Shah avatar IceMasterT avatar  avatar Francisco Leyva avatar  avatar Seongho Bae avatar qindamoni avatar Guang-hui Zeng avatar Fan Shang Xiang avatar Jesus Pacheco avatar Sebastien Pires avatar Adam Zell avatar Greg Roberts avatar  avatar Eren Aslan avatar Greg avatar hieu nguyen avatar Jessy Song avatar Alyssa Yesilyurt avatar  avatar  avatar James Le avatar Kamesh Sampath avatar  avatar Leo avatar Shiming Ren avatar Yasunori Suzuki avatar  avatar  avatar Rahul Alam avatar  avatar  avatar  avatar boseop kim & nick.coco avatar  avatar tyoc213 avatar Ricky Wong avatar Sathiya avatar Marcin Szymański avatar  avatar  avatar  avatar  avatar David Butaev avatar Kyle Jovanovic avatar Henry Hyeonmok Ko avatar Yuxiang Wei avatar EMIN K. avatar Billal avatar Daniele Gadler avatar  avatar Shaw Yan avatar Xiaolong avatar Brett Butterfield avatar Fyphen avatar FlamesCN avatar vishal hirawat avatar  avatar Brad Groux avatar Oleksandr Arsentiev avatar  avatar  avatar Fabio Dias Rollo avatar Thomas Lindgren avatar  avatar Billy avatar  avatar Hemanth Kollipara avatar  avatar Oaxaca Gold avatar Mohammad Reza Taesiri avatar  avatar  avatar Kundan Kumar avatar Saksham Tyagi avatar  avatar

Watchers

 avatar Dash Desai avatar FCB3 avatar Boadi Samson avatar xiaozhiob avatar  avatar

snowflake-arctic's Issues

Trouble replicating the training procedure -- Batch size

Hi there!

I am currently trying to implement the training recipe from your Snowflake report. I have access to the same hardware (8xH100s), however, I am struggling to match the reported batch sizes.

In the report, gradient checkpointing is never mentioned, but it does feel like this would make the extremely large batch sizes possible. Could you confirm whether gradient checkpointing was used?

Thank you.

Detailed Accuracy Metrics for Individual Benchmarks

Hi,

First of tall thank you so much for releasing this awesome model.

We want to include Snowflake-Arctic in our study, and we need to have detailed accuracy metrics for individual benchmarks. The official article syas that the "Common Sense" metric is an average of 11 metrics, and the "Coding" category also appears to be an average of HumanEval and MBPP. Could you provide the detailed scores for these individual benchmarks?

Thanks!

Optimal Multi-node Inference Parallel Settings

Great work!
I noticed in your blog that the multi-node inference is implemented via TP and PP

While challenging, this can be achieved with two-node inference using a combination of system optimizations such as FP8 weights, split-fuse and continuous batching, tensor parallelism within a node and pipeline parallelism across nodes.

I was wondering have you tried DP + TP + EP as described in the DeepSpeed-MoE paper? And what's the best practice to scale such a giant model to a multi-node environment to achieve the best inference efficiency?

Meet error in serving with huggingface inference tutorial

Hi, Arctic team, Great work! I followed the Huggingface Inference Tutorial to do the inference. But I met the following error:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [24:34<00:00,  7.56s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:31999 for open-end generation.
Traceback (most recent call last):
  File "/mnt/afs/jfduan/LLMInfer/snowflake-arctic/inference/hf_infer.py", line 28, in <module>
    outputs = model.generate(input_ids=input_ids, max_new_tokens=20)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/generation/utils.py", line 1572, in generate
    result = self._greedy_search(
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/generation/utils.py", line 2477, in _greedy_search
    outputs = self(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1708, in forward
    outputs = self.model(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1397, in forward
    layer_outputs = decoder_layer(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 1087, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/afs/jfduan/LLMInfer/transformers-arctic/src/transformers/models/arctic/modeling_arctic.py", line 808, in forward
    query_states = self.q_proj(hidden_states)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in pre_forward
    set_module_tensor_to_device(
  File "/mnt/afs/jfduan/env/miniconda3/envs/arctic/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 358, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([7168, 7168]) in "weight" (which has shape torch.Size([100352, 516])), this look incorrect.

Can you help me resolve this? Thanks a lot!

Snowflake Cortex Inference

Hi Snowflake team,

Congrats on the release. As mentioned in the release blog, Snowflake customers with a payment method on file will be able to access Snowflake Arctic for free until June 3. . What is model name for Arctic using Snowflake Cortex to inference? Will there be any documentation updated any time soon?

Thanks.

git clone error git+https://github.com/Snowflake-Labs/vllm@artic

Hello,
I am trying to play around with snowflake’s artic llm in google co-lab and using Hugging face version. I tried following the instructions on snowflake-artic github and got the errors mentioned below:

  1. !pip install git+git://github.com/Snowflake-Labs/vllm@artic - threw error saying [git clone --filter=blob:none --quiet git://github.com/Snowflake-Labs/vllm.git /tmp/pip-req-build-07so961n did not run successfully.]
  2. changed it to !pip install git+https://github.com/Snowflake-Labs/vllm@artic - threw error saying [error: pathspec 'artic' did not match any file(s) known to git]
  3. Finally I removed the branch option !pip install git+https://github.com/Snowflake-Labs/vllm, starts to clone and towards the end throws the error:
    _Building wheels for collected packages: vllm
    error: subprocess-exited-with-error

× Building wheel for vllm (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for vllm (pyproject.toml) … error
ERROR: Failed building wheel for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects_

I am referring to an incorrect branch ? could someone help me here pls?

ImportError: cannot import name 'LlamaTokenizer' from 'transformers.models.llama'

I tried the minimum example from https://huggingface.co/Snowflake/snowflake-arctic-instruct and it did not work. Can you help me to fix it?

image

Im using the latest trasnformers release commit.

image

snowflake-arctic-instruct.py

import os
# enable hf_transfer for faster ckpt download
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed.linear.config import QuantizationConfig

tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True
)
quant_config = QuantizationConfig(q_bits=8)

model = AutoModelForCausalLM.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="auto",
    ds_quantization_config=quant_config,
    max_memory={i: "150GiB" for i in range(8)},
    torch_dtype=torch.bfloat16)


content = "5x + 35 = 7x - 60 + 10. Solve for x"
messages = [{"role": "user", "content": content}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

requirements.txt

annotated-types==0.6.0
certifi==2024.2.2
charset-normalizer==3.3.2
deepspeed==0.14.2
filelock==3.13.4
fsspec==2024.3.1
hf_transfer==0.1.6
hjson==3.1.0
huggingface-hub==0.22.2
idna==3.7
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
ninja==1.11.1.1
numpy==1.26.4
packaging==24.0
psutil==5.9.8
py-cpuinfo==9.0.0
pydantic==2.7.1
pydantic_core==2.18.2
pynvml==11.5.0
PyYAML==6.0.1
regex==2024.4.28
requests==2.31.0
safetensors==0.4.3
sympy==1.12
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.2
transformers @ git+https://github.com/huggingface/transformers@9fe3f585bb4ea29f209dc705d269fbe292e1128f
typing_extensions==4.11.0
urllib3==2.2.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.