flowritecom / flow-merge Goto Github PK

10.0 5.0 0.0 93 KB

flow-merge is a powerful Python library that enables seamless merging of multiple transformer-based language models using the most popular merge methods such as model soups, SLERP, ties-MERGING or DARE.

License: Apache License 2.0

Python 100.00%

llms model-merging

flow-merge's Introduction

Merge Language Models with Ease
Getting Started - Contributing - Issues - Website - flow-merge-UI

👋 Welcome

Model merging is an innovative technique that allows you to combine pre-trained and fine-tuned language models (LMs) into new models with unique capabilities.

By merging existing LMs, you can potentially create a new model that inherits the strengths and capabilities of its constituent models. This way, you can explore new model variations and experiment with different combinations without the need for expensive GPU resources or extensive training from scratch.

flow-merge is a fully open-source library written in Python that implements some of the most popular merge methods such as model soups, SLERP, ties-MERGING or DARE. The library is built on top of the Hugging Face transformers library and the deep learning framework Pytorch, and provides a simple and easy-to-use interface to merge models and upload them to the Hugging Face Hub.

⭐️ Features

flow-merge has been designed to serve both beginners and experts in merging transformer-based language models (LMs). You don't need prior experience with merge methods or advanced knowledge of LMs; a basic understanding of LMs and the command-line interface (CLI) is sufficient.

The library walks you through the merging process, so you can focus on finding the best possible merges without getting bogged down in details of the complex merge methods. Our ultimate goal is to make language model merging simple, flexible, and customizable to your specific needs.

The key features of the library consists of:

Default parameter settings: Sane default values for the most important parameters based on the experiments in the papers.
Input validations: flow-merge validates all the user inputs before starting the merge and provides helpful error messages if something is wrong.
CLI and Library: A command-line interface (CLI) for easy merging and uploading of models to the Hugging Face Hub. Also a library that you can use in your own projects.
Memory efficient: flow-merge is designed to be memory efficient, so you can merge large models without running out of memory or without a GPU.

🎉 Getting started

💻 Installation

Clone the repository and navigate to the root directory:

# via ssh
git clone [email protected]:flowritecom/flow-merge.git

cd flow-merge

Create a new python environment and activate it. For example, with conda:

Note flow-merge requires python>=3.10

conda create -n flow-merge python>=3.10 && conda activate flow-merge

flow-merge can be installed with running pip inside the project directory (-e for editable install):

pip install -e .

🏎️💨 Quick start

Write a `flow-merge` config

A merge config is a YAML file that defines the models you want to merge and how you want to merge them.

Below is an example of a merge config that merges three models using the addition-task-arithmetic method and saves the merged model to the ./merged_model directory:

method: addition-task-arithmetic
method_global_parameters:
  scaling_coefficient: 0.7
  normalize: False
base_model: Qwen/Qwen1.5-0.5B
models:
  - model: Qwen/Qwen1.5-0.5B
  - model: Qwen/Qwen1.5-0.5B-chat
  - model: minghaowu/Qwen1.5-0.5B-OpenHermes-2.5
tokenizer:
  mode: base
  interpolation_method: linear
directory_settings:
  cache_dir: null
  local_dir: ./models
  output_dir: ./merged_model
hf_token:
  token: null
  trust_remote_code: False
device: cpu

The only required fields are method, and models. The method field specifies the merge method you want to use, and models is a list of models you want to merge. The rest of the fields are optionally and flow-merge will use the default values if they are not provided. For a complete list of the default values, see the config file documentation.

Save the config to a file, for example my_first_merge.yaml.

Run a merge

Merging models with flow-merge is as simple as choosing a YAML template from the examples folder, modifying the paths to the models you want to merge, and running the following command:

flow-merge run --config my_first_merge.yaml --model_name qwen_merge

Upload the merged model to the Hugging Face Hub

After the merge is complete, you can easily upload the merged model to the Hugging Face Hub by running the following command:

flow-merge upload --model_dir ./merged_model --username <hf_user_id> --model_name qwen_merge --token <hf_token> --private <True/False>

Usage

CLI

You can check the available commands and options by running:

flow-merge --help

You can display the config yaml schema and the default values by running:

flow-merge schema
# extra tip: pipe to highlighted json with 'flow-merge schema | jq' or 'flow-merge schema | fx'
# where you require either 'jq' or 'fx' installed beforehand

You can optionally validate your config file before running the merge:

flow-merge validate --config my_first_merge.yaml

🛠️ Supported Merge methods

Currently flow-merge supports most of the popular and proven merge methods.

Method	Identifier	Paper
Linear or Model Soups	`model-soup`	Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
SLERP	`slerp`	-
Addition Task Arithmetic	`addition-task-arithmetic`	Editing Models with Task Arithmetic
Ties-MERGING	`ties-merging`	TIES-Merging: Resolving Interference When Merging Models
DARE Ties-MERGING	`dare-ties`	Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

📢 We are working hard on adding more methods to the library.

Properties of the methods

Method	Description	Uses a Base Model	Can Merge Multiple Models	Supports Weighted Merge
Linear or Model Soups	Averages the weights of the models	No	Yes	Yes
SLERP	Smoothly interpolates between the weights of two models using spherical linear interpolation	No	No	No
Addition Task Arithmetic	Obtains task vectors or deltas and applies them to the base model	Yes	Yes	Yes
Ties-MERGING	It addresses the problem of interference between parameters from different models before merging with addition task arithmetic	Yes	Yes	Yes
DARE Ties-MERGING	Similar to Ties-MERGING but it uses a different approach that prunes the task vectors and rescale them.	Yes	Yes	Yes

Supported LLM Architectures

flow-merge currently supports merging models that are based on the following architectures:

Model type	Architecture
`qwen`	`QwenForCausalLM`
`mistral`	`MistralForCausalLM`
`llama`	`LlamaForCausalLM`

📢 We plan to support many models and architectures more, including encoder models such as BERT-Family models too.

Tokenizers

When merging language models, it's crucial to consider the tokenizers involved, as they convert text into tokens that the models can process.

flow-merge currently supports two modes for constructing the tokenizer that is used by the resulting merged model:

base: Default mode. The merged model utilizes the tokenizer of the base model. If no base model is specified in the merged configuration, the first model in the models list is used as the base model.
merged: If the tokenizers of the models use different vocabularies, a common vocabulary is created, and a new tokenizer is constructed based on this vocabulary.

Interpolation of embedding and language modeling layers

If the tokenizers of the models use different vocabularies, flow-merge creates input_ids mappings for the models and linearly interpolates the embedding and language modeling layers.

Currently, only linear interpolation is supported.

Special tokens

Conflicts can arise from special tokens used by different models' tokenizers, such as differing eos_token tokens. In such cases, flow-merge uses the special token of the last model in the list.

🚧 WIP 🚧 📚 Additional resources

Here we have prepared some additional resources to help developers understand the supported merge methods better.

🗺️ `flow-merge` Roadmap

Coming soon..

✨ Project showcase

Coming soon..

🤝 Contributing

Wanna pitch in? We're totally open to contributions for the core flow-merge library as well as any cool integrations built on top of it! Check out our Contribution Guide for all the details on how to get started.

💻 Development setup

Install conda (refer here for instructions) and make sure it's initialized for your shell. Git clone the repository and spawn the environment from environment.yml.

git clone [email protected]:flowritecom/flow-merge.git; cd flow-merge
conda env create # creates conda env with name flow-merge, python ~3.10 and installs the listed dependencies
conda activate flow-merge
pip install -e . # install flow-merge in editable mode
code . # open your editor, for example vscode

To easily jump into PRs you can use for example the (Github CLI)[https://cli.github.com/] client gh pr checkout <insert_pr_number>.

🙏 Acknowledgments

Special thanks to these amazing projects that helped us build flow-merge:

Also, a big shoutout to the authors of the papers of the merge methods implemented in flow-merge, and to Charles O. Goddard, creator of mergekit, who inspired us to create our own merging toolkit.

Finally, thanks to Derrick Schultz for the pytorch-tensor-slerp.py gist that helped us implement the SLERP method.

✍️ Citation

@misc{flowrite_2024_flow_merge,
  author = {The Flowrite Team},
  title = {flow-merge},
  howpublished = {\url{https://https://github.com/flowritecom/flow-merge}},
  year = {2024}
}

flow-merge's People

Contributors

Stargazers

Watchers

flow-merge's Issues

Feature request: Implement interpolated gradients for parameter values

🚀 Feature Request

Summary

Merge models based on a specific gradient between them.

Motivation

Enable more granular merging.

Implementation

Adapt the implementation from https://github.com/Gryphe/BlockMerge_Gradient

Additional context

Bug: Generated model card shouldn't include ValidatedInputData object

Description

The ValidatedInputData object seems to be written to the model card:

---
library_name: transformers
tags:
- flow-merge
- merge

---
# neural_story

This model is the result of merge of the following models made with flow-merge:

- Base model:
	- mistralai/Mistral-7B-Instruct-v0.2
- Models:
	- NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story


## flow-merge config

The following configuration was used to merge the models:

```yaml
!!python/object:flow_merge.lib.merge_config.ValidatedInputData
__dict__:
  base_model: mistralai/Mistral-7B-Instruct-v0.2
  models:
  - !!python/object:flow_merge.lib.merge_settings.RawModelDict
    __dict__:
      path_or_id: mistralai/Mistral-7B-Instruct-v0.2
      weight: null
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set
      path_or_id: null
    __pydantic_private__: null
  - !!python/object:flow_merge.lib.merge_settings.RawModelDict
    __dict__:
      path_or_id: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
      weight: 0.75
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set
      weight: null
      path_or_id: null
    __pydantic_private__: null
  method: !!python/object/apply:flow_merge.lib.constants.MergeMethodIdentifier
  - addition-task-arithmetic
  device: !!python/object/apply:flow_merge.lib.constants.DeviceIdentifier
  - cpu
  method_global_parameters: !!python/object:flow_merge.lib.merge_settings.MethodGlobalParameters
    __dict__:
      scaling_coefficient: 1.0
      normalize: false
      p: null
      top_k: null
      t: null
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set
      scaling_coefficient: null
      normalize: null
    __pydantic_private__: null
  directory_settings: !!python/object:flow_merge.lib.merge_settings.DirectorySettings
    __dict__:
      cache_dir: null
      local_dir: ./models
      output_dir: ./neural_story
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set
      output_dir: null
      cache_dir: null
      local_dir: null
    __pydantic_private__: null
  hf_hub_settings: !!python/object:flow_merge.lib.merge_settings.HfHubSettings
    __dict__:
      token: null
      trust_remote_code: false
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set
      token: null
      trust_remote_code: null
    __pydantic_private__: null
  tokenizer_settings: !!python/object:flow_merge.lib.merge_settings.TokenizerSettings
    __dict__:
      mode: base
      interpolation_method: linear
    __pydantic_extra__: null
    __pydantic_fields_set__: !!set {}
    __pydantic_private__: null
__pydantic_extra__: null
__pydantic_fields_set__: !!set
  hf_hub_settings: null
  method_global_parameters: null
  base_model: null
  device: null
  models: null
  directory_settings: null
  method: null
__pydantic_private__: null

Add an explanation of model-specific weights in the config file to README

Explain how using weights for each model impacts the merge.

Bug: device python object written in the config of the model card and keys sorted

🐛 Bug report

Summary

Python object is written in the model card's config file

Actual Behavior

device python object is written in the model card.

Also, keys are being sorted instead of keeping the original order.

See https://huggingface.co/flow-ai-llm/biomistral_slerp_7b

Expected behavior

python object not written in config of model card.
Keys ordered as in the original config.

To Reproduce

Merge configuration file (if relevant):

method: slerp
method_global_parameters:
  t: 0.5
base_model: BioMistral/BioMistral-Safetensors
models:
  - model: BioMistral/BioMistral-Safetensors
  - model: mistralai/Mistral-7B-Instruct-v0.2
tokenizer:
  mode: base
  interpolation_method: linear
directory_settings:
  output_dir: ./biomistral/biomistral_slerp_7b/
hf_token:
  token: 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
  trust_remote_code: True
device: cpu

Environment

OS: Ubuntu 22.04.4 LTS
Python version:Python 3.10.13
Library version: 0.1.0
Other relevant dependencies: NA

Improving log message when task vectors are close to zero

Description

If there is almost no difference between the tensor of the base model and the tensor of a model, task vector values are close to zero. If all the task vectors are close to zero, then the merge method of the merge method class just returns the base tensor. The current warning message "No task vectors. Returning the base model tensor." is not very helpful:

class TaskArithmetic(MergeMethod):
    def merge(
        self,
        weight: ModelWeight,
        base_model_tensor: torch.Tensor,
        models_tensors: Dict[Model, torch.Tensor],
        merge_method_settings: Union[TaskArithmeticSettings, TiesMergingSettings],
        base_model: Model,
    ) -> torch.Tensor:
        base_tensor_dtype = base_model_tensor.dtype

        task_vectors: Dict[Model, torch.Tensor] = self._get_task_vectors(
            base_model_tensor, models_tensors
        )

        if not task_vectors:
            logger.warning("No task vectors. Returning the base model tensor.")
            return base_model_tensor
            
            ...

It should provide a better explanation.

Feature request: dtype conversions and optimizations

🚀 Feature Request

Summary

Optimize data type (dtype) for certain operations to improve performance and memory efficiency. For example, use int for tensor operations with masks.

Also, allow for dtype settings in config files.

Motivation

Improve performance
Give user more control over the dtype of the resulting model

[Optional] Implementation

Additional context

Feature: Allow for .bin models

Description

Allow for .bin model weight file format. Currently, it tries to find .safetensors files in the hf repo and it errors our if not found:

flow_merge.lib.merge_runner - ERROR - Merge error: EntryNotFoundError - 404 Client Error. (Request ID: Root=1-662b6d7f-44142b6d056f9513353e0c40;0454e034-eaa9-438b-bda0-33147d8e6f4b)

Entry Not Found for url: https://huggingface.co/Doctor-Shotgun/cat-v1.0-13b/resolve/main/model.safetensors.

Improve validation error messages for better user experience

Description

The current validation error messages in the flow-merge validate --config ... tool could be improved to provide a better user experience. When a user encounters a validation error, the message should clearly explain what went wrong and provide actionable steps to resolve the issue.

For example, the current error message for a missing required field in the configuration file looks like this:

Configuration file is invalid: 1 validation error for ValidatedInputData
models.0.model
  Field required [type=missing, input_value={'models': 'TheBloke/Llama-2-13B-fp16'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

An improved error message could start with:

Error: Missing required field '<name_of_field>' in the configuration file.

...

Example of a correct model entry:
models:
  - model: TheBloke/Llama-2-13B-fp16
    weight:
    
...

New architecture Request: Phi-2 and phi-3 models

New architecture

phi-2 and phi-3 -> PhiForCausalLM

Motivation

Small, powerful and versatile family of LMs

Attribution

Bug: Merged model is uploaded to hf as private even if it would be set to public

🐛 Bug report

Summary

running the model upload command with --Private False -> still uploads the model to hf as private

Expected behavior

If --private False model should be uploaded as public to hf

To Reproduce

run model upload command for example
flow-merge upload --model_dir ./merged_model --username <hf_user_id> --model_name qwen_merge --token <hf_token> --private False

Bug: remove deps like `hvac`, `tqdm` and `hf-transfer` that aren't currently used

🐛 Bug report

Summary

Check both setup.py and environment.yml

Bug: No supported operating systems mentioned & no testing details

🐛 Bug report

Summary

The repository documentation lacks explicit information regarding the supported operating systems.

Actual Behavior

The project's documentation or README file does not specify which operating systems are officially supported.

Expected behavior

Ideally, the repository should clearly state the compatible operating systems, whether it's Windows, macOS, Linux, or all of them, to guide users before they start using the project.

Additional context

The absence of operating system compatibility information might cause confusion for developers on unsupported platforms.

Possible Solution or Workaround

Suggested action:

The maintainer should update the repository's documentation to include a section on supported operating systems, ensuring users can quickly determine if the project is compatible with their development environment.

Feature request: Enable passing HF_TOKEN from environment

🚀 Feature Request

Summary

Right now we are providing HF_TOKEN as a variable to merge_config.yaml file. I suggest we also enable passing it via env variable and catch that automatically.

[Optional] Implementation

TBA, assigned to myself

Bug: Hugging Face hub authentication

🐛 Bug report

Summary

Currently, the library does not log the user into HF hub even if the token is passed in the config.yaml.

Actual Behavior

If a gated or privated model is passed in the models list, the load config function errors out:

flow-merge run --config ./biomistral/slerp_config.yaml --model_name biomistral_slerp_7b
flow_merge.lib.merge_runner - INFO - Starting merge...
flow_merge.lib.merge_runner - ERROR - Merge error: RuntimeError - Failed to load config for model mistralai/Mistral-7B-Instruct-v0.2

The error message could be improve too..

Tried to load the model manually and this was the actual error from transformers library:

...
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/envs/flow-merge/lib/python3.10/site-packages/transformers/configuration_utils.py", line 688, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/conda/envs/flow-merge/lib/python3.10/site-packages/transformers/utils/hub.py", line 416, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2.
401 Client Error. (Request ID: Root=1-662e549f-06bec9942e4a5164323cb0d1;498fe681-a1ee-4564-94c3-04a3098285df)

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted. You must be authenticated to access it.

Expected behavior

If hf token is passed, the user should be logged in.

To Reproduce

Merge configuration file (if relevant):

method: slerp
method_global_parameters:
  t: 0.5
base_model: BioMistral/BioMistral-7B
models:
  - model: BioMistral/BioMistral-7B
  - model: mistralai/Mistral-7B-Instruct-v0.2
tokenizer:
  mode: base
  interpolation_method: linear
directory_settings:
  output_dir: ./biomistral/biomistral_slerp_7b/
hf_token:
  token: <your_token>
  trust_remote_code: True
device: cpu

Steps to reproduce the behavior:

Create the config file above
Run flow-merge run --config <path_to_file> --model_name model

Environment

OS: Ubuntu 22.04.4 LTS
Python version:Python 3.10.13
Library version: 0.1.0
Other relevant dependencies: NA

Feature request: Create a unique subdirectory for the merged model

🚀 Feature Request

Summary

Currently the model output directory is specified in the merge configuration yaml file. While I can specify here a subfolder for the model, I believe it to be more intuitive and efficient to automatically create a unique subfolder per merge.

Motivation

This change helps to avoid accidental over-writes of the previously merged model.

[Optional] Implementation

add unique model name/identifier and create subfolder where the merged model is stored.

Fix: `flow-merge inputs` command output is not complete and needs a revamp

🐛 Bug report

Summary

It's at first draft stage. To be completed.

s@zappacosta ~/repos/april-oss/flow-merge
 (fix/10-bug-generated-model-card-shouldnt-include-validatedinputdata-object +*)$ flow-merge inputs                     31.3s  Fri 26 Apr 2024 07:49:27 PM EEST

# Required parameters
- 'base_model': 			 the base model to be used for merging
- 'models': 				 list of dictionaries, each representing a model to be merged
 	- 'model': 				 each model dictionary should have a 'model' property specifying the model path or identifier
 	- 'weight': 				 the 'weight' property in a model dictionary is optional and specifies the weight of the model during merging
- 'method': 				 the merge method to be used, one of ['addition-task-arithmetic','ties-merging','slerp','dare-ties-merging','model-soup','passthrough']

# Optional parameters
- 'device': 				 the device to be used for merging one of ['cpu','cuda']
- 'method_global_parameters': 		 global parameters for the merge method
 	- 'normalize': bool				 lorem ipsum
 	- 'p': float					 lorem ipsum
 	- 'scaling_coefficient': float		 lorem ipsum
 	- 't': float					 lorem ipsum
 	- 'top_k': float				 lorem ipsum
- 'directory_settings': 		 directories for caching, loading, and saving models
 	- 'cache_dir': str				 lorem ipsum
 	- 'local_dir': str					 lorem ipsum
 	- 'output_dir': str		 lorem ipsum
- 'hf_hub_settings': 			 settings for interacting with the Hugging Face Hub
 	- 'token': str				 lorem ipsum
 	- 'trust_remote_code': bool					 lorem ipsum
- 'tokenizer_settings': 		 settings for the tokenizer used with the merged model
 	- 'interpolation_method': str		 lorem ipsum
 	- 'mode': str				 lorem ipsum

Bug: Scaling coefficient is `None` if not passed in config.

🐛 Bug report

Summary

The scaling coefficient is None if not passed in the config file.

Actual Behavior

The scaling coefficient takes None if it's not passed in the config file, resulting in a merge error.

Expected behavior

The scaling coefficient should use the default value if it's not passed in the config file.

To Reproduce

Merge configuration file (if relevant):

method: dare-ties-merging
method_global_parameters:
  p: 0.3
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: mistralai/Mistral-7B-v0.1
  - model: mistralai/Mistral-7B-Instruct-v0.1
    weight: 0.5
  - model: BioMistral/BioMistral-Safetensors
    weight: 0.5
directory_settings:
  output_dir: ./biomistral/biomistral_dare_ties_7b/
hf_token:
  token: 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
  trust_remote_code: True
device: cuda

Environment

OS: Ubuntu 22.04.4 LTS
Python version:Python 3.10.13
Library version: 0.1.0
Other relevant dependencies: NA

Feature request: Enable frankenmerging

🚀 Feature Request

Summary

Enable frakenmerging technique.

Motivation

Some frankenmerges have surprised the community with their quality of outputs.

Additional context

Documentation: Add explanation of tokenizer behaviour

Description

Add an explanation of how the tokenizer for the merged model is created by flow-merge.

flowritecom / flow-merge Goto Github PK

flow-merge's Introduction

👋 Welcome

⭐️ Features

🎉 Getting started

💻 Installation

🏎️💨 Quick start

Write a flow-merge config

Run a merge

Upload the merged model to the Hugging Face Hub

Usage

CLI

🛠️ Supported Merge methods

Properties of the methods

Supported LLM Architectures

Tokenizers

Interpolation of embedding and language modeling layers

Special tokens

🚧 WIP 🚧 📚 Additional resources

🗺️ flow-merge Roadmap

✨ Project showcase

🤝 Contributing

💻 Development setup

🙏 Acknowledgments

✍️ Citation

flow-merge's People

Contributors

Stargazers

Watchers

flow-merge's Issues

🚀 Feature Request

Summary

Motivation

Implementation

Additional context

Description

🐛 Bug report

Summary

Actual Behavior

Expected behavior

To Reproduce

Environment

Description

🚀 Feature Request

Summary

Motivation

[Optional] Implementation

Additional context

Description

Description

New architecture

Motivation

Attribution

🐛 Bug report

Summary

Expected behavior

To Reproduce

🐛 Bug report

Summary

🐛 Bug report

Summary

Actual Behavior

Expected behavior

Additional context

Possible Solution or Workaround

🚀 Feature Request

Summary

[Optional] Implementation

🐛 Bug report

Summary

Actual Behavior

Expected behavior

To Reproduce

Environment

🚀 Feature Request

Summary

Motivation

[Optional] Implementation

🐛 Bug report

Summary

Write a `flow-merge` config

🗺️ `flow-merge` Roadmap