Coder Social home page Coder Social logo

chordmixer's Introduction

ChordMixer: A Scalable Neural Attention Model For Sequences With Different Lengths [Accepted to ICLR'23]

OpenReview

ChordMixer Architecture

ChordMixer Network is a stack of ChordMixer blocks. Each of them applies two simple tensor operations on input sequences.

  1. Rotate step. A parameter-free module that circularly rotates sequence channels.
  2. Mix step. Applies an MLP over the sequence positions.

Experiments

Long Range Arena

We get competitive results on a public benchmark. SOTA on Pathfinder and PathfinderX.

Model ListOps Text Image Retrieval Pathfinder PathfinderX
Transformer 36.37 64.27 42.44 57.46 71.40
Longformer 35.63 62.58 42.22 56.89 69.71
Linformer 37.70 53.94 38.56 52.27 76.34
Rerformer 37.27 56.10 38.07 53.40 68.50
Performer 18.01 65.40 42.77 53.82 77.05
Nyströmformer 37.15 65.52 41.58 79.56 70.94
S4 59.60 86.82 88.65 90.90 94.20 96.35
Mega 63.14 90.43 90.44 91.25 96.01 97.98
ChordMixer 59.89 88.87 89.95 90.38 96.67 98.63

Insanely long sequences

ChordMixer shows great performance on extremely long sequences with high length variability. We designed experiments with sequences within different domains, such as arithmetic operations, text, and DNA. We demonstrate lengths up to 1.5M in our experiments.

Updates

  1. [May 2023] Add ddp support
  2. [May 2023] Add module to calculate and log performance for lengths percentiles
  3. [July 2023] Test and release other models
  4. [July 2023] Release pre-training pipeline

How to use

You can use the ChordMixer backbone directly from this repository. The module does not need any manual cuda kernels. All ChordMixer operations are built-in PyTorch modules.

ChordMixer can work in two modes:

  • equal lengths (sequences have the same lengths or when padding is applied)
  • variable lengths (sequences have high lengths variability, no padding is applied)
# Equal lengths mode
net = ChordMixer(
    input_size=100,            # Size of the token dict (or size of real-valued input)
    output_size=10,            # Target dim (10 classes)
    embedding_type='sparse',   # 'linear' for real-valued input
    decoder='linear',          # Global average pooling + linear layer
    max_seq_len=2000,          # Maximum sequence length observed in the whole dataset.
    track_size=16,             # Size of tracks to be rotated.
    hidden_size=128,           # Hidden layer size for MLPs.
    mlp_dropout=0.,            # Dropout probability for MLPs.
    layer_dropout=0.,          # Probability for layer dropout.
    prenorm='LN',              # Pre-normalization. One of 'BN', 'LN', 'GN', or 'None' when not applied. 
    norm='LN',                 # Post-normalization. One of 'BN', 'LN', 'GN', or 'None' when not applied. 
    var_len=False              # All sequences are equal in length.
)

x = torch.randint(low=1, high=99, size=(4, 2000))
out = net(x)
print('input size', x.size())
print('output size', out.size())

# Variable lengths mode
net = ChordMixer(
    input_size=100,
    output_size=10,
    embedding_type='sparse',
    decoder='linear',
    max_seq_len=2000,
    track_size=16,
    hidden_size=128,
    mlp_dropout=0.,
    layer_dropout=0.,
    prenorm='None',
    norm='None',
    var_len=True                # Use variable length mode
)

lengths = torch.randint(low=1025, high=2048, size=(4, 1)).squeeze()
x = torch.randint(low=1, high=99, size=(torch.sum(lengths), ))
out = net(x, lengths)
print('input size', x.size())
print('output size', out.size())

How to run experiments

Please follow the steps from the page with more examples and running scripts.

Acknowledgments

This research is funded by The Research Council of Norway. We want to thank IDUN group for providing resources to complete the experiments.

Kudos to the HazyResearch team for publicly sharing their well-structured code. The PL training pipelines and the LRA dataloaders in this repo were heavily inspired by their work.

Citation

If you use this codebase, datasets, or paper. Please cite us as

@inproceedings{
khalitov2023chordmixer,
title={ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length},
author={Ruslan Khalitov and Tong Yu and Lei Cheng and Zhirong Yang},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=E8mzu3JbdR}
}

chordmixer's People

Contributors

ruslankhalitov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

chordmixer's Issues

Q

Hello, I'd like to inquire, how much GPU memory would be required if I intend to use your model for conducting experiments related to long sequence

File not found on genbank data example

Hi Ruslan,

When running step by step the scripts as in here I run into the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'Other vertebrate_df_classes.pkl'

This comes from this code line - the 'df_classes' pickle files I cannot find them anywhere (nor are generated with the previous scripts?). Did you maybe forgot to upload them or so?

ValueError: You selected an invalid strategy name: `strategy='dp'`

Traceback (most recent call last):
File "/content/ChordMixer/trainer.py", line 394, in cli_main
trainer = Trainer(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py", line 69, in insert_env_defaults
return fn(self, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 393, in init
self._accelerator_connector = _AcceleratorConnector(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 140, in init
self._check_config_and_set_final_flags(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 209, in _check_config_and_set_final_flags
raise ValueError(
ValueError: You selected an invalid strategy name: strategy='dp'. It must be either a string or an instance of pytorch_lightning.strategies.Strategy. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai/

Cuda OOM errors, even on A100 with batch size 1

Hello. I am getting Cuda OOM errors at around 40% of the first epoch when training on the Sus_Bos problem. Is there any way to scale the model and/or problem down further so I can train on Colab?

CIFAR-10 and IMDB share a dataset split for validation and testing

I discovered that both CIFAR-10 and IMDB datasets utilize an identical dataset split for their validation and testing purposes. The relevant source code is presented below:

data_val = torch.load('./cifar10_test_vocab.pt').to(torch.int64)
labels_val = torch.load('./cifar10_test_targets_vocab.pt').to(torch.int64)

data_test = torch.load('./cifar10_test_vocab.pt').to(torch.int64)
labels_test = torch.load('./cifar10_test_targets_vocab.pt').to(torch.int64)
data_val = torch.load('./data/IMDB_test.pt').to(torch.int64)
labels_val = torch.load('./data/IMDB_test_targets.pt').to(torch.int64)

data_test = torch.load('./data/IMDB_test.pt').to(torch.int64)
labels_test = torch.load('./data/IMDB_test_targets.pt').to(torch.int64)

wrong package in requirement.txt

Hi Ruslan,
I got this error when creating the environment using pip with the requirement.txt, seems like a wrong package.

pip install -r requirements.txt

ERROR: Could not find a version that satisfies the requirement screen-resolution-extra==0.0.0 (from versions: none)
ERROR: No matching distribution found for screen-resolution-extra==0.0.0

By the way, which python version did you use?

AttributeError: 'NoneType' object has no attribute 'train_dataloader'

Traceback (most recent call last):
File "/content/ChordMixer/trainer.py", line 411, in cli_main
trainer.fit(model, datamodule=dm)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
return function(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 911, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 166, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 138, in setup_optimizers
self.optimizers, self.lr_scheduler_configs = _init_optimizers_and_lr_schedulers(self.lightning_module)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 171, in _init_optimizers_and_lr_schedulers
optim_conf = call._call_lightning_module_hook(model.trainer, "configure_optimizers", pl_module=model)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 142, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/content/ChordMixer/trainer.py", line 125, in configure_optimizers
num_training_steps=self.total_steps()
File "/content/ChordMixer/trainer.py", line 106, in total_steps
l = len(self.trainer.datamodule.train_dataloader())
AttributeError: 'NoneType' object has no attribute 'train_dataloader'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.