nandss1 / flash_attention_inference Goto Github PK
View Code? Open in Web Editor NEWThis project forked from shayebuhui01/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
License: MIT License