ddele / neural-speed Goto Github PK
View Code? Open in Web Editor NEWThis project forked from intel/neural-speed
A library for efficient LLM inference based on SOTA low-bit quantization and sparsity
Home Page: https://github.com/intel/neural-speed