-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9 -
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Paper • 2407.18121 • Published • 17 -
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Paper • 2407.14057 • Published • 46 -
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Paper • 2407.10969 • Published • 23
Ingyu Seong
ingyu
AI & ML interests
None yet
Recent Activity
authored
a paper
about 4 hours ago
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation upvoted a paper about 9 hours ago
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation submitted
a paper
about 13 hours ago
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation Organizations
None yet