Skip to content
View chfeng-cs's full-sized avatar
💬
All In AI
💬
All In AI
  • Alibaba
  • Shanghai Jiao Tong University
  • 15:19 (UTC +08:00)

Block or report chfeng-cs

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
chfeng-cs/README.md

Ethan Feng

Infrastructure engineer focused on LLM inference systems.

  • M.S. Computer Science — Shanghai Jiao Tong University
  • B.S. Computer Science — Harbin Institute of Technology
  • 2 yrs at Alibaba

Focus Areas: LLM Inference / GPU Performance

Open Source

Currently contributing to vllm — KV cache transfer, scheduler optimization, and hybrid KV cache management (HMA).

See detail at my vllm contributions

Contact

📫 ethan.fengch [at] gmail [dot] com

🔗 LinkedIn

Pinned Loading

  1. sglang sglang Public

    Forked from sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python

  2. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

  3. flashinfer flashinfer Public

    Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Python

  4. vllm-contributions vllm-contributions Public

    Python

  5. TensorRT-LLM TensorRT-LLM Public

    Forked from NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

    Python