Skip to content
View dzhengAP's full-sized avatar
  • California
  • 01:15 (UTC -07:00)

Block or report dzhengAP

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 85.2k 18.9k

  2. vllm-project/llm-compressor vllm-project/llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 3.5k 562

  3. On-Device-Agent-for-adaptive-display-optimization On-Device-Agent-for-adaptive-display-optimization Public

    We present a novel on-device hybrid agent combining LLMs with retrieval-augmented generation for real-time display optimization. The system achieves 92% accuracy with CoreML acceleration delivering…

    Swift 1

  4. ARS-Adaptive-Reasoning-Suppression-for-Efficient-Large-Reasoning-Language-Models ARS-Adaptive-Reasoning-Suppression-for-Efficient-Large-Reasoning-Language-Models Public

    Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

  5. distributed-inference-engine-nano-vLLM distributed-inference-engine-nano-vLLM Public

    Python

  6. distributed-training-infra-demo-megatron distributed-training-infra-demo-megatron Public

    Python