Skip to content
View waynehacking8's full-sized avatar

Block or report waynehacking8

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. content-radar content-radar Public

    Collect trending AI/dev signal from Hacker News, arXiv, GitHub Trending, Reddit, and X, then synthesize review-ready post drafts with Claude.

    Python 2

  2. gh-radar gh-radar Public

    Daily email digest of trending GitHub tools — GitHub Trending + Hacker News + new-repo search, no X API. Runs on GitHub Actions.

    Python

  3. tensor-core-from-scratch tensor-core-from-scratch Public

    From naive matmul to tensor cores on NVIDIA Blackwell — step by step. 8 self-contained CUDA kernels, each benchmarked against cuBLAS.

    Cuda 1 1

  4. blackwell-tensorcore-kernels blackwell-tensorcore-kernels Public

    Hand-written CUDA Tensor Core GEMM kernels on Blackwell (sm_120) and Hopper (sm_90) — raw mma.sync reaching 106% of the cuBLAS-TC kernel on sm_120, CUTLASS 3.x wgmma at 85.5% of nvjet on H100, and …

    Cuda

  5. trtllm-triton-serving trtllm-triton-serving Public

    TensorRT-LLM vs vLLM controlled head-to-head on H100 — 12 studies including a knob-by-knob waterfall reproducing NVIDIA's published 27.7k tok/s (100.3%) and attributing the gap to real serving, plu…

    Python