Skip to content
View Dev-next-gen's full-sized avatar
🫣
I am not here
🫣
I am not here

Block or report Dev-next-gen

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Dev-next-gen/README.md

Leo Camus — @Dev-next-gen

AI Infrastructure · Multi-GPU ROCm · Independent Research · Offensive Security

Full Stack Development

Paris, France · Self-taught · No degree · Full stack from silicon to inference.


I build systems that run at the edge of what's technically possible — locally, at scale, without compromise. No cloud dependency, no abstraction layers hiding the truth. Every component understood, every parameter owned.

From founding a SaaS startup at 26, to operating a 300+ GPU farm on-site in Ukraine, to deploying 80B LLMs on self-hosted ROCm infrastructure and publishing independent AI research — every step was built from scratch, under real constraints.


Projects

flux-amd-rocm — FLUX.1-dev at parity with NVIDIA on AMD RDNA3

4-GPU Megatron-style tensor parallelism · 51 s/image @ 1024² · 11 GB/GPU. Int8 quantization + async group offloading on a single RX 7800 XT · 80 s · 12.5 GB VRAM.

diffusers-rocm-parallel — Multi-GPU inference stack for AMD

Tensor parallel FLUX on 5× RX 7800 XT (gfx1101) · ring attention LSE shape fix · Ulysses context parallel.

openclaw — Autonomous bug bounty pipeline

Multi-agent orchestration · Qwen3 80B + 14B · fully local · recon → scan → CVSS → report.

CAMUS Theory — Independent AI Research

Graft-based temporal cognition in frozen LLMs. TemporalAdapter (<0.6% params) grafted at mid-depth via forward pre-hook. R² ≈ 0.9 for log-time decoding from 1B parameters. ~5-dimensional subspace invariant across model sizes. Validated on TinyLlama-1.1B and Qwen2.5-14B in under 30 minutes for $0.83.


Background

  • 2020–2022 — On-site GPU infrastructure engineer, 300+ GPU production facility, Kyiv, Ukraine. End-to-end hardware deployment, network architecture, 24/7 uptime under real production constraints.
  • 2019 — Founded and shipped a full SaaS repair management platform solo (350+ pages, logistics, billing, payments). Shut down by Covid.
  • 2022–now — Freelance AI infra, security research, independent publications.

Stack

Compute       5× AMD RX 7800 XT (gfx1101) · 80 GB VRAM · ROCm 7.1
              Custom builds: rocWMMA · FA_ALL_QUANTS · HIP_GRAPHS
Inference     PyTorch · diffusers · torchao · llama.cpp · vLLM · 38 t/s @ 80B ctx 262K
ML            Tensor parallelism · group offloading · int8/int4 · Triton kernels
Security      nuclei · subfinder · katana · httpx · Burp Suite Pro · responsible disclosure
Systems       Python · Rust · Node.js · Next.js · FastAPI · PostgreSQL · Supabase · Docker

Products

SaaS platforms, mobile apps, full-stack web. Recent deliveries:

  • Email marketing platform — self-hosted, SPF/DKIM/DMARC, warm-up automation, 10/10 deliverability on first test
  • Yoga studio app — React/Vite, Supabase auth, booking system, deployed in production
  • Hyperlocal marketplace — mobile app, real-time geolocation, neighbor-to-neighbor listings
  • OSINT platformosint-platform — open-source Palantir alternative, 6-tier data ingestion, entity graph, real-time analysis

Stack: Python · Node.js · Rust · Next.js · React · FastAPI · PostgreSQL · Supabase · Docker · Stripe · REST APIs

Infrastructure

CPU     2× Intel Xeon E5-2698 v4 — 80 threads
RAM     512 GB ECC
GPU     5× AMD RX 7800 XT (gfx1101) — 80 GB VRAM total
NVMe    Multi-drive storage array
OS      Ubuntu · ROCm 7.1
Net     10 GbE local · self-hosted services

Open to research collabs, freelance infra missions, or projects that shouldn't exist yet.

Pinned Loading

  1. gpu-cluster-lab gpu-cluster-lab Public

    AMD/NVIDIA GPU cluster infrastructure — ~300 GPU deployment, ROCm, kernel tuning, multi-node benchmarking

    3

  2. local-llm-stack local-llm-stack Public

    Production-grade local LLM deployment stack — llama.cpp, Ollama, GGUF/GGML, ROCm AMD, 14B to 80B models

    2

  3. osint-platform osint-platform Public

    Open-source intelligence platform — Palantir Gotham alternative. 6-level source integration, ontology graph, real-time threat analysis.

    9 1

  4. camus-theory camus-theory Public

    The CAMUS Theory: Emergent Temporal Cognition in Language Models — DOI: 10.5281/zenodo.19509846

    TeX 2

  5. diffusers-rocm-parallel diffusers-rocm-parallel Public

    Multi-GPU tensor/context parallel diffusion on AMD ROCm — with the patch that makes it actually work.

    Python 2

  6. flux-amd-rocm flux-amd-rocm Public

    FLUX.1-dev on AMD Radeon consumer GPUs — fast, low-VRAM, and shippable. Backport patches + benchmarks for torchao + diffusers group_offload on ROCm.

    Python 2