Skip to content

feat(cli): Add nexus validate benchmarks command#136

Merged
christian-pinto merged 25 commits into
mainfrom
cp_benchmark_validation
Jun 18, 2026
Merged

feat(cli): Add nexus validate benchmarks command#136
christian-pinto merged 25 commits into
mainfrom
cp_benchmark_validation

Conversation

@christian-pinto

Copy link
Copy Markdown
Member

PR Summary: Add nexus validate benchmarks Command

Overview

This PR introduces a comprehensive benchmark validation system with a new nexus validate benchmarks command and adds package existence validation. It also restructures the existing validation command to use subcommands.

Key Changes

1. New Command: nexus validate benchmarks

Added three validation modes for benchmark instances:

  • PR-based validation: Validates only benchmark instances modified in a specific GitHub PR
  • Package-specific validation: Validates all instances from a single package
  • Full validation: Validates all benchmark instances across all packages

Usage:

# Validate PR changes
nexus validate benchmarks --pr https://github.com/IBM/algorithm-nexus/pull/123

# Validate specific package
nexus validate benchmarks --package terratorch

# Validate all instances
nexus validate benchmarks

2. Validation Process

The validation performs comprehensive checks for each benchmark instance using isolated environments:

One Virtual Environment Per Instance:

  • Creates a separate temporary virtual environment for each benchmark instance using uv
  • Ensures complete isolation between validations to prevent dependency conflicts
  • Automatically cleans up environments after validation (success or failure)

Multi-Stage Validation:

  1. Syntax validation: Validates space.yaml structure and required fields
  2. Dependency resolution: Resolves benchmark package required for benchmark instance from nexus.yaml
  3. Installation testing: Attempts to install all dependencies in the isolated venv
  4. ADO dry-run: Validates the instance can be created with ADO (if applicable)
  5. Cleanup: Removes temporary environment regardless of outcome

3. Breaking Change: Command Restructure

IMPORTANT: The existing nexus validate command has been restructured into subcommands:

Before:

nexus validate <package_path>

After:

nexus validate package <package_path>

All existing package validation functionality remains unchanged, but now requires the package subcommand.

…s on PR

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
…the nexus package

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
… command

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto christian-pinto added the ci Enable CI integration label Jun 12, 2026
@christian-pinto christian-pinto changed the title Cp benchmark validation Jun 12, 2026
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto

christian-pinto commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

@AlessandroPomponio and/or @michael-johnston have a look when you get the chance

Comment thread docs/getting-started/cli-reference.md
Comment thread src/algorithm_nexus/commands/ado_validator.py
Comment thread src/algorithm_nexus/commands/ado_validator.py Outdated
Comment thread src/algorithm_nexus/commands/ado_validator.py Outdated
Comment thread src/algorithm_nexus/commands/ado_validator.py
Comment thread src/algorithm_nexus/commands/validate.py Outdated
Comment thread src/algorithm_nexus/commands/venv_manager.py Outdated
Comment thread src/algorithm_nexus/models.py Outdated
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@DRL-NextGen

DRL-NextGen commented Jun 17, 2026

Copy link
Copy Markdown
Member

Checks Summary

Last run: 2026-06-18T07:52:46.996Z

Mend Unified Agent vulnerability scan found 20 vulnerabilities:

Severity Identifier Package Details Fix
❗ Critical CVE-2025-69872 diskcache-5.6.3-py3-none-any.whl
DiskCache (python-diskcache) through 5.6.3 uses Python pickle for serialization by default. An attac...DiskCache (python-diskcache) through 5.6.3 uses Python pickle for serialization by default. An attacker with write access to the cache directory can achieve arbitrary code execution when a victim application reads from the cache.
Not Available
❗ Critical CVE-2026-48746 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an au...Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API "AuthenticationMiddleware", which was discovered during @x41sec's source code audit. It allows to use the API without providing the configured "VLLM_API_KEY" or "--api-key". Details In https://github.com/vllm-project/vllm/blob/v0.14.0/vllm/entrypoints/openai/api_server.py#L689-L692 the "url_path" is taken from the "URL", which is reconstructed by starlette based on the request "scope". from starlette.datastructures import URL, Headers, MutableHeaders, State ... url_path = URL(scope=scope).path.removeprefix(root_path) headers = Headers(scope=scope) if url_path.startswith("/v1") and not self.verify_token(headers): response = JSONResponse(content={"error": "Unauthorized"}, status_code=401) return response(scope, receive, send) return self.app(scope, receive, send) The request "scope" includes the request's "Host:" header and reconstructs the URL as shown below: f"{scheme}://{host_header}{path}" Neither starlette nor "any of the ASGI servers" (https://asgi.readthedocs.io/en/latest/implementations.html#servers) (including uvicorn, which vllm uses) properly filter the "Host:" header for invalid characters. This allows an attacker to include special URL characters such as "/" or "?" in the "Host:" header and thereby control the reconstructed URL and it's ".path" attribute. FastAPI/starlette's routing uses the HTTP path and does not depend on the parsed url.path attribute, allowing attackers to reach an endpoint via a certain path while providing a different value in the ".path". Impact - Instances of vllm that use an API Key for the OpenAI API and expose the API to attackers. - Instances behind an RFC-conforming web server (such as nginx) are not affected.
vllm - 0.22.0
🔺 High CVE-2026-41523 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary An "assert"-based security check in vLLM's activation function loading allows any unauthenti...Summary An "assert"-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode ("python -O" or "PYTHONOPTIMIZE=1"). Details vLLM uses an "assert" statement at ""vllm/model_executor/layers/pooler/activations.py:48"" (https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/pooler/activations.py#L48) as its sole security control to restrict which activation functions can be loaded from a HuggingFace model's "config.json": vllm/model_executor/layers/pooler/activations.py:35-53 function_name: str
None = None if ( hasattr(config, "sentence_transformers") and "activation_fn" in config.sentence_transformers ): function_name = config.sentence_transformers["activation_fn"] elif ( hasattr(config, "sbert_ce_default_activation_function") and config.sbert_ce_default_activation_function is not None ): function_name = config.sbert_ce_default_activation_function if function_name is not None: assert function_name.startswith("torch.nn.modules."), ( "Loading of activation functions is restricted to " "torch.nn.modules for security reasons" ) fn = resolve_obj_by_qualname(function_name)() Python's "assert" statements are stripped at compile time when running in optimized mode ("python -O" or "PYTHONOPTIMIZE=1"). When the assert is absent, the attacker-controlled "function_name" from the model's "config.json" is passed directly to ""resolve_obj_by_qualname()"" (https://github.com/vllm-project/vllm/blob/main/vllm/utils/import_utils.py#L106) — an unrestricted import gadget: def resolve_obj_by_qualname(qualname: str) -> Any: module_name, obj_name = qualname.rsplit(".", 1) module = importlib.import_module(module_name) return getattr(module, obj_name) This is the same vulnerability class as CVE-2017-1000433 (pysaml2 assert-based auth bypass), flagged by Bandit B101 and Ruff S101, and the reason Django proactively replaced all assert-based security checks (ticket #32508). Attacker-controlled input sources: - "config.sentence_transformers["activation_fn"]" (line 40) - "config.sbert_ce_default_activation_function" (line 45) Affected call sites — "get_act_fn()" is called via "resolve_classifier_act_fn()" from: - "vllm/model_executor/layers/pooler/seqwise/poolers.py:122" — SequencePooler - "vllm/model_executor/layers/pooler/tokwise/poolers.py:130" — TokenPooler Broader systemic risk: "resolve_obj_by_qualname" is called from ~20 locations across the codebase with no validation of its own. Any future caller feeding user-controlled input to it without validation creates the same vulnerability class. Suggested fix: Replace the "assert" with an explicit conditional raise: if not function_name.startswith("torch.nn.modules."): raise ValueError( "Loading of activation functions is restricted to " "torch.nn.modules for security reasons" ) Impact Arbitrary code execution. A malicious model author publishes a HuggingFace model with a crafted "config.json". When a victim loads this model with vLLM running under "python -O" or "PYTHONOPTIMIZE=1", arbitrary code executes during model initialization with the privileges of the vLLM process. The attack requires: 1. Victim loads a malicious model from HuggingFace (user interaction) 2. vLLM runs under "python -O" or "PYTHONOPTIMIZE=1" (documented in production use) 3. Model uses a cross-encoder architecture (e.g. BERT or RoBERTa with sequence classification) Coordinated disclosure note: This vulnerability was also reported via huntr.com on April 2, 2026 (https://huntr.com/bounties/dcb05b04-e625-41e7-adbc-bbae0cc2d64c). A GitHub Security Advisory was also filed because it is vLLM's stated preferred disclosure channel per SECURITY.md. Fix A fix for this was introduced in this commit: vllm-project/vllm@b3c7ffc
🔺 High CVE-2026-4372 transformers-4.57.6-py3-none-any.whl
A critical remote code execution vulnerability exists in all versions of the HuggingFace transformer...A critical remote code execution vulnerability exists in all versions of the HuggingFace transformers library prior to version 5.3.0. The vulnerability allows an attacker to craft a malicious "config.json" file containing the "_attn_implementation_internal" field set to an attacker-controlled HuggingFace Hub repository ID. When a victim loads this model using the standard "AutoModelForCausalLM.from_pretrained()" API, the library downloads and executes arbitrary Python code from the attacker's repository with the victim's full OS privileges. This issue arises due to unfiltered deserialization of configuration attributes, insufficient sanitization of internal fields, and unsandboxed execution of downloaded kernels. The vulnerability bypasses the "trust_remote_code" security mechanism, is invisible to the victim, and exploits the standard documented usage pattern, making it particularly severe. Users are advised to upgrade to version 5.3.0 or later to mitigate this issue.
Upgrade to version transformers - 5.3.0,https://github.com/huggingface/transformers.git - v5.3.0,transformers - 5.3.0
🔺 High CVE-2026-5241 transformers-4.57.6-py3-none-any.whl
A vulnerability in the LightGlue model loading path of huggingface/transformers version 5.2.0 allows...A vulnerability in the LightGlue model loading path of huggingface/transformers version 5.2.0 allows an attacker-controlled model repository to execute arbitrary code during model initialization. The issue arises because the "trust_remote_code" parameter, intended to prevent remote code execution, is overridden by untrusted serialized configuration data in a nested code path. Specifically, when loading a LightGlue model using "AutoModel.from_pretrained()" with "trust_remote_code=False", the "LightGlueConfig" reads the "trust_remote_code" value from the untrusted "config.json" file and propagates it into nested "AutoConfig.from_pretrained()" calls. This results in the execution of attacker-provided Python modules, even when the victim explicitly disables remote code execution. The vulnerability poses a high risk for environments such as API inference servers, research notebooks, CI/CD pipelines, and model evaluation workers, potentially leading to credential theft, lateral movement, or persistence/backdoor deployment.
Upgrade to version transformers - 5.5.0,transformers - 5.5.0,https://github.com/huggingface/transformers.git - v5.5.0
🔺 High CVE-2025-14920 transformers-4.57.6-py3-none-any.whl
Hugging Face Transformers Perceiver Model Deserialization of Untrusted Data Remote Code Execution Vu...Hugging Face Transformers Perceiver Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.
The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25423.
Not Available
🔷 Medium CVE-2026-47155 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a mod...Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies "--revision" or "--code-revision" can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository subfolder weights/config from an unpinned/default revision. This is a supply-chain integrity issue for pinned vLLM deployments. Operators can believe they are serving a reviewed model revision while vLLM resolves behavior-affecting nested or sibling artifacts outside that reviewed revision. Details The expected invariant is: «When a vLLM operator supplies a model or code revision pin, every code, config, processor, weight file, side weight, and same-repository subfolder artifact loaded as part of that model should resolve under that pin unless vLLM exposes and enforces a separate explicit pin for that artifact.» Current "main" was verified affected at commit "3795d7acf431980e62e738493f437ae2a51549da". Affected source boundaries: - "vllm/model_executor/models/registry.py:1045-1051" and ":1058-1064" - "_try_resolve_transformers()" passes "revision=model_config.revision" and "trust_remote_code=model_config.trust_remote_code", but omits "code_revision=model_config.code_revision" for external "auto_map" dynamic module imports. - "vllm/model_executor/model_loader/gguf_loader.py:58-60" - The direct-file GGUF form "repo/file.gguf" calls "hf_hub_download(repo_id=repo_id, filename=filename)" without passing "revision". - "vllm/model_executor/models/roberta.py:203-209" - BGE-M3 secondary sparse and ColBERT side weights are declared with "revision=None". - "vllm/model_executor/models/kimi_k25.py:111-114" - Kimi-K2.5 calls "cached_get_image_processor()" without passing "model_config.revision". - "vllm/model_executor/models/kimi_audio.py:92-95" - Kimi-Audio loads Whisper config from the "whisper-large-v3" subfolder without a "revision" argument. - "vllm/model_executor/models/kimi_audio.py:425-430" - Kimi-Audio declares same-repository "whisper-large-v3" secondary weights with "revision=None". - "vllm/model_executor/model_loader/default_loader.py:287-301" - The default loader preserves "model_config.revision" for the primary source, then consumes model-supplied secondary sources as declared. The strongest example is Kimi-Audio: the primary "moonshotai/Kimi-Audio-7B-Instruct" weights preserve the configured model revision, but the same-repository "whisper-large-v3" audio tower config/weights do not. A pinned Kimi-Audio deployment can therefore load the Whisper subfolder outside the audited revision. This report does not claim a "trust_remote_code=False" bypass, unauthenticated RCE, or real artifact compromise. The issue is improper propagation of explicit artifact pins across supported loader paths. Impact Affected users are operators who pin vLLM model deployments to a reviewed Hugging Face revision for safety review, provenance, rollback, or reproducibility. The impact is that the pin does not reliably describe the full set of artifacts vLLM serves. Even when the operator selects an audited revision, vLLM can resolve behavior-affecting secondary artifacts from the repository default branch or another mutable ref. Depending on the model path, the unpinned artifact can be dynamic model code, a GGUF file, an image processor, retrieval side weights, or the same-repository Kimi-Audio Whisper subfolder weights/config. This breaks the operational guarantee of a pinned deployment: "serve the exact artifact set I reviewed." A later change to an unpinned secondary artifact can alter model behavior without changing the operator's configured revision, making review, rollback, incident response, and audit records unreliable. Occurrences - "vllm/model_executor/models/kimi_k25.py" L111-L114 — Kimi-K2.5 loads its image processor with "cached_get_image_processor()" but does not pass "self.ctx.model_config.revision". The processor can therefore resolve from the default repository revision even when the model deployment is pinned. - "vllm/model_executor/models/kimi_audio.py" L425-L430 — Kimi-Audio declares same-repository "whisper-large-v3" secondary weights with "revision=None". A pinned Kimi-Audio deployment can therefore load the Whisper audio tower weights from an unpinned/default revision. - "vllm/model_executor/models/kimi_audio.py" L92-L95 — Kimi-Audio loads Whisper config from the same repository's "whisper-large-v3" subfolder without passing the top-level model revision. The config for this behavior-affecting subcomponent can be resolved outside the audited model revision. - "vllm/model_executor/models/registry.py" L1058-L1064 — The later dynamic model-class resolution repeats the same pin-decay pattern: it forwards "revision" and "trust_remote_code", but omits "code_revision". This means an operator-provided code pin is not enforced at the dynamic module loader boundary. - "vllm/model_executor/model_loader/gguf_loader.py" L58-L60 — The direct GGUF form "repo/file.gguf" calls "hf_hub_download(repo_id=repo_id, filename=filename)" without passing "model_config.revision". A deployment that pins the model revision can therefore resolve this GGUF file from the repository default revision. - "vllm/model_executor/models/registry.py" L1045-L1051 — "try_get_class_from_dynamic_module()" is called for external "auto_map" config/model classes with "revision=model_config.revision", but without forwarding "model_config.code_revision". When "--code-revision" is set, this dynamic module resolution can still fall back to the default code revision instead of the audited code revision. - "vllm/model_executor/models/roberta.py" L203-L209 — "BgeM3EmbeddingModel" creates same-repository secondary sparse/ColBERT weight sources with "revision=None". The primary model revision is not propagated to these side weights, so they can be downloaded outside the operator-selected model revision. Fixes This was fixed in: vllm-project/vllm#42616 *** Originally filed via huntr: https://huntr.com/bounties/3f1e24c0-87d2-4f6c-a705-820f380879ac. The vLLM maintainer (Russell Bryant) redirected the report to the private GHSA channel. Offline proof bundle ("vllm_artifact_pin_decay_bundle_verify.py" + "bundle-verification-20260430T143506Z.json") is available upon request.
vllm - 0.22.0
🔷 Medium CVE-2026-53923 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels ("csrc/quantizatio...Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels ("csrc/quantization/gguf/gguf_kernel.cu") causes partial tensor processing. The output tensor is allocated at full size via "torch::empty" (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. Root Cause The "to_cuda_ggml_t" function pointer type at "ggml-common.h:1067" declares its element count parameter as "int" (32-bit): using to_cuda_ggml_t = void ()(const void * restrict x, dst_t * restrict y, int k, // 32-bit cudaStream_t stream); All dequantize kernel functions ("dequantize_block_cuda", "dequantize_row_q2_K_cuda", etc. in "dequantize.cuh") inherit this "int k" parameter and use it as the kernel launch grid size: static void dequantize_block_cuda(..., const int k, cudaStream_t stream) { const int num_blocks = (k + 2CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / (2CUDA_DEQUANTIZE_BLOCK_SIZE); dequantize_block<<<num_blocks, CUDA_DEQUANTIZE_BLOCK_SIZE, 0, stream>>>(vx, y, k); } In "ggml_dequantize()" at "gguf_kernel.cu:85", the caller passes "m * n" (an "int64_t" product) to this "int k" parameter: at::Tensor DW = torch::empty({m, n}, options); // line 80: full-size, UNINITIALIZED // ... to_cuda((void)W.data_ptr(), (scalar_t*)DW.data_ptr(), m * n, stream); // line 85: mn truncated to int When "m * n > INT_MAX", the truncated "k" is smaller than the actual tensor size. The kernel processes "k" elements. The remaining "(m * n) - k" elements in "DW" are never written and contain stale GPU memory. This is a single root cause -- the "int" type on the "k" parameter in "to_cuda_ggml_t" -- with a single fix: change "int k" to "int64_t k". All dequantize functions inherit this type through the same typedef. Affected Functions All in "csrc/quantization/gguf/gguf_kernel.cu":
Function
🔷 Medium CVE-2026-54233 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary vLLM's "/v1/audio/transcriptions" endpoint limits compressed upload size but not decoded PCM...Summary vLLM's "/v1/audio/transcriptions" endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. Details "SpeechToTextProcessor" rejects uploads over "VLLM_MAX_AUDIO_CLIP_FILESIZE_MB" (default 25MB) based on compressed byte length, but the audio decoder in "audio.py" accumulates all decoded frames into memory with no size limit before returning: speech_to_text.py L184-189 if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb: raise VLLMValidationError(...) y, sr = load_audio(buf, sr=self.asr_config.sample_rate) # decoded size unchecked audio.py L77-107 chunks: list[npt.NDArray] = [] for frame in container.decode(stream): chunks.append(frame.to_ndarray()) audio = np.concatenate(chunks, axis=-1).astype(np.float32) # single contiguous allocation A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and "np.concatenate" then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. "SpeechToTextConfig.max_audio_clip_s" (default 30s) applies only after the full decode and does not prevent the allocation. Impact An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM. Fix A fix for this vulnerability was merged here: vllm-project/vllm#44970
Not Available
🔷 Medium CVE-2026-44222 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20....vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.
Upgrade to version vllm - 0.20.0
🔷 Medium CVE-2026-54235 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
Summary All temperature validation gates use comparison operators ("<", ">"), which silently evaluat...Summary All temperature validation gates use comparison operators ("<", ">"), which silently evaluate to "False" for "NaN" and for positive "Infinity" in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: "-Infinity" is correctly caught. Root Cause "sampling_params.py:384": if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False "sampling_params.py:462": if self.temperature < 0.0: # NaN → False; +Inf → False raise VLLMValidationError(...) No "math.isnan()" or "math.isinf()" check exists anywhere in "sampling_params.py". Python semantics (verified): "float('nan') < 0.0" → "False", "float('inf') < 0.0" → "False". Impact Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users. Remediation Add "math.isfinite(self.temperature)" check in "_verify_args()". Reject non-finite float values with a 400 error. Fix A fix for this vulnerability was merged here: vllm-project/vllm#45116
Not Available
🔷 Medium CVE-2026-44223 vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, th...vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Upgrade to version vllm - 0.20.0
🔷 Medium CVE-2026-1839 transformers-4.57.6-py3-none-any.whl
A vulnerability in the HuggingFace Transformers library, specifically in the "Trainer" class, allows...A vulnerability in the HuggingFace Transformers library, specifically in the "Trainer" class, allows for arbitrary code execution. The "_load_rng_state()" method in "src/transformers/trainer.py" at line 3059 calls "torch.load()" without the "weights_only=True" parameter. This issue affects all versions of the library supporting "torch>=2.2" when used with PyTorch versions below 2.6, as the "safe_globals()" context manager provides no protection in these versions. An attacker can exploit this vulnerability by supplying a malicious checkpoint file, such as "rng_state.pth", which can execute arbitrary code when loaded. The issue is resolved in version v5.0.0rc3.
Upgrade to version transformers - 5.0.0rc3,https://github.com/huggingface/transformers.git - v5.0.0rc3
🔷 Medium CVE-2026-34755 vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19....vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM. This vulnerability is fixed in 0.19.0.
Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.19.0
🔷 Medium CVE-2026-34756 vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19....vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.
Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.19.0,vllm - 0.19.0
🔷 Medium CVE-2026-34753 vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl
vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19...vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions.
This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host. This vulnerability is fixed in 0.19.0.
Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.18.1
🔷 Medium CVE-2026-7141 vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl
A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layer...A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To fix this issue, it is recommended to deploy a patch.
Upgrade to version vllm - 0.19.1,vllm - 0.19.1,https://github.com/vllm-project/vllm.git - v0.19.1
🔷 Medium CVE-2025-3000 torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl
A vulnerability classified as critical has been found in PyTorch 2.6.0. This affects the function to...A vulnerability classified as critical has been found in PyTorch 2.6.0. This affects the function torch.jit.script. The manipulation leads to memory corruption. It is possible to launch the attack on the local host. The exploit has been disclosed to the public and may be used.
Not Available
🔷 Medium CVE-2026-4538 torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl
A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the...A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the component pt2 Loading Handler. The manipulation leads to deserialization. The attack can only be performed from a local environment. The exploit is publicly available and might be used. The project was informed of the problem early through a pull request but has not reacted yet.
Not Available
🔸 Low CVE-2025-63396 torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl
An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.prof...An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.profiler.profile (PythonTracer) to crash or hang during finalization, leading to a Denial of Service (DoS).
Not Available
Comment thread src/algorithm_nexus/commands/ado_validator.py Outdated
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Comment thread src/algorithm_nexus/commands/validate.py
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Comment thread src/algorithm_nexus/commands/validate.py Outdated
Comment thread tests/test_validate_benchmarks.py
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Comment thread src/algorithm_nexus/commands/benchmark_manager.py Outdated
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto christian-pinto changed the title cli(feat): Add nexus validate benchmarks command Jun 17, 2026
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

@AlessandroPomponio AlessandroPomponio left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto christian-pinto merged commit 51c247c into main Jun 18, 2026
11 checks passed
@christian-pinto christian-pinto deleted the cp_benchmark_validation branch June 18, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Enable CI integration

3 participants