feat(cli): Add nexus validate benchmarks command#136

Merged

christian-pinto merged 25 commits into

mainfrom

cp_benchmark_validation

Jun 18, 2026

christian-pinto commented Jun 12, 2026

Member

PR Summary: Add `nexus validate benchmarks` Command

Overview

This PR introduces a comprehensive benchmark validation system with a new nexus validate benchmarks command and adds package existence validation. It also restructures the existing validation command to use subcommands.

Key Changes

1. New Command: `nexus validate benchmarks`

Added three validation modes for benchmark instances:

PR-based validation: Validates only benchmark instances modified in a specific GitHub PR
Package-specific validation: Validates all instances from a single package
Full validation: Validates all benchmark instances across all packages

Usage:

# Validate PR changes
nexus validate benchmarks --pr https://github.com/IBM/algorithm-nexus/pull/123

# Validate specific package
nexus validate benchmarks --package terratorch

# Validate all instances
nexus validate benchmarks

2. Validation Process

The validation performs comprehensive checks for each benchmark instance using isolated environments:

One Virtual Environment Per Instance:

Creates a separate temporary virtual environment for each benchmark instance using uv
Ensures complete isolation between validations to prevent dependency conflicts
Automatically cleans up environments after validation (success or failure)

Multi-Stage Validation:

Syntax validation: Validates space.yaml structure and required fields
Dependency resolution: Resolves benchmark package required for benchmark instance from nexus.yaml
Installation testing: Attempts to install all dependencies in the isolated venv
ADO dry-run: Validates the instance can be created with ADO (if applicable)
Cleanup: Removes temporary environment regardless of outcome

3. Breaking Change: Command Restructure

IMPORTANT: The existing nexus validate command has been restructured into subcommands:

Before:

nexus validate <package_path>

After:

nexus validate package <package_path>

All existing package validation functionality remains unchanged, but now requires the package subcommand.

christian-pinto added 17 commits

June 9, 2026 11:05


          feat(cli): added run benchmarks command to execute benchmark instance…

eb6d19b

…s on PR

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): use base as model name when benchmark instances belong to …

dd3db38

…the nexus package

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Changes after review

eef5a80

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Changes after review

83f4977

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Added ado-core as dependency for the cli

d940955

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Leftover from last commit

353932f

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): renamed --list-only to --dry-run

8369ba6

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): standardised use of pydantic models for the run benchmarks…

4afab27

… command

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): changes after review

ffb527b

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): changes after review

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): changes after review

13bcede

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): updated lockfile

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          Merge branch 'main' of github.com:IBM/algorithm-nexus into cp_benchma…

a508901

…rk_validation


          feat(cli): Added nexus validate benchmarks command

b3c9d9e

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): A few fixes to the new benchmarks validate command

fc14fce

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Another fix

93af595

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          feat(cli): Another fix

32fca81

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto added the ci label

christian-pinto requested review from AlessandroPomponio and michael-johnston

June 12, 2026 12:29

christian-pinto changed the title ~~Cp benchmark validation~~


          feat(cli): First review round

af7477d

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto commented Jun 16, 2026 •

edited

Loading

Member Author

@AlessandroPomponio and/or @michael-johnston have a look when you get the chance

AlessandroPomponio reviewed

View reviewed changes

docs/getting-started/cli-reference.md

src/algorithm_nexus/commands/ado_validator.py

src/algorithm_nexus/commands/ado_validator.py Outdated

src/algorithm_nexus/commands/ado_validator.py Outdated

src/algorithm_nexus/commands/ado_validator.py

src/algorithm_nexus/commands/validate.py Outdated

src/algorithm_nexus/commands/venv_manager.py Outdated

src/algorithm_nexus/models.py Outdated


          feat(cli): First review round

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto requested review from AlessandroPomponio

June 16, 2026 15:28

DRL-NextGen commented Jun 17, 2026 •

edited

Loading

Member

Checks Summary

Last run: 2026-06-18T07:52:46.996Z

Mend Unified Agent vulnerability scan found 20 vulnerabilities:

Severity	Identifier	Package	Details	Fix
❗ Critical	CVE-2025-69872	diskcache-5.6.3-py3-none-any.whl	DiskCache (python-diskcache) through 5.6.3 uses Python pickle for serialization by default. An attac... DiskCache (python-diskcache) through 5.6.3 uses Python pickle for serialization by default. An attacker with write access to the cache directory can achieve arbitrary code execution when a victim application reads from the cache.	Not Available
❗ Critical	CVE-2026-48746	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an au... Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API "AuthenticationMiddleware", which was discovered during @x41sec's source code audit. It allows to use the API without providing the configured "VLLM_API_KEY" or "--api-key". Details In https://github.com/vllm-project/vllm/blob/v0.14.0/vllm/entrypoints/openai/api_server.py#L689-L692 the "url_path" is taken from the "URL", which is reconstructed by starlette based on the request "scope". from starlette.datastructures import URL, Headers, MutableHeaders, State ... url_path = URL(scope=scope).path.removeprefix(root_path) headers = Headers(scope=scope) if url_path.startswith("/v1") and not self.verify_token(headers): response = JSONResponse(content={"error": "Unauthorized"}, status_code=401) return response(scope, receive, send) return self.app(scope, receive, send) The request "scope" includes the request's "Host:" header and reconstructs the URL as shown below: f"{scheme}://{host_header}{path}" Neither starlette nor "any of the ASGI servers" (https://asgi.readthedocs.io/en/latest/implementations.html#servers) (including uvicorn, which vllm uses) properly filter the "Host:" header for invalid characters. This allows an attacker to include special URL characters such as "/" or "?" in the "Host:" header and thereby control the reconstructed URL and it's ".path" attribute. FastAPI/starlette's routing uses the HTTP path and does not depend on the parsed url.path attribute, allowing attackers to reach an endpoint via a certain path while providing a different value in the ".path". Impact - Instances of vllm that use an API Key for the OpenAI API and expose the API to attackers. - Instances behind an RFC-conforming web server (such as nginx) are not affected.	vllm - 0.22.0
🔺 High	CVE-2026-41523	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary An "assert"-based security check in vLLM's activation function loading allows any unauthenti... Summary An "assert"-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode ("python -O" or "PYTHONOPTIMIZE=1"). Details vLLM uses an "assert" statement at ""vllm/model_executor/layers/pooler/activations.py:48"" (https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/pooler/activations.py#L48) as its sole security control to restrict which activation functions can be loaded from a HuggingFace model's "config.json": vllm/model_executor/layers/pooler/activations.py:35-53 function_name: str	None = None if ( hasattr(config, "sentence_transformers") and "activation_fn" in config.sentence_transformers ): function_name = config.sentence_transformers["activation_fn"] elif ( hasattr(config, "sbert_ce_default_activation_function") and config.sbert_ce_default_activation_function is not None ): function_name = config.sbert_ce_default_activation_function if function_name is not None: assert function_name.startswith("torch.nn.modules."), ( "Loading of activation functions is restricted to " "torch.nn.modules for security reasons" ) fn = resolve_obj_by_qualname(function_name)() Python's "assert" statements are stripped at compile time when running in optimized mode ("python -O" or "PYTHONOPTIMIZE=1"). When the assert is absent, the attacker-controlled "function_name" from the model's "config.json" is passed directly to ""resolve_obj_by_qualname()"" (https://github.com/vllm-project/vllm/blob/main/vllm/utils/import_utils.py#L106) — an unrestricted import gadget: def resolve_obj_by_qualname(qualname: str) -> Any: module_name, obj_name = qualname.rsplit(".", 1) module = importlib.import_module(module_name) return getattr(module, obj_name) This is the same vulnerability class as CVE-2017-1000433 (pysaml2 assert-based auth bypass), flagged by Bandit B101 and Ruff S101, and the reason Django proactively replaced all assert-based security checks (ticket #32508). Attacker-controlled input sources: - "config.sentence_transformers["activation_fn"]" (line 40) - "config.sbert_ce_default_activation_function" (line 45) Affected call sites — "get_act_fn()" is called via "resolve_classifier_act_fn()" from: - "vllm/model_executor/layers/pooler/seqwise/poolers.py:122" — SequencePooler - "vllm/model_executor/layers/pooler/tokwise/poolers.py:130" — TokenPooler Broader systemic risk: "resolve_obj_by_qualname" is called from ~20 locations across the codebase with no validation of its own. Any future caller feeding user-controlled input to it without validation creates the same vulnerability class. Suggested fix: Replace the "assert" with an explicit conditional raise: if not function_name.startswith("torch.nn.modules."): raise ValueError( "Loading of activation functions is restricted to " "torch.nn.modules for security reasons" ) Impact Arbitrary code execution. A malicious model author publishes a HuggingFace model with a crafted "config.json". When a victim loads this model with vLLM running under "python -O" or "PYTHONOPTIMIZE=1", arbitrary code executes during model initialization with the privileges of the vLLM process. The attack requires: 1. Victim loads a malicious model from HuggingFace (user interaction) 2. vLLM runs under "python -O" or "PYTHONOPTIMIZE=1" (documented in production use) 3. Model uses a cross-encoder architecture (e.g. BERT or RoBERTa with sequence classification) Coordinated disclosure note: This vulnerability was also reported via huntr.com on April 2, 2026 (https://huntr.com/bounties/dcb05b04-e625-41e7-adbc-bbae0cc2d64c). A GitHub Security Advisory was also filed because it is vLLM's stated preferred disclosure channel per SECURITY.md. Fix A fix for this was introduced in this commit: vllm-project/vllm@`b3c7ffc`
🔺 High	CVE-2026-4372	transformers-4.57.6-py3-none-any.whl	A critical remote code execution vulnerability exists in all versions of the HuggingFace transformer... A critical remote code execution vulnerability exists in all versions of the HuggingFace transformers library prior to version 5.3.0. The vulnerability allows an attacker to craft a malicious "config.json" file containing the "_attn_implementation_internal" field set to an attacker-controlled HuggingFace Hub repository ID. When a victim loads this model using the standard "AutoModelForCausalLM.from_pretrained()" API, the library downloads and executes arbitrary Python code from the attacker's repository with the victim's full OS privileges. This issue arises due to unfiltered deserialization of configuration attributes, insufficient sanitization of internal fields, and unsandboxed execution of downloaded kernels. The vulnerability bypasses the "trust_remote_code" security mechanism, is invisible to the victim, and exploits the standard documented usage pattern, making it particularly severe. Users are advised to upgrade to version 5.3.0 or later to mitigate this issue.	Upgrade to version transformers - 5.3.0,https://github.com/huggingface/transformers.git - v5.3.0,transformers - 5.3.0
🔺 High	CVE-2026-5241	transformers-4.57.6-py3-none-any.whl	A vulnerability in the LightGlue model loading path of huggingface/transformers version 5.2.0 allows... A vulnerability in the LightGlue model loading path of huggingface/transformers version 5.2.0 allows an attacker-controlled model repository to execute arbitrary code during model initialization. The issue arises because the "trust_remote_code" parameter, intended to prevent remote code execution, is overridden by untrusted serialized configuration data in a nested code path. Specifically, when loading a LightGlue model using "AutoModel.from_pretrained()" with "trust_remote_code=False", the "LightGlueConfig" reads the "trust_remote_code" value from the untrusted "config.json" file and propagates it into nested "AutoConfig.from_pretrained()" calls. This results in the execution of attacker-provided Python modules, even when the victim explicitly disables remote code execution. The vulnerability poses a high risk for environments such as API inference servers, research notebooks, CI/CD pipelines, and model evaluation workers, potentially leading to credential theft, lateral movement, or persistence/backdoor deployment.	Upgrade to version transformers - 5.5.0,transformers - 5.5.0,https://github.com/huggingface/transformers.git - v5.5.0
🔺 High	CVE-2025-14920	transformers-4.57.6-py3-none-any.whl	Hugging Face Transformers Perceiver Model Deserialization of Untrusted Data Remote Code Execution Vu... Hugging Face Transformers Perceiver Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. The specific flaw exists within the parsing of model files. The issue results from the lack of proper validation of user-supplied data, which can result in deserialization of untrusted data. An attacker can leverage this vulnerability to execute code in the context of the current user. Was ZDI-CAN-25423.	Not Available
🔷 Medium	CVE-2026-47155	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a mod... Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies "--revision" or "--code-revision" can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository subfolder weights/config from an unpinned/default revision. This is a supply-chain integrity issue for pinned vLLM deployments. Operators can believe they are serving a reviewed model revision while vLLM resolves behavior-affecting nested or sibling artifacts outside that reviewed revision. Details The expected invariant is: «When a vLLM operator supplies a model or code revision pin, every code, config, processor, weight file, side weight, and same-repository subfolder artifact loaded as part of that model should resolve under that pin unless vLLM exposes and enforces a separate explicit pin for that artifact.» Current "main" was verified affected at commit "3795d7acf431980e62e738493f437ae2a51549da". Affected source boundaries: - "vllm/model_executor/models/registry.py:1045-1051" and ":1058-1064" - "_try_resolve_transformers()" passes "revision=model_config.revision" and "trust_remote_code=model_config.trust_remote_code", but omits "code_revision=model_config.code_revision" for external "auto_map" dynamic module imports. - "vllm/model_executor/model_loader/gguf_loader.py:58-60" - The direct-file GGUF form "repo/file.gguf" calls "hf_hub_download(repo_id=repo_id, filename=filename)" without passing "revision". - "vllm/model_executor/models/roberta.py:203-209" - BGE-M3 secondary sparse and ColBERT side weights are declared with "revision=None". - "vllm/model_executor/models/kimi_k25.py:111-114" - Kimi-K2.5 calls "cached_get_image_processor()" without passing "model_config.revision". - "vllm/model_executor/models/kimi_audio.py:92-95" - Kimi-Audio loads Whisper config from the "whisper-large-v3" subfolder without a "revision" argument. - "vllm/model_executor/models/kimi_audio.py:425-430" - Kimi-Audio declares same-repository "whisper-large-v3" secondary weights with "revision=None". - "vllm/model_executor/model_loader/default_loader.py:287-301" - The default loader preserves "model_config.revision" for the primary source, then consumes model-supplied secondary sources as declared. The strongest example is Kimi-Audio: the primary "moonshotai/Kimi-Audio-7B-Instruct" weights preserve the configured model revision, but the same-repository "whisper-large-v3" audio tower config/weights do not. A pinned Kimi-Audio deployment can therefore load the Whisper subfolder outside the audited revision. This report does not claim a "trust_remote_code=False" bypass, unauthenticated RCE, or real artifact compromise. The issue is improper propagation of explicit artifact pins across supported loader paths. Impact Affected users are operators who pin vLLM model deployments to a reviewed Hugging Face revision for safety review, provenance, rollback, or reproducibility. The impact is that the pin does not reliably describe the full set of artifacts vLLM serves. Even when the operator selects an audited revision, vLLM can resolve behavior-affecting secondary artifacts from the repository default branch or another mutable ref. Depending on the model path, the unpinned artifact can be dynamic model code, a GGUF file, an image processor, retrieval side weights, or the same-repository Kimi-Audio Whisper subfolder weights/config. This breaks the operational guarantee of a pinned deployment: "serve the exact artifact set I reviewed." A later change to an unpinned secondary artifact can alter model behavior without changing the operator's configured revision, making review, rollback, incident response, and audit records unreliable. Occurrences - "vllm/model_executor/models/kimi_k25.py" L111-L114 — Kimi-K2.5 loads its image processor with "cached_get_image_processor()" but does not pass "self.ctx.model_config.revision". The processor can therefore resolve from the default repository revision even when the model deployment is pinned. - "vllm/model_executor/models/kimi_audio.py" L425-L430 — Kimi-Audio declares same-repository "whisper-large-v3" secondary weights with "revision=None". A pinned Kimi-Audio deployment can therefore load the Whisper audio tower weights from an unpinned/default revision. - "vllm/model_executor/models/kimi_audio.py" L92-L95 — Kimi-Audio loads Whisper config from the same repository's "whisper-large-v3" subfolder without passing the top-level model revision. The config for this behavior-affecting subcomponent can be resolved outside the audited model revision. - "vllm/model_executor/models/registry.py" L1058-L1064 — The later dynamic model-class resolution repeats the same pin-decay pattern: it forwards "revision" and "trust_remote_code", but omits "code_revision". This means an operator-provided code pin is not enforced at the dynamic module loader boundary. - "vllm/model_executor/model_loader/gguf_loader.py" L58-L60 — The direct GGUF form "repo/file.gguf" calls "hf_hub_download(repo_id=repo_id, filename=filename)" without passing "model_config.revision". A deployment that pins the model revision can therefore resolve this GGUF file from the repository default revision. - "vllm/model_executor/models/registry.py" L1045-L1051 — "try_get_class_from_dynamic_module()" is called for external "auto_map" config/model classes with "revision=model_config.revision", but without forwarding "model_config.code_revision". When "--code-revision" is set, this dynamic module resolution can still fall back to the default code revision instead of the audited code revision. - "vllm/model_executor/models/roberta.py" L203-L209 — "BgeM3EmbeddingModel" creates same-repository secondary sparse/ColBERT weight sources with "revision=None". The primary model revision is not propagated to these side weights, so they can be downloaded outside the operator-selected model revision. Fixes This was fixed in: vllm-project/vllm#42616 *** Originally filed via huntr: https://huntr.com/bounties/3f1e24c0-87d2-4f6c-a705-820f380879ac. The vLLM maintainer (Russell Bryant) redirected the report to the private GHSA channel. Offline proof bundle ("vllm_artifact_pin_decay_bundle_verify.py" + "bundle-verification-20260430T143506Z.json") is available upon request.	vllm - 0.22.0
🔷 Medium	CVE-2026-53923	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels ("csrc/quantizatio... Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels ("csrc/quantization/gguf/gguf_kernel.cu") causes partial tensor processing. The output tensor is allocated at full size via "torch::empty" (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. Root Cause The "to_cuda_ggml_t" function pointer type at "ggml-common.h:1067" declares its element count parameter as "int" (32-bit): using to_cuda_ggml_t = void ()(const void * restrict x, dst_t * restrict y, int k, // 32-bit cudaStream_t stream); All dequantize kernel functions ("dequantize_block_cuda", "dequantize_row_q2_K_cuda", etc. in "dequantize.cuh") inherit this "int k" parameter and use it as the kernel launch grid size: static void dequantize_block_cuda(..., const int k, cudaStream_t stream) { const int num_blocks = (k + 2CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / (2CUDA_DEQUANTIZE_BLOCK_SIZE); dequantize_block<<<num_blocks, CUDA_DEQUANTIZE_BLOCK_SIZE, 0, stream>>>(vx, y, k); } In "ggml_dequantize()" at "gguf_kernel.cu:85", the caller passes "m * n" (an "int64_t" product) to this "int k" parameter: at::Tensor DW = torch::empty({m, n}, options); // line 80: full-size, UNINITIALIZED // ... to_cuda((void)W.data_ptr(), (scalar_t)DW.data_ptr(), m n, stream); // line 85: mn truncated to int When "m * n > INT_MAX", the truncated "k" is smaller than the actual tensor size. The kernel processes "k" elements. The remaining "(m * n) - k" elements in "DW" are never written and contain stale GPU memory. This is a single root cause -- the "int" type on the "k" parameter in "to_cuda_ggml_t" -- with a single fix: change "int k" to "int64_t k". All dequantize functions inherit this type through the same typedef. Affected Functions All in "csrc/quantization/gguf/gguf_kernel.cu":	Function
🔷 Medium	CVE-2026-54233	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary vLLM's "/v1/audio/transcriptions" endpoint limits compressed upload size but not decoded PCM... Summary vLLM's "/v1/audio/transcriptions" endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. Details "SpeechToTextProcessor" rejects uploads over "VLLM_MAX_AUDIO_CLIP_FILESIZE_MB" (default 25MB) based on compressed byte length, but the audio decoder in "audio.py" accumulates all decoded frames into memory with no size limit before returning: speech_to_text.py L184-189 if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb: raise VLLMValidationError(...) y, sr = load_audio(buf, sr=self.asr_config.sample_rate) # decoded size unchecked audio.py L77-107 chunks: list[npt.NDArray] = [] for frame in container.decode(stream): chunks.append(frame.to_ndarray()) audio = np.concatenate(chunks, axis=-1).astype(np.float32) # single contiguous allocation A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and "np.concatenate" then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. "SpeechToTextConfig.max_audio_clip_s" (default 30s) applies only after the full decode and does not prevent the allocation. Impact An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM. Fix A fix for this vulnerability was merged here: vllm-project/vllm#44970	Not Available
🔷 Medium	CVE-2026-44222	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.... vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.	Upgrade to version vllm - 0.20.0
🔷 Medium	CVE-2026-54235	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	Summary All temperature validation gates use comparison operators ("<", ">"), which silently evaluat... Summary All temperature validation gates use comparison operators ("<", ">"), which silently evaluate to "False" for "NaN" and for positive "Infinity" in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: "-Infinity" is correctly caught. Root Cause "sampling_params.py:384": if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False "sampling_params.py:462": if self.temperature < 0.0: # NaN → False; +Inf → False raise VLLMValidationError(...) No "math.isnan()" or "math.isinf()" check exists anywhere in "sampling_params.py". Python semantics (verified): "float('nan') < 0.0" → "False", "float('inf') < 0.0" → "False". Impact Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users. Remediation Add "math.isfinite(self.temperature)" check in "_verify_args()". Reject non-finite float values with a 400 error. Fix A fix for this vulnerability was merged here: vllm-project/vllm#45116	Not Available
🔷 Medium	CVE-2026-44223	vllm-0.19.1-cp38-abi3-manylinux_2_31_x86_64.whl	vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, th... vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.	Upgrade to version vllm - 0.20.0
🔷 Medium	CVE-2026-1839	transformers-4.57.6-py3-none-any.whl	A vulnerability in the HuggingFace Transformers library, specifically in the "Trainer" class, allows... A vulnerability in the HuggingFace Transformers library, specifically in the "Trainer" class, allows for arbitrary code execution. The "_load_rng_state()" method in "src/transformers/trainer.py" at line 3059 calls "torch.load()" without the "weights_only=True" parameter. This issue affects all versions of the library supporting "torch>=2.2" when used with PyTorch versions below 2.6, as the "safe_globals()" context manager provides no protection in these versions. An attacker can exploit this vulnerability by supplying a malicious checkpoint file, such as "rng_state.pth", which can execute arbitrary code when loaded. The issue is resolved in version v5.0.0rc3.	Upgrade to version transformers - 5.0.0rc3,https://github.com/huggingface/transformers.git - v5.0.0rc3
🔷 Medium	CVE-2026-34755	vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl	vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.... vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM. This vulnerability is fixed in 0.19.0.	Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.19.0
🔷 Medium	CVE-2026-34756	vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl	vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.... vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.	Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.19.0,vllm - 0.19.0
🔷 Medium	CVE-2026-34753	vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl	vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19... vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions. This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host. This vulnerability is fixed in 0.19.0.	Upgrade to version vllm - 0.19.0,https://github.com/vllm-project/vllm.git - v0.18.1
🔷 Medium	CVE-2026-7141	vllm-0.18.0-cp38-abi3-manylinux_2_31_x86_64.whl	A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layer... A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To fix this issue, it is recommended to deploy a patch.	Upgrade to version vllm - 0.19.1,vllm - 0.19.1,https://github.com/vllm-project/vllm.git - v0.19.1
🔷 Medium	CVE-2025-3000	torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl	A vulnerability classified as critical has been found in PyTorch 2.6.0. This affects the function to... A vulnerability classified as critical has been found in PyTorch 2.6.0. This affects the function torch.jit.script. The manipulation leads to memory corruption. It is possible to launch the attack on the local host. The exploit has been disclosed to the public and may be used.	Not Available
🔷 Medium	CVE-2026-4538	torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl	A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the... A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the component pt2 Loading Handler. The manipulation leads to deserialization. The attack can only be performed from a local environment. The exploit is publicly available and might be used. The project was informed of the problem early through a pull request but has not reacted yet.	Not Available
🔸 Low	CVE-2025-63396	torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl	An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.prof... An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.profiler.profile (PythonTracer) to crash or hang during finalization, leading to a Denial of Service (DoS).	Not Available


          Merge branch 'main' into cp_benchmark_validation

18b1625

AlessandroPomponio requested changes

View reviewed changes

src/algorithm_nexus/commands/ado_validator.py Outdated

src/algorithm_nexus/commands/benchmark_manager.py Outdated

src/algorithm_nexus/commands/benchmark_manager.py Outdated

src/algorithm_nexus/commands/validate.py

src/algorithm_nexus/commands/benchmark_manager.py Outdated

src/algorithm_nexus/commands/benchmark_manager.py Outdated

src/algorithm_nexus/commands/validate.py Outdated

tests/test_validate_benchmarks.py

src/algorithm_nexus/commands/benchmark_manager.py Outdated

src/algorithm_nexus/commands/benchmark_manager.py Outdated


          feat(cli): Third review round

f71ac1e

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto changed the title ~~cli(feat): Add nexus validate benchmarks command~~

christian-pinto added 3 commits

June 17, 2026 14:15


          feat(cli): Fourth review round

6cb9317

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>


          Merge branch 'main' of github.com:IBM/algorithm-nexus into cp_benchma…

d088ae3

…rk_validation


          feat(cli): Updated docs after merge with master

e836d97

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

AlessandroPomponio previously approved these changes

View reviewed changes

AlessandroPomponio left a comment

Collaborator

LGTM thanks


          feat(cli): Updated lockfile

6f855e6

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto dismissed AlessandroPomponio’s stale review via

6f855e6

June 18, 2026 07:05

AlessandroPomponio approved these changes

View reviewed changes

christian-pinto merged commit 51c247c into main

11 checks passed

christian-pinto deleted the cp_benchmark_validation branch

June 18, 2026 08:20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment