🐛 Bug Description
Two related defects in qlib.workflow, both rooted in latent assumptions about what the caller's current working directory looks like.
(1) MLflowExpManager.get_or_create_exp builds its FileLock path relative to CWD (qlib/workflow/expm.py:236). The construction is os.path.join(pr.netloc, pr.path.lstrip("/"), "filelock") after urlparse-ing the experiment URI. For any absolute file:// URI — including the default URI that qlib.config.QSettings constructs as "file:" + str(Path(os.getcwd()).resolve() / "mlruns") — urlparse returns netloc="" and path="/absolute/...", the lstrip("/") then turns the absolute path into a relative one, and os.path.join("", "absolute/...", "filelock") yields a CWD-relative path. FileLock.__enter__ mkdirs the lock's parent under whatever directory the process happens to be in, not under the URI's actual location. The intended file://... location does get populated by mlflow itself, but the qlib-side FileLock is now pointing at the wrong path, so the lock no longer serializes runs that share the URI from different working directories.
(2) MLflowRecorder._log_uncommitted_code misbehaves when CWD is not a git work tree (qlib/workflow/recorder.py:362). The natural workaround for (1) — wrapping qlib.init in tempfile.TemporaryDirectory() + contextlib.chdir(tmp) so the misplaced scaffolding lands in a throwaway tree — immediately surfaces this second weakness, which is also reachable on its own from any non-git CWD (containers, CI sandboxes, processes started from /tmp).
The git-controlled CWD that _log_uncommitted_code reaches for is not load-bearing: nothing about the experiment's correctness depends on a git repo being present. The feature is opportunistic — capture uncommitted diff as an mlflow side-artifact so a future reader can reconstruct what code produced the run. The author already anticipated git might not be available (hence the except subprocess.CalledProcessError), but the missing-repo case is handled badly along two independent axes:
shell=True with no stderr= capture leaks git's diagnostics to the caller's terminal. subprocess.check_output(..., shell=True) only captures stdout; stderr inherits the parent's fd 2. In a non-repo, git diff emits a multi-line usage: git diff --no-index ... banner straight to the caller's stderr, bypassing qlib's logger entirely. Any downstream consumer with a structured-stderr contract (JSONL logger, CI log parser, anything piping qlib's output through jq) gets corrupted output it cannot suppress via qlib's log-level controls.
- No pre-check that CWD is a git repo means three failing subprocess calls and three INFO records every
R.start(). Each command's CalledProcessError is caught and re-logged as INFO - "Fail to log the uncommitted code of $CWD(...) when run git ...". The wording suggests the user did something wrong, when in many legitimate cases the absence of a git repo is intentional.
Fixing (1) eliminates the most common path into (2) (downstream users no longer need the chdir(tempdir) workaround), but (2) is reachable on its own, so it's worth addressing in the same area.
To Reproduce
Primary bug — FileLock at CWD-relative path
-
From any directory other than your intended mlruns parent (e.g. cd /tmp/somewhere-unrelated), run:
import qlib
qlib.init(
provider_uri="~/.qlib/qlib_data/cn_data",
exp_manager={
"class": "MLflowExpManager",
"module_path": "qlib.workflow.expm",
"kwargs": {
"uri": "file:///Users/<you>/some/abs/path/mlruns",
"default_exp_name": "Experiment",
},
},
)
from qlib.workflow import R
with R.start(experiment_name="repro"):
pass
-
ls the CWD afterwards: a new directory tree Users/<you>/some/abs/path/mlruns/ has been created next to where you were, containing filelock (and, depending on the run, mlflow scaffolding around it).
-
The path logic can also be verified without invoking qlib at all:
>>> from urllib.parse import urlparse
>>> import os
>>> pr = urlparse("file:///Users/me/proj/mlruns")
>>> pr.netloc, pr.path
('', '/Users/me/proj/mlruns')
>>> os.path.join(pr.netloc, pr.path.lstrip("/"), "filelock")
'Users/me/proj/mlruns/filelock' # relative!
>>> os.path.isabs(_)
False
The same happens for the single-slash file:/Users/me/proj/mlruns shape that QSettings.mlflow.uri produces by default.
Secondary bug — _log_uncommitted_code in a non-git CWD
-
From any non-git directory (e.g. a fresh tempfile.TemporaryDirectory(), or simply cd /tmp), run any qlib.init + R.start() pair, e.g.:
import qlib
qlib.init(provider_uri="~/.qlib/qlib_data/cn_data")
from qlib.workflow import R
with R.start(experiment_name="repro"):
pass
-
Observe two symptoms on every R.start(...):
- A multi-line
usage: git diff --no-index ... banner printed to the process's stderr, bypassing qlib's logger.
- Three
INFO - "Fail to log the uncommitted code of $CWD(...) when run git ..." records emitted via qlib's logger.
Expected Behavior
Primary bug. The FileLock path should resolve to the absolute location encoded in the file:// URI, regardless of the caller's current working directory.
Secondary bug. When CWD is not a git work tree, _log_uncommitted_code should skip silently — no stderr output, no per-R.start INFO log records. The feature is an optional reproducibility hook, not a precondition for running experiments, so it should degrade quietly when its environment isn't available.
Screenshot
Stderr banner emitted by the secondary bug on every R.start(...) when CWD is not a git repo:
warning: Not a git repository. Use --no-index to compare two paths outside a working tree
usage: git diff --no-index [<options>] <path> <path> [<pathspec>...]
Diff output format options
-p, --patch generate patch
-s, --no-patch suppress diff output
-u generate patch
-U, --unified[=<n>] generate diffs with <n> lines context
...
(Plus the corresponding three INFO records Fail to log the uncommitted code of $CWD(...) when run git diff|git status|git diff --cached written via qlib's logger.)
Environment
- Qlib version: 0.9.7 (both defects also present on
main HEAD at the time of writing — expm.py:236 and recorder.py:362 are unchanged in recent history)
- Python version: 3.12
- OS: macOS (Darwin) verified; the same logic fails identically on Linux (POSIX
urlparse behavior is the same; the secondary git-stderr leak is platform-independent)
- Commit number (optional, please provide it if you are using the dev version): d5379c5 (
main HEAD at time of report)
Additional Notes
Suggested fix for the primary bug
In qlib/workflow/expm.py:234–236, replace:
pr = urlparse(self.uri)
if pr.scheme == "file":
with FileLock(Path(os.path.join(pr.netloc, pr.path.lstrip("/"), "filelock"))):
with:
from urllib.request import url2pathname
pr = urlparse(self.uri)
if pr.scheme == "file":
lock_path = Path(url2pathname(pr.path)) / "filelock"
with FileLock(lock_path):
url2pathname is preferable to a plain Path(pr.path) because it also handles Windows drive-letter file:///C:/... URIs correctly, where the current lstrip("/") would mangle the drive separator.
Suggested fixes for the secondary bug, ordered least → most invasive
For reference, the offending code (qlib/workflow/recorder.py:362):
def _log_uncommitted_code(self):
# TODO: the sub-directories maybe git repos.
# So it will be better if we can walk the sub-directories and log the uncommitted changes.
for cmd, fname in [
("git diff", "code_diff.txt"),
("git status", "code_status.txt"),
("git diff --cached", "code_cached.txt"),
]:
try:
out = subprocess.check_output(cmd, shell=True)
self.client.log_text(self.id, out.decode(), fname)
except subprocess.CalledProcessError:
logger.info(f"Fail to log the uncommitted code of $CWD({os.getcwd()}) when run {cmd}.")
- Capture stderr.
subprocess.run(cmd, stdout=PIPE, stderr=PIPE, check=True) (or at minimum subprocess.check_output(..., stderr=subprocess.DEVNULL)). Fixes the worst symptom — the leak that bypasses the logger.
- Drop
shell=True. Pass ["git", "diff"] as a list. The commands here use no shell features, and removing shell=True eliminates a needless injection footgun.
- Pre-probe with
git rev-parse --is-inside-work-tree (or check (Path.cwd() / ".git").exists()) and skip silently when CWD isn't a git work tree. Avoids three failing invocations and three INFO records when the answer is obviously no.
- Demote the fallback log from INFO to DEBUG. INFO is the wrong level for "we tried an optional reproducibility hook and the env doesn't support it" — the user can't act on it.
- Expose a
QSettings.mlflow.log_uncommitted_code toggle. Users who deliberately run qlib in non-git environments can disable the feature cleanly, independent of the heuristics above.
Happy to send a PR for either or both fixes if a maintainer agrees on the direction.
🐛 Bug Description
Two related defects in
qlib.workflow, both rooted in latent assumptions about what the caller's current working directory looks like.(1)
MLflowExpManager.get_or_create_expbuilds itsFileLockpath relative to CWD (qlib/workflow/expm.py:236). The construction isos.path.join(pr.netloc, pr.path.lstrip("/"), "filelock")afterurlparse-ing the experiment URI. For any absolutefile://URI — including the default URI thatqlib.config.QSettingsconstructs as"file:" + str(Path(os.getcwd()).resolve() / "mlruns")—urlparsereturnsnetloc=""andpath="/absolute/...", thelstrip("/")then turns the absolute path into a relative one, andos.path.join("", "absolute/...", "filelock")yields a CWD-relative path.FileLock.__enter__mkdirs the lock's parent under whatever directory the process happens to be in, not under the URI's actual location. The intendedfile://...location does get populated by mlflow itself, but the qlib-sideFileLockis now pointing at the wrong path, so the lock no longer serializes runs that share the URI from different working directories.(2)
MLflowRecorder._log_uncommitted_codemisbehaves when CWD is not a git work tree (qlib/workflow/recorder.py:362). The natural workaround for (1) — wrappingqlib.initintempfile.TemporaryDirectory()+contextlib.chdir(tmp)so the misplaced scaffolding lands in a throwaway tree — immediately surfaces this second weakness, which is also reachable on its own from any non-git CWD (containers, CI sandboxes, processes started from/tmp).The git-controlled CWD that
_log_uncommitted_codereaches for is not load-bearing: nothing about the experiment's correctness depends on a git repo being present. The feature is opportunistic — capture uncommitted diff as an mlflow side-artifact so a future reader can reconstruct what code produced the run. The author already anticipated git might not be available (hence theexcept subprocess.CalledProcessError), but the missing-repo case is handled badly along two independent axes:shell=Truewith nostderr=capture leaks git's diagnostics to the caller's terminal.subprocess.check_output(..., shell=True)only captures stdout; stderr inherits the parent's fd 2. In a non-repo,git diffemits a multi-lineusage: git diff --no-index ...banner straight to the caller's stderr, bypassing qlib's logger entirely. Any downstream consumer with a structured-stderr contract (JSONL logger, CI log parser, anything piping qlib's output throughjq) gets corrupted output it cannot suppress via qlib's log-level controls.R.start(). Each command'sCalledProcessErroris caught and re-logged asINFO - "Fail to log the uncommitted code of $CWD(...) when run git ...". The wording suggests the user did something wrong, when in many legitimate cases the absence of a git repo is intentional.Fixing (1) eliminates the most common path into (2) (downstream users no longer need the
chdir(tempdir)workaround), but (2) is reachable on its own, so it's worth addressing in the same area.To Reproduce
Primary bug — FileLock at CWD-relative path
From any directory other than your intended
mlrunsparent (e.g.cd /tmp/somewhere-unrelated), run:lsthe CWD afterwards: a new directory treeUsers/<you>/some/abs/path/mlruns/has been created next to where you were, containingfilelock(and, depending on the run, mlflow scaffolding around it).The path logic can also be verified without invoking qlib at all:
The same happens for the single-slash
file:/Users/me/proj/mlrunsshape thatQSettings.mlflow.uriproduces by default.Secondary bug —
_log_uncommitted_codein a non-git CWDFrom any non-git directory (e.g. a fresh
tempfile.TemporaryDirectory(), or simplycd /tmp), run anyqlib.init+R.start()pair, e.g.:Observe two symptoms on every
R.start(...):usage: git diff --no-index ...banner printed to the process's stderr, bypassing qlib's logger.INFO - "Fail to log the uncommitted code of $CWD(...) when run git ..."records emitted via qlib's logger.Expected Behavior
Primary bug. The
FileLockpath should resolve to the absolute location encoded in thefile://URI, regardless of the caller's current working directory.Secondary bug. When CWD is not a git work tree,
_log_uncommitted_codeshould skip silently — no stderr output, no per-R.startINFO log records. The feature is an optional reproducibility hook, not a precondition for running experiments, so it should degrade quietly when its environment isn't available.Screenshot
Stderr banner emitted by the secondary bug on every
R.start(...)when CWD is not a git repo:(Plus the corresponding three
INFOrecordsFail to log the uncommitted code of $CWD(...) when run git diff|git status|git diff --cachedwritten via qlib's logger.)Environment
mainHEAD at the time of writing —expm.py:236andrecorder.py:362are unchanged in recent history)urlparsebehavior is the same; the secondary git-stderr leak is platform-independent)mainHEAD at time of report)Additional Notes
Suggested fix for the primary bug
In
qlib/workflow/expm.py:234–236, replace:with:
url2pathnameis preferable to a plainPath(pr.path)because it also handles Windows drive-letterfile:///C:/...URIs correctly, where the currentlstrip("/")would mangle the drive separator.Suggested fixes for the secondary bug, ordered least → most invasive
For reference, the offending code (
qlib/workflow/recorder.py:362):subprocess.run(cmd, stdout=PIPE, stderr=PIPE, check=True)(or at minimumsubprocess.check_output(..., stderr=subprocess.DEVNULL)). Fixes the worst symptom — the leak that bypasses the logger.shell=True. Pass["git", "diff"]as a list. The commands here use no shell features, and removingshell=Trueeliminates a needless injection footgun.git rev-parse --is-inside-work-tree(or check(Path.cwd() / ".git").exists()) and skip silently when CWD isn't a git work tree. Avoids three failing invocations and three INFO records when the answer is obviously no.QSettings.mlflow.log_uncommitted_codetoggle. Users who deliberately run qlib in non-git environments can disable the feature cleanly, independent of the heuristics above.Happy to send a PR for either or both fixes if a maintainer agrees on the direction.