Skip to content

Fix Base58 dropping leading zero bytes#44

Open
gaoflow wants to merge 1 commit into
dhondta:mainfrom
gaoflow:fix-base58-leading-zeros
Open

Fix Base58 dropping leading zero bytes#44
gaoflow wants to merge 1 commit into
dhondta:mainfrom
gaoflow:fix-base58-leading-zeros

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 15, 2026

Copy link
Copy Markdown

Problem

Base58 (and the other big-integer base codecs) silently drop leading null bytes:

import codext
codext.encode(b"\x00abc", "base58")  # 'ZiCa'  -> should be '1ZiCa'
codext.encode(b"\x00",    "base58")  # ''      -> should be '1'

base_encode/base_decode in src/codext/base/_base.py convert the whole input to a single integer (s2i) and back via divmod, so leading 0x00 bytes (high-order zeros) vanish. Per the Base58 specification the codec cites (and every reference implementation, e.g. the base58 PyPI library / Bitcoin Core), each leading 0x00 byte must map to a leading charset[0] character ('1' for the bitcoin alphabet). This also broke round-tripping for any value beginning with a null byte.

Fix

Preserve the leading-zero count on encode (prepend one charset[0] per leading \x00) and restore it on decode (prepend one \x00 per leading charset[0]). Both changes are guarded to the byte-input path so the integer recode used internally is untouched.

codext.encode(b"\x00abc", "base58")        # '1ZiCa'
codext.decode("1ZiCa", "base58")           # b'\x00abc'
codext.encode(b"\x01\x00", "base58")       # '5R'  (internal/trailing zeros unaffected)

Verified against the base58 reference library: 0 mismatches and 0 round-trip failures across random inputs (every leading-zero input failed before).

Test

Extended test_codec_base58 in tests/test_base.py with leading-null-byte encode/decode/round-trip assertions (str and bytes paths). Verified red→green: the test fails without the source change (AssertionError) and passes with it; the full test suite stays green (103 passed).


Disclosure: I use AI assistance (under my direction) for my contributions; I review and verify every change before submitting.

The generic base_encode/base_decode convert the whole input to a single
integer, so leading null bytes (high-order zeros) were silently lost: e.g.
Base58 encoded b'\x00abc' to 'ZiCa' instead of '1ZiCa', and b'\x00' to an
empty string. Per the Base58 spec each leading 0x00 byte maps to a leading
charset[0] character. Preserve the leading-zero count on encode and restore
it on decode, so values round-trip and match reference implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant