Skip to content

fix: Route TOC-without-page-numbers documents to the correct strategy#285

Open
Me3sP wants to merge 1 commit into
VectifyAI:mainfrom
Me3sP:fix/toc-no-page-numbers-routing
Open

fix: Route TOC-without-page-numbers documents to the correct strategy#285
Me3sP wants to merge 1 commit into
VectifyAI:mainfrom
Me3sP:fix/toc-no-page-numbers-routing

Conversation

@Me3sP

@Me3sP Me3sP commented May 20, 2026

Copy link
Copy Markdown

Problem

tree_parser only had two dispatch branches: a TOC with page numbers, or everything else. A document with a printed table of contents that lists no page numbers fell into the else branch and was handled by process_no_toc — regenerating the structure from scratch and ignoring the existing TOC entirely.

As a result, process_toc_no_page_numbers was unreachable as a primary strategy. It only ever ran as a fallback from process_toc_with_page_numbers inside meta_processor.

Fix

  • Add the missing tree_parser branch so a TOC with no page numbers is dispatched to process_toc_no_page_numbers directly, using the TOC instead of discarding it.
  • Forward start_index from meta_processor into process_toc_no_page_numbers. It previously relied on the default (1), which would index incorrectly when invoked for non-top-level nodes.

The existing fallback chain is preserved: process_toc_no_page_numbers still degrades to process_no_toc on low verification accuracy.

Impact

Additive only — no existing branch behavior changes. Documents that previously hit process_no_toc despite having a usable TOC now keep their authored structure.

tree_parser only had two branches: a TOC with page numbers, or
everything else. A document with a printed TOC that lists no page
numbers fell into the else branch and was processed with
process_no_toc, regenerating the structure from scratch and ignoring
the existing TOC entirely.

process_toc_no_page_numbers was therefore unreachable as a primary
strategy and only ran as a fallback from process_toc_with_page_numbers.

Add the missing branch so a TOC with no page numbers is dispatched to
process_toc_no_page_numbers directly. Also forward start_index from
meta_processor into process_toc_no_page_numbers, which previously
relied on the default and would index incorrectly for non-top-level
nodes.
@Me3sP Me3sP changed the title Route TOC-without-page-numbers documents to the correct strategy May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant