Skip to content

PsychQuant/FastOCR

Repository files navigation

FastOCR

OCR a PDF to math-aware Markdown via GLM-OCR (MLX or Ollama), plus the Python/R harness that drives the article's factorial ANOVA. A thin Swift CLI over the OCRCore (ocr-swift) backends and PDFToLaTeXCore (PageRenderer).

Build

The MLX backends need a compiled Metal shader library (mlx.metallib) next to the binary — swift build alone does NOT produce it. Use the Makefile:

make release   # swift build -c release + build-metallib.sh → .build/release/mlx.metallib

Requires the Metal toolchain (xcodebuild -downloadComponent MetalToolchain). A bare swift build -c release builds the binary, but the MLX backends then fail at runtime with "Failed to load the default metallib".

Usage

fastocr <pdf> <out_dir> --backend mlx|ollama --quant 4|8 --dpi 150 --pages 1-3

Backends map to the article's factor A: mlx --quant 4 = A1, mlx --quant 8 = A2, ollama = A3.

Status: --backend ollama (A3) is the only path that currently runs end-to-end. The MLX backends (A1/A2) are blocked by an upstream mlx-swift-lm regression — see KNOWN-ISSUES.md #4.

Harness

  • scripts/run_experiment.py — 3-way sweep (Backend × DPI × corpus) → data/raw/results.tsv
  • scripts/autotune.py — phase-1 pilot runner
  • scripts/analyze.R — mixed-effects ANOVA + Pareto frontier

About

OCR a PDF to math-aware Markdown via GLM-OCR (MLX or Ollama) — Swift CLI + Python/R experiment harness

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors