Local LLM Bench

Evaluate local LLM accuracy on structured data extraction. Tests models' ability to extract JSON from unstructured text with ground-truth comparison, F1 scoring, and fuzzy matching. Supports MLX and Ollama backends.