Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...
Abstract: In this paper, we present CAST-Eval, a novel, comprehensive and domain-specific benchmark designed to assess the knowledge and reasoning capabilities of large language models (LLMs) in the ...