Eval JavaScript - Search News

How to choose the best LLM using R and vitals

Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...

IEEE

CAST-Eval: A Domain-Specific Benchmark for Large Language Models in Civil Aviation Safety

Abstract: In this paper, we present CAST-Eval, a novel, comprehensive and domain-specific benchmark designed to assess the knowledge and reasoning capabilities of large language models (LLMs) in the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

How to choose the best LLM using R and vitals

CAST-Eval: A Domain-Specific Benchmark for Large Language Models in Civil Aviation Safety

Trending now