Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...
The moment an AI system can read internal systems, trigger workflows, move money, send emails, update records or approve actions, the risk profile changes.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results