Evaluate, iterate faster, and select your best LLM app with TruLens.
Create credible and powerful LLM apps, faster. TruLens is a software tool that helps you to objectively measure the quality and effectiveness of your LLM-based applications using feedback functions. Feedback functions help to programmatically evaluate the quality of inputs, outputs, and intermediate results, so that you can expedite and scale up experiment evaluation. Use it for a wide variety of use cases including question answering, summarization, retrieval-augmented generation, and agent-based applications.
Evaluate how your choices are performing across multiple feedback functions, such as:
Leverage and add to an extensible library of built-in feedback functions. Observe where apps have weaknesses to inform iteration on prompts, hyperparameters, and more.
Compare different LLM apps on a metrics leaderboard to pick the best performing one.
The fastest, easiest way to validate your LLM app.
TruLens fits easily into your LLM app dev process. Simply pip install from PyPI, and add a couple of lines to your LLM app. Track any application, and evaluate with the model of your choice.
Human feedback is the most common way of evaluating LLM apps today - it’s important, but slow and limited. TruLens provides the higher volume, programmatic feedback that helps you to identify trouble spots and iterate rapidly.
TruLens can evaluate your LLM app with the following kinds of feedback functions to increase performance and minimize risk:
Use TruLens for any LLM based app that you’re building with Python.
You are critical to the ongoing success of TruLens. We encourage you to get started and provide ample feedback, so that TruLens improves over time.
A feedback function scores the output of an LLM application by analyzing generated text from an LLM (or a downstream model or application built on it) and metadata.
This is similar to labeling functions. A human-in-the-loop can be used to discover a relationship between the feedback and input text. By modeling this relationship, we can then programmatically apply it to scale up model evaluation. You can read more in this blog: “What’s Missing to Evaluate Foundation Models at Scale”
Originally created by TruEra, TruLens is a community-driven open source project used by thousands of developers to make credible LLM apps faster. Since TruEra's acquisition by Snowflake, Snowflake now actively oversees and supports the development of TruLens in open source. Read more about Snowflake's commitment to growing TruLens in open source.
Why a colossal squid?
The colossal squid’s eyeball is about the size of a soccer ball, making it the largest eyeball of any living creature. In addition, did you know that its eyeball contains light organs? That means that colossal squids have automatic headlights when looking around. We're hoping to bring similar guidance to model developers when creating, introspecting, and debugging neural networks. Read more about the amazing eyes of the colossal squid.