- AmplifyIntelligence.ai
- Posts
- How I Set Up DeepEval for Fast, Easy, and Powerful LLM Evaluations
How I Set Up DeepEval for Fast, Easy, and Powerful LLM Evaluations
The Simple Tutorial for LLM Evals Using DeepEval
Today, I want to walk you through one of the most crucial (and often overlooked) aspects of deploying LLMs in production: evaluations. I'll show you how to set up DeepEval, a popular open-source evaluation framework for large language models.
Why DeepEval?
DeepEval is my go-to framework for LLM evaluations because:
It's open-source and highly customizable
It comes with a variety of built-in evaluation methods
You can easily write your own evaluation metrics
It offers a web UI for in-depth analysis (optional, but super useful)
Let me walk you through how I set it up and ran my first evaluation.
Setting Up the Environment
Here's how I got started:
Created a new folder for the project:
mkdir DeepEvalTest && cd DeepEvalTest
Set up a Python virtual environment:
python -m venv venv source venv/bin/activate
Installed DeepEval:
pip install deepeval
(Optional) Logged into the DeepEval web UI:
deepeval login
Creating Our First Evaluation
I created a simple Python script (test_example.py
) to run our first eval:
from deepeval import assert_test from deepeval.metrics import AnswerRelevancy test_case = { "input": "What if these shoes don't fit?", "actual_output": "We offer a 30-day full refund on all shoe purchases at no extra cost." } metric = AnswerRelevancy(threshold=0.5) assert_test(metric, test_case)
This script uses the AnswerRelevancy metric to check if the LLM's response is relevant to the user's question. Pretty neat, right?
The Crucial Step You Might Miss
Here's a pro tip: Before running your eval, make sure to set up your OpenAI API key as an environment variable. I did this by running:
export OPENAI_API_KEY=your_api_key_here
Trust me, this step will save you from a headache later!
Running the Evaluation
With everything set up, I ran the evaluation using:
deepeval test run test_example.py
And voila! The eval ran successfully, giving me a score and even telling me how many tokens it used.
Diving Deeper with the Web UI
Remember that optional login step? Here's where it pays off. By logging into confident.ai.com, I got access to a dashboard that shows:
Detailed test results
Cost per test
Input and output for each test case
Time taken for each evaluation
This feature is especially handy when you're running bulk tests or need to do more in-depth analysis.
Why This Matters
Setting up a robust evaluation framework like DeepEval is crucial when you're deploying LLMs at scale. It allows you to:
Consistently measure the performance of your models
Quickly identify areas for improvement in your prompts
Ensure the quality of your LLM outputs in production
Wrapping Up
Getting started with DeepEval might seem a bit tricky at first, but trust me, it's worth it. The insights you gain from these evaluations can be game-changing for your LLM applications.
I hope this walkthrough helps you get up and running with DeepEval. Remember, the key steps are:
Set up your environment
Install DeepEval
Create your evaluation script
Set your OpenAI API key
Run the evaluation
Analyze the results (bonus points for using the web UI!)
If you want to dive deeper into LLM evaluations or have any questions about setting up DeepEval, feel free to reach out. You can find me on Twitter or check out my YouTube channel for more tutorials like this one.
Happy evaluating!
Reply