2️Viewing and Evaluating the Run

The next step after running a prompt is collectively observing and evaluating it through the Prompt Runs dashboard. You can either create a custom score via the Custom Scorer page before evaluating, or immediately run evaluations using the already-provided scorers.

For this example, we will not be creating a custom scorer, but the steps on how to do so can be found in the following locations:

This section will be split up into two parts: viewing the run results, and initiating an evaluation on a run.

Viewing the Run Results

  1. Following the example in this walkthrough, we are going to be focused on the four runs outlined by the red box within the Prompt Runs dashboard.

  1. From this pane, we can add more tags to a run by clicking on the "+" sign on the row of the run (or the "+ Add tags" button for a run that doesn't already have tags). Adding tags to responses are not mandatory but are beneficial for grouping responses so as to later reference them while creating evaluation charts and datasets.

  2. We also have the ability to view more details about a particular run. To do so, either double click on the row, or click on the three dot button on the row you wish to review results for. This will open up a popup, and click on "Details."

  1. This will open a drawer to the right. Details that can be viewed include Input, Output, Context (if provided during run), Expected Output (if provided during run), Model version, Tags, Template name and ID, and Response ID.

Initiating an Evaluation

  1. Now, let's run an evaluation. Select the run(s) that you wish to evaluate using the associated checkbox on the left of each run. After selecting, hover to the top right corner of the page and click the "Evaluate run(s)" button (outlined in red below).

  1. Clicking on this will open up a popup box, asking which scorers we want to evaluate against. For this example, we will evaluate against Factuality. After selecting the desired scorers to evaluate against, click on "Evaluate."

See LLM-based Scorers for detailed information on each LLM-based scorer. Similarly, see Heuristic Scorers for detailed information on each algorithmic/heuristic scorer.

After a few moments, the evaluation will complete and lead to the Scores page. The next section will guide us through evaluating the scores and interpreting the metrics derived from them.

Last updated