🔸Reviewing Evaluation Analytics

After running evaluations, you can visualize the scores using charts on the Evaluation Analytics page. Here, users can specify a time period and select specific scorers. The tags defined earlier can also be used to filter the dataset for which you want to see evaluations. These charts help users gauge how a model is performing on a specific dataset, allowing for filtering and examination of performance using designated tags at the top of the page.

Currently, our charts are designed to analyze how responses are categorized into different groups. To illustrate this, we use two types of charts: a bar chart and a dodge plot.

  • The bar chart displays a cumulative count of responses in each category, based on the dataset defined by tags. In contrast, the dodge plot represents each response individually as a dot, color-coded according to the category it belongs to.

  • The dodge plot includes enhanced functionality that allows users to view both the inference made and the corresponding response. It also provides insights into the reasons why the response was evaluated in a particular category bucket.

  1. To filter on the evaluations we ran for this particular example, the following filters would need to be applied:

    • Start Date

    • End Date

    • Scorer: Factuality

    • Tags: CEOs, Example

    The page should look something similar to the following screenshot:

