Documentation Index
Fetch the complete documentation index at: https://launchdarkly-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This topic explains how to create and manage custom judges for online evaluations in AI Configs. Online evaluations in AI Configs score AI Config outputs in production by evaluating model responses with a judge. A judge is a specialized AI Config that uses a large language model (LLM) as a judge to evaluate responses and return numeric scores that represent quality signals such as accuracy, relevance, or toxicity. Custom judges let organizations define what quality means for their own products and domains. With custom judges, teams can:- Standardize and reuse approved evaluation logic across AI Configs and environments
- Measure domain-specific quality signals such as accuracy, relevance, or toxicity
- Apply consistent quality criteria in monitoring, guarded rollouts, and experiments to detect regressions in production
To record evaluation results from custom judges, you must use a LaunchDarkly AI SDK version that includes online evaluation support. If your SDK does not support online evaluations, judges can be attached but no evaluation metrics will appear.
Access control and prerequisites
Custom judges use the same access control model as other AI Configs. To create or edit a judge, you must have permission to create and update AI Configs in the project. Online evaluations must be enabled for the project before evaluation metrics can be recorded or displayed. If online evaluations are not enabled, judges can be attached but no evaluation results will appear. You can attach custom judges to completion-mode AI Config variations in the LaunchDarkly UI. For other variations, invoke a custom judge programmatically using the AI SDK. Automatic recording of evaluation metrics is supported starting in:- Python AI SDK version 0.14.0
- Node.js AI SDK version 0.16.1
Create and manage custom judges
Create custom judges from the Create AI Config dialog. When you create a judge, LaunchDarkly provides a default evaluation configuration that you can customize for your use case.Create a custom judge
To create a custom judge:- Navigate to AI Configs.
- Click Create AI Config.
- Click Judge as the AI Config type. Judge mode is a specialized configuration used only for evaluation.
- Enter a name and key for the judge.
- (Optional) Select a maintainer.
- Click Create.
Manage custom judges
After you create a custom judge, you can update its configuration, manage its variations, and attach it to AI Config variations. Judges use the same editing interface as other AI Configs, with judge-specific settings and restrictions.Configure judge settings
Judge-specific settings are defined at the AI Config level and apply to all variations of the judge. To update judge settings:- Navigate to AI Configs and select the judge.
- Open the judge details page.
- Change the evaluation metric key. If the key does not already exist, LaunchDarkly creates a new metric and displays a warning before you save.
- Select the evaluation metric key to open the metric details page.
- Configure score inversion to indicate whether a score of 0.0 represents good or bad quality. Use inversion for metrics such as toxicity, where lower scores indicate better outcomes.
$ld:ai:judge: event prefix to identify how evaluation scores are stored and aggregated. Metric keys do not need to be unique across judges. Teams can reuse a metric key across multiple judges to intentionally aggregate evaluation results.
Edit judge variations
Each judge includes one or more variations. You can edit judge variations similarly to other AI Configs, with the following restrictions:- You cannot view or edit model parameters or custom parameters.
- You cannot attach tools.
- You cannot attach judges to a judge.
- The “Judges” section is hidden for judge variations.
Attach and manage judges
After creating a judge, attach it to one or more AI Config variations to evaluate model responses. To attach a judge:- Navigate to the AI Config you want to evaluate.
- Select the Variations tab.
- Expand a variation.
- In the “Judges” section, click Attach judges.
- Select a judge.
- Adjust the sampling percentage as needed.
- Click Review and save.
- Select the judge name to open the judge details page.
- Select the evaluation metric key to open the metric details page.
Use evaluation results
Evaluation results from custom judges appear throughout LaunchDarkly as standard AI metrics. Judges return structured results with a numeric score and brief reasoning. You do not need to define output formatting. LaunchDarkly enforces structured output so evaluation results can be reliably recorded and displayed as metrics. Each evaluation metric produces a single score between 0.0 and 1.0.View results in Monitoring and Metrics
To view evaluation results in Monitoring:- Navigate to the AI Config with the attached judge.
- Select the Monitoring tab.
- Use the metric dropdown to select the evaluation metric key.
- Navigate to Metrics.
- Select the Judge metrics tab to filter metrics with the
$ld:ai:judge:prefix. - Select a metric to view its details and trends.
Use evaluation metrics in guardrails and experiments
Evaluation metrics produced by custom judges behave like other AI metrics. You can:- Use evaluation metrics as guardrails in guarded rollouts to pause or revert releases when quality degrades.
- Select evaluation metrics as goals in experiments to compare AI Config variations.
- Use judge scores in your application’s execution logic to enforce custom guardrails at runtime, when the evaluation sampling rate is set to 100 percent and every model response is evaluated.
Reference: judge configuration and evaluation formats
Reference: judge configuration and evaluation formats