Create your Docent account

Enter your email and password to get started

A quick look at how Docent works
1. Create a rubric

Ask any question about your agent, such as "where is it reward hacking?" or "why did it fail?"

Docent first converts it into a precise behavior rubric by reading through your data, asking questions about ambiguities, and suggesting concrete re-writes based on your feedback.

2. Spot-check results
Docent then searches for agent behaviors that match your rubric. You can click on each result to see where it was found in each transcript, as well as an explanation of why it matched.
3. Quantify and visualize
Finally, visualize quantitative patterns by aggregating, slicing, and filtering using Docent's charts. For example, you can plot the number of reward hacks across training steps, or compare the prevalence of reward hacking between different models.
1 / 0