Welcome to Arrakis Viewer

A comprehensive platform for AI model evaluation, analysis, and quality assessment

Documentation

System Information
  • Version: 0f4ac6d
  • Environment: prod
Configuration Status
Cosmos DB: Configured
Local Filesystem: Disabled

Available Features

Evaluation Management

Manage, import, and run evaluation test suites.

  • Import test definitions from YAML files
  • Run evaluations against any server
  • Track progress and view results
Access Evaluation Management
Traces

View and explore trace data from evaluation runs.

  • Browse all available traces
  • Filter by device ID
  • View detailed trace information
Access Traces
Evaluation Results

View evaluation results from all storage sources.

  • Browse results from local files and Cosmos DB
  • View unified evaluation run details
  • Explore detailed test traces and LLM requests
Access Evaluation Results
Spice Registry

Manage spice definitions and tool configurations.

  • Register and manage spice definitions
  • Configure tool parameters and metadata
  • View live spice information
Access Spice Registry
Spice Ensembles

Configure ensemble compositions and inheritance.

  • Create ensemble configurations
  • Set up inheritance hierarchies
  • Override spice versions per environment
Access Spice Ensembles

Getting Started

For First-Time Users

Welcome to the Arrakis Viewer platform! This tool helps you test and evaluate AI model performance:

  1. Check the configuration status above to see which features are available
  2. For developers: Run evaluations with uv run python -m peanut_eval.bin.run_evals to generate test results in the eval_results/ directory
  3. Access these results through the Local Results section if filesystem access is enabled
  4. For stakeholders: Use the Evaluations and Runs sections in the hosted environment (requires Cosmos DB)

View the complete evaluation framework documentation

Configuration & Workflow

Local development workflow:

  • Define test cases in YAML configuration files
  • Run evaluations against a local server
  • Review results in the Local Results section
  • Verify assertions through detailed test reports

Hosted environment:

  • Cosmos DB enables evaluation management and trace features
  • Import evaluation definitions from YAML files
  • Run evaluations against any server
  • Track progress and view detailed results

Contact your administrator if you need help configuring these options.