# Benchmark Model Performance

Measure and analyze model performance on Axelera Metis AI hardware

## Important Context

**This command benchmarks AI model performance on Axelera Metis AI Processing Units (APUs).**
All benchmarks measure performance on Metis hardware using the Voyager SDK runtime.

## Instructions

Benchmark the specified model/pipeline: **$ARGUMENTS**

### Step 0: Data Source & Environment Selection

**FIRST**, use the `AskUserQuestion` tool to determine the setup:

```
Question: "Which Voyager SDK setup would you like to use for benchmarking?"
Options:
1. "ChipOS Knowledge Base Only" - Use RAG for research and benchmarking guidance only (no local execution)
2. "ChipOS Knowledge Base + Local SDK" - Use RAG for benchmarking guidance, run on local SDK
3. "Local Voyager SDK Repository" - Use an existing local installation only
4. "Clone Fresh Repository" - Clone latest Voyager SDK for benchmarking
```

**Based on selection:**

**If ChipOS Knowledge Base Only:**
- Search for benchmarking information and documentation:
  ```python
  mcp__chipos__rag_search_knowledge_base(
      query="benchmark performance FPS throughput latency",
      source_id="github.com/axelera-ai-hub/voyager-sdk",
      match_count=5
  )
  mcp__chipos__rag_search_code_examples(
      query="performance tracing metrics",
      source_id="github.com/axelera-ai-hub/voyager-sdk",
      match_count=5
  )
  ```
- Provide research-based guidance and documentation
- No local SDK execution required - purely informational

**If ChipOS Knowledge Base + Local SDK:**
- Search for benchmarking examples:
  ```python
  mcp__chipos__rag_search_knowledge_base(
      query="benchmark performance FPS throughput",
      source_id="github.com/axelera-ai-hub/voyager-sdk",
      match_count=5
  )
  ```
- Ask for local SDK path for execution
- Set environment variables accordingly

**If Local Repository:**
- Ask for the repository path: "What is the path to your Voyager SDK installation?"
- Set environment:
  ```bash
  export AXELERA_FRAMEWORK=/path/to/voyager-sdk
  export AXELERA_RUNTIME_DIR=$AXELERA_FRAMEWORK/runtime
  source $AXELERA_FRAMEWORK/venv/bin/activate
  ```

**If Clone Fresh:**
- Clone and install:
  ```bash
  git clone https://github.com/axelera-ai-hub/voyager-sdk.git
  cd voyager-sdk
  ./install.sh --all --media
  source venv/bin/activate
  export AXELERA_FRAMEWORK=$(pwd)
  ```

### Step 0.5: ChipOS Project & Task Integration

**Check if project context is already set (from previous commands or `/init-chipos`).**

If NO project context exists, ask:
```
Question: "Would you like to track this benchmark in ChipOS?"
Options:
1. "Yes, use existing project" - Select from available ChipOS projects
2. "Yes, create new project" - Create a new project for this work
3. "No, skip task tracking" - Proceed without ChipOS task management
```

**If using existing project:**
```python
mcp__chipos__list_projects()
# User selects project
```

**If creating new project:**
```python
mcp__chipos__create_project(
    title="Voyager SDK Benchmark - [Model Name]",
    description="Performance benchmarking: $ARGUMENTS",
    github_repo="https://github.com/axelera-ai-hub/voyager-sdk"
)
```

**Create benchmark task:**
```python
mcp__chipos__create_task(
    project_id="[PROJECT_ID]",
    title="Benchmark: [Model from $ARGUMENTS]",
    description="Measure performance metrics: $ARGUMENTS",
    assignee="AI IDE Agent",
    feature="benchmarking",
    task_order=10
)
mcp__chipos__update_task(task_id="[TASK_ID]", status="doing")
```

### Step 1: Environment Setup
   ```bash
   # Ensure environment is activated
   source venv/bin/activate

   # Verify hardware
   axdevice list
   ```

2. **Basic Benchmarking**
   Run benchmark on deployed model:
   ```bash
   # Benchmark with video file
   ./inference.py <model> media/traffic1_1080p.mp4 --benchmark

   # Benchmark with test pattern (no I/O overhead)
   ./inference.py <model> test:pattern --benchmark

   # Benchmark with specific iterations
   ./inference.py <model> <source> --benchmark --iterations 1000
   ```

3. **Performance Metrics**
   Key metrics to measure:
   - **System Throughput (FPS)**: End-to-end pipeline performance
   - **Device Throughput (FPS)**: Metis AIPU-only performance
   - **Latency (ms)**: Time per frame
   - **CPU Utilization (%)**: Host CPU usage

4. **Detailed Performance Tracing**
   ```bash
   # Enable statistics display
   ./inference.py <model> <source> --show-stats

   # Save tracer data to CSV
   ./inference.py <model> <source> --save-tracers performance.csv

   # Append to existing file (for comparison)
   ./inference.py <model> <source> --save-tracers +performance.csv
   ```

5. **Multi-Stream Benchmarking**
   ```bash
   # Benchmark multiple streams
   ./inference.py <model> video1.mp4,video2.mp4 --benchmark

   # Maximum stream capacity test
   ./inference.py <model> usb:0,usb:1,video.mp4,video2.mp4 --benchmark
   ```

6. **Component-Level Profiling**
   ```bash
   # GStreamer latency tracing
   GST_TRACERS="latency" ./inference.py <model> <source>

   # GStreamer statistics
   GST_TRACERS="stats" ./inference.py <model> <source>

   # Detailed debug output
   GST_DEBUG=3 ./inference.py <model> <source>
   ```

7. **Accuracy vs Performance Analysis**
   ```bash
   # Measure accuracy
   ./inference.py <model> dataset --no-display

   # Record both accuracy and performance
   ./inference.py <model> dataset --no-display --show-stats --save-tracers acc_perf.csv
   ```

8. **Compare Model Variants**
   Benchmark different model sizes:
   ```bash
   # Small model
   ./inference.py yolov8n-coco video.mp4 --benchmark > yolov8n_bench.txt

   # Medium model
   ./inference.py yolov8s-coco video.mp4 --benchmark > yolov8s_bench.txt

   # Large model
   ./inference.py yolov8m-coco video.mp4 --benchmark > yolov8m_bench.txt
   ```

9. **Resolution Impact**
   Test different input resolutions:
   ```bash
   # 640x640 (default YOLO)
   ./inference.py yolov8n-coco video.mp4 --benchmark

   # Different resolution models
   ./inference.py yolov8n-coco-320 video.mp4 --benchmark
   ./inference.py yolov8n-coco-1280 video.mp4 --benchmark
   ```

10. **Core Configuration Testing**
    ```bash
    # Single core
    ./deploy.py <model> --aipu-cores 1
    ./inference.py <model> video.mp4 --benchmark

    # Dual core
    ./deploy.py <model> --aipu-cores 2
    ./inference.py <model> video.mp4 --benchmark

    # Quad core
    ./deploy.py <model> --aipu-cores 4
    ./inference.py <model> video.mp4 --benchmark
    ```

11. **Benchmark Report Generation**
    Create comprehensive report:
    ```python
    import pandas as pd

    # Load tracer data
    df = pd.read_csv('performance.csv')

    # Calculate statistics
    print("Performance Summary")
    print("=" * 40)
    print(f"Average FPS: {df['system_fps'].mean():.1f}")
    print(f"P95 Latency: {df['latency_ms'].quantile(0.95):.1f} ms")
    print(f"Min FPS: {df['system_fps'].min():.1f}")
    print(f"Max FPS: {df['system_fps'].max():.1f}")
    print(f"CPU Usage: {df['cpu_percent'].mean():.1f}%")
    ```

12. **Performance Bottleneck Analysis**
    Identify bottlenecks:

    **CPU Bottleneck:**
    - High CPU utilization (>80%)
    - System FPS << Device FPS
    - Solution: Optimize preprocessing/postprocessing

    **Memory Bottleneck:**
    - High memory usage
    - Frame drops
    - Solution: Reduce batch size, optimize buffers

    **I/O Bottleneck:**
    - Low device utilization
    - Stream stalls
    - Solution: Use faster input source, enable buffering

    **Device Bottleneck:**
    - Device FPS is limiting factor
    - CPU utilization low
    - Solution: Use smaller model or more cores

13. **Optimization Recommendations**
    Based on benchmark results:
    - If CPU bound: Use OpenCL preprocessing
    - If memory bound: Reduce resolution or batch
    - If I/O bound: Use hardware decoder
    - If device bound: Try model pruning/quantization

14. **Benchmark Comparison Table**
    Create comparison table:
    ```
    | Model      | Resolution | FPS   | Latency | CPU % | mAP   |
    |------------|------------|-------|---------|-------|-------|
    | yolov8n    | 640x640    | 120   | 8.3ms   | 45%   | 37.3  |
    | yolov8s    | 640x640    | 85    | 11.8ms  | 52%   | 44.9  |
    | yolov8m    | 640x640    | 55    | 18.2ms  | 58%   | 50.2  |
    ```

15. **Continuous Benchmarking**
    Set up automated benchmarks:
    ```bash
    #!/bin/bash
    # benchmark_suite.sh

    MODELS="yolov8n-coco yolov8s-coco yolov5s-v7-coco"
    SOURCE="media/traffic1_1080p.mp4"
    OUTPUT_DIR="benchmark_results/$(date +%Y%m%d)"

    mkdir -p $OUTPUT_DIR

    for model in $MODELS; do
        echo "Benchmarking $model..."
        ./inference.py $model $SOURCE --benchmark \
            --save-tracers "$OUTPUT_DIR/${model}.csv" \
            --no-display
    done

    echo "Results saved to $OUTPUT_DIR"
    ```
