asiai web
Launch the web dashboard for visual monitoring and benchmarking.
Usage
asiai web
asiai web --port 9000
asiai web --host 0.0.0.0
asiai web --no-open
Options
| Option | Default | Description |
|---|---|---|
--port |
8899 |
HTTP port to listen on |
--host |
127.0.0.1 |
Host to bind to |
--no-open |
Don't open the browser automatically | |
--db |
~/.local/share/asiai/asiai.db |
Path to the SQLite database |
Requirements
The web dashboard requires additional dependencies:
pip install asiai[web]
# or install everything:
pip install asiai[all]
Pages
Dashboard (/)
System overview with engine status, loaded models, memory usage, and last benchmark results.
Benchmark (/bench)
Run cross-engine benchmarks directly from the browser:
- Quick Bench button — 1 prompt, 1 run, ~15 seconds
- Advanced options: engines, prompts, runs, context-size (4K/16K/32K/64K), power
- Live progress via SSE
- Results table with winner highlighting
- Throughput and TTFT charts
- Shareable card — auto-generated after benchmark (PNG via API, SVG fallback)
- Share section — copy link, download PNG/SVG, share on X/Reddit, export JSON
History (/history)
Visualize benchmark and system metrics over time:
- System charts: CPU load, Memory %, GPU utilization (with renderer/tiler breakdown)
- Engine activity: TCP connections, requests processing, KV cache usage %
- Benchmark charts: throughput (tok/s) and TTFT per engine
- Process metrics: engine CPU % and RSS memory during benchmark runs
- Filter by time range (1h / 24h / 7d / 30d / 90d) or custom date range
- Data table with context-size indication (e.g., "code (64K ctx)")
Monitor (/monitor)
Real-time system monitoring with 5-second refresh:
- CPU load sparkline
- Memory gauge
- Thermal state
- Loaded models list
Doctor (/doctor)
Interactive health check for system, engines, and database. Same checks as asiai doctor with a visual interface.
API Endpoints
The web dashboard exposes REST API endpoints for programmatic access.
GET /api/status
Lightweight health check. Cached 10s, responds in < 500ms.
{
"status": "ok",
"ts": 1709700000,
"uptime": 86400,
"engines": {"ollama": true, "lmstudio": false},
"memory_pressure": "normal",
"thermal_level": "nominal"
}
Status values: ok (all engines reachable), degraded (some down), error (all down).
GET /api/snapshot
Full system + engine snapshot. Cached 5s. Includes CPU load, memory, thermal state, and per-engine status with loaded models.
GET /api/benchmarks
Benchmark results with filters. Returns per-run data including tok/s, TTFT, power, context_size, engine_version.
| Parameter | Default | Description |
|---|---|---|
hours |
168 |
Time range in hours (0 = all) |
model |
Filter by model name | |
engine |
Filter by engine name | |
since / until |
Unix timestamp range (overrides hours) |
GET /api/engine-history
Engine status history (reachability, TCP connections, KV cache, tokens predicted).
| Parameter | Default | Description |
|---|---|---|
hours |
168 |
Time range in hours |
engine |
Filter by engine name |
GET /api/benchmark-process
Process-level CPU and memory metrics from benchmark runs (7-day retention).
| Parameter | Default | Description |
|---|---|---|
hours |
168 |
Time range in hours |
engine |
Filter by engine name |
GET /api/metrics
Prometheus exposition format. Gauges covering system, engine, model, and benchmark metrics.
# prometheus.yml
scrape_configs:
- job_name: 'asiai'
static_configs:
- targets: ['localhost:8899']
metrics_path: '/api/metrics'
scrape_interval: 30s
Metrics include:
| Metric | Type | Description |
|---|---|---|
asiai_cpu_load_1m |
gauge | CPU load average (1 min) |
asiai_memory_used_bytes |
gauge | Memory used |
asiai_thermal_speed_limit_pct |
gauge | CPU speed limit % |
asiai_engine_reachable{engine} |
gauge | Engine reachability (0/1) |
asiai_engine_models_loaded{engine} |
gauge | Models loaded count |
asiai_engine_tcp_connections{engine} |
gauge | Established TCP connections |
asiai_engine_requests_processing{engine} |
gauge | Requests currently processing |
asiai_engine_kv_cache_usage_ratio{engine} |
gauge | KV cache fill ratio (0-1) |
asiai_engine_tokens_predicted_total{engine} |
counter | Cumulative tokens predicted |
asiai_model_vram_bytes{engine,model} |
gauge | VRAM per model |
asiai_bench_tok_per_sec{engine,model} |
gauge | Last benchmark tok/s |
Notes
- The dashboard binds to
127.0.0.1by default (localhost only) - Use
--host 0.0.0.0to expose on the network (e.g., for remote monitoring) - Port
8899is chosen to avoid conflicts with inference engine ports