Interactive Dashboard Tour
Interactive Dashboard Tour
Learn by doing: Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.
Prerequisites
- Observability stack running (from the Observability Demo)
- Observable agents running (broker, publisher, subscriber)
- Grafana open at http://localhost:3333
- 10-15 minutes for the complete tour
Quick Setup Reminder
If you haven’t completed the observability demo yet:
# Start observability stack
cd agenthub/observability
docker-compose up -d
# Run observable agents (3 terminals)
go run broker/main.go
go run agents/subscriber/main.go
go run agents/publisher/main.go
Dashboard Navigation
Accessing the Main Dashboard
- Open Grafana: http://localhost:3333
- Login: admin / admin (skip password change for demo)
- Navigate: Dashboards → Browse → AgentHub → “AgentHub EDA System Observatory”
- Bookmark: Save this URL for quick access: http://localhost:3333/d/agenthub-eda-dashboard
Dashboard Layout Overview
The dashboard is organized in 4 main rows:
🎯 Row 1: Event Processing Overview
├── Event Processing Rate (events/sec)
└── Event Processing Error Rate (%)
📊 Row 2: Event Analysis
├── Event Types Distribution (pie chart)
└── Event Processing Latency (p50, p95, p99)
🔍 Row 3: Distributed Tracing
└── Jaeger Integration Panel
💻 Row 4: System Health
├── Service CPU Usage (%)
├── Service Memory Usage (MB)
├── Go Goroutines Count
└── Service Health Status
Interactive Tour
Tour 1: Understanding Event Flow (3 minutes)
Step 1: Watch the Event Processing Rate
Location: Top-left panel What to observe: Real-time lines showing events per second
- Identify the services: - Green line: agenthub-broker(should be highest - processes all events)
- Blue line: agenthub-publisher(events being created)
- Orange line: agenthub-subscriber(events being processed)
 
- Green line: 
- Watch the pattern: - Publisher creates bursts of events
- Broker immediately processes them (routing)
- Subscriber processes them shortly after
 
- Understand the flow: - Publisher (creates) → Broker (routes) → Subscriber (processes) 50/sec → 150/sec → 145/sec
💡 Tour Insight: The broker rate is higher because it processes both incoming tasks AND outgoing results.
Step 2: Monitor Error Rates
Location: Top-right panel (gauge) What to observe: Error percentage gauge
- Healthy system: Should show 0-2% (green zone) 
- If you see higher errors: - Check if all services are running
- Look for red traces in Jaeger (we’ll do this next)
 
- Error rate calculation: - Error Rate = (Failed Events / Total Events) × 100
🎯 Action: Note your current error rate - we’ll compare it later.
Tour 2: Event Analysis Deep Dive (3 minutes)
Step 3: Explore Event Types
Location: Middle-left panel (pie chart) What to observe: Distribution of different event types
- Identify event types: - greeting: Most common (usually 40-50%)
- math_calculation: Compute-heavy tasks (30-40%)
- random_number: Quick tasks (15-25%)
- unknown_task: Error-generating tasks (2-5%)
 
- Business insights: - Larger slices = more frequent tasks
- Small red slice = intentional error tasks for testing
 
💡 Tour Insight: The publisher randomly generates different task types to simulate real-world workload diversity.
Step 4: Analyze Processing Latency
Location: Middle-right panel What to observe: Three latency lines (p50, p95, p99)
- Understand percentiles: - p50 (blue): 50% of events process faster than this
- p95 (green): 95% of events process faster than this
- p99 (red): 99% of events process faster than this
 
- Healthy ranges: - p50: < 50ms (very responsive)
- p95: < 200ms (good performance)
- p99: < 500ms (acceptable outliers)
 
- Pattern recognition: - Spiky p99 = occasional slow tasks (normal)
- Rising p50 = systemic slowdown (investigate)
- Flat lines = no activity or measurement issues
 
🎯 Action: Hover over the lines to see exact values at different times.
Tour 3: Distributed Tracing Exploration (4 minutes)
Step 5: Jump into Jaeger
Location: Middle section - “Distributed Traces” panel Action: Click the “Explore” button
This opens Jaeger in a new tab. Let’s explore:
- In Jaeger UI: - Service dropdown: Select “agenthub-broker”
- Operation: Leave as “All”
- Click “Find Traces”
 
- Pick a trace to examine: - Look for traces that show multiple spans
- Click on any trace line to open details
 
- Understand the trace structure: - Timeline View: agenthub-publisher: publish_event [2ms] └── agenthub-broker: process_event [1ms] └── agenthub-subscriber: consume_event [3ms] └── agenthub-subscriber: process_task [15ms] └── agenthub-subscriber: publish_result [2ms]
- Explore span details: - Click individual spans to see:- Tags: event_type, event_id, agent names
- Process: Which service handled the span
- Duration: Exact timing information
 
 
- Click individual spans to see:
💡 Tour Insight: Each event creates a complete “trace” showing its journey from creation to completion.
Step 6: Find and Analyze an Error
- Search for error traces: - In Jaeger, add tag filter: error=true
- Or look for traces with red spans
 
- In Jaeger, add tag filter: 
- Examine the error trace: - Red spans indicate errors
- Error tags show the error type and message
- Stack traces help with debugging
 
- Follow the error propagation: - See how errors affect child spans
- Notice error context in span attributes
 
🎯 Action: Find a trace with “unknown_task” event type - these are designed to fail for demonstration.
Tour 4: System Health Monitoring (3 minutes)
Step 7: Monitor Resource Usage
Location: Bottom row panels What to observe: System resource consumption
- CPU Usage Panel (Bottom-left): - Normal range: 10-50% for demo workload
- Watch for: Sustained high CPU (>70%)
- Services comparison: See which service uses most CPU
 
- Memory Usage Panel (Bottom-center-left): - Normal range: 30-80MB per service for demo
- Watch for: Continuously growing memory (memory leaks)
- Pattern: Sawtooth = normal GC, steady growth = potential leak
 
- Goroutines Panel (Bottom-center-right): - Normal range: 10-50 goroutines per service
- Watch for: Continuously growing count (goroutine leaks)
- Pattern: Stable baseline with activity spikes
 
Step 8: Verify Service Health
Location: Bottom-right panel What to observe: Service up/down status
- Health indicators: - Green: Service healthy and responding
- Red: Service down or health check failing
- Yellow: Service degraded but operational
 
- Health check details: - Each service exposes /healthendpoint
- Prometheus monitors these endpoints
- Dashboard shows aggregated status
 
- Each service exposes 
🎯 Action: Open http://localhost:8080/health in a new tab to see raw health data.
Tour 5: Time-based Analysis (2 minutes)
Step 9: Change Time Ranges
Location: Top-right of dashboard (time picker) Current: Likely showing “Last 5 minutes”
- Try different ranges: - Last 15 minutes: See longer trends
- Last 1 hour: See full demo session
- Custom range: Pick specific time period
 
- Observe pattern changes: - Longer ranges: Show trends and patterns
- Shorter ranges: Show real-time detail
- Custom ranges: Zoom into specific incidents
 
Step 10: Use Dashboard Filters
Location: Top of dashboard - variable dropdowns
- Service Filter: - Select “All” to see everything
- Pick specific service to focus analysis
- Useful for isolating service-specific issues
 
- Event Type Filter: - Filter to specific event types
- Compare performance across task types
- Identify problematic event categories
 
💡 Tour Insight: Filtering helps you drill down from system-wide view to specific components or workloads.
Hands-on Experiments
Experiment 1: Create a Service Outage
Goal: See how the dashboard shows service failures
- Stop the subscriber: - # In subscriber terminal, press Ctrl+C
- Watch the dashboard changes: - Error rate increases (top-right gauge turns red)
- Subscriber metrics disappear from bottom panels
- Service health shows subscriber as down
 
- Check Jaeger for failed traces: - Look for traces that don’t complete
- See where the chain breaks
 
- Restart subscriber: - go run agents/subscriber/main.go
🎯 Learning: Dashboard immediately shows impact of service failures.
Experiment 2: Generate High Load
Goal: See system behavior under stress
- Modify publisher to generate more events: - # Edit agents/publisher/main.go # Change: time.Sleep(5 * time.Second) # To: time.Sleep(1 * time.Second)
- Watch dashboard changes: - Processing rate increases
- Latency may increase
- CPU/memory usage grows
 
- Observe scaling behavior: - How does the system handle increased load?
- Do error rates increase?
- Where are the bottlenecks?
 
🎯 Learning: Dashboard shows system performance characteristics under load.
Dashboard Interpretation Guide
What Good Looks Like
✅ Event Processing Rate: Steady activity matching workload ✅ Error Rate: < 5% (green zone) ✅ Event Types: Expected distribution ✅ Latency: p95 < 200ms, p99 < 500ms ✅ CPU Usage: < 50% sustained ✅ Memory: Stable or slow growth with GC cycles ✅ Goroutines: Stable baseline with activity spikes ✅ Service Health: All services green/up
Warning Signs
⚠️ Error Rate: 5-10% (yellow zone) ⚠️ Latency: p95 > 200ms or rising trend ⚠️ CPU: Sustained > 70% ⚠️ Memory: Continuous growth without GC ⚠️ Missing data: Gaps in metrics (service issues)
Critical Issues
🚨 Error Rate: > 10% (red zone) 🚨 Latency: p95 > 500ms 🚨 CPU: Sustained > 90% 🚨 Memory: Rapid growth or OOM 🚨 Service Health: Any service showing red/down 🚨 Traces: Missing or broken trace chains
Next Steps After the Tour
For Daily Operations:
- Bookmark: Save dashboard URL for quick access
- Set up alerts: Configure notifications for critical metrics
- Create views: Use filters to create focused views for your team
For Development:
- Add Observability to Your Agent - Instrument your own agents
- Debug with Distributed Tracing - Use Jaeger for troubleshooting
For Deep Understanding:
- Distributed Tracing Explained - Learn the concepts
- Observability Metrics Reference - Complete metrics catalog
Troubleshooting Tour Issues
| Issue | Solution | 
|---|---|
| Dashboard shows no data | Verify observability environment variables are set | 
| Grafana won’t load | Check docker-compose psin observability/ | 
| Metrics missing | Verify Prometheus targets at http://localhost:9090/targets | 
| Jaeger empty | Ensure trace context propagation is working | 
🎉 Congratulations! You’ve completed the interactive dashboard tour and learned to read AgentHub’s observability signals like a pro!
🎯 Ready for More?
Master the Tools: Use Grafana Dashboards - Advanced dashboard usage
Troubleshoot Issues: Debug with Distributed Tracing - Use Jaeger effectively
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.