1 - Interactive Dashboard Tour
Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.
Interactive Dashboard Tour
Learn by doing: Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.
Prerequisites
- Observability stack running (from the Observability Demo)
- Observable agents running (broker, publisher, subscriber)
- Grafana open at http://localhost:3333
- 10-15 minutes for the complete tour
Quick Setup Reminder
If you haven’t completed the observability demo yet:
# Start observability stack
cd agenthub/observability
docker-compose up -d
# Run observable agents (3 terminals)
go run broker/main.go
go run agents/subscriber/main.go
go run agents/publisher/main.go
Dashboard Navigation
Accessing the Main Dashboard
- Open Grafana: http://localhost:3333
- Login: admin / admin (skip password change for demo)
- Navigate: Dashboards β Browse β AgentHub β “AgentHub EDA System Observatory”
- Bookmark: Save this URL for quick access: http://localhost:3333/d/agenthub-eda-dashboard
Dashboard Layout Overview
The dashboard is organized in 4 main rows:
π― Row 1: Event Processing Overview
βββ Event Processing Rate (events/sec)
βββ Event Processing Error Rate (%)
π Row 2: Event Analysis
βββ Event Types Distribution (pie chart)
βββ Event Processing Latency (p50, p95, p99)
π Row 3: Distributed Tracing
βββ Jaeger Integration Panel
π» Row 4: System Health
βββ Service CPU Usage (%)
βββ Service Memory Usage (MB)
βββ Go Goroutines Count
βββ Service Health Status
Interactive Tour
Tour 1: Understanding Event Flow (3 minutes)
Step 1: Watch the Event Processing Rate
Location: Top-left panel
What to observe: Real-time lines showing events per second
- Identify the services: - Green line: agenthub-broker(should be highest - processes all events)
- Blue line: agenthub-publisher(events being created)
- Orange line: agenthub-subscriber(events being processed)
 
- Watch the pattern: - Publisher creates bursts of events
- Broker immediately processes them (routing)
- Subscriber processes them shortly after
 
- Understand the flow: - Publisher (creates) β Broker (routes) β Subscriber (processes)
     50/sec      β      150/sec     β      145/sec
 
π‘ Tour Insight: The broker rate is higher because it processes both incoming tasks AND outgoing results.
Step 2: Monitor Error Rates
Location: Top-right panel (gauge)
What to observe: Error percentage gauge
- Healthy system: Should show 0-2% (green zone) 
- If you see higher errors: - Check if all services are running
- Look for red traces in Jaeger (we’ll do this next)
 
- Error rate calculation: - Error Rate = (Failed Events / Total Events) Γ 100
 
π― Action: Note your current error rate - we’ll compare it later.
Tour 2: Event Analysis Deep Dive (3 minutes)
Step 3: Explore Event Types
Location: Middle-left panel (pie chart)
What to observe: Distribution of different event types
- Identify event types: - greeting: Most common (usually 40-50%)
- math_calculation: Compute-heavy tasks (30-40%)
- random_number: Quick tasks (15-25%)
- unknown_task: Error-generating tasks (2-5%)
 
- Business insights: - Larger slices = more frequent tasks
- Small red slice = intentional error tasks for testing
 
π‘ Tour Insight: The publisher randomly generates different task types to simulate real-world workload diversity.
Step 4: Analyze Processing Latency
Location: Middle-right panel
What to observe: Three latency lines (p50, p95, p99)
- Understand percentiles: - p50 (blue): 50% of events process faster than this
- p95 (green): 95% of events process faster than this
- p99 (red): 99% of events process faster than this
 
- Healthy ranges: - p50: < 50ms (very responsive)
- p95: < 200ms (good performance)
- p99: < 500ms (acceptable outliers)
 
- Pattern recognition: - Spiky p99 = occasional slow tasks (normal)
- Rising p50 = systemic slowdown (investigate)
- Flat lines = no activity or measurement issues
 
π― Action: Hover over the lines to see exact values at different times.
Tour 3: Distributed Tracing Exploration (4 minutes)
Step 5: Jump into Jaeger
Location: Middle section - “Distributed Traces” panel
Action: Click the “Explore” button
This opens Jaeger in a new tab. Let’s explore:
- In Jaeger UI: - Service dropdown: Select “agenthub-broker”
- Operation: Leave as “All”
- Click “Find Traces”
 
- Pick a trace to examine: - Look for traces that show multiple spans
- Click on any trace line to open details
 
- Understand the trace structure: - Timeline View:
agenthub-publisher: publish_event [2ms]
  βββ agenthub-broker: process_event [1ms]
      βββ agenthub-subscriber: consume_event [3ms]
          βββ agenthub-subscriber: process_task [15ms]
              βββ agenthub-subscriber: publish_result [2ms]
 
- Explore span details: - Click individual spans to see:- Tags: event_type, event_id, agent names
- Process: Which service handled the span
- Duration: Exact timing information
 
 
π‘ Tour Insight: Each event creates a complete “trace” showing its journey from creation to completion.
Step 6: Find and Analyze an Error
- Search for error traces: - In Jaeger, add tag filter: error=true
- Or look for traces with red spans
 
- Examine the error trace: - Red spans indicate errors
- Error tags show the error type and message
- Stack traces help with debugging
 
- Follow the error propagation: - See how errors affect child spans
- Notice error context in span attributes
 
π― Action: Find a trace with “unknown_task” event type - these are designed to fail for demonstration.
Tour 4: System Health Monitoring (3 minutes)
Step 7: Monitor Resource Usage
Location: Bottom row panels
What to observe: System resource consumption
- CPU Usage Panel (Bottom-left): - Normal range: 10-50% for demo workload
- Watch for: Sustained high CPU (>70%)
- Services comparison: See which service uses most CPU
 
- Memory Usage Panel (Bottom-center-left): - Normal range: 30-80MB per service for demo
- Watch for: Continuously growing memory (memory leaks)
- Pattern: Sawtooth = normal GC, steady growth = potential leak
 
- Goroutines Panel (Bottom-center-right): - Normal range: 10-50 goroutines per service
- Watch for: Continuously growing count (goroutine leaks)
- Pattern: Stable baseline with activity spikes
 
Step 8: Verify Service Health
Location: Bottom-right panel
What to observe: Service up/down status
- Health indicators: - Green: Service healthy and responding
- Red: Service down or health check failing
- Yellow: Service degraded but operational
 
- Health check details: - Each service exposes /healthendpoint
- Prometheus monitors these endpoints
- Dashboard shows aggregated status
 
π― Action: Open http://localhost:8080/health in a new tab to see raw health data.
Tour 5: Time-based Analysis (2 minutes)
Step 9: Change Time Ranges
Location: Top-right of dashboard (time picker)
Current: Likely showing “Last 5 minutes”
- Try different ranges: - Last 15 minutes: See longer trends
- Last 1 hour: See full demo session
- Custom range: Pick specific time period
 
- Observe pattern changes: - Longer ranges: Show trends and patterns
- Shorter ranges: Show real-time detail
- Custom ranges: Zoom into specific incidents
 
Step 10: Use Dashboard Filters
Location: Top of dashboard - variable dropdowns
- Service Filter: - Select “All” to see everything
- Pick specific service to focus analysis
- Useful for isolating service-specific issues
 
- Event Type Filter: - Filter to specific event types
- Compare performance across task types
- Identify problematic event categories
 
π‘ Tour Insight: Filtering helps you drill down from system-wide view to specific components or workloads.
Hands-on Experiments
Experiment 1: Create a Service Outage
Goal: See how the dashboard shows service failures
- Stop the subscriber: - # In subscriber terminal, press Ctrl+C
 
- Watch the dashboard changes: - Error rate increases (top-right gauge turns red)
- Subscriber metrics disappear from bottom panels
- Service health shows subscriber as down
 
- Check Jaeger for failed traces: - Look for traces that don’t complete
- See where the chain breaks
 
- Restart subscriber: - go run agents/subscriber/main.go
 
π― Learning: Dashboard immediately shows impact of service failures.
Experiment 2: Generate High Load
Goal: See system behavior under stress
- Modify publisher to generate more events: - # Edit agents/publisher/main.go
# Change: time.Sleep(5 * time.Second)
# To:     time.Sleep(1 * time.Second)
 
- Watch dashboard changes: - Processing rate increases
- Latency may increase
- CPU/memory usage grows
 
- Observe scaling behavior: - How does the system handle increased load?
- Do error rates increase?
- Where are the bottlenecks?
 
π― Learning: Dashboard shows system performance characteristics under load.
Dashboard Interpretation Guide
What Good Looks Like
β
 Event Processing Rate: Steady activity matching workload
β
 Error Rate: < 5% (green zone)
β
 Event Types: Expected distribution
β
 Latency: p95 < 200ms, p99 < 500ms
β
 CPU Usage: < 50% sustained
β
 Memory: Stable or slow growth with GC cycles
β
 Goroutines: Stable baseline with activity spikes
β
 Service Health: All services green/up
Warning Signs
β οΈ Error Rate: 5-10% (yellow zone)
β οΈ Latency: p95 > 200ms or rising trend
β οΈ CPU: Sustained > 70%
β οΈ Memory: Continuous growth without GC
β οΈ Missing data: Gaps in metrics (service issues)
Critical Issues
π¨ Error Rate: > 10% (red zone)
π¨ Latency: p95 > 500ms
π¨ CPU: Sustained > 90%
π¨ Memory: Rapid growth or OOM
π¨ Service Health: Any service showing red/down
π¨ Traces: Missing or broken trace chains
Next Steps After the Tour
For Daily Operations:
- Bookmark: Save dashboard URL for quick access
- Set up alerts: Configure notifications for critical metrics
- Create views: Use filters to create focused views for your team
For Development:
For Deep Understanding:
Troubleshooting Tour Issues
| Issue | Solution | 
|---|
| Dashboard shows no data | Verify observability environment variables are set | 
| Grafana won’t load | Check docker-compose psin observability/ | 
| Metrics missing | Verify Prometheus targets at http://localhost:9090/targets | 
| Jaeger empty | Ensure trace context propagation is working | 
π Congratulations! You’ve completed the interactive dashboard tour and learned to read AgentHub’s observability signals like a pro!
π― Ready for More?
Master the Tools: Use Grafana Dashboards - Advanced dashboard usage
Troubleshoot Issues: Debug with Distributed Tracing - Use Jaeger effectively
2 - AgentHub Observability Demo Tutorial
Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes through hands-on learning.
AgentHub Observability Demo Tutorial
Learn by doing: Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes.
What You’ll Learn
By the end of this tutorial, you’ll have:
- β
 Seen distributed traces flowing across multiple agents
- β
 Monitored real-time metrics in beautiful Grafana dashboards
- β
 Understood event correlation through trace IDs
- β
 Experienced intelligent alerting when things go wrong
- β
 Explored the complete observability stack components
Prerequisites
- Go 1.24+ installed
- Docker and Docker Compose installed
- Environment variables configured (see Installation and Setup)
- 10 minutes of your time
- Basic terminal knowledge
π‘ Environment Note: AgentHub agents automatically enable observability when JAEGER_ENDPOINT is configured. See Environment Variables Reference for all configuration options.
Step 1: Clone and Setup (1 minute)
# Clone the repository
git clone https://github.com/owulveryck/agenthub.git
cd agenthub
# Verify you have the observability files
ls observability/
# You should see: docker-compose.yml, grafana/, prometheus/, etc.
Step 2: Start the Observability Stack (2 minutes)
# Navigate to observability directory
cd observability
# Start all monitoring services
docker-compose up -d
# Verify services are running
docker-compose ps
Expected Output:
NAME                    COMMAND                  SERVICE             STATUS
agenthub-grafana        "/run.sh"                grafana             running
agenthub-jaeger         "/go/bin/all-in-one"     jaeger              running
agenthub-prometheus     "/bin/prometheus --cβ¦"   prometheus          running
agenthub-otel-collector "/otelcol-contrib --β¦"   otel-collector      running
π― Checkpoint 1: All services should be “running”. If not, check Docker logs: docker-compose logs <service-name>
Step 3: Access the Dashboards (1 minute)
Open these URLs in your browser (keep them open in tabs):
| Service | URL | Purpose | 
|---|
| Grafana | http://localhost:3333 | Main observability dashboard | 
| Jaeger | http://localhost:16686 | Distributed tracing | 
| Prometheus | http://localhost:9090 | Raw metrics and alerts | 
Grafana Login: admin / admin (skip password change for demo)
π― Checkpoint 2: You should see Grafana’s welcome page and Jaeger’s empty trace list.
Step 4: Start the Observable Broker (1 minute)
Open a new terminal and navigate back to the project root:
# From agenthub root directory
go run broker/main.go
Expected Output:
time=2025-09-28T21:00:00.000Z level=INFO msg="Starting health server on port 8080"
time=2025-09-28T21:00:00.000Z level=INFO msg="AgentHub broker gRPC server with observability listening" address="[::]:50051" health_endpoint="http://localhost:8080/health" metrics_endpoint="http://localhost:8080/metrics"
π― Checkpoint 3:
- Broker is listening on port 50051
- Health endpoint available at http://localhost:8080/health
- Metrics endpoint available at http://localhost:8080/metrics
Step 5: Start the Observable Subscriber (1 minute)
Open another terminal:
go run agents/subscriber/main.go
Expected Output:
time=2025-09-28T21:00:01.000Z level=INFO msg="Starting health server on port 8082"
time=2025-09-28T21:00:01.000Z level=INFO msg="Starting observable subscriber"
time=2025-09-28T21:00:01.000Z level=INFO msg="Agent started with observability. Listening for events and tasks."
π― Checkpoint 4:
- Subscriber is connected and listening
- Health available at http://localhost:8082/health
Step 6: Generate Events with the Publisher (2 minutes)
Open a third terminal:
go run agents/publisher/main.go
Expected Output:
time=2025-09-28T21:00:02.000Z level=INFO msg="Starting health server on port 8081"
time=2025-09-28T21:00:02.000Z level=INFO msg="Starting observable publisher demo"
time=2025-09-28T21:00:02.000Z level=INFO msg="Publishing task" task_id=task_greeting_1727557202 task_type=greeting responder_agent_id=agent_demo_subscriber
time=2025-09-28T21:00:02.000Z level=INFO msg="Task published successfully" task_id=task_greeting_1727557202 task_type=greeting
π― Checkpoint 5: You should see:
- Publisher creating and sending tasks
- Subscriber receiving and processing tasks
- Broker routing messages between them
Step 7: Explore Real-time Metrics in Grafana (2 minutes)
- Go to Grafana: http://localhost:3333
- Navigate to Dashboards β Browse β AgentHub β “AgentHub EDA System Observatory”
- Observe the real-time data:
What You’ll See:
Event Processing Rate (Top Left)
- Lines showing events/second for each service
- Should show activity spikes when publisher runs
Error Rate (Top Right)
- Gauge showing error percentage
- Should be green (< 5% errors)
Event Types Distribution (Middle Left)
- Pie chart showing task types: greeting, math_calculation, random_number
- Different colors for each task type
Processing Latency (Middle Right)
- Three lines: p50, p95, p99 latencies
- Should show sub-second processing times
System Health (Bottom)
- CPU usage, memory usage, goroutines
- Service health status (all should be UP)
π― Checkpoint 6: Dashboard should show live metrics with recent activity.
Step 8: Explore Distributed Traces in Jaeger (2 minutes)
- Go to Jaeger: http://localhost:16686
- Select Service: Choose “agenthub-broker” from dropdown
- Click “Find Traces”
- Click on any trace to see details
What You’ll See:
Complete Event Journey:
agenthub-publisher: publish_event (2ms)
  βββ agenthub-broker: process_event (1ms)
      βββ agenthub-subscriber: consume_event (5ms)
          βββ agenthub-subscriber: process_task (15ms)
              βββ agenthub-subscriber: publish_result (2ms)
Trace Details:
- Span Tags: event_id, event_type, service names
- Timing Information: Exact start/end times and durations
- Log Correlation: Each span linked to structured logs
Error Detection:
- Look for red spans indicating errors
- Trace the “unknown_task” type to see how errors propagate
π― Checkpoint 7: You should see complete traces showing the full event lifecycle.
Step 9: Correlate Logs with Traces (1 minute)
- Copy a trace ID from Jaeger (the long hex string) 
- Check broker logs for that trace ID: - # In your broker terminal, look for lines like:
time=2025-09-28T21:00:02.000Z level=INFO msg="Received task request" task_id=task_greeting_1727557202 trace_id=a1b2c3d4e5f6...
 
- Check subscriber logs for the same trace ID 
π― Checkpoint 8: You should find the same trace_id in logs across multiple services.
Step 10: Experience Intelligent Alerting (Optional)
To see alerting in action:
- Simulate errors by stopping the subscriber: - # In subscriber terminal, press Ctrl+C
 
- Keep publisher running (it will fail to process tasks) 
- Check Prometheus alerts: - Go to http://localhost:9090/alerts
- After ~5 minutes, you should see “HighEventProcessingErrorRate” firing
 
- Restart subscriber to clear the alert 
π Congratulations!
You’ve successfully experienced the complete AgentHub observability stack!
Summary: What You Accomplished
β
 Deployed a complete observability stack with Docker Compose
β
 Ran observable agents with automatic instrumentation
β
 Monitored real-time metrics in Grafana dashboards
β
 Traced event flows across multiple services with Jaeger
β
 Correlated logs with traces using trace IDs
β
 Experienced intelligent alerting with Prometheus
β
 Understood the complete event lifecycle from publisher to subscriber
Key Observability Concepts You Learned
Distributed Tracing
- Events get unique trace IDs that follow them everywhere
- Each processing step creates a “span” with timing information
- Complete request flows are visible across service boundaries
Metrics Collection
- 47+ different metrics automatically collected
- Real-time visualization of system health and performance
- Historical data for trend analysis
Structured Logging
- All logs include trace context for correlation
- Consistent format across all services
- Easy debugging and troubleshooting
Intelligent Alerting
- Proactive monitoring for error rates and performance
- Automatic notifications when thresholds are exceeded
- Helps prevent issues before they impact users
Next Steps
For Development:
For Operations:
For Understanding:
Troubleshooting
| Issue | Solution | 
|---|
| Services won’t start | Run docker-compose down && docker-compose up -d | 
| No metrics in Grafana | Check Prometheus targets: http://localhost:9090/targets | 
| No traces in Jaeger | Verify JAEGER_ENDPOINT environment variable is set correctly | 
| Permission errors | Ensure Docker has proper permissions | 
Clean Up
When you’re done exploring:
# Stop the observability stack
cd observability
docker-compose down
# Stop the Go applications
# Press Ctrl+C in each terminal running the agents
π― Ready for More?
Production Usage: Add Observability to Your Agent
Deep Understanding: Distributed Tracing Explained