Documentation
AgentHub Documentation
Welcome to the AgentHub documentation! This comprehensive guide will help you understand, install, and use the Agent2Agent (A2A) protocol-compliant communication broker to build sophisticated multi-agent systems with Event-Driven Architecture scalability.
π Quick Start
New to AgentHub?
Start with our Installation and Setup tutorial, then follow the Running the Demo guide to see AgentHub in action.
Want to understand the concepts?
Read about The Agent2Agent Principle to understand the philosophy behind AgentHub.
Ready to build agents?
Use our guides to Create a Publisher or Create a Subscriber.
Need technical details?
Check the API Reference and Configuration Reference.
π Documentation Types
Our documentation follows the DiΓ‘taxis framework with four distinct types:
Tutorials - Learning by doing
Step-by-step guides that teach you how to use AgentHub through practical exercises:
How-to Guides - Solving specific problems
Goal-oriented guides for accomplishing specific tasks:
Reference - Technical specifications
Comprehensive technical documentation and API details:
Explanation - Understanding concepts
In-depth discussions that provide context and background:
π― Learning Paths
For Beginners
- Installation and Setup
- The Agent2Agent Principle
- Running the Demo
- Understanding Tasks
- Create a Subscriber
For Experienced Developers
- Running the Demo
- API Reference
- Create a Publisher
- Performance and Scaling
For System Architects
- The Agent2Agent Principle
- Broker Architecture
- Performance and Scaling
- Configuration Reference
π§ System Overview
AgentHub enables autonomous agents to collaborate through A2A protocol-compliant task delegation with EDA scalability:
- A2A Protocol Compliance with standardized Message, Task, and Artifact formats
- Event-Driven Architecture for scalable asynchronous communication
- Flexible agent addressing (direct, broadcast, topic-based routing)
- Rich task semantics with A2A lifecycle states and priorities
- Built-in resilience with EDA patterns and graceful failure handling
- Comprehensive observability with distributed tracing and metrics
π οΈ Key Technologies
- Protocol: gRPC with Protocol Buffers
- Language: Go 1.24+
- Architecture: Event-driven with pub/sub patterns
- Observability: OpenTelemetry integration
- Message Format: Structured protobuf with flexible JSON parameters
1 - Tutorials
Learning-oriented guides that take you through practical exercises to master AgentHub
Tutorials
These hands-on tutorials will guide you through learning AgentHub by doing. Each tutorial is designed to be followed step-by-step and will help you build practical experience with the system.
π Tutorial Categories
π― Learning Path
1. Start Here
Begin with Getting Started tutorials to install and run your first examples
2. Build Systems
Progress to Workflows to create sophisticated agent interactions
3. Monitor & Observe
Master Observability to monitor and troubleshoot your deployments
π Prerequisites
Before starting these tutorials, make sure you have:
- Go 1.24 or later installed
- Basic understanding of command-line tools
- Familiarity with distributed systems concepts (helpful but not required)
π‘ Tutorial Tips
- Follow tutorials in order for the best learning experience
- Each tutorial builds on concepts from previous ones
- Code examples are tested and should work as written
- Don’t hesitate to experiment beyond the tutorial steps
Note
All tutorials are designed to work on Linux, macOS, and Windows. Platform-specific instructions are provided where needed.1.1 - Getting Started
Essential tutorials to get you up and running with AgentHub
Getting Started Tutorials
Step-by-step tutorials to help you get AgentHub installed, configured, and running your first examples.
Available Tutorials
1.1.1 - Installation and Setup Tutorial
Guide for installing AgentHub and setting up your development environment from scratch. Get a working A2A-compliant AgentHub installation ready for building agent systems.
Installation and Setup Tutorial
This tutorial will guide you through installing AgentHub and setting up your development environment from scratch. By the end, you’ll have a working A2A-compliant AgentHub installation ready for building Agent2Agent protocol systems.
Prerequisites Check
Before we begin, let’s verify you have the required software installed.
Step 1: Verify Go Installation
Check if Go 1.24+ is installed:
You should see output like:
go version go1.24.0 darwin/amd64
If Go is not installed or the version is older than 1.24:
macOS (using Homebrew):
Linux (using package manager):
# Ubuntu/Debian
sudo apt update && sudo apt install golang-go
# CentOS/RHEL
sudo yum install golang
# Arch Linux
sudo pacman -S go
Windows:
Download from https://golang.org/dl/ and run the installer.
Step 2: Verify Protocol Buffers Compiler
Check if protoc is installed:
You should see output like:
libprotoc 3.21.12
If protoc is not installed:
macOS (using Homebrew):
Linux:
# Ubuntu/Debian
sudo apt update && sudo apt install protobuf-compiler
# CentOS/RHEL
sudo yum install protobuf-compiler
# Arch Linux
sudo pacman -S protobuf
Windows:
Download from Protocol Buffers releases and add to PATH.
Step 3: Install Go Protocol Buffer Plugins
Install the required Go plugins for Protocol Buffers:
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
Verify the plugins are in your PATH:
which protoc-gen-go
which protoc-gen-go-grpc
Both commands should return paths to the installed plugins.
Installing AgentHub
Step 4: Clone the Repository
Clone the AgentHub repository:
git clone https://github.com/owulveryck/agenthub.git
cd agenthub
Step 5: Verify Project Structure
Let’s explore what we have:
You should see:
drwxr-xr-x agents/           # Sample A2A agent implementations
drwxr-xr-x broker/           # A2A-compliant AgentHub broker server
drwxr-xr-x documentation/    # Complete A2A documentation
drwxr-xr-x events/           # Generated A2A protocol code
drwxr-xr-x internal/         # Internal packages and abstractions
-rw-r--r-- go.mod            # Go module definition
-rw-r--r-- Makefile         # Build automation
drwxr-xr-x proto/           # A2A protocol definitions
-rw-r--r-- README.md        # Project overview
Step 6: Initialize Go Module
Ensure Go modules are properly initialized:
This downloads all required dependencies. You should see output about downloading packages.
Step 7: Generate Protocol Buffer Code
Generate the Go code from Protocol Buffer definitions:
You should see:
Generating protobuf code for A2A protocol definitions...
Generating proto/eventbus.proto...
Generating proto/a2a.proto...
Protobuf code generated successfully.
Verify the generated files exist:
You should see:
a2a/          # A2A protocol definitions
eventbus/     # AgentHub broker definitions
You should see:
a2a.pb.go
a2a_grpc.pb.go
You should see:
eventbus.pb.go
eventbus_grpc.pb.go
Step 8: Build All Components
Build the AgentHub components:
You should see:
Building A2A-compliant server binary...
Building A2A publisher binary...
Building A2A subscriber binary...
Build complete. A2A-compliant binaries are in the 'bin/' directory.
Verify the binaries were created:
You should see:
agenthub-server  # A2A-compliant AgentHub broker
publisher        # A2A message publisher
subscriber       # A2A message subscriber
Verification Test
Let’s verify everything works by running a quick test.
Step 9: Test the Installation
Start the A2A-compliant broker server in the background:
You should see:
2025/09/28 10:00:00 A2A-compliant AgentHub broker gRPC server listening on [::]:50051
2025/09/28 10:00:00 AgentHub service ready for A2A protocol communication
Start an A2A subscriber agent:
You should see:
A2A Agent started. Listening for A2A events and tasks. Press Enter to stop.
2025/09/28 10:00:05 A2A Agent agent_demo_subscriber subscribing to A2A tasks...
2025/09/28 10:00:05 Successfully subscribed to A2A tasks for agent agent_demo_subscriber. Waiting for A2A tasks...
Run the A2A publisher to send test tasks:
You should see A2A tasks being published and processed with conversation context and structured artifacts.
Clean up the test processes:
pkill -f agenthub-server
pkill -f subscriber
Development Environment Setup
For VS Code users:
Install the Go extension:
- Open VS Code
- Go to Extensions (Ctrl+Shift+X)
- Search for “Go” and install the official Go extension
- Open the AgentHub project folder
For other editors:
Ensure your editor has Go language support and Protocol Buffer syntax highlighting.
Step 11: Set Up Environment Variables (Recommended)
AgentHub uses environment variables for configuration. Create a .envrc file for local development:
cat > .envrc << EOF
# Core A2A AgentHub Configuration
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"
# A2A Protocol Configuration
export AGENTHUB_A2A_PROTOCOL_VERSION="1.0"
export AGENTHUB_MESSAGE_BUFFER_SIZE="100"
export AGENTHUB_CONTEXT_TIMEOUT="30s"
export AGENTHUB_ARTIFACT_MAX_SIZE="10MB"
# Health Check Ports
export AGENTHUB_HEALTH_PORT="8080"
export A2A_PUBLISHER_HEALTH_PORT="8081"
export A2A_SUBSCRIBER_HEALTH_PORT="8082"
# Observability (optional for development)
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export SERVICE_NAME="agenthub-dev"
export SERVICE_VERSION="dev"
export ENVIRONMENT="development"
export LOG_LEVEL="DEBUG"
EOF
Install direnv for automatic loading (recommended):
# macOS
brew install direnv
# Ubuntu/Debian
sudo apt install direnv
# After installation, add to your shell
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc  # For bash
echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc    # For zsh
Allow the environment file:
Alternative: Manual loading
π For complete environment variable reference, see Environment Variables Reference
Step 12: Verify Make Targets
Test all available make targets:
You should see all available commands:
Makefile for gRPC Event Bus
Usage:
  make <target>
Targets:
  all              Builds all binaries (default).
  proto            Generates Go code from .proto files.
  build            Builds the server, publisher, and subscriber binaries.
  run-server       Runs the event bus gRPC server.
  run-publisher    Runs the publisher client.
  run-subscriber   Runs the subscriber client.
  clean            Removes generated Go files and build artifacts.
  help             Displays this help message.
Common Issues and Solutions
Issue: “protoc-gen-go: program not found”
Solution: Ensure Go bin directory is in your PATH:
export PATH=$PATH:$(go env GOPATH)/bin
echo 'export PATH=$PATH:$(go env GOPATH)/bin' >> ~/.bashrc
source ~/.bashrc
Issue: “go.mod not found”
Solution: Ensure you’re in the AgentHub project directory:
pwd  # Should show .../agenthub
ls go.mod  # Should exist
Issue: Port 50051 already in use
Solution: Kill existing processes or change the port:
lsof -ti:50051 | xargs kill -9
Issue: Permission denied on binaries
Solution: Make binaries executable:
Next Steps
Now that you have AgentHub installed and verified:
- Learn the basics: Follow the Running the Demo tutorial
- Build your first agent: Try Create a Subscriber
- Understand the concepts: Read The Agent2Agent Principle
Getting Help
If you encounter issues:
- Check the troubleshooting section above
- Review the complete documentation
- Open an issue on the GitHub repository
Congratulations! You now have a fully functional AgentHub development environment ready for building autonomous agent systems.
1.1.2 - Running the A2A-Compliant AgentHub Demo
Walk through setting up and running the complete A2A-compliant AgentHub EDA broker system. Learn how agents communicate using Agent2Agent protocol messages through the Event-Driven Architecture broker.
Running the A2A-Compliant AgentHub Demo
This tutorial will walk you through setting up and running the complete Agent2Agent (A2A) protocol-compliant AgentHub Event-Driven Architecture (EDA) broker system. By the end of this tutorial, you’ll have agents communicating using standardized A2A messages through the scalable EDA broker.
Prerequisites
- Go 1.24 or later installed
- Protocol Buffers compiler (protoc) installed
- Basic understanding of gRPC and message brokers
Step 1: Build the A2A-Compliant Components
First, let’s build all the A2A-compliant components using the Makefile:
# Build all A2A-compliant binaries (generates protobuf files first)
make build
This will:
- Generate A2A protocol files from proto/a2a_core.protoandproto/eventbus.proto
- Build the A2A-compliant broker, publisher, and subscriber binaries
- Place all binaries in the bin/directory
You should see output like:
Building A2A-compliant server binary...
Building A2A-compliant publisher binary...
Building A2A-compliant subscriber binary...
Build complete. A2A-compliant binaries are in the 'bin/' directory.
Step 2: Start the AgentHub Broker Server
Open a terminal and start the AgentHub broker server:
You should see output like:
time=2025-09-29T11:51:26.612+02:00 level=INFO msg="Starting health server" port=8080
time=2025-09-29T11:51:26.611+02:00 level=INFO msg="AgentHub gRPC server with observability listening" address=[::]:50051 health_endpoint=http://localhost:8080/health metrics_endpoint=http://localhost:8080/metrics component=broker
Keep this terminal open - the AgentHub broker needs to run continuously.
Step 3: Start an Agent (Subscriber)
Open a second terminal and start an agent that can receive and process tasks:
You should see output indicating the agent has started:
time=2025-09-29T11:52:04.727+02:00 level=INFO msg="AgentHub client started with observability" broker_addr=localhost:50051 component=subscriber
time=2025-09-29T11:52:04.727+02:00 level=INFO msg="Starting health server" port=8082
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Agent started with observability. Listening for events and tasks."
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Subscribing to task results" agent_id=agent_demo_subscriber
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Subscribing to tasks" agent_id=agent_demo_subscriber
This agent can process several types of tasks:
- greeting: Simple greeting messages
- math_calculation: Basic arithmetic operations
- random_number: Random number generation
- Any unknown task type will be rejected
Step 4: Send A2A-Compliant Tasks
Open a third terminal and run the publisher to send A2A protocol-compliant task messages:
You’ll see the publisher send various A2A-compliant task messages through the AgentHub EDA broker:
time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Starting publisher demo"
time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Testing Agent2Agent Task Publishing via AgentHub with observability"
time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Publishing A2A task" task_id=task_greeting_1759149671 task_type=greeting responder_agent_id=agent_demo_subscriber context_id=ctx_greeting_1759149671
time=2025-09-29T14:41:11.242+02:00 level=INFO msg="A2A task published successfully" task_id=task_greeting_1759149671 task_type=greeting event_id=evt_msg_greeting_1759149671_1759149671
time=2025-09-29T14:41:11.242+02:00 level=INFO msg="Published greeting task" task_id=task_greeting_1759149671
time=2025-09-29T14:41:14.243+02:00 level=INFO msg="Publishing A2A task" task_id=task_math_calculation_1759149674 task_type=math_calculation responder_agent_id=agent_demo_subscriber context_id=ctx_math_calculation_1759149674
time=2025-09-29T14:41:14.247+02:00 level=INFO msg="A2A task published successfully" task_id=task_math_calculation_1759149674 task_type=math_calculation event_id=evt_msg_math_calculation_1759149674_1759149674
time=2025-09-29T14:41:16.248+02:00 level=INFO msg="Publishing A2A task" task_id=task_random_number_1759149676 task_type=random_number responder_agent_id=agent_demo_subscriber context_id=ctx_random_number_1759149676
time=2025-09-29T14:41:16.249+02:00 level=INFO msg="Published random number task" task_id=task_random_number_1759149676
Notice how the A2A implementation includes:
- Context IDs: Each task is grouped in a conversation context (ctx_greeting_...)
- Event IDs: EDA wrapper events have unique identifiers for tracing
- A2A Task Structure: Tasks use A2A-compliant Message and Part formats
Step 5: Observe A2A Task Processing
Switch back to the subscriber terminal to see the agent processing A2A tasks in real-time:
time=2025-09-29T14:41:11.243+02:00 level=INFO msg="Task processing completed" task_id=task_greeting_1759149671 status=TASK_STATE_COMPLETED has_artifact=true
time=2025-09-29T14:41:14.253+02:00 level=INFO msg="Task processing completed" task_id=task_math_calculation_1759149674 status=TASK_STATE_COMPLETED has_artifact=true
time=2025-09-29T14:41:16.249+02:00 level=INFO msg="Task processing completed" task_id=task_random_number_1759149676 status=TASK_STATE_COMPLETED has_artifact=true
Notice the A2A-compliant processing:
- Task States: Using A2A standard states (TASK_STATE_COMPLETED)
- Artifacts: Each completed task generates A2A artifacts (has_artifact=true)
- Structured Processing: Tasks are processed using A2A Message and Part handlers
Step 6: Check the Broker Logs
In the first terminal (broker server), you’ll see logs showing message routing:
2025/09/27 16:34:33 Received task request: task_greeting_1758983673 (type: greeting) from agent: agent_demo_publisher
2025/09/27 16:34:35 Received task result for task: task_greeting_1758983673 from agent: agent_demo_subscriber
2025/09/27 16:34:35 Received task progress for task: task_greeting_1758983673 (100%) from agent: agent_demo_subscriber
Understanding What Happened
- A2A Message Creation: The publisher created A2A-compliant messages with: - Message Structure: Using A2A Message format with Part content
- Context Grouping: Each task belongs to a conversation context
- Task Association: Messages are linked to specific A2A tasks
- Role Definition: Messages specify USER (requester) or AGENT (responder) roles
 
- EDA Event Routing: The AgentHub EDA broker: - Wrapped A2A Messages: A2A messages wrapped in AgentEvent for EDA transport
- Event-Driven Routing: Used EDA patterns for scalable message delivery
- Task Storage: Stored A2A tasks with full message history and artifacts
- Status Tracking: Managed A2A task lifecycle (SUBMITTED β WORKING β COMPLETED)
 
- A2A Task Processing: The subscriber agent: - A2A Task Reception: Received A2A tasks via EDA event streams
- Message Processing: Processed A2A Message content using Part handlers
- Artifact Generation: Generated structured A2A artifacts as task output
- Status Updates: Published A2A-compliant status updates through EDA events
 
- Hybrid Architecture Benefits: - A2A Compliance: Full interoperability with other A2A-compliant systems
- EDA Scalability: Event-driven patterns for high-throughput scenarios
- Standards-Based: Using industry-standard Agent2Agent protocol
- Observable: Built-in tracing and metrics for production deployment
 
Next Steps
Now that you have the basic system working, you can:
- Create Multiple Agents: Run multiple subscriber instances with different agent IDs to see task distribution
- Add Custom Task Types: Modify the subscriber to handle new types of tasks
- Build a Request-Response Flow: Create an agent that both requests and processes tasks
- Monitor Task Progress: Build a dashboard that subscribes to task progress updates
Troubleshooting
Port Already in Use: If you see “bind: address already in use”, kill any existing processes:
lsof -ti:50051 | xargs kill -9
Agent Not Receiving Tasks: Ensure the agent ID in the publisher matches the subscriber’s agent ID (agent_demo_subscriber).
Build Errors: Regenerate A2A protocol buffer files and ensure all imports are correct:
# Clean old protobuf files
make clean
# Regenerate A2A protobuf files
make proto
# Rebuild everything
make build
A2A Compliance Issues: Verify A2A protocol structures are correctly generated:
# Check A2A core types
ls events/a2a/
# Should show: a2a_core.pb.go eventbus.pb.go eventbus_grpc.pb.go
You now have a working A2A-compliant AgentHub EDA broker system! The agents can exchange standardized A2A messages, maintain conversation contexts, generate structured artifacts, and track task lifecycles - all through your scalable Event-Driven Architecture broker with full Agent2Agent protocol compliance.
1.2 - Observability
Tutorials for monitoring and observing AgentHub systems
Observability Tutorials
Learn how to monitor, trace, and observe your AgentHub deployments with comprehensive observability features.
Available Tutorials
1.2.1 - Interactive Dashboard Tour
Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.
Interactive Dashboard Tour
Learn by doing: Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.
Prerequisites
- Observability stack running (from the Observability Demo)
- Observable agents running (broker, publisher, subscriber)
- Grafana open at http://localhost:3333
- 10-15 minutes for the complete tour
Quick Setup Reminder
If you haven’t completed the observability demo yet:
# Start observability stack
cd agenthub/observability
docker-compose up -d
# Run observable agents (3 terminals)
go run broker/main.go
go run agents/subscriber/main.go
go run agents/publisher/main.go
Dashboard Navigation
Accessing the Main Dashboard
- Open Grafana: http://localhost:3333
- Login: admin / admin (skip password change for demo)
- Navigate: Dashboards β Browse β AgentHub β “AgentHub EDA System Observatory”
- Bookmark: Save this URL for quick access: http://localhost:3333/d/agenthub-eda-dashboard
Dashboard Layout Overview
The dashboard is organized in 4 main rows:
π― Row 1: Event Processing Overview
βββ Event Processing Rate (events/sec)
βββ Event Processing Error Rate (%)
π Row 2: Event Analysis
βββ Event Types Distribution (pie chart)
βββ Event Processing Latency (p50, p95, p99)
π Row 3: Distributed Tracing
βββ Jaeger Integration Panel
π» Row 4: System Health
βββ Service CPU Usage (%)
βββ Service Memory Usage (MB)
βββ Go Goroutines Count
βββ Service Health Status
Interactive Tour
Tour 1: Understanding Event Flow (3 minutes)
Step 1: Watch the Event Processing Rate
Location: Top-left panel
What to observe: Real-time lines showing events per second
- Identify the services: - Green line: agenthub-broker(should be highest - processes all events)
- Blue line: agenthub-publisher(events being created)
- Orange line: agenthub-subscriber(events being processed)
 
- Watch the pattern: - Publisher creates bursts of events
- Broker immediately processes them (routing)
- Subscriber processes them shortly after
 
- Understand the flow: - Publisher (creates) β Broker (routes) β Subscriber (processes)
     50/sec      β      150/sec     β      145/sec
 
π‘ Tour Insight: The broker rate is higher because it processes both incoming tasks AND outgoing results.
Step 2: Monitor Error Rates
Location: Top-right panel (gauge)
What to observe: Error percentage gauge
- Healthy system: Should show 0-2% (green zone) 
- If you see higher errors: - Check if all services are running
- Look for red traces in Jaeger (we’ll do this next)
 
- Error rate calculation: - Error Rate = (Failed Events / Total Events) Γ 100
 
π― Action: Note your current error rate - we’ll compare it later.
Tour 2: Event Analysis Deep Dive (3 minutes)
Step 3: Explore Event Types
Location: Middle-left panel (pie chart)
What to observe: Distribution of different event types
- Identify event types: - greeting: Most common (usually 40-50%)
- math_calculation: Compute-heavy tasks (30-40%)
- random_number: Quick tasks (15-25%)
- unknown_task: Error-generating tasks (2-5%)
 
- Business insights: - Larger slices = more frequent tasks
- Small red slice = intentional error tasks for testing
 
π‘ Tour Insight: The publisher randomly generates different task types to simulate real-world workload diversity.
Step 4: Analyze Processing Latency
Location: Middle-right panel
What to observe: Three latency lines (p50, p95, p99)
- Understand percentiles: - p50 (blue): 50% of events process faster than this
- p95 (green): 95% of events process faster than this
- p99 (red): 99% of events process faster than this
 
- Healthy ranges: - p50: < 50ms (very responsive)
- p95: < 200ms (good performance)
- p99: < 500ms (acceptable outliers)
 
- Pattern recognition: - Spiky p99 = occasional slow tasks (normal)
- Rising p50 = systemic slowdown (investigate)
- Flat lines = no activity or measurement issues
 
π― Action: Hover over the lines to see exact values at different times.
Tour 3: Distributed Tracing Exploration (4 minutes)
Step 5: Jump into Jaeger
Location: Middle section - “Distributed Traces” panel
Action: Click the “Explore” button
This opens Jaeger in a new tab. Let’s explore:
- In Jaeger UI: - Service dropdown: Select “agenthub-broker”
- Operation: Leave as “All”
- Click “Find Traces”
 
- Pick a trace to examine: - Look for traces that show multiple spans
- Click on any trace line to open details
 
- Understand the trace structure: - Timeline View:
agenthub-publisher: publish_event [2ms]
  βββ agenthub-broker: process_event [1ms]
      βββ agenthub-subscriber: consume_event [3ms]
          βββ agenthub-subscriber: process_task [15ms]
              βββ agenthub-subscriber: publish_result [2ms]
 
- Explore span details: - Click individual spans to see:- Tags: event_type, event_id, agent names
- Process: Which service handled the span
- Duration: Exact timing information
 
 
π‘ Tour Insight: Each event creates a complete “trace” showing its journey from creation to completion.
Step 6: Find and Analyze an Error
- Search for error traces: - In Jaeger, add tag filter: error=true
- Or look for traces with red spans
 
- Examine the error trace: - Red spans indicate errors
- Error tags show the error type and message
- Stack traces help with debugging
 
- Follow the error propagation: - See how errors affect child spans
- Notice error context in span attributes
 
π― Action: Find a trace with “unknown_task” event type - these are designed to fail for demonstration.
Tour 4: System Health Monitoring (3 minutes)
Step 7: Monitor Resource Usage
Location: Bottom row panels
What to observe: System resource consumption
- CPU Usage Panel (Bottom-left): - Normal range: 10-50% for demo workload
- Watch for: Sustained high CPU (>70%)
- Services comparison: See which service uses most CPU
 
- Memory Usage Panel (Bottom-center-left): - Normal range: 30-80MB per service for demo
- Watch for: Continuously growing memory (memory leaks)
- Pattern: Sawtooth = normal GC, steady growth = potential leak
 
- Goroutines Panel (Bottom-center-right): - Normal range: 10-50 goroutines per service
- Watch for: Continuously growing count (goroutine leaks)
- Pattern: Stable baseline with activity spikes
 
Step 8: Verify Service Health
Location: Bottom-right panel
What to observe: Service up/down status
- Health indicators: - Green: Service healthy and responding
- Red: Service down or health check failing
- Yellow: Service degraded but operational
 
- Health check details: - Each service exposes /healthendpoint
- Prometheus monitors these endpoints
- Dashboard shows aggregated status
 
π― Action: Open http://localhost:8080/health in a new tab to see raw health data.
Tour 5: Time-based Analysis (2 minutes)
Step 9: Change Time Ranges
Location: Top-right of dashboard (time picker)
Current: Likely showing “Last 5 minutes”
- Try different ranges: - Last 15 minutes: See longer trends
- Last 1 hour: See full demo session
- Custom range: Pick specific time period
 
- Observe pattern changes: - Longer ranges: Show trends and patterns
- Shorter ranges: Show real-time detail
- Custom ranges: Zoom into specific incidents
 
Step 10: Use Dashboard Filters
Location: Top of dashboard - variable dropdowns
- Service Filter: - Select “All” to see everything
- Pick specific service to focus analysis
- Useful for isolating service-specific issues
 
- Event Type Filter: - Filter to specific event types
- Compare performance across task types
- Identify problematic event categories
 
π‘ Tour Insight: Filtering helps you drill down from system-wide view to specific components or workloads.
Hands-on Experiments
Experiment 1: Create a Service Outage
Goal: See how the dashboard shows service failures
- Stop the subscriber: - # In subscriber terminal, press Ctrl+C
 
- Watch the dashboard changes: - Error rate increases (top-right gauge turns red)
- Subscriber metrics disappear from bottom panels
- Service health shows subscriber as down
 
- Check Jaeger for failed traces: - Look for traces that don’t complete
- See where the chain breaks
 
- Restart subscriber: - go run agents/subscriber/main.go
 
π― Learning: Dashboard immediately shows impact of service failures.
Experiment 2: Generate High Load
Goal: See system behavior under stress
- Modify publisher to generate more events: - # Edit agents/publisher/main.go
# Change: time.Sleep(5 * time.Second)
# To:     time.Sleep(1 * time.Second)
 
- Watch dashboard changes: - Processing rate increases
- Latency may increase
- CPU/memory usage grows
 
- Observe scaling behavior: - How does the system handle increased load?
- Do error rates increase?
- Where are the bottlenecks?
 
π― Learning: Dashboard shows system performance characteristics under load.
Dashboard Interpretation Guide
What Good Looks Like
β
 Event Processing Rate: Steady activity matching workload
β
 Error Rate: < 5% (green zone)
β
 Event Types: Expected distribution
β
 Latency: p95 < 200ms, p99 < 500ms
β
 CPU Usage: < 50% sustained
β
 Memory: Stable or slow growth with GC cycles
β
 Goroutines: Stable baseline with activity spikes
β
 Service Health: All services green/up
Warning Signs
β οΈ Error Rate: 5-10% (yellow zone)
β οΈ Latency: p95 > 200ms or rising trend
β οΈ CPU: Sustained > 70%
β οΈ Memory: Continuous growth without GC
β οΈ Missing data: Gaps in metrics (service issues)
Critical Issues
π¨ Error Rate: > 10% (red zone)
π¨ Latency: p95 > 500ms
π¨ CPU: Sustained > 90%
π¨ Memory: Rapid growth or OOM
π¨ Service Health: Any service showing red/down
π¨ Traces: Missing or broken trace chains
Next Steps After the Tour
For Daily Operations:
- Bookmark: Save dashboard URL for quick access
- Set up alerts: Configure notifications for critical metrics
- Create views: Use filters to create focused views for your team
For Development:
For Deep Understanding:
Troubleshooting Tour Issues
| Issue | Solution | 
|---|
| Dashboard shows no data | Verify observability environment variables are set | 
| Grafana won’t load | Check docker-compose psin observability/ | 
| Metrics missing | Verify Prometheus targets at http://localhost:9090/targets | 
| Jaeger empty | Ensure trace context propagation is working | 
π Congratulations! You’ve completed the interactive dashboard tour and learned to read AgentHub’s observability signals like a pro!
π― Ready for More?
Master the Tools: Use Grafana Dashboards - Advanced dashboard usage
Troubleshoot Issues: Debug with Distributed Tracing - Use Jaeger effectively
1.2.2 - AgentHub Observability Demo Tutorial
Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes through hands-on learning.
AgentHub Observability Demo Tutorial
Learn by doing: Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes.
What You’ll Learn
By the end of this tutorial, you’ll have:
- β
 Seen distributed traces flowing across multiple agents
- β
 Monitored real-time metrics in beautiful Grafana dashboards
- β
 Understood event correlation through trace IDs
- β
 Experienced intelligent alerting when things go wrong
- β
 Explored the complete observability stack components
Prerequisites
- Go 1.24+ installed
- Docker and Docker Compose installed
- Environment variables configured (see Installation and Setup)
- 10 minutes of your time
- Basic terminal knowledge
π‘ Environment Note: AgentHub agents automatically enable observability when JAEGER_ENDPOINT is configured. See Environment Variables Reference for all configuration options.
Step 1: Clone and Setup (1 minute)
# Clone the repository
git clone https://github.com/owulveryck/agenthub.git
cd agenthub
# Verify you have the observability files
ls observability/
# You should see: docker-compose.yml, grafana/, prometheus/, etc.
Step 2: Start the Observability Stack (2 minutes)
# Navigate to observability directory
cd observability
# Start all monitoring services
docker-compose up -d
# Verify services are running
docker-compose ps
Expected Output:
NAME                    COMMAND                  SERVICE             STATUS
agenthub-grafana        "/run.sh"                grafana             running
agenthub-jaeger         "/go/bin/all-in-one"     jaeger              running
agenthub-prometheus     "/bin/prometheus --cβ¦"   prometheus          running
agenthub-otel-collector "/otelcol-contrib --β¦"   otel-collector      running
π― Checkpoint 1: All services should be “running”. If not, check Docker logs: docker-compose logs <service-name>
Step 3: Access the Dashboards (1 minute)
Open these URLs in your browser (keep them open in tabs):
| Service | URL | Purpose | 
|---|
| Grafana | http://localhost:3333 | Main observability dashboard | 
| Jaeger | http://localhost:16686 | Distributed tracing | 
| Prometheus | http://localhost:9090 | Raw metrics and alerts | 
Grafana Login: admin / admin (skip password change for demo)
π― Checkpoint 2: You should see Grafana’s welcome page and Jaeger’s empty trace list.
Step 4: Start the Observable Broker (1 minute)
Open a new terminal and navigate back to the project root:
# From agenthub root directory
go run broker/main.go
Expected Output:
time=2025-09-28T21:00:00.000Z level=INFO msg="Starting health server on port 8080"
time=2025-09-28T21:00:00.000Z level=INFO msg="AgentHub broker gRPC server with observability listening" address="[::]:50051" health_endpoint="http://localhost:8080/health" metrics_endpoint="http://localhost:8080/metrics"
π― Checkpoint 3:
- Broker is listening on port 50051
- Health endpoint available at http://localhost:8080/health
- Metrics endpoint available at http://localhost:8080/metrics
Step 5: Start the Observable Subscriber (1 minute)
Open another terminal:
go run agents/subscriber/main.go
Expected Output:
time=2025-09-28T21:00:01.000Z level=INFO msg="Starting health server on port 8082"
time=2025-09-28T21:00:01.000Z level=INFO msg="Starting observable subscriber"
time=2025-09-28T21:00:01.000Z level=INFO msg="Agent started with observability. Listening for events and tasks."
π― Checkpoint 4:
- Subscriber is connected and listening
- Health available at http://localhost:8082/health
Step 6: Generate Events with the Publisher (2 minutes)
Open a third terminal:
go run agents/publisher/main.go
Expected Output:
time=2025-09-28T21:00:02.000Z level=INFO msg="Starting health server on port 8081"
time=2025-09-28T21:00:02.000Z level=INFO msg="Starting observable publisher demo"
time=2025-09-28T21:00:02.000Z level=INFO msg="Publishing task" task_id=task_greeting_1727557202 task_type=greeting responder_agent_id=agent_demo_subscriber
time=2025-09-28T21:00:02.000Z level=INFO msg="Task published successfully" task_id=task_greeting_1727557202 task_type=greeting
π― Checkpoint 5: You should see:
- Publisher creating and sending tasks
- Subscriber receiving and processing tasks
- Broker routing messages between them
Step 7: Explore Real-time Metrics in Grafana (2 minutes)
- Go to Grafana: http://localhost:3333
- Navigate to Dashboards β Browse β AgentHub β “AgentHub EDA System Observatory”
- Observe the real-time data:
What You’ll See:
Event Processing Rate (Top Left)
- Lines showing events/second for each service
- Should show activity spikes when publisher runs
Error Rate (Top Right)
- Gauge showing error percentage
- Should be green (< 5% errors)
Event Types Distribution (Middle Left)
- Pie chart showing task types: greeting, math_calculation, random_number
- Different colors for each task type
Processing Latency (Middle Right)
- Three lines: p50, p95, p99 latencies
- Should show sub-second processing times
System Health (Bottom)
- CPU usage, memory usage, goroutines
- Service health status (all should be UP)
π― Checkpoint 6: Dashboard should show live metrics with recent activity.
Step 8: Explore Distributed Traces in Jaeger (2 minutes)
- Go to Jaeger: http://localhost:16686
- Select Service: Choose “agenthub-broker” from dropdown
- Click “Find Traces”
- Click on any trace to see details
What You’ll See:
Complete Event Journey:
agenthub-publisher: publish_event (2ms)
  βββ agenthub-broker: process_event (1ms)
      βββ agenthub-subscriber: consume_event (5ms)
          βββ agenthub-subscriber: process_task (15ms)
              βββ agenthub-subscriber: publish_result (2ms)
Trace Details:
- Span Tags: event_id, event_type, service names
- Timing Information: Exact start/end times and durations
- Log Correlation: Each span linked to structured logs
Error Detection:
- Look for red spans indicating errors
- Trace the “unknown_task” type to see how errors propagate
π― Checkpoint 7: You should see complete traces showing the full event lifecycle.
Step 9: Correlate Logs with Traces (1 minute)
- Copy a trace ID from Jaeger (the long hex string) 
- Check broker logs for that trace ID: - # In your broker terminal, look for lines like:
time=2025-09-28T21:00:02.000Z level=INFO msg="Received task request" task_id=task_greeting_1727557202 trace_id=a1b2c3d4e5f6...
 
- Check subscriber logs for the same trace ID 
π― Checkpoint 8: You should find the same trace_id in logs across multiple services.
Step 10: Experience Intelligent Alerting (Optional)
To see alerting in action:
- Simulate errors by stopping the subscriber: - # In subscriber terminal, press Ctrl+C
 
- Keep publisher running (it will fail to process tasks) 
- Check Prometheus alerts: - Go to http://localhost:9090/alerts
- After ~5 minutes, you should see “HighEventProcessingErrorRate” firing
 
- Restart subscriber to clear the alert 
π Congratulations!
You’ve successfully experienced the complete AgentHub observability stack!
Summary: What You Accomplished
β
 Deployed a complete observability stack with Docker Compose
β
 Ran observable agents with automatic instrumentation
β
 Monitored real-time metrics in Grafana dashboards
β
 Traced event flows across multiple services with Jaeger
β
 Correlated logs with traces using trace IDs
β
 Experienced intelligent alerting with Prometheus
β
 Understood the complete event lifecycle from publisher to subscriber
Key Observability Concepts You Learned
Distributed Tracing
- Events get unique trace IDs that follow them everywhere
- Each processing step creates a “span” with timing information
- Complete request flows are visible across service boundaries
Metrics Collection
- 47+ different metrics automatically collected
- Real-time visualization of system health and performance
- Historical data for trend analysis
Structured Logging
- All logs include trace context for correlation
- Consistent format across all services
- Easy debugging and troubleshooting
Intelligent Alerting
- Proactive monitoring for error rates and performance
- Automatic notifications when thresholds are exceeded
- Helps prevent issues before they impact users
Next Steps
For Development:
For Operations:
For Understanding:
Troubleshooting
| Issue | Solution | 
|---|
| Services won’t start | Run docker-compose down && docker-compose up -d | 
| No metrics in Grafana | Check Prometheus targets: http://localhost:9090/targets | 
| No traces in Jaeger | Verify JAEGER_ENDPOINT environment variable is set correctly | 
| Permission errors | Ensure Docker has proper permissions | 
Clean Up
When you’re done exploring:
# Stop the observability stack
cd observability
docker-compose down
# Stop the Go applications
# Press Ctrl+C in each terminal running the agents
π― Ready for More?
Production Usage: Add Observability to Your Agent
Deep Understanding: Distributed Tracing Explained
1.3 - Workflows
Tutorials for building complex multi-agent workflows
Workflow Tutorials
Learn to design and implement sophisticated multi-agent workflows and orchestration patterns.
Available Tutorials
1.3.1 - Building Multi-Agent Workflows
Learn to create complex workflows involving multiple specialized agents working together to accomplish sophisticated tasks. Build a real document processing pipeline with multiple agents handling different stages.
Building Multi-Agent Workflows
This advanced tutorial teaches you to create complex workflows involving multiple specialized agents working together to accomplish sophisticated tasks. You’ll build a real document processing pipeline with multiple agents handling different stages.
What You’ll Build
By the end of this tutorial, you’ll have an A2A-compliant multi-agent system that:
- Ingests documents through an A2A Document Intake Agent
- Validates content using an A2A Validation Agent
- Extracts metadata with an A2A Metadata Extraction Agent
- Processes text through an A2A Text Processing Agent
- Generates summaries using an A2A Summary Agent
- Orchestrates the workflow with an A2A Workflow Coordinator Agent
This demonstrates real-world A2A agent collaboration patterns with conversation context, structured message content, and artifact-based results used in production systems.
Prerequisites
Architecture Overview
βββββββββββββββββββ    βββββββββββββββββββ    βββββββββββββββββββ
β  A2A Workflow   β    β   AgentHub      β    β  A2A Specializedβ
β  Coordinator    β    β  A2A Broker     β    β    Agents       β
β                 β    β                 β    β                 β
β β’ A2A context   βββββΊβ β’ Routes A2A    βββββΊβ β’ Document      β
β   management    β    β   messages      β    β   Intake        β
β β’ Conversation  β    β β’ Tracks A2A    β    β β’ Validation    β
β   threading     β    β   conversations β    β β’ Metadata      β
β β’ Artifact      β    β β’ Manages A2A   β    β β’ Text Proc     β
β   aggregation   β    β   state         β    β β’ Summary       β
βββββββββββββββββββ    βββββββββββββββββββ    βββββββββββββββββββ
Step 1: Create the Workflow Coordinator
First, let’s create the main coordinator that manages the document processing pipeline.
Create the coordinator agent:
mkdir -p agents/coordinator
Create agents/coordinator/main.go:
package main
import (
	"context"
	"fmt"
	"log"
	"time"
	"github.com/google/uuid"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"
	a2a "github.com/owulveryck/agenthub/events/a2a"
	pb "github.com/owulveryck/agenthub/events/eventbus"
)
const (
	agentHubAddr = "localhost:50051"
	agentID      = "a2a_workflow_coordinator"
)
type A2ADocumentWorkflow struct {
	DocumentID    string
	ContextID     string                 // A2A conversation context
	Status        string
	CurrentStage  string
	TaskHistory   []*a2a.Task           // Complete A2A task history
	Artifacts     []*a2a.Artifact       // Collected artifacts from stages
	StartTime     time.Time
	client        pb.AgentHubClient      // A2A-compliant client
}
func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()
	client := pb.NewAgentHubClient(conn)
	coordinator := &A2AWorkflowCoordinator{
		client:        client,
		workflows:     make(map[string]*A2ADocumentWorkflow),
	}
	ctx := context.Background()
	// Start listening for A2A task events
	go coordinator.subscribeToA2AEvents(ctx)
	// Start processing documents with A2A workflow
	coordinator.startA2ADocumentProcessing(ctx)
	// Keep running
	select {}
}
type A2AWorkflowCoordinator struct {
	client    pb.AgentHubClient
	workflows map[string]*A2ADocumentWorkflow
}
func (wc *A2AWorkflowCoordinator) startA2ADocumentProcessing(ctx context.Context) {
	// Simulate document arrival with A2A structured content
	documents := []map[string]interface{}{
		{
			"document_id": "doc_001",
			"content":     "This is a sample business document about quarterly results.",
			"filename":    "q3_results.txt",
			"source":      "email_attachment",
			"doc_type":    "business_report",
		},
		{
			"document_id": "doc_002",
			"content":     "Technical specification for the new API endpoints and authentication mechanisms.",
			"filename":    "api_spec.txt",
			"source":      "file_upload",
			"doc_type":    "technical_spec",
		},
	}
	for _, doc := range documents {
		wc.processA2ADocument(ctx, doc)
		time.Sleep(5 * time.Second)
	}
}
func (wc *A2AWorkflowCoordinator) processA2ADocument(ctx context.Context, document map[string]interface{}) {
	documentID := document["document_id"].(string)
	contextID := fmt.Sprintf("doc_workflow_%s_%s", documentID, uuid.New().String())
	workflow := &A2ADocumentWorkflow{
		DocumentID:   documentID,
		ContextID:    contextID,
		Status:       "started",
		CurrentStage: "intake",
		TaskHistory:  make([]*a2a.Task, 0),
		Artifacts:    make([]*a2a.Artifact, 0),
		StartTime:    time.Now(),
		client:       wc.client,
	}
	wc.workflows[documentID] = workflow
	log.Printf("Starting A2A document processing workflow for %s with context %s", documentID, contextID)
	// Stage 1: A2A Document Intake
	wc.publishA2ATask(ctx, "document_intake", document, "a2a_document_intake_agent", workflow)
}
func (wc *A2AWorkflowCoordinator) publishA2ATask(ctx context.Context, taskDescription string, params map[string]interface{}, targetAgent string, workflow *A2ADocumentWorkflow) {
	taskID := fmt.Sprintf("task_%s_%s", taskDescription, uuid.New().String())
	messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())
	// Create A2A structured content
	paramsData, err := structpb.NewStruct(params)
	if err != nil {
		log.Printf("Error creating parameters: %v", err)
		return
	}
	// Create A2A message with structured parts
	requestMessage := &a2a.Message{
		MessageId: messageID,
		ContextId: workflow.ContextID,
		TaskId:    taskID,
		Role:      a2a.Role_USER,
		Content: []*a2a.Part{
			{
				Part: &a2a.Part_Text{
					Text: fmt.Sprintf("Please process %s for document %s", taskDescription, workflow.DocumentID),
				},
			},
			{
				Part: &a2a.Part_Data{
					Data: &a2a.DataPart{
						Data:        paramsData,
						Description: fmt.Sprintf("%s parameters", taskDescription),
					},
				},
			},
		},
	}
	// Create A2A task
	task := &a2a.Task{
		Id:        taskID,
		ContextId: workflow.ContextID,
		Status: &a2a.TaskStatus{
			State:     a2a.TaskState_TASK_STATE_SUBMITTED,
			Update:    requestMessage,
			Timestamp: timestamppb.Now(),
		},
		History: []*a2a.Message{requestMessage},
		Metadata: paramsData,
	}
	// Store in workflow history
	workflow.TaskHistory = append(workflow.TaskHistory, task)
	// Publish A2A task update
	req := &pb.PublishTaskUpdateRequest{
		Task: task,
		Routing: &pb.AgentEventMetadata{
			FromAgentId: agentID,
			ToAgentId:   targetAgent,
			EventType:   "task.submitted",
			Priority:    pb.Priority_PRIORITY_MEDIUM,
		},
	}
	log.Printf("Publishing A2A %s task for workflow %s in context %s", taskDescription, workflow.DocumentID, workflow.ContextID)
	_, err = wc.client.PublishTaskUpdate(ctx, req)
	if err != nil {
		log.Printf("Error publishing A2A task: %v", err)
	}
}
func (wc *WorkflowCoordinator) subscribeToResults(ctx context.Context) {
	req := &pb.SubscribeToTaskResultsRequest{
		RequesterAgentId: agentID,
	}
	stream, err := wc.client.SubscribeToTaskResults(ctx, req)
	if err != nil {
		log.Printf("Error subscribing to results: %v", err)
		return
	}
	for {
		result, err := stream.Recv()
		if err != nil {
			log.Printf("Error receiving result: %v", err)
			return
		}
		wc.handleTaskResult(ctx, result)
	}
}
func (wc *WorkflowCoordinator) handleTaskResult(ctx context.Context, result *pb.TaskResult) {
	params := result.GetResult().AsMap()
	workflowID := params["workflow_id"].(string)
	stage := params["stage"].(string)
	workflow, exists := wc.workflows[workflowID]
	if !exists {
		log.Printf("Unknown workflow ID: %s", workflowID)
		return
	}
	log.Printf("Received result for workflow %s, stage %s: %s",
		workflowID, stage, result.GetStatus().String())
	if result.GetStatus() == pb.TaskStatus_TASK_STATUS_FAILED {
		workflow.Status = "failed"
		log.Printf("Workflow %s failed at stage %s: %s",
			workflowID, stage, result.GetErrorMessage())
		return
	}
	// Store stage results
	workflow.Results[stage] = params
	// Advance to next stage
	wc.advanceWorkflow(ctx, workflow, stage)
}
func (wc *WorkflowCoordinator) advanceWorkflow(ctx context.Context, workflow *DocumentWorkflow, completedStage string) {
	switch completedStage {
	case "document_intake":
		// Move to validation
		workflow.CurrentStage = "validation"
		data := workflow.Results["document_intake"]
		wc.publishTask(ctx, "document_validation", data.(map[string]interface{}), "validation_agent", workflow.DocumentID)
	case "document_validation":
		// Move to metadata extraction
		workflow.CurrentStage = "metadata_extraction"
		data := workflow.Results["document_validation"]
		wc.publishTask(ctx, "metadata_extraction", data.(map[string]interface{}), "metadata_agent", workflow.DocumentID)
	case "metadata_extraction":
		// Move to text processing
		workflow.CurrentStage = "text_processing"
		data := workflow.Results["metadata_extraction"]
		wc.publishTask(ctx, "text_processing", data.(map[string]interface{}), "text_processor_agent", workflow.DocumentID)
	case "text_processing":
		// Move to summary generation
		workflow.CurrentStage = "summary_generation"
		data := workflow.Results["text_processing"]
		wc.publishTask(ctx, "summary_generation", data.(map[string]interface{}), "summary_agent", workflow.DocumentID)
	case "summary_generation":
		// Workflow complete
		workflow.Status = "completed"
		workflow.CurrentStage = "finished"
		duration := time.Since(workflow.StartTime)
		log.Printf("Workflow %s completed successfully in %v", workflow.DocumentID, duration)
		wc.printWorkflowSummary(workflow)
	}
}
func (wc *WorkflowCoordinator) printWorkflowSummary(workflow *DocumentWorkflow) {
	fmt.Printf("\n=== WORKFLOW SUMMARY ===\n")
	fmt.Printf("Document ID: %s\n", workflow.DocumentID)
	fmt.Printf("Status: %s\n", workflow.Status)
	fmt.Printf("Duration: %v\n", time.Since(workflow.StartTime))
	fmt.Printf("Stages completed:\n")
	for stage, result := range workflow.Results {
		fmt.Printf("  - %s: %v\n", stage, result)
	}
	fmt.Printf("=======================\n\n")
}
Step 2: Create Specialized Agents
Now let’s create each specialized agent that handles specific stages of the pipeline.
Document Intake Agent
Create agents/document_intake/main.go:
package main
import (
	"context"
	"crypto/md5"
	"fmt"
	"io"
	"log"
	"strings"
	"time"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"
	pb "github.com/owulveryck/agenthub/events/a2a"
)
const (
	agentHubAddr = "localhost:50051"
	agentID      = "document_intake_agent"
)
func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()
	client := pb.NewEventBusClient(conn)
	agent := &DocumentIntakeAgent{client: client}
	ctx := context.Background()
	agent.start(ctx)
}
type DocumentIntakeAgent struct {
	client pb.EventBusClient
}
func (dia *DocumentIntakeAgent) start(ctx context.Context) {
	log.Printf("Document Intake Agent %s starting...", agentID)
	req := &pb.SubscribeToTasksRequest{
		AgentId:   agentID,
		TaskTypes: []string{"document_intake"},
	}
	stream, err := dia.client.SubscribeToTasks(ctx, req)
	if err != nil {
		log.Fatalf("Error subscribing: %v", err)
	}
	log.Printf("Subscribed to document intake tasks")
	for {
		task, err := stream.Recv()
		if err == io.EOF {
			return
		}
		if err != nil {
			log.Printf("Error receiving task: %v", err)
			return
		}
		go dia.processTask(ctx, task)
	}
}
func (dia *DocumentIntakeAgent) processTask(ctx context.Context, task *pb.TaskMessage) {
	log.Printf("Processing document intake task: %s", task.GetTaskId())
	params := task.GetParameters().AsMap()
	// Simulate document intake processing
	time.Sleep(2 * time.Second)
	// Generate document hash
	content := params["content"].(string)
	hash := fmt.Sprintf("%x", md5.Sum([]byte(content)))
	// Extract basic metadata
	wordCount := len(strings.Fields(content))
	charCount := len(content)
	result := map[string]interface{}{
		"document_id":   params["document_id"],
		"workflow_id":   params["workflow_id"],
		"stage":         "document_intake",
		"content":       content,
		"filename":      params["filename"],
		"source":        params["source"],
		"document_hash": hash,
		"word_count":    wordCount,
		"char_count":    charCount,
		"intake_timestamp": time.Now().Format(time.RFC3339),
		"status":        "intake_complete",
	}
	dia.publishResult(ctx, task, result, pb.TaskStatus_TASK_STATUS_COMPLETED, "")
}
func (dia *DocumentIntakeAgent) publishResult(ctx context.Context, originalTask *pb.TaskMessage, result map[string]interface{}, status pb.TaskStatus, errorMsg string) {
	resultStruct, err := structpb.NewStruct(result)
	if err != nil {
		log.Printf("Error creating result struct: %v", err)
		return
	}
	taskResult := &pb.TaskResult{
		TaskId:          originalTask.GetTaskId(),
		Status:          status,
		Result:          resultStruct,
		ErrorMessage:    errorMsg,
		ExecutorAgentId: agentID,
		CompletedAt:     timestamppb.Now(),
	}
	req := &pb.PublishTaskResultRequest{Result: taskResult}
	_, err = dia.client.PublishTaskResult(ctx, req)
	if err != nil {
		log.Printf("Error publishing result: %v", err)
	} else {
		log.Printf("Published result for task %s", originalTask.GetTaskId())
	}
}
Validation Agent
Create agents/validation/main.go:
package main
import (
	"context"
	"io"
	"log"
	"strings"
	"time"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"
	pb "github.com/owulveryck/agenthub/events/a2a"
)
const (
	agentHubAddr = "localhost:50051"
	agentID      = "validation_agent"
)
func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()
	client := pb.NewEventBusClient(conn)
	agent := &ValidationAgent{client: client}
	ctx := context.Background()
	agent.start(ctx)
}
type ValidationAgent struct {
	client pb.EventBusClient
}
func (va *ValidationAgent) start(ctx context.Context) {
	log.Printf("Validation Agent %s starting...", agentID)
	req := &pb.SubscribeToTasksRequest{
		AgentId:   agentID,
		TaskTypes: []string{"document_validation"},
	}
	stream, err := va.client.SubscribeToTasks(ctx, req)
	if err != nil {
		log.Fatalf("Error subscribing: %v", err)
	}
	log.Printf("Subscribed to document validation tasks")
	for {
		task, err := stream.Recv()
		if err == io.EOF {
			return
		}
		if err != nil {
			log.Printf("Error receiving task: %v", err)
			return
		}
		go va.processTask(ctx, task)
	}
}
func (va *ValidationAgent) processTask(ctx context.Context, task *pb.TaskMessage) {
	log.Printf("Processing validation task: %s", task.GetTaskId())
	params := task.GetParameters().AsMap()
	// Simulate validation processing
	time.Sleep(1500 * time.Millisecond)
	content := params["content"].(string)
	// Perform validation checks
	validationResults := va.validateDocument(content)
	result := map[string]interface{}{
		"document_id":       params["document_id"],
		"workflow_id":       params["workflow_id"],
		"stage":             "document_validation",
		"content":           content,
		"filename":          params["filename"],
		"source":            params["source"],
		"document_hash":     params["document_hash"],
		"word_count":        params["word_count"],
		"char_count":        params["char_count"],
		"intake_timestamp":  params["intake_timestamp"],
		"validation_results": validationResults,
		"validation_timestamp": time.Now().Format(time.RFC3339),
		"status":            "validation_complete",
	}
	var status pb.TaskStatus
	var errorMsg string
	if validationResults["is_valid"].(bool) {
		status = pb.TaskStatus_TASK_STATUS_COMPLETED
	} else {
		status = pb.TaskStatus_TASK_STATUS_FAILED
		errorMsg = "Document validation failed: " + validationResults["errors"].(string)
	}
	va.publishResult(ctx, task, result, status, errorMsg)
}
func (va *ValidationAgent) validateDocument(content string) map[string]interface{} {
	// Simple validation rules
	isValid := true
	var errors []string
	// Check minimum length
	if len(content) < 10 {
		isValid = false
		errors = append(errors, "content too short")
	}
	// Check for suspicious content
	suspiciousTerms := []string{"malware", "virus", "hack"}
	for _, term := range suspiciousTerms {
		if strings.Contains(strings.ToLower(content), term) {
			isValid = false
			errors = append(errors, "suspicious content detected")
			break
		}
	}
	// Check language (simple heuristic)
	isEnglish := va.isEnglishContent(content)
	return map[string]interface{}{
		"is_valid":    isValid,
		"is_english":  isEnglish,
		"errors":      strings.Join(errors, "; "),
		"length_ok":   len(content) >= 10,
		"safe_content": !strings.Contains(strings.ToLower(content), "malware"),
	}
}
func (va *ValidationAgent) isEnglishContent(content string) bool {
	// Simple heuristic: check for common English words
	commonWords := []string{"the", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"}
	lowerContent := strings.ToLower(content)
	matches := 0
	for _, word := range commonWords {
		if strings.Contains(lowerContent, " "+word+" ") {
			matches++
		}
	}
	return matches >= 2
}
func (va *ValidationAgent) publishResult(ctx context.Context, originalTask *pb.TaskMessage, result map[string]interface{}, status pb.TaskStatus, errorMsg string) {
	resultStruct, err := structpb.NewStruct(result)
	if err != nil {
		log.Printf("Error creating result struct: %v", err)
		return
	}
	taskResult := &pb.TaskResult{
		TaskId:          originalTask.GetTaskId(),
		Status:          status,
		Result:          resultStruct,
		ErrorMessage:    errorMsg,
		ExecutorAgentId: agentID,
		CompletedAt:     timestamppb.Now(),
	}
	req := &pb.PublishTaskResultRequest{Result: taskResult}
	_, err = va.client.PublishTaskResult(ctx, req)
	if err != nil {
		log.Printf("Error publishing result: %v", err)
	} else {
		log.Printf("Published result for task %s", originalTask.GetTaskId())
	}
}
Step 3: Build and Test the Multi-Agent System
Update the Makefile to include the new agents:
# Add to Makefile build target
build: proto
	@echo "Building server binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(SERVER_BINARY) broker/main.go
	@echo "Building coordinator binary..."
	go build $(GO_BUILD_FLAGS) -o bin/coordinator agents/coordinator/main.go
	@echo "Building document intake agent..."
	go build $(GO_BUILD_FLAGS) -o bin/document-intake agents/document_intake/main.go
	@echo "Building validation agent..."
	go build $(GO_BUILD_FLAGS) -o bin/validation agents/validation/main.go
	@echo "Building publisher binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(PUBLISHER_BINARY) agents/publisher/main.go
	@echo "Building subscriber binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(SUBSCRIBER_BINARY) agents/subscriber/main.go
	@echo "Build complete. Binaries are in the 'bin/' directory."
Build all components:
Step 4: Run the Multi-Agent Workflow
Now let’s run the complete multi-agent system:
Terminal 1 - Start the broker:
Terminal 2 - Start the document intake agent:
Terminal 3 - Start the validation agent:
Terminal 4 - Start the workflow coordinator:
Step 5: Observe the Workflow
You’ll see the workflow coordinator processing documents through multiple stages:
- Document Intake: Receives and processes raw documents
- Validation: Checks content for safety and validity
- Metadata Extraction: Extracts structured metadata
- Text Processing: Processes and analyzes text content
- Summary Generation: Creates document summaries
Each agent processes its stage and passes results to the next stage via the AgentHub broker.
Understanding the Multi-Agent Pattern
This tutorial demonstrates several key patterns:
1. Workflow Orchestration
The coordinator agent manages the overall workflow, determining which stage comes next and handling failures.
2. Specialized Agents
Each agent has a specific responsibility and can be developed, deployed, and scaled independently.
3. Asynchronous Processing
Agents work asynchronously, allowing for better resource utilization and scalability.
4. Error Handling
The system handles failures gracefully, with the coordinator managing workflow state.
5. Data Flow
Structured data flows between agents, with each stage adding value to the processing pipeline.
Next Steps
Now that you understand multi-agent workflows:
- Add more agents: Create metadata extraction, text processing, and summary agents
- Implement error recovery: Add retry mechanisms and failure handling
- Add monitoring: Create a dashboard agent that tracks workflow progress
- Scale the system: Run multiple instances of each agent type
- Add persistence: Store workflow state in a database for recovery
This pattern scales to handle complex business processes, data pipelines, and automated workflows in production systems.
Common Patterns and Best Practices
Workflow State Management
- Store workflow state persistently for recovery
- Use unique workflow IDs for tracking
- Implement timeouts for stuck workflows
Agent Communication
- Use structured messages with clear schemas
- Include metadata for routing and tracking
- Implement progress reporting for long-running tasks
Error Handling
- Design for partial failures
- Implement retry mechanisms with backoff
- Provide clear error messages and recovery paths
Monitoring and Observability
- Log all state transitions
- Track workflow performance metrics
- Implement health checks for agents
You now have the foundation for building sophisticated multi-agent systems that can handle complex, real-world workflows!
2 - How-To Guides
Goal-oriented guides that solve specific problems and accomplish particular tasks
How-To Guides
These practical guides will help you solve specific problems and accomplish particular tasks with AgentHub. Each guide focuses on a specific goal and assumes you have some familiarity with the basic concepts.
π How-to Categories
π Before You Start
These guides assume you have:
- Completed the Installation and Setup tutorial
- Basic understanding of AgentHub concepts
- A working development environment
π‘ How to Use These Guides
- Each guide is self-contained and focuses on one specific task
- Guides can be followed in any order based on your needs
- Code examples are production-ready where possible
- Links to related concepts and references are provided
Tip
If you’re new to AgentHub, start with the 
tutorials before diving into these how-to guides.
2.1 - Agent Development
Practical guides for creating and managing agents
Agent Development How-to Guides
Step-by-step guides for creating, configuring, and managing different types of agents in AgentHub.
Available Guides
2.1.1 - How to Create an A2A Task Publisher
Learn how to create an agent that publishes Agent2Agent (A2A) protocol-compliant tasks to other agents through the AgentHub EDA broker.
How to Create an A2A Task Publisher
This guide shows you how to create an agent that publishes Agent2Agent (A2A) protocol-compliant tasks to other agents through the AgentHub Event-Driven Architecture (EDA) broker.
Basic Setup
Using AgentHub’s unified abstractions, creating a publisher is straightforward:
package main
import (
    "context"
    "fmt"
    "time"
    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
)
const (
    myAgentID = "my_publisher_agent"
)
func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
    defer cancel()
    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("publisher")
    config.HealthPort = "8081" // Unique port for this publisher
    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }
    // Automatic graceful shutdown
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()
    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }
    // Create A2A task publisher with automatic tracing and metrics
    taskPublisher := &agenthub.A2ATaskPublisher{
        Client:         client.Client,
        TraceManager:   client.TraceManager,
        MetricsManager: client.MetricsManager,
        Logger:         client.Logger,
        ComponentName:  "publisher",
        AgentID:        myAgentID,
    }
    // Your A2A task publishing code goes here
}
Publishing a Simple A2A Task
Here’s how to publish a basic A2A task using the A2ATaskPublisher abstraction:
func publishSimpleTask(ctx context.Context, taskPublisher *agenthub.A2ATaskPublisher) error {
    // Create A2A-compliant content parts
    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Hello! Please provide a greeting for Claude.",
            },
        },
    }
    // Publish A2A task using the unified abstraction
    task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
        TaskType:         "greeting",
        Content:          content,
        RequesterAgentID: myAgentID,
        ResponderAgentID: "agent_demo_subscriber", // Target agent
        Priority:         pb.Priority_PRIORITY_HIGH,
        ContextID:        "ctx_greeting_demo", // Optional: conversation context
    })
    if err != nil {
        return fmt.Errorf("failed to publish greeting task: %w", err)
    }
    taskPublisher.Logger.InfoContext(ctx, "Published A2A greeting task",
        "task_id", task.GetId(),
        "context_id", task.GetContextId())
    return nil
}
Publishing Different Task Types
Math Calculation Task with A2A Data Parts
func publishMathTask(ctx context.Context, taskPublisher *agenthub.A2ATaskPublisher) error {
    // Create A2A-compliant content with structured data
    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Please perform the following mathematical calculation:",
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data: &structpb.Struct{
                        Fields: map[string]*structpb.Value{
                            "operation": structpb.NewStringValue("multiply"),
                            "a":         structpb.NewNumberValue(15.0),
                            "b":         structpb.NewNumberValue(7.0),
                        },
                    },
                },
            },
        },
    }
    // Publish A2A math task
    task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
        TaskType:         "math_calculation",
        Content:          content,
        RequesterAgentID: myAgentID,
        ResponderAgentID: "agent_demo_subscriber",
        Priority:         pb.Priority_PRIORITY_MEDIUM,
        ContextID:        "ctx_math_demo",
    })
    if err != nil {
        return fmt.Errorf("failed to publish math task: %w", err)
    }
    taskPublisher.Logger.InfoContext(ctx, "Published A2A math task",
        "task_id", task.GetId(),
        "operation", "multiply")
    return nil
}
Data Processing Task
func publishDataProcessingTask(ctx context.Context, taskPublisher *agenthub.TaskPublisher) {
    err := taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
        TaskType: "data_processing",
        Parameters: map[string]interface{}{
            "dataset_path":   "/data/customer_data.csv",
            "analysis_type":  "summary_statistics",
            "output_format":  "json",
            "filters": map[string]interface{}{
                "date_range": "last_30_days",
                "status":     "active",
            },
            // Metadata is handled automatically by TaskPublisher
            "workflow_id": "workflow_123",
            "user_id":     "user_456",
        },
        RequesterAgentID: myAgentID,
        ResponderAgentID: "data_agent",
        Priority:         pb.Priority_PRIORITY_HIGH,
    })
    if err != nil {
        panic(fmt.Sprintf("Failed to publish data processing task: %v", err))
    }
}
Broadcasting Tasks (No Specific Responder)
To broadcast a task to all available agents, omit the ResponderAgentID:
func broadcastTask(ctx context.Context, taskPublisher *agenthub.TaskPublisher) {
    err := taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
        TaskType: "announcement",
        Parameters: map[string]interface{}{
            "announcement":    "Server maintenance in 30 minutes",
            "action_required": false,
        },
        RequesterAgentID: myAgentID,
        // ResponderAgentID omitted - will broadcast to all agents
        ResponderAgentID: "",
        Priority:         pb.Priority_PRIORITY_LOW,
    })
    if err != nil {
        panic(fmt.Sprintf("Failed to publish announcement: %v", err))
    }
}
Subscribing to Task Results
As a publisher, you’ll want to receive results from tasks you’ve requested. You can use the AgentHub client directly:
func subscribeToResults(ctx context.Context, client *agenthub.AgentHubClient) {
    req := &pb.SubscribeToTaskResultsRequest{
        RequesterAgentId: myAgentID,
        // TaskIds: []string{"specific_task_id"}, // Optional: filter specific tasks
    }
    stream, err := client.Client.SubscribeToTaskResults(ctx, req)
    if err != nil {
        client.Logger.ErrorContext(ctx, "Error subscribing to results", "error", err)
        return
    }
    client.Logger.InfoContext(ctx, "Subscribed to task results", "agent_id", myAgentID)
    for {
        result, err := stream.Recv()
        if err != nil {
            client.Logger.ErrorContext(ctx, "Error receiving result", "error", err)
            return
        }
        handleTaskResult(ctx, client, result)
    }
}
func handleTaskResult(ctx context.Context, client *agenthub.AgentHubClient, result *pb.TaskResult) {
    client.Logger.InfoContext(ctx, "Received task result",
        "task_id", result.GetTaskId(),
        "status", result.GetStatus().String())
    switch result.GetStatus() {
    case pb.TaskStatus_TASK_STATUS_COMPLETED:
        client.Logger.InfoContext(ctx, "Task completed successfully",
            "task_id", result.GetTaskId(),
            "result", result.GetResult().AsMap())
    case pb.TaskStatus_TASK_STATUS_FAILED:
        client.Logger.ErrorContext(ctx, "Task failed",
            "task_id", result.GetTaskId(),
            "error", result.GetErrorMessage())
    case pb.TaskStatus_TASK_STATUS_CANCELLED:
        client.Logger.InfoContext(ctx, "Task was cancelled",
            "task_id", result.GetTaskId())
    }
}
Monitoring Task Progress
Subscribe to progress updates to track long-running tasks:
func subscribeToProgress(ctx context.Context, client *agenthub.AgentHubClient) {
    req := &pb.SubscribeToTaskResultsRequest{
        RequesterAgentId: myAgentID,
    }
    stream, err := client.Client.SubscribeToTaskProgress(ctx, req)
    if err != nil {
        client.Logger.ErrorContext(ctx, "Error subscribing to progress", "error", err)
        return
    }
    client.Logger.InfoContext(ctx, "Subscribed to task progress", "agent_id", myAgentID)
    for {
        progress, err := stream.Recv()
        if err != nil {
            client.Logger.ErrorContext(ctx, "Error receiving progress", "error", err)
            return
        }
        client.Logger.InfoContext(ctx, "Task progress update",
            "task_id", progress.GetTaskId(),
            "progress_percentage", progress.GetProgressPercentage(),
            "progress_message", progress.GetProgressMessage())
    }
}
Complete Publisher Example
func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
    defer cancel()
    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("publisher")
    config.HealthPort = "8081"
    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()
    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }
    // Create task publisher with automatic tracing and metrics
    taskPublisher := &agenthub.TaskPublisher{
        Client:         client.Client,
        TraceManager:   client.TraceManager,
        MetricsManager: client.MetricsManager,
        Logger:         client.Logger,
        ComponentName:  "publisher",
    }
    client.Logger.InfoContext(ctx, "Starting publisher demo")
    // Publish various tasks with automatic observability
    publishMathTask(ctx, taskPublisher)
    time.Sleep(2 * time.Second)
    publishDataProcessingTask(ctx, taskPublisher)
    time.Sleep(2 * time.Second)
    broadcastTask(ctx, taskPublisher)
    client.Logger.InfoContext(ctx, "All tasks published! Check subscriber logs for results")
}
Best Practices
- Always set a unique task ID: Use timestamps, UUIDs, or sequential IDs to ensure uniqueness. 
- Use appropriate priorities: Reserve - PRIORITY_CRITICALfor urgent tasks that must be processed immediately.
 
- Set realistic deadlines: Include deadlines for time-sensitive tasks to help agents prioritize. 
- Handle results gracefully: Always subscribe to task results and handle failures appropriately. 
- Include helpful metadata: Add context information that might be useful for debugging or auditing. 
- Validate parameters: Ensure task parameters are properly structured before publishing. 
- Use specific responder IDs when possible: This ensures tasks go to the most appropriate agent. 
Your publisher is now ready to send tasks to agents and receive results!
2.1.2 - How to Create an A2A Task Subscriber (Agent)
Learn how to create an agent that can receive, process, and respond to Agent2Agent (A2A) protocol tasks through the AgentHub EDA broker using A2A-compliant abstractions.
How to Create an A2A Task Subscriber (Agent)
This guide shows you how to create an agent that can receive, process, and respond to Agent2Agent (A2A) protocol tasks through the AgentHub Event-Driven Architecture (EDA) broker using AgentHub’s A2A-compliant abstractions.
Basic Agent Setup
Start by creating the basic structure for your agent using the unified abstraction:
package main
import (
    "context"
    "os"
    "os/signal"
    "syscall"
    "time"
    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)
const (
    agentID = "my_agent_processor"
)
func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("subscriber")
    config.HealthPort = "8082" // Unique port for this agent
    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }
    // Automatic graceful shutdown
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()
    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }
    // Create A2A task subscriber with automatic observability
    taskSubscriber := agenthub.NewA2ATaskSubscriber(client, agentID)
    // Register A2A task handlers (see below for examples)
    taskSubscriber.RegisterDefaultHandlers()
    // Handle graceful shutdown
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigChan
        client.Logger.Info("Received shutdown signal")
        cancel()
    }()
    client.Logger.InfoContext(ctx, "Starting subscriber agent")
    // Start task subscription (with automatic observability)
    go func() {
        if err := taskSubscriber.SubscribeToTasks(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task subscription failed", "error", err)
        }
    }()
    // Optional: Subscribe to task results if this agent also publishes tasks
    go func() {
        if err := taskSubscriber.SubscribeToTaskResults(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task result subscription failed", "error", err)
        }
    }()
    client.Logger.InfoContext(ctx, "Agent started with observability. Listening for tasks.")
    // Wait for context cancellation
    <-ctx.Done()
    client.Logger.Info("Agent shutdown complete")
}
Default Task Handlers
The RegisterDefaultHandlers() method provides built-in handlers for common task types:
- greeting: Simple greeting with name parameter
- math_calculation: Basic arithmetic operations (add, subtract, multiply, divide)
- random_number: Random number generation with seed
Custom Task Handlers
Simple Custom Handler
Add your own task handlers using RegisterTaskHandler():
func setupCustomHandlers(taskSubscriber *agenthub.TaskSubscriber) {
    // Register a custom data processing handler
    taskSubscriber.RegisterTaskHandler("data_processing", handleDataProcessing)
    // Register a file conversion handler
    taskSubscriber.RegisterTaskHandler("file_conversion", handleFileConversion)
    // Register a status check handler
    taskSubscriber.RegisterTaskHandler("status_check", handleStatusCheck)
}
func handleDataProcessing(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()
    datasetPath := params.Fields["dataset_path"].GetStringValue()
    analysisType := params.Fields["analysis_type"].GetStringValue()
    if datasetPath == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "dataset_path parameter is required"
    }
    // Simulate data processing
    time.Sleep(2 * time.Second)
    result, err := structpb.NewStruct(map[string]interface{}{
        "dataset_path":    datasetPath,
        "analysis_type":   analysisType,
        "records_processed": 1500,
        "processing_time": "2.1s",
        "summary": map[string]interface{}{
            "mean":   42.7,
            "median": 41.2,
            "stddev": 8.3,
        },
        "processed_at": time.Now().Format(time.RFC3339),
    })
    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }
    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}
Advanced Handler with Validation
func handleFileConversion(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()
    // Extract and validate parameters
    inputPath := params.Fields["input_path"].GetStringValue()
    outputFormat := params.Fields["output_format"].GetStringValue()
    if inputPath == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "input_path parameter is required"
    }
    if outputFormat == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "output_format parameter is required"
    }
    // Validate output format
    validFormats := []string{"pdf", "docx", "txt", "html"}
    isValidFormat := false
    for _, format := range validFormats {
        if outputFormat == format {
            isValidFormat = true
            break
        }
    }
    if !isValidFormat {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("unsupported output format: %s", outputFormat)
    }
    // Simulate file conversion process
    time.Sleep(1 * time.Second)
    outputPath := strings.Replace(inputPath, filepath.Ext(inputPath), "."+outputFormat, 1)
    result, err := structpb.NewStruct(map[string]interface{}{
        "input_path":      inputPath,
        "output_path":     outputPath,
        "output_format":   outputFormat,
        "file_size":       "2.5MB",
        "conversion_time": "1.2s",
        "status":          "success",
        "converted_at":    time.Now().Format(time.RFC3339),
    })
    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }
    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}
Handler with External Service Integration
func handleStatusCheck(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()
    serviceURL := params.Fields["service_url"].GetStringValue()
    if serviceURL == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "service_url parameter is required"
    }
    // Create HTTP client with timeout
    client := &http.Client{
        Timeout: 10 * time.Second,
    }
    // Perform health check
    resp, err := client.Get(serviceURL + "/health")
    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("Failed to reach service: %v", err)
    }
    defer resp.Body.Close()
    // Determine status
    isHealthy := resp.StatusCode >= 200 && resp.StatusCode < 300
    status := "unhealthy"
    if isHealthy {
        status = "healthy"
    }
    result, err := structpb.NewStruct(map[string]interface{}{
        "service_url":     serviceURL,
        "status":          status,
        "status_code":     resp.StatusCode,
        "response_time":   "150ms",
        "checked_at":      time.Now().Format(time.RFC3339),
    })
    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }
    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}
Complete Agent Example
Here’s a complete agent that handles multiple task types:
package main
import (
    "context"
    "fmt"
    "net/http"
    "os"
    "os/signal"
    "path/filepath"
    "strings"
    "syscall"
    "time"
    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)
const agentID = "multi_task_agent"
func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    // Create AgentHub client with observability
    config := agenthub.NewGRPCConfig("subscriber")
    config.HealthPort = "8082"
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()
    if err := client.Start(ctx); err != nil {
        panic(err)
    }
    // Create and configure task subscriber
    taskSubscriber := agenthub.NewTaskSubscriber(client, agentID)
    // Register both default and custom handlers
    taskSubscriber.RegisterDefaultHandlers()
    setupCustomHandlers(taskSubscriber)
    // Graceful shutdown handling
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigChan
        client.Logger.Info("Received shutdown signal")
        cancel()
    }()
    client.Logger.InfoContext(ctx, "Starting multi-task agent")
    // Start subscriptions
    go func() {
        if err := taskSubscriber.SubscribeToTasks(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task subscription failed", "error", err)
        }
    }()
    go func() {
        if err := taskSubscriber.SubscribeToTaskResults(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task result subscription failed", "error", err)
        }
    }()
    client.Logger.InfoContext(ctx, "Agent ready to process tasks",
        "supported_tasks", []string{"greeting", "math_calculation", "random_number", "data_processing", "file_conversion", "status_check"})
    <-ctx.Done()
    client.Logger.Info("Agent shutdown complete")
}
func setupCustomHandlers(taskSubscriber *agenthub.TaskSubscriber) {
    taskSubscriber.RegisterTaskHandler("data_processing", handleDataProcessing)
    taskSubscriber.RegisterTaskHandler("file_conversion", handleFileConversion)
    taskSubscriber.RegisterTaskHandler("status_check", handleStatusCheck)
}
// ... (include the handler functions from above)
Automatic Features
The unified abstraction provides automatic features:
Observability
- Distributed tracing for each task processing
- Metrics collection for processing times and success rates
- Structured logging with correlation IDs
Task Management
- Automatic result publishing back to the broker
- Error handling and status reporting
- Progress tracking capabilities
Resource Management
- Graceful shutdown handling
- Connection management to the broker
- Health endpoints for monitoring
Best Practices
- Parameter Validation: Always validate task parameters before processing - if requiredParam == "" {
    return nil, pb.TaskStatus_TASK_STATUS_FAILED, "required_param is missing"
}
 
- Error Handling: Provide meaningful error messages - if err != nil {
    return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("Processing failed: %v", err)
}
 
- Timeouts: Use context with timeouts for external operations - client := &http.Client{Timeout: 10 * time.Second}
 
- Resource Cleanup: Always clean up resources in handlers - defer file.Close()
defer resp.Body.Close()
 
- Structured Results: Return well-structured result data - result, _ := structpb.NewStruct(map[string]interface{}{
    "status": "completed",
    "timestamp": time.Now().Format(time.RFC3339),
    "data": processedData,
})
 
Handler Function Signature
All task handlers must implement the TaskHandler interface:
type TaskHandler func(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string)
Return values:
- *structpb.Struct: The result data (can be- nilon failure)
- pb.TaskStatus: One of:- pb.TaskStatus_TASK_STATUS_COMPLETED
- pb.TaskStatus_TASK_STATUS_FAILED
- pb.TaskStatus_TASK_STATUS_CANCELLED
 
- string: Error message (empty string on success)
Your agent is now ready to receive and process tasks from other agents in the system with full observability and automatic result publishing!
2.2 - Observability
Practical guides for monitoring and observability setup
Observability How-to Guides
Practical step-by-step guides for setting up monitoring, metrics, and observability in your AgentHub deployments.
Available Guides
2.2.1 - How to Add Observability to Your Agent
Use AgentHub’s unified abstractions to automatically get distributed tracing, metrics, and structured logging in your agents.
How to Add Observability to Your Agent
Goal-oriented guide: Use AgentHub’s unified abstractions to automatically get distributed tracing, metrics, and structured logging in your agents with minimal configuration.
Prerequisites
- Go 1.24+ installed
- Basic understanding of AgentHub concepts
- 10-15 minutes
Overview: What You Get Automatically
With AgentHub’s unified abstractions, you automatically get:
β
 Distributed Tracing - OpenTelemetry traces with correlation IDs
β
 Comprehensive Metrics - Performance and health monitoring
β
 Structured Logging - JSON logs with trace correlation
β
 Health Endpoints - HTTP health checks and metrics endpoints
β
 Graceful Shutdown - Clean resource management
Quick Start: Observable Agent in 5 Minutes
Step 1: Create Your Agent Using Abstractions
package main
import (
	"context"
	"time"
	"github.com/owulveryck/agenthub/internal/agenthub"
)
func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()
	// Create configuration (observability included automatically)
	config := agenthub.NewGRPCConfig("my-agent")
	config.HealthPort = "8083" // Unique port for your agent
	// Create AgentHub client (observability built-in)
	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic("Failed to create AgentHub client: " + err.Error())
	}
	// Automatic graceful shutdown
	defer func() {
		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
		defer shutdownCancel()
		if err := client.Shutdown(shutdownCtx); err != nil {
			client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
		}
	}()
	// Start the client (enables observability)
	if err := client.Start(ctx); err != nil {
		client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
		panic(err)
	}
	// Your agent logic here...
	client.Logger.Info("My observable agent is running!")
	// Keep running
	select {}
}
That’s it! Your agent now has full observability.
Set observability configuration via environment:
# Tracing configuration
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="my-agent"
export OTEL_SERVICE_VERSION="1.0.0"
# Health server port
export BROKER_HEALTH_PORT="8083"
# Broker connection
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
Step 3: Run Your Observable Agent
Expected Output:
time=2025-09-29T10:00:00.000Z level=INFO msg="Starting health server" port=8083
time=2025-09-29T10:00:00.000Z level=INFO msg="AgentHub client connected" broker_addr=localhost:50051
time=2025-09-29T10:00:00.000Z level=INFO msg="My observable agent is running!"
Available Observability Features
Automatic Health Endpoints
Your agent automatically exposes:
- Health Check: http://localhost:8083/health
- Metrics: http://localhost:8083/metrics(Prometheus format)
- Readiness: http://localhost:8083/ready
Structured Logging
All logs are automatically structured with trace correlation:
{
  "time": "2025-09-29T10:00:00.000Z",
  "level": "INFO",
  "msg": "Task published",
  "trace_id": "abc123...",
  "span_id": "def456...",
  "task_type": "process_document",
  "correlation_id": "req_789"
}
Distributed Tracing
Traces are automatically created for:
- gRPC calls to broker
- Task publishing and subscribing
- Custom operations (when you use the TraceManager)
Metrics Collection
Automatic metrics include:
- Task processing duration
- Success/failure rates
- gRPC call metrics
- Health check status
Advanced Usage
Adding Custom Tracing
Use the built-in TraceManager for custom operations:
// Custom operation with tracing
ctx, span := client.TraceManager.StartPublishSpan(ctx, "my_operation", "document")
defer span.End()
// Add custom attributes
client.TraceManager.AddComponentAttribute(span, "my-component")
span.SetAttributes(attribute.String("document.id", "doc-123"))
// Your operation logic
result, err := doCustomOperation(ctx)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}
Adding Custom Metrics
Use the MetricsManager for custom metrics:
// Start timing an operation
timer := client.MetricsManager.StartTimer()
defer timer(ctx, "my_operation", "my-component")
// Your operation
processDocument()
Custom Log Fields
Use the structured logger with context:
client.Logger.InfoContext(ctx, "Processing document",
    "document_id", "doc-123",
    "user_id", "user-456",
    "processing_type", "ocr",
)
Publisher Example with Observability
package main
import (
	"context"
	"time"
	"github.com/owulveryck/agenthub/internal/agenthub"
	pb "github.com/owulveryck/agenthub/events/a2a"
	"google.golang.org/protobuf/types/known/structpb"
)
func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()
	// Observable client setup
	config := agenthub.NewGRPCConfig("publisher")
	config.HealthPort = "8081"
	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic(err)
	}
	defer client.Shutdown(context.Background())
	if err := client.Start(ctx); err != nil {
		panic(err)
	}
	// Create observable task publisher
	publisher := &agenthub.TaskPublisher{
		Client:         client.Client,
		TraceManager:   client.TraceManager,
		MetricsManager: client.MetricsManager,
		Logger:         client.Logger,
		ComponentName:  "publisher",
	}
	// Publish task with automatic tracing
	data, _ := structpb.NewStruct(map[string]interface{}{
		"message": "Hello, observable world!",
	})
	task := &pb.TaskMessage{
		TaskId:   "task-123",
		TaskType: "greeting",
		Data:     data,
		Priority: pb.Priority_MEDIUM,
	}
	// Automatically traced and metered
	if err := publisher.PublishTask(ctx, task); err != nil {
		client.Logger.ErrorContext(ctx, "Failed to publish task", "error", err)
	} else {
		client.Logger.InfoContext(ctx, "Task published successfully", "task_id", task.TaskId)
	}
}
Subscriber Example with Observability
package main
import (
	"context"
	"os"
	"os/signal"
	"syscall"
	"github.com/owulveryck/agenthub/internal/agenthub"
	pb "github.com/owulveryck/agenthub/events/a2a"
	"google.golang.org/protobuf/types/known/structpb"
)
func main() {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	// Observable client setup
	config := agenthub.NewGRPCConfig("subscriber")
	config.HealthPort = "8082"
	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic(err)
	}
	defer client.Shutdown(context.Background())
	if err := client.Start(ctx); err != nil {
		panic(err)
	}
	// Create observable task subscriber
	subscriber := agenthub.NewTaskSubscriber(client, "my-subscriber")
	// Register handler with automatic tracing
	subscriber.RegisterHandler("greeting", func(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
		// This is automatically traced and logged
		client.Logger.InfoContext(ctx, "Processing greeting task", "task_id", task.TaskId)
		// Your processing logic
		result, _ := structpb.NewStruct(map[string]interface{}{
			"response": "Hello back!",
		})
		return result, pb.TaskStatus_COMPLETED, ""
	})
	// Start processing with automatic observability
	go subscriber.StartProcessing(ctx)
	// Graceful shutdown
	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
	<-sigChan
}
Configuration Reference
π Complete Reference: For all environment variables and configuration options, see Environment Variables Reference
Key Environment Variables
| Variable | Description | Default | 
|---|
| JAEGER_ENDPOINT | Jaeger tracing endpoint | "" (tracing disabled) | 
| SERVICE_NAME | Service name for tracing | “agenthub-service” | 
| SERVICE_VERSION | Service version | “1.0.0” | 
| BROKER_HEALTH_PORT | Health endpoint port | “8080” | 
| AGENTHUB_BROKER_ADDR | Broker address | “localhost” | 
| AGENTHUB_BROKER_PORT | Broker port | “50051” | 
Health Endpoints
Each agent exposes these endpoints:
| Endpoint | Purpose | Response | 
|---|
| /health | Overall health status | JSON status | 
| /metrics | Prometheus metrics | Metrics format | 
| /ready | Readiness check | 200 OK or 503 | 
Troubleshooting
Common Issues
| Issue | Solution | 
|---|
| No traces in Jaeger | Set JAEGER_ENDPOINTenvironment variable | 
| Health endpoint not accessible | Check BROKER_HEALTH_PORTis unique | 
| Logs not structured | Ensure using client.Loggernot standardlog | 
| Missing correlation IDs | Use context.Contextin all operations | 
Verification Steps
- Check health endpoint: - curl http://localhost:8083/health
 
- Verify metrics: - curl http://localhost:8083/metrics
 
- Check traces in Jaeger: - Open http://localhost:16686
- Search for your service name
 
Migration from Manual Setup
If you have existing agents using manual observability setup:
Old Approach (Manual)
// 50+ lines of OpenTelemetry setup
obs, err := observability.NewObservability(config)
traceManager := observability.NewTraceManager(serviceName)
// Manual gRPC client setup
// Manual health server setup
New Approach (Unified)
// 3 lines - everything automatic
config := agenthub.NewGRPCConfig("my-agent")
client, err := agenthub.NewAgentHubClient(config)
client.Start(ctx)
The unified abstractions provide the same observability features with 90% less code and no manual setup required.
With AgentHub’s unified abstractions, observability is no longer an add-on feature but a built-in capability that comes automatically with every agent. Focus on your business logic while the platform handles monitoring, tracing, and health checks for you.
2.2.2 - How to Use Grafana Dashboards
Master the AgentHub observability dashboards to monitor, analyze, and troubleshoot your event-driven system effectively.
How to Use Grafana Dashboards
Goal-oriented guide: Master the AgentHub observability dashboards to monitor, analyze, and troubleshoot your event-driven system effectively.
Prerequisites
- AgentHub observability stack running (docker-compose up -d)
- AgentHub agents running with observability enabled
- Basic understanding of metrics concepts
- 10-15 minutes
Quick Access
- Grafana Dashboard: http://localhost:3333 (admin/admin)
- Direct Dashboard: http://localhost:3333/d/agenthub-eda-dashboard
Dashboard Overview
The AgentHub EDA System Observatory provides comprehensive monitoring across three main areas:
- Event Metrics (Top Row) - Event processing performance
- Distributed Tracing (Middle) - Request flow visualization
- System Health (Bottom Row) - Infrastructure monitoring
Panel-by-Panel Guide
π Event Processing Rate (Top Left)
What it shows: Events processed per second by each service
How to use:
- Monitor throughput: See how many events your system processes
- Identify bottlenecks: Low rates may indicate performance issues
- Compare services: See which agents are busiest
Reading the chart:
Green line: agenthub-broker (150 events/sec)
Blue line:  agenthub-publisher (50 events/sec)
Red line:   agenthub-subscriber (145 events/sec)
Troubleshooting:
- Flat lines: No activity - check if agents are running
- Dropping rates: Performance degradation - check CPU/memory
- Spiky patterns: Bursty workloads - consider load balancing
π¨ Event Processing Error Rate (Top Right)
What it shows: Percentage of events that failed processing
How to use:
- Monitor reliability: Should stay below 5% (green zone)
- Alert threshold: Yellow above 5%, red above 10%
- Quick health check: Single glance system reliability
Color coding:
- Green (0-5%): Healthy system
- Yellow (5-10%): Moderate issues
- Red (>10%): Critical problems
Troubleshooting:
- High error rates: Check Jaeger for failing traces
- Sudden spikes: Look for recent deployments or config changes
- Persistent errors: Check logs for recurring issues
π Event Types Distribution (Middle Left)
What it shows: Breakdown of event types by volume
How to use:
- Understand workload: See what types of tasks dominate
- Capacity planning: Identify which task types need scaling
- Anomaly detection: Unusual distributions may indicate issues
Example interpretation:
greeting: 40% (blue) - Most common task type
math_calculation: 35% (green) - Heavy computational tasks
random_number: 20% (yellow) - Quick tasks
unknown_task: 5% (red) - Error-generating tasks
Troubleshooting:
- Missing task types: Check if specific agents are down
- Unexpected distributions: May indicate upstream issues
- Dominant error types: Focus optimization efforts
β±οΈ Event Processing Latency (Middle Right)
What it shows: Processing time percentiles (p50, p95, p99)
How to use:
- Performance monitoring: Track how fast events are processed
- SLA compliance: Ensure latencies meet requirements
- Outlier detection: p99 shows worst-case scenarios
Understanding percentiles:
- p50 (median): 50% of events process faster than this
- p95: 95% of events process faster than this
- p99: 99% of events process faster than this
Healthy ranges:
- p50: < 50ms (very responsive)
- p95: < 200ms (good performance)
- p99: < 500ms (acceptable outliers)
Troubleshooting:
- Rising latencies: Check CPU/memory usage
- High p99: Look for resource contention or long-running tasks
- Flatlined metrics: May indicate measurement issues
π Distributed Traces (Middle Section)
What it shows: Integration with Jaeger for trace visualization
How to use:
- Click “Explore” to open Jaeger
- Select service from dropdown
- Find specific traces to debug issues
- Analyze request flows across services
When to use:
- Debugging errors: Find root cause of failures
- Performance analysis: Identify slow operations
- Understanding flows: See complete request journeys
π₯οΈ Service CPU Usage (Bottom Left)
What it shows: CPU utilization by service
How to use:
- Capacity monitoring: Ensure services aren’t overloaded
- Resource planning: Identify when to scale
- Performance correlation: High CPU often explains high latency
Healthy ranges:
- < 50%: Comfortable utilization
- 50-70%: Moderate load
- > 70%: Consider scaling
πΎ Service Memory Usage (Bottom Center)
What it shows: Memory consumption by service
How to use:
- Memory leak detection: Watch for continuously growing usage
- Capacity planning: Ensure sufficient memory allocation
- Garbage collection: High usage may impact performance
Monitoring tips:
- Steady growth: May indicate memory leaks
- Sawtooth pattern: Normal GC behavior
- Sudden spikes: Check for large event batches
π§΅ Go Goroutines (Bottom Right)
What it shows: Number of concurrent goroutines per service
How to use:
- Concurrency monitoring: Track parallel processing
- Resource leak detection: Continuously growing numbers indicate leaks
- Performance tuning: Optimize concurrency levels
Normal patterns:
- Stable baseline: Normal operation
- Activity spikes: During high load
- Continuous growth: Potential goroutine leaks
π₯ Service Health Status (Bottom Far Right)
What it shows: Up/down status of each service
How to use:
- Quick status check: See if all services are running
- Outage detection: Immediately identify down services
- Health monitoring: Green = UP, Red = DOWN
Dashboard Variables and Filters
Service Filter
Location: Top of dashboard
Purpose: Filter metrics by specific services
Usage:
- Select “All” to see everything
- Choose specific services to focus analysis
- Useful for isolating problems to specific components
Event Type Filter
Location: Top of dashboard
Purpose: Filter by event/task types
Usage:
- Analyze specific workflow types
- Debug particular task categories
- Compare performance across task types
Time Range Selector
Location: Top right of dashboard
Purpose: Control time window for analysis
Common ranges:
- 5 minutes: Real-time monitoring
- 1 hour: Recent trend analysis
- 24 hours: Daily pattern analysis
- 7 days: Weekly trend and capacity planning
Advanced Usage Patterns
- Start with Overview: - Check error rates (should be < 5%)
- Verify processing rates look normal
- Scan for any red/yellow indicators
 
- Drill Down on Issues: - If high error rates β check distributed traces
- If high latency β examine CPU/memory usage
- If low throughput β check service health
 
- Root Cause Analysis: - Use time range selector to find when problems started
- Filter by specific services to isolate issues
- Correlate metrics across different panels
 
Capacity Planning Workflow
- Analyze Peak Patterns: - Set time range to 7 days
- Identify peak usage periods
- Note maximum throughput achieved
 
- Resource Utilization: - Check CPU usage during peaks
- Monitor memory consumption trends
- Verify goroutine scaling behavior
 
- Plan Scaling: - If CPU > 70% during peaks, scale up
- If memory continuously growing, investigate leaks
- If error rates spike during load, optimize before scaling
 
Troubleshooting Workflow
- Identify Symptoms: - High error rates: Focus on traces and logs
- High latency: Check resource utilization
- Low throughput: Verify service health
 
- Time Correlation: - Use time range to find when issues started
- Look for correlated changes across metrics
- Check for deployment or configuration changes
 
- Service Isolation: - Use service filter to identify problematic components
- Compare healthy vs unhealthy services
- Check inter-service dependencies
 
Dashboard Customization
Adding New Panels
- Click “+ Add panel” in top menu
- Choose visualization type:- Time series for trends
- Stat for current values
- Gauge for thresholds
 
- Configure query:# Example: Custom error rate
rate(my_custom_errors_total[5m]) / rate(my_custom_requests_total[5m]) * 100
 
Creating Alerts
- Edit existing panel or create new one
- Click “Alert” tab
- Configure conditions:Query: rate(event_errors_total[5m]) / rate(events_processed_total[5m]) * 100
Condition: IS ABOVE 5
Evaluation: Every 1m for 2m
 
- Set notification channels
Custom Time Ranges
- Click time picker (top right)
- Select “Custom range”
- Set specific dates/times for historical analysis
- Use “Refresh” settings for auto-updating
Troubleshooting Dashboard Issues
Dashboard Not Loading
# Check Grafana status
docker-compose ps grafana
# Check Grafana logs
docker-compose logs grafana
# Restart if needed
docker-compose restart grafana
No Data in Panels
# Check Prometheus connection
curl http://localhost:9090/api/v1/targets
# Verify agents are exposing metrics
curl http://localhost:8080/metrics
curl http://localhost:8081/metrics
curl http://localhost:8082/metrics
# Check Prometheus configuration
docker-compose logs prometheus
- Reduce time range: Use shorter windows for better performance
- Limit service selection: Filter to specific services
- Optimize queries: Use appropriate rate intervals
- Check resource usage: Ensure Prometheus has enough memory
Authentication Issues
- Default credentials: admin/admin
- Reset password: Through Grafana UI after first login
- Lost access: Restart Grafana container to reset
Best Practices
Regular Monitoring
- Check dashboard daily: Quick health overview
- Weekly reviews: Trend analysis and capacity planning
- Set up alerts: Proactive monitoring for critical metrics
- Use appropriate time ranges: Don’t query more data than needed
- Filter effectively: Use service and event type filters
- Refresh intervals: Balance real-time needs with performance
Team Usage
- Share dashboard URLs: Bookmark specific views
- Create annotations: Mark deployments and incidents
- Export snapshots: Share findings with team members
Jaeger Integration
- Click Explore in traces panel
- Auto-links to Jaeger with service context
- Correlate traces with metrics timeframes
Prometheus Integration
- Click Explore on any panel
- Edit queries in Prometheus query language
- Access raw metrics for custom analysis
Log Correlation
- Use trace IDs from Jaeger
- Search logs for matching trace IDs
- Correlate log events with metric spikes
π― Next Steps:
Deep Debugging: Debug with Distributed Tracing
Production Setup: Configure Alerts
Understanding: Observability Architecture Explained
2.3 - Agent2Agent Protocol
Learn how to work with Agent2Agent (A2A) protocol components including messages, conversation contexts, artifacts, and task lifecycle management.
Agent2Agent Protocol How-To Guides
This section provides practical guides for working with the Agent2Agent (A2A) protocol in AgentHub. These guides show you how to implement A2A-compliant communication patterns for building robust agent systems.
Available Guides
Learn how to create, structure, and process A2A messages with text, data, and file content parts. This is the foundation for all A2A communication.
Understand how to manage conversation contexts for multi-turn interactions, workflow coordination, and state preservation across agent communications.
Master the creation and handling of A2A artifacts - structured outputs that deliver rich results from completed tasks.
Learn how to manage the complete task lifecycle from creation through completion, including state transitions, progress updates, and error handling.
A2A Protocol Benefits
The Agent2Agent protocol provides:
- Structured Communication: Standardized message formats with rich content types
- Conversation Threading: Context-aware message grouping for complex workflows
- Rich Artifacts: Structured outputs with multiple content types
- Lifecycle Management: Complete task state tracking from submission to completion
- Interoperability: Standards-based communication for multi-vendor agent systems
Prerequisites
Before following these guides:
- Complete the Installation and Setup tutorial
- Run the AgentHub Demo to see A2A in action
- Understand the Agent2Agent Principle
Implementation Approach
These guides use AgentHub’s unified abstractions from internal/agenthub which provide:
- A2ATaskPublisher: Simplified A2A task creation and publishing
- A2ATaskSubscriber: Streamlined A2A task processing and response generation
- Automatic Observability: Built-in tracing, metrics, and logging
- Environment Configuration: Zero-config setup with environment variables
Start with the A2A Messages guide to learn the fundamentals, then progress through the other guides to build complete A2A-compliant agent systems.
2.3.1 - How to Work with A2A Messages
Learn how to create, structure, and work with Agent2Agent protocol messages including text, data, and file parts.
How to Work with A2A Messages
This guide shows you how to create and work with Agent2Agent (A2A) protocol messages using AgentHub’s unified abstractions. A2A messages are the foundation of all agent communication.
Understanding A2A Message Structure
A2A messages consist of several key components:
- Message ID: Unique identifier for the message
- Context ID: Groups related messages in a conversation
- Task ID: Links the message to a specific task
- Role: Indicates if the message is from USER (requester) or AGENT (responder)
- Content Parts: The actual message content (text, data, or files)
- Metadata: Additional context for routing and processing
Creating Basic A2A Messages
Text Messages
Create a simple text message:
package main
import (
    "context"
    "fmt"
    "log"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/timestamppb"
)
func createTextMessage() *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_greeting",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Hello! Please process this greeting request.",
                },
            },
        },
        Metadata: nil, // Optional
    }
}
Data Messages
Include structured data in your message:
import (
    "google.golang.org/protobuf/types/known/structpb"
)
func createDataMessage() *pb.Message {
    // Create structured data
    data, err := structpb.NewStruct(map[string]interface{}{
        "operation": "calculate",
        "numbers":   []float64{10, 20, 30},
        "formula":   "sum",
        "precision": 2,
    })
    if err != nil {
        log.Fatal(err)
    }
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_math",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please perform the calculation described in the data.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        data,
                        Description: "Calculation parameters",
                    },
                },
            },
        },
    }
}
File Reference Messages
Reference files in your messages:
func createFileMessage() *pb.Message {
    // Create file metadata
    fileMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "source":      "user_upload",
        "category":    "image",
        "permissions": "read-only",
    })
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_image_analysis",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please analyze the uploaded image.",
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "file_abc123",
                        Filename: "analysis_target.jpg",
                        MimeType: "image/jpeg",
                        SizeBytes: 2048576, // 2MB
                        Metadata:  fileMetadata,
                    },
                },
            },
        },
    }
}
Working with Mixed Content
Combine multiple part types in a single message:
func createMixedContentMessage() *pb.Message {
    // Configuration data
    config, _ := structpb.NewStruct(map[string]interface{}{
        "format":     "json",
        "output_dir": "/results",
        "compress":   true,
    })
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_data_processing",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Process the dataset with the following configuration and source file.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        config,
                        Description: "Processing configuration",
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "dataset_xyz789",
                        Filename: "raw_data.csv",
                        MimeType: "text/csv",
                        SizeBytes: 5242880, // 5MB
                    },
                },
            },
        },
    }
}
Publishing A2A Messages
Use AgentHub’s unified abstractions to publish messages:
package main
import (
    "context"
    "log"
    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/eventbus"
)
func publishA2AMessage(ctx context.Context) error {
    // Create AgentHub client
    config := agenthub.NewGRPCConfig("message_publisher")
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        return err
    }
    defer client.Close()
    // Create A2A message
    message := createTextMessage()
    // Publish using AgentHub client
    response, err := client.Client.PublishMessage(ctx, &pb.PublishMessageRequest{
        Message: message,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "message_publisher",
            ToAgentId:   "message_processor",
            EventType:   "a2a.message",
            Priority:    pb.Priority_PRIORITY_MEDIUM,
        },
    })
    if err != nil {
        return err
    }
    log.Printf("A2A message published: %s", response.GetEventId())
    return nil
}
Processing Received A2A Messages
Handle incoming A2A messages in your agent:
func processA2AMessage(ctx context.Context, message *pb.Message) (string, error) {
    var response string
    // Process each content part
    for i, part := range message.GetContent() {
        switch content := part.GetPart().(type) {
        case *pb.Part_Text:
            log.Printf("Text part %d: %s", i, content.Text)
            response += fmt.Sprintf("Processed text: %s\n", content.Text)
        case *pb.Part_Data:
            log.Printf("Data part %d: %s", i, content.Data.GetDescription())
            // Process structured data
            data := content.Data.GetData()
            response += fmt.Sprintf("Processed data: %s\n", content.Data.GetDescription())
            // Access specific fields
            if operation, ok := data.GetFields()["operation"]; ok {
                log.Printf("Operation: %s", operation.GetStringValue())
            }
        case *pb.Part_File:
            log.Printf("File part %d: %s (%s)", i, content.File.GetFilename(), content.File.GetMimeType())
            response += fmt.Sprintf("Processed file: %s\n", content.File.GetFilename())
            // Handle file processing based on MIME type
            switch content.File.GetMimeType() {
            case "image/jpeg", "image/png":
                // Process image
                response += "Image analysis completed\n"
            case "text/csv":
                // Process CSV data
                response += "CSV data parsed\n"
            }
        }
    }
    return response, nil
}
Message Role Management
Properly set message roles for A2A compliance:
// User message (requesting work)
func createUserMessage(content string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{Text: content},
            },
        },
    }
}
// Agent response message
func createAgentResponse(contextId, taskId, response string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextId,
        TaskId:    taskId,
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{Text: response},
            },
        },
    }
}
Message Validation
Validate A2A messages before publishing:
func validateA2AMessage(message *pb.Message) error {
    if message.GetMessageId() == "" {
        return fmt.Errorf("message_id is required")
    }
    if message.GetRole() == pb.Role_ROLE_UNSPECIFIED {
        return fmt.Errorf("role must be specified (USER or AGENT)")
    }
    if len(message.GetContent()) == 0 {
        return fmt.Errorf("message must have at least one content part")
    }
    // Validate each part
    for i, part := range message.GetContent() {
        if part.GetPart() == nil {
            return fmt.Errorf("content part %d is empty", i)
        }
    }
    return nil
}
Best Practices
1. Always Use Unique Message IDs
messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())
2. Group Related Messages with Context IDs
contextID := fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())
dataPart := &pb.DataPart{
    Data:        structData,
    Description: "User preferences for recommendation engine",
}
4. Validate Messages Before Publishing
if err := validateA2AMessage(message); err != nil {
    return fmt.Errorf("invalid A2A message: %w", err)
}
5. Handle All Part Types in Message Processors
switch content := part.GetPart().(type) {
case *pb.Part_Text:
    // Handle text
case *pb.Part_Data:
    // Handle structured data
case *pb.Part_File:
    // Handle file references
default:
    log.Printf("Unknown part type: %T", content)
}
This guide covered the fundamentals of working with A2A messages. Next, learn about A2A Conversation Context to group related messages and maintain conversation state across multiple interactions.
2.3.2 - How to Work with A2A Conversation Context
Learn how to manage conversation contexts in Agent2Agent protocol for multi-turn interactions and workflow coordination.
How to Work with A2A Conversation Context
This guide shows you how to use A2A conversation contexts to group related messages, maintain state across interactions, and coordinate multi-agent workflows.
Understanding A2A Conversation Context
A2A conversation context is identified by a context_id that groups related messages and tasks. This enables:
- Multi-turn conversations between agents
- Workflow coordination across multiple tasks
- State preservation throughout long-running processes
- Message threading for audit trails
- Context-aware routing based on conversation history
Creating Conversation Contexts
Simple Conversation Context
Start a basic conversation context:
package main
import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
)
func createConversationContext(workflowType string) string {
    return fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())
}
func startConversation() *pb.Message {
    contextID := createConversationContext("user_onboarding")
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please start the user onboarding process for new user.",
                },
            },
        },
    }
}
Workflow-Specific Contexts
Create contexts for different workflow types:
func createWorkflowContexts() map[string]string {
    return map[string]string{
        "data_analysis":    createConversationContext("data_analysis"),
        "image_processing": createConversationContext("image_processing"),
        "user_support":     createConversationContext("user_support"),
        "integration_test": createConversationContext("integration_test"),
    }
}
Multi-Turn Conversations
Conversation Initiation
Start a conversation with initial context:
import (
    "google.golang.org/protobuf/types/known/structpb"
)
func initiateDataAnalysisConversation() *pb.Message {
    contextID := createConversationContext("data_analysis")
    // Initial conversation metadata
    contextMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "workflow_type":    "data_analysis",
        "initiated_by":     "user_12345",
        "priority":         "high",
        "expected_steps":   []string{"validation", "processing", "analysis", "report"},
        "timeout_minutes":  30,
    })
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please analyze the uploaded dataset and provide insights.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        contextMetadata,
                        Description: "Conversation context and workflow parameters",
                    },
                },
            },
        },
        Metadata: contextMetadata,
    }
}
Continuing the Conversation
Add follow-up messages to the same context:
func continueConversation(contextID, previousMessageID string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID, // Same context as initial message
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Can you also include trend analysis in the report?",
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "follows_message": structpb.NewStringValue(previousMessageID),
                "conversation_turn": structpb.NewNumberValue(2),
            },
        },
    }
}
Agent Responses in Context
Agents respond within the same conversation context:
func createAgentResponse(contextID, requestMessageID, response string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID, // Same context as request
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: response,
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "responding_to": structpb.NewStringValue(requestMessageID),
                "agent_id":      structpb.NewStringValue("data_analysis_agent"),
            },
        },
    }
}
Context-Aware Task Management
Creating Tasks with Context
Link tasks to conversation contexts:
import (
    "google.golang.org/protobuf/types/known/timestamppb"
)
func createContextAwareTask(contextID string) *pb.Task {
    taskID := fmt.Sprintf("task_%s_%s", "analysis", uuid.New().String())
    return &pb.Task{
        Id:        taskID,
        ContextId: contextID, // Link to conversation
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: contextID,
                TaskId:    taskID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Task submitted for data analysis workflow",
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
        History: []*pb.Message{}, // Will be populated during processing
        Artifacts: []*pb.Artifact{}, // Will be populated on completion
    }
}
Context-Based Task Querying
Retrieve all tasks for a conversation context:
func getTasksForContext(ctx context.Context, client pb.AgentHubClient, contextID string) ([]*pb.Task, error) {
    response, err := client.ListTasks(ctx, &pb.ListTasksRequest{
        ContextId: contextID,
        Limit:     100,
    })
    if err != nil {
        return nil, err
    }
    return response.GetTasks(), nil
}
Workflow Coordination
Multi-Agent Workflow with Shared Context
Coordinate multiple agents within a single conversation:
type WorkflowCoordinator struct {
    client    pb.AgentHubClient
    contextID string
    logger    *log.Logger
}
func (wc *WorkflowCoordinator) ExecuteDataPipeline(ctx context.Context) error {
    // Step 1: Data Validation
    validationTask := &pb.Task{
        Id:        fmt.Sprintf("task_validation_%s", uuid.New().String()),
        ContextId: wc.contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: wc.contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Validate uploaded dataset for quality and completeness",
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }
    // Publish validation task
    _, err := wc.client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
        Task: validationTask,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "data_validator",
            EventType:   "task.validation",
            Priority:    pb.Priority_PRIORITY_HIGH,
        },
    })
    if err != nil {
        return err
    }
    // Step 2: Data Processing (after validation)
    processingTask := &pb.Task{
        Id:        fmt.Sprintf("task_processing_%s", uuid.New().String()),
        ContextId: wc.contextID, // Same context
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: wc.contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Process validated dataset and extract features",
                        },
                    },
                },
                Metadata: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "depends_on": structpb.NewStringValue(validationTask.GetId()),
                        "workflow_step": structpb.NewNumberValue(2),
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }
    // Publish processing task
    _, err = wc.client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
        Task: processingTask,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "data_processor",
            EventType:   "task.processing",
            Priority:    pb.Priority_PRIORITY_MEDIUM,
        },
    })
    return err
}
Context State Management
Tracking Conversation State
Maintain state throughout the conversation:
type ConversationState struct {
    ContextID     string                 `json:"context_id"`
    WorkflowType  string                 `json:"workflow_type"`
    CurrentStep   int                    `json:"current_step"`
    TotalSteps    int                    `json:"total_steps"`
    CompletedTasks []string              `json:"completed_tasks"`
    PendingTasks   []string              `json:"pending_tasks"`
    Variables      map[string]interface{} `json:"variables"`
    CreatedAt      time.Time             `json:"created_at"`
    UpdatedAt      time.Time             `json:"updated_at"`
}
func (cs *ConversationState) ToMetadata() (*structpb.Struct, error) {
    data := map[string]interface{}{
        "context_id":      cs.ContextID,
        "workflow_type":   cs.WorkflowType,
        "current_step":    cs.CurrentStep,
        "total_steps":     cs.TotalSteps,
        "completed_tasks": cs.CompletedTasks,
        "pending_tasks":   cs.PendingTasks,
        "variables":       cs.Variables,
        "updated_at":      cs.UpdatedAt.Format(time.RFC3339),
    }
    return structpb.NewStruct(data)
}
func (cs *ConversationState) UpdateFromMessage(message *pb.Message) {
    cs.UpdatedAt = time.Now()
    // Extract state updates from message metadata
    if metadata := message.GetMetadata(); metadata != nil {
        if step, ok := metadata.GetFields()["current_step"]; ok {
            cs.CurrentStep = int(step.GetNumberValue())
        }
        if vars, ok := metadata.GetFields()["variables"]; ok {
            if varsStruct := vars.GetStructValue(); varsStruct != nil {
                for key, value := range varsStruct.GetFields() {
                    cs.Variables[key] = value
                }
            }
        }
    }
}
State-Aware Message Creation
Include conversation state in messages:
func createStateAwareMessage(contextID string, state *ConversationState, content string) *pb.Message {
    stateMetadata, _ := state.ToMetadata()
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: content,
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        stateMetadata,
                        Description: "Current conversation state",
                    },
                },
            },
        },
        Metadata: stateMetadata,
    }
}
Context-Based Routing
Route Messages Based on Context
Use conversation context for intelligent routing:
func routeByContext(contextID string) *pb.AgentEventMetadata {
    // Determine routing based on context type
    var targetAgent string
    var eventType string
    if strings.Contains(contextID, "data_analysis") {
        targetAgent = "data_analysis_agent"
        eventType = "data.analysis"
    } else if strings.Contains(contextID, "image_processing") {
        targetAgent = "image_processor"
        eventType = "image.processing"
    } else if strings.Contains(contextID, "user_support") {
        targetAgent = "support_agent"
        eventType = "support.request"
    } else {
        targetAgent = "" // Broadcast to all agents
        eventType = "general.message"
    }
    return &pb.AgentEventMetadata{
        FromAgentId:   "context_router",
        ToAgentId:     targetAgent,
        EventType:     eventType,
        Subscriptions: []string{eventType},
        Priority:      pb.Priority_PRIORITY_MEDIUM,
    }
}
Subscribe to Context-Specific Events
Agents can subscribe to specific conversation contexts:
func subscribeToContextEvents(ctx context.Context, client pb.AgentHubClient, agentID, contextPattern string) error {
    stream, err := client.SubscribeToMessages(ctx, &pb.SubscribeToMessagesRequest{
        AgentId: agentID,
        ContextPattern: contextPattern, // e.g., "ctx_data_analysis_*"
    })
    if err != nil {
        return err
    }
    for {
        event, err := stream.Recv()
        if err != nil {
            return err
        }
        if message := event.GetMessage(); message != nil {
            log.Printf("Received context message: %s in context: %s",
                message.GetMessageId(), message.GetContextId())
            // Process message within context
            processContextMessage(ctx, message)
        }
    }
}
Best Practices
1. Use Descriptive Context IDs
contextID := fmt.Sprintf("ctx_%s_%s_%s", workflowType, userID, uuid.New().String())
2. Preserve Context Across All Related Messages
// All messages in the same workflow should use the same context_id
message.ContextId = existingContextID
3. Include Context Metadata for State Tracking
contextMetadata := map[string]interface{}{
    "workflow_type":   "data_pipeline",
    "initiated_by":    userID,
    "current_step":    stepNumber,
    "total_steps":     totalSteps,
}
4. Use Context for Task Dependencies
taskMetadata := map[string]interface{}{
    "context_id":     contextID,
    "depends_on":     previousTaskID,
    "workflow_step":  stepNumber,
}
5. Handle Context Cleanup
// Set context expiration for long-running workflows
contextMetadata["expires_at"] = time.Now().Add(24 * time.Hour).Format(time.RFC3339)
This guide covered conversation context management in A2A protocol. Next, learn about Working with A2A Artifacts to understand how to create and manage structured outputs from completed tasks.
2.3.3 - How to Work with A2A Artifacts
Learn how to create, structure, and deliver Agent2Agent protocol artifacts as structured outputs from completed tasks.
How to Work with A2A Artifacts
This guide shows you how to create and work with Agent2Agent (A2A) protocol artifacts, which are structured outputs delivered when tasks are completed. Artifacts provide rich, typed results that can include text reports, data files, structured data, and more.
Understanding A2A Artifacts
A2A artifacts are structured containers for task outputs that include:
- Artifact ID: Unique identifier for the artifact
- Name: Human-readable name for the artifact
- Description: Explanation of what the artifact contains
- Parts: The actual content (text, data, files)
- Metadata: Additional context about the artifact
Artifacts are typically generated when tasks reach TASK_STATE_COMPLETED status.
Creating Basic Artifacts
Text Report Artifacts
Create simple text-based results:
package main
import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)
func createTextReportArtifact(taskID, reportContent string) *pb.Artifact {
    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_%s_%s", taskID, uuid.New().String()),
        Name:        "Analysis Report",
        Description: "Detailed analysis results and recommendations",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: reportContent,
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type": structpb.NewStringValue("report"),
                "format":        structpb.NewStringValue("text"),
                "task_id":       structpb.NewStringValue(taskID),
                "generated_at":  structpb.NewStringValue(time.Now().Format(time.RFC3339)),
            },
        },
    }
}
Data Analysis Artifacts
Create artifacts with structured analysis results:
func createDataAnalysisArtifact(taskID string, results map[string]interface{}) *pb.Artifact {
    // Convert results to structured data
    resultsData, err := structpb.NewStruct(results)
    if err != nil {
        log.Printf("Error creating results data: %v", err)
        resultsData = &structpb.Struct{}
    }
    // Create summary statistics
    summary := map[string]interface{}{
        "total_records":    results["record_count"],
        "processing_time":  results["duration_ms"],
        "success_rate":     results["success_percentage"],
        "anomalies_found":  results["anomaly_count"],
    }
    summaryData, _ := structpb.NewStruct(summary)
    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_analysis_%s", uuid.New().String()),
        Name:        "Data Analysis Results",
        Description: "Complete analysis results with statistics and insights",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Data analysis completed successfully. See attached results for detailed findings.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        resultsData,
                        Description: "Complete analysis results",
                    },
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        summaryData,
                        Description: "Summary statistics",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":   structpb.NewStringValue("analysis"),
                "analysis_type":   structpb.NewStringValue("statistical"),
                "data_source":     structpb.NewStringValue(results["source"].(string)),
                "record_count":    structpb.NewNumberValue(results["record_count"].(float64)),
                "processing_time": structpb.NewNumberValue(results["duration_ms"].(float64)),
            },
        },
    }
}
File-Based Artifacts
Create artifacts that reference generated files:
func createFileArtifact(taskID, fileID, filename, mimeType string, sizeBytes int64) *pb.Artifact {
    // File metadata
    fileMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "generated_by":   "data_processor_v1.2",
        "file_version":   "1.0",
        "encoding":       "utf-8",
        "compression":    "gzip",
        "checksum_sha256": "abc123...", // Calculate actual checksum
    })
    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_file_%s", uuid.New().String()),
        Name:        "Processed Dataset",
        Description: "Cleaned and processed dataset ready for analysis",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Dataset processing completed. Generated file: %s", filename),
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   fileID,
                        Filename: filename,
                        MimeType: mimeType,
                        SizeBytes: sizeBytes,
                        Metadata:  fileMetadata,
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":    structpb.NewStringValue("file"),
                "file_type":        structpb.NewStringValue("dataset"),
                "original_task":    structpb.NewStringValue(taskID),
                "processing_stage": structpb.NewStringValue("cleaned"),
            },
        },
    }
}
Complex Multi-Part Artifacts
Complete Analysis Package
Create comprehensive artifacts with multiple content types:
func createCompleteAnalysisArtifact(taskID string, analysisResults map[string]interface{}) *pb.Artifact {
    // Executive summary
    summary := fmt.Sprintf(`
Analysis Complete: %s
Key Findings:
- Processed %v records
- Found %v anomalies
- Success rate: %v%%
- Processing time: %v ms
Recommendations:
%s
`,
        analysisResults["dataset_name"],
        analysisResults["record_count"],
        analysisResults["anomaly_count"],
        analysisResults["success_percentage"],
        analysisResults["duration_ms"],
        analysisResults["recommendations"],
    )
    // Detailed results data
    detailedResults, _ := structpb.NewStruct(analysisResults)
    // Configuration used
    configData, _ := structpb.NewStruct(map[string]interface{}{
        "algorithm":         "statistical_analysis_v2",
        "confidence_level":  0.95,
        "outlier_threshold": 2.5,
        "normalization":     "z-score",
    })
    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_complete_%s", uuid.New().String()),
        Name:        "Complete Analysis Package",
        Description: "Full analysis results including summary, data, configuration, and generated files",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: summary,
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        detailedResults,
                        Description: "Detailed analysis results and metrics",
                    },
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        configData,
                        Description: "Analysis configuration parameters",
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "results_visualization_123",
                        Filename: "analysis_charts.png",
                        MimeType: "image/png",
                        SizeBytes: 1024000,
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "results_dataset_456",
                        Filename: "processed_data.csv",
                        MimeType: "text/csv",
                        SizeBytes: 5120000,
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":     structpb.NewStringValue("complete_package"),
                "analysis_type":     structpb.NewStringValue("comprehensive"),
                "includes_files":    structpb.NewBoolValue(true),
                "includes_data":     structpb.NewBoolValue(true),
                "includes_summary":  structpb.NewBoolValue(true),
                "file_count":        structpb.NewNumberValue(2),
                "total_size_bytes":  structpb.NewNumberValue(6144000),
            },
        },
    }
}
Publishing Artifacts
Using A2A Task Completion
Publish artifacts when completing tasks:
import (
    "context"
    "github.com/owulveryck/agenthub/internal/agenthub"
    eventbus "github.com/owulveryck/agenthub/events/eventbus"
)
func completeTaskWithArtifact(ctx context.Context, client eventbus.AgentHubClient, task *pb.Task, artifact *pb.Artifact) error {
    // Update task status to completed
    task.Status = &pb.TaskStatus{
        State: pb.TaskState_TASK_STATE_COMPLETED,
        Update: &pb.Message{
            MessageId: fmt.Sprintf("msg_completion_%s", uuid.New().String()),
            ContextId: task.GetContextId(),
            TaskId:    task.GetId(),
            Role:      pb.Role_AGENT,
            Content: []*pb.Part{
                {
                    Part: &pb.Part_Text{
                        Text: "Task completed successfully. Artifact has been generated.",
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    }
    // Add artifact to task
    task.Artifacts = append(task.Artifacts, artifact)
    // Publish task completion
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: task,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "processing_agent",
            ToAgentId:   "", // Broadcast completion
            EventType:   "task.completed",
            Priority:    eventbus.Priority_PRIORITY_MEDIUM,
        },
    })
    if err != nil {
        return fmt.Errorf("failed to publish task completion: %w", err)
    }
    // Separately publish artifact update
    return publishArtifactUpdate(ctx, client, task.GetId(), artifact)
}
func publishArtifactUpdate(ctx context.Context, client eventbus.AgentHubClient, taskID string, artifact *pb.Artifact) error {
    _, err := client.PublishTaskArtifact(ctx, &eventbus.PublishTaskArtifactRequest{
        TaskId:   taskID,
        Artifact: artifact,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "processing_agent",
            ToAgentId:   "", // Broadcast to interested parties
            EventType:   "artifact.created",
            Priority:    eventbus.Priority_PRIORITY_LOW,
        },
    })
    return err
}
Using A2A Abstractions
Use AgentHub’s simplified artifact publishing:
func completeTaskWithA2AArtifact(ctx context.Context, subscriber *agenthub.A2ATaskSubscriber, task *pb.Task, artifact *pb.Artifact) error {
    return subscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
}
Processing Received Artifacts
Artifact Event Handling
Handle incoming artifact notifications:
func handleArtifactEvents(ctx context.Context, client eventbus.AgentHubClient, agentID string) error {
    stream, err := client.SubscribeToAgentEvents(ctx, &eventbus.SubscribeToAgentEventsRequest{
        AgentId: agentID,
        EventTypes: []string{"artifact.created", "task.completed"},
    })
    if err != nil {
        return err
    }
    for {
        event, err := stream.Recv()
        if err != nil {
            return err
        }
        switch payload := event.GetPayload().(type) {
        case *eventbus.AgentEvent_ArtifactUpdate:
            artifactEvent := payload.ArtifactUpdate
            log.Printf("Received artifact: %s for task: %s",
                artifactEvent.GetArtifact().GetArtifactId(),
                artifactEvent.GetTaskId())
            // Process the artifact
            err := processArtifact(ctx, artifactEvent.GetArtifact())
            if err != nil {
                log.Printf("Error processing artifact: %v", err)
            }
        case *eventbus.AgentEvent_Task:
            task := payload.Task
            if task.GetStatus().GetState() == pb.TaskState_TASK_STATE_COMPLETED {
                // Process completed task artifacts
                for _, artifact := range task.GetArtifacts() {
                    err := processArtifact(ctx, artifact)
                    if err != nil {
                        log.Printf("Error processing task artifact: %v", err)
                    }
                }
            }
        }
    }
}
Artifact Content Processing
Process different types of artifact content:
func processArtifact(ctx context.Context, artifact *pb.Artifact) error {
    log.Printf("Processing artifact: %s - %s", artifact.GetName(), artifact.GetDescription())
    for i, part := range artifact.GetParts() {
        switch content := part.GetPart().(type) {
        case *pb.Part_Text:
            log.Printf("Text part %d: Processing text content (%d chars)", i, len(content.Text))
            // Process text content
            err := processTextArtifact(content.Text)
            if err != nil {
                return fmt.Errorf("failed to process text part: %w", err)
            }
        case *pb.Part_Data:
            log.Printf("Data part %d: Processing structured data (%s)", i, content.Data.GetDescription())
            // Process structured data
            err := processDataArtifact(content.Data.GetData())
            if err != nil {
                return fmt.Errorf("failed to process data part: %w", err)
            }
        case *pb.Part_File:
            log.Printf("File part %d: Processing file %s (%s, %d bytes)",
                i, content.File.GetFilename(), content.File.GetMimeType(), content.File.GetSizeBytes())
            // Process file reference
            err := processFileArtifact(ctx, content.File)
            if err != nil {
                return fmt.Errorf("failed to process file part: %w", err)
            }
        }
    }
    return nil
}
func processTextArtifact(text string) error {
    // Extract insights, save to database, etc.
    log.Printf("Extracting insights from text artifact...")
    return nil
}
func processDataArtifact(data *structpb.Struct) error {
    // Parse structured data, update metrics, etc.
    log.Printf("Processing structured data artifact...")
    // Access specific fields
    if recordCount, ok := data.GetFields()["record_count"]; ok {
        log.Printf("Records processed: %v", recordCount.GetNumberValue())
    }
    return nil
}
func processFileArtifact(ctx context.Context, file *pb.FilePart) error {
    // Download file, process content, etc.
    log.Printf("Processing file artifact: %s", file.GetFileId())
    // Handle different file types
    switch file.GetMimeType() {
    case "text/csv":
        return processCSVFile(ctx, file.GetFileId())
    case "image/png", "image/jpeg":
        return processImageFile(ctx, file.GetFileId())
    case "application/json":
        return processJSONFile(ctx, file.GetFileId())
    default:
        log.Printf("Unknown file type: %s", file.GetMimeType())
    }
    return nil
}
Artifact Chaining
Use artifacts from one task as inputs to another:
func chainArtifactProcessing(ctx context.Context, client eventbus.AgentHubClient, inputArtifact *pb.Artifact) error {
    // Create a new task using the artifact as input
    contextID := fmt.Sprintf("ctx_chained_%s", uuid.New().String())
    chainedTask := &pb.Task{
        Id:        fmt.Sprintf("task_chained_%s", uuid.New().String()),
        ContextId: contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Please process the results from the previous analysis task.",
                        },
                    },
                    {
                        Part: &pb.Part_Data{
                            Data: &pb.DataPart{
                                Data: &structpb.Struct{
                                    Fields: map[string]*structpb.Value{
                                        "input_artifact_id": structpb.NewStringValue(inputArtifact.GetArtifactId()),
                                        "processing_type":   structpb.NewStringValue("enhancement"),
                                    },
                                },
                                Description: "Processing parameters with input artifact reference",
                            },
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }
    // Publish the chained task
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: chainedTask,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "enhancement_processor",
            EventType:   "task.chained",
            Priority:    eventbus.Priority_PRIORITY_MEDIUM,
        },
    })
    return err
}
Best Practices
1. Use Descriptive Artifact Names and Descriptions
artifact := &pb.Artifact{
    Name:        "Customer Segmentation Analysis Results",
    Description: "Complete customer segmentation with demographics, behavior patterns, and actionable insights",
    // ...
}
metadata := map[string]interface{}{
    "artifact_type":    "analysis",
    "domain":          "customer_analytics",
    "data_source":     "customer_transactions_2024",
    "algorithm":       "k_means_clustering",
    "confidence":      0.94,
    "generated_by":    "analytics_engine_v2.1",
    "valid_until":     time.Now().Add(30*24*time.Hour).Format(time.RFC3339),
}
3. Structure Multi-Part Artifacts Logically
// Order parts from most important to least important
parts := []*pb.Part{
    textSummaryPart,      // Human-readable summary first
    structuredDataPart,   // Machine-readable data second
    configurationPart,    // Configuration details third
    fileReferencePart,    // File references last
}
4. Validate Artifacts Before Publishing
func validateArtifact(artifact *pb.Artifact) error {
    if artifact.GetArtifactId() == "" {
        return fmt.Errorf("artifact_id is required")
    }
    if len(artifact.GetParts()) == 0 {
        return fmt.Errorf("artifact must have at least one part")
    }
    return nil
}
5. Handle Large Artifacts Appropriately
// For large data, use file references instead of inline data
if len(dataBytes) > 1024*1024 { // 1MB threshold
    // Save to file storage and reference
    fileID := saveToFileStorage(dataBytes)
    part = createFileReferencePart(fileID, filename, mimeType)
} else {
    // Include data inline
    part = createInlineDataPart(data)
}
This guide covered creating and working with A2A artifacts. Next, learn about A2A Task Lifecycle Management to understand how to properly manage task states and coordinate complex workflows.
2.3.4 - How to Work with A2A Task Lifecycle
Learn how to manage Agent2Agent protocol task states, handle lifecycle transitions, and coordinate complex task workflows.
How to Work with A2A Task Lifecycle
This guide shows you how to manage the complete lifecycle of Agent2Agent (A2A) protocol tasks, from creation through completion. Understanding task states and transitions is essential for building reliable agent workflows.
Understanding A2A Task States
A2A tasks progress through the following states:
- TASK_STATE_SUBMITTED: Task created and submitted for processing
- TASK_STATE_WORKING: Task accepted and currently being processed
- TASK_STATE_COMPLETED: Task finished successfully with results
- TASK_STATE_FAILED: Task failed with error information
- TASK_STATE_CANCELLED: Task cancelled before completion
Each state transition is recorded with a timestamp and status message.
Creating A2A Tasks
Basic Task Creation
Create a new task with initial state:
package main
import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/timestamppb"
    "google.golang.org/protobuf/types/known/structpb"
)
func createA2ATask(contextID, taskType string, content []*pb.Part) *pb.Task {
    taskID := fmt.Sprintf("task_%s_%s", taskType, uuid.New().String())
    messageID := fmt.Sprintf("msg_%s", uuid.New().String())
    return &pb.Task{
        Id:        taskID,
        ContextId: contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: messageID,
                ContextId: contextID,
                TaskId:    taskID,
                Role:      pb.Role_USER,
                Content:   content,
                Metadata: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "task_type":      structpb.NewStringValue(taskType),
                        "submitted_by":   structpb.NewStringValue("user_agent"),
                        "priority":       structpb.NewStringValue("medium"),
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
        History:   []*pb.Message{},
        Artifacts: []*pb.Artifact{},
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "task_type":    structpb.NewStringValue(taskType),
                "created_at":   structpb.NewStringValue(time.Now().Format(time.RFC3339)),
                "expected_duration": structpb.NewStringValue("5m"),
            },
        },
    }
}
Task with Complex Requirements
Create tasks with detailed specifications:
func createComplexAnalysisTask(contextID string) *pb.Task {
    // Task configuration
    taskConfig, _ := structpb.NewStruct(map[string]interface{}{
        "algorithm":         "advanced_ml_analysis",
        "confidence_level":  0.95,
        "max_processing_time": "30m",
        "output_formats":    []string{"json", "csv", "visualization"},
        "quality_threshold": 0.9,
    })
    // Input data specification
    inputSpec, _ := structpb.NewStruct(map[string]interface{}{
        "dataset_id":       "customer_data_2024",
        "required_fields":  []string{"customer_id", "transaction_amount", "timestamp"},
        "date_range":       map[string]string{"start": "2024-01-01", "end": "2024-12-31"},
        "preprocessing":    true,
    })
    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Perform comprehensive customer behavior analysis on the specified dataset with advanced ML algorithms.",
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data:        taskConfig,
                    Description: "Analysis configuration parameters",
                },
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data:        inputSpec,
                    Description: "Input dataset specification",
                },
            },
        },
    }
    task := createA2ATask(contextID, "customer_analysis", content)
    // Add complex task metadata
    task.Metadata = &structpb.Struct{
        Fields: map[string]*structpb.Value{
            "task_type":           structpb.NewStringValue("customer_analysis"),
            "complexity":          structpb.NewStringValue("high"),
            "estimated_duration":  structpb.NewStringValue("30m"),
            "required_resources":  structpb.NewListValue(&structpb.ListValue{
                Values: []*structpb.Value{
                    structpb.NewStringValue("gpu_compute"),
                    structpb.NewStringValue("large_memory"),
                },
            }),
            "deliverables":        structpb.NewListValue(&structpb.ListValue{
                Values: []*structpb.Value{
                    structpb.NewStringValue("analysis_report"),
                    structpb.NewStringValue("customer_segments"),
                    structpb.NewStringValue("predictions"),
                },
            }),
        },
    }
    return task
}
Task State Transitions
Accepting a Task (SUBMITTED β WORKING)
When an agent accepts a task:
func acceptTask(task *pb.Task, agentID string) *pb.Task {
    // Create acceptance message
    acceptanceMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_accept_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task accepted by agent %s. Beginning processing.", agentID),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "accepting_agent": structpb.NewStringValue(agentID),
                "estimated_completion": structpb.NewStringValue(
                    time.Now().Add(15*time.Minute).Format(time.RFC3339),
                ),
            },
        },
    }
    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_WORKING,
        Update:    acceptanceMessage,
        Timestamp: timestamppb.Now(),
    }
    // Add to history
    task.History = append(task.History, acceptanceMessage)
    return task
}
Progress Updates (WORKING β WORKING)
Send progress updates during processing:
func sendProgressUpdate(task *pb.Task, progressPercentage int, currentPhase, details string) *pb.Task {
    // Create progress data
    progressData, _ := structpb.NewStruct(map[string]interface{}{
        "progress_percentage": progressPercentage,
        "current_phase":       currentPhase,
        "details":            details,
        "estimated_remaining": calculateRemainingTime(progressPercentage),
        "memory_usage_mb":     getCurrentMemoryUsage(),
        "cpu_usage_percent":   getCurrentCPUUsage(),
    })
    progressMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_progress_%s_%d", uuid.New().String(), progressPercentage),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Progress update: %d%% complete. Current phase: %s",
                        progressPercentage, currentPhase),
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        progressData,
                        Description: "Detailed progress information",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "update_type":         structpb.NewStringValue("progress"),
                "progress_percentage": structpb.NewNumberValue(float64(progressPercentage)),
                "phase":              structpb.NewStringValue(currentPhase),
            },
        },
    }
    // Update task status (still WORKING, but with new message)
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_WORKING,
        Update:    progressMessage,
        Timestamp: timestamppb.Now(),
    }
    // Add to history
    task.History = append(task.History, progressMessage)
    return task
}
func calculateRemainingTime(progressPercentage int) string {
    if progressPercentage <= 0 {
        return "unknown"
    }
    // Simplified estimation logic
    remainingMinutes := (100 - progressPercentage) * 15 / 100
    return fmt.Sprintf("%dm", remainingMinutes)
}
Completing a Task (WORKING β COMPLETED)
Complete a task with results:
func completeTask(task *pb.Task, results string, artifacts []*pb.Artifact) *pb.Task {
    // Create completion message
    completionMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_complete_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task completed successfully. %s", results),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "completion_status": structpb.NewStringValue("success"),
                "processing_time":   structpb.NewStringValue(
                    time.Since(getTaskStartTime(task)).String(),
                ),
                "artifact_count":    structpb.NewNumberValue(float64(len(artifacts))),
            },
        },
    }
    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_COMPLETED,
        Update:    completionMessage,
        Timestamp: timestamppb.Now(),
    }
    // Add completion message to history
    task.History = append(task.History, completionMessage)
    // Add artifacts
    task.Artifacts = append(task.Artifacts, artifacts...)
    return task
}
Handling Task Failures (WORKING β FAILED)
Handle task failures with detailed error information:
func failTask(task *pb.Task, errorMessage, errorCode string, errorDetails map[string]interface{}) *pb.Task {
    // Create error data
    errorData, _ := structpb.NewStruct(map[string]interface{}{
        "error_code":    errorCode,
        "error_message": errorMessage,
        "error_details": errorDetails,
        "failure_phase": getCurrentProcessingPhase(task),
        "retry_possible": determineRetryPossibility(errorCode),
        "diagnostic_info": map[string]interface{}{
            "memory_at_failure": getCurrentMemoryUsage(),
            "cpu_at_failure":   getCurrentCPUUsage(),
            "logs_reference":   getLogReference(),
        },
    })
    failureMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_failure_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task failed: %s (Code: %s)", errorMessage, errorCode),
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        errorData,
                        Description: "Detailed error information and diagnostics",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "failure_type":  structpb.NewStringValue("processing_error"),
                "error_code":    structpb.NewStringValue(errorCode),
                "retry_possible": structpb.NewBoolValue(determineRetryPossibility(errorCode)),
            },
        },
    }
    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_FAILED,
        Update:    failureMessage,
        Timestamp: timestamppb.Now(),
    }
    // Add failure message to history
    task.History = append(task.History, failureMessage)
    return task
}
func determineRetryPossibility(errorCode string) bool {
    // Determine if the error is retryable
    retryableErrors := []string{
        "TEMPORARY_RESOURCE_UNAVAILABLE",
        "NETWORK_TIMEOUT",
        "RATE_LIMIT_EXCEEDED",
    }
    for _, retryable := range retryableErrors {
        if errorCode == retryable {
            return true
        }
    }
    return false
}
Cancelling Tasks (ANY β CANCELLED)
Handle task cancellation:
func cancelTask(task *pb.Task, reason, cancelledBy string) *pb.Task {
    cancellationMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_cancel_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task cancelled: %s", reason),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "cancellation_reason": structpb.NewStringValue(reason),
                "cancelled_by":        structpb.NewStringValue(cancelledBy),
                "previous_state":      structpb.NewStringValue(task.GetStatus().GetState().String()),
            },
        },
    }
    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_CANCELLED,
        Update:    cancellationMessage,
        Timestamp: timestamppb.Now(),
    }
    // Add cancellation message to history
    task.History = append(task.History, cancellationMessage)
    return task
}
Publishing Task Updates
Using AgentHub Client
Publish task updates through the AgentHub broker:
import (
    "context"
    eventbus "github.com/owulveryck/agenthub/events/eventbus"
)
func publishTaskUpdate(ctx context.Context, client eventbus.AgentHubClient, task *pb.Task, fromAgent, toAgent string) error {
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: task,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: fromAgent,
            ToAgentId:   toAgent,
            EventType:   fmt.Sprintf("task.%s", task.GetStatus().GetState().String()),
            Priority:    getPriorityFromTaskState(task.GetStatus().GetState()),
        },
    })
    return err
}
func getPriorityFromTaskState(state pb.TaskState) eventbus.Priority {
    switch state {
    case pb.TaskState_TASK_STATE_FAILED:
        return eventbus.Priority_PRIORITY_HIGH
    case pb.TaskState_TASK_STATE_COMPLETED:
        return eventbus.Priority_PRIORITY_MEDIUM
    case pb.TaskState_TASK_STATE_WORKING:
        return eventbus.Priority_PRIORITY_LOW
    default:
        return eventbus.Priority_PRIORITY_MEDIUM
    }
}
Using A2A Abstractions
Use simplified A2A task management:
import (
    "github.com/owulveryck/agenthub/internal/agenthub"
)
func manageTaskWithA2A(ctx context.Context, subscriber *agenthub.A2ATaskSubscriber, task *pb.Task) error {
    // Process the task
    artifact, status, errorMsg := processTaskContent(ctx, task)
    switch status {
    case pb.TaskState_TASK_STATE_COMPLETED:
        return subscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
    case pb.TaskState_TASK_STATE_FAILED:
        return subscriber.FailA2ATask(ctx, task, errorMsg)
    default:
        return subscriber.UpdateA2ATaskProgress(ctx, task, 50, "Processing data", "Halfway complete")
    }
}
Task Monitoring and Querying
Get Task Status
Query task status and history:
func getTaskStatus(ctx context.Context, client eventbus.AgentHubClient, taskID string) (*pb.Task, error) {
    task, err := client.GetTask(ctx, &eventbus.GetTaskRequest{
        TaskId:        taskID,
        HistoryLength: 10, // Get last 10 messages
    })
    if err != nil {
        return nil, err
    }
    // Log current status
    log.Printf("Task %s status: %s", taskID, task.GetStatus().GetState().String())
    log.Printf("Last update: %s", task.GetStatus().GetUpdate().GetContent()[0].GetText())
    log.Printf("History length: %d messages", len(task.GetHistory()))
    log.Printf("Artifacts: %d", len(task.GetArtifacts()))
    return task, nil
}
List Tasks by Context
Get all tasks for a conversation context:
func getTasksInContext(ctx context.Context, client eventbus.AgentHubClient, contextID string) ([]*pb.Task, error) {
    response, err := client.ListTasks(ctx, &eventbus.ListTasksRequest{
        ContextId: contextID,
        States:    []pb.TaskState{}, // All states
        Limit:     100,
    })
    if err != nil {
        return nil, err
    }
    tasks := response.GetTasks()
    log.Printf("Found %d tasks in context %s", len(tasks), contextID)
    // Analyze task distribution
    stateCount := make(map[pb.TaskState]int)
    for _, task := range tasks {
        stateCount[task.GetStatus().GetState()]++
    }
    for state, count := range stateCount {
        log.Printf("  %s: %d tasks", state.String(), count)
    }
    return tasks, nil
}
Workflow Coordination
Sequential Task Workflow
Create dependent tasks that execute in sequence:
type TaskWorkflow struct {
    ContextID string
    Tasks     []*pb.Task
    Current   int
}
func (tw *TaskWorkflow) ExecuteNext(ctx context.Context, client eventbus.AgentHubClient) error {
    if tw.Current >= len(tw.Tasks) {
        return fmt.Errorf("workflow completed")
    }
    currentTask := tw.Tasks[tw.Current]
    // Add dependency metadata if not first task
    if tw.Current > 0 {
        previousTask := tw.Tasks[tw.Current-1]
        dependencyMetadata := map[string]interface{}{
            "depends_on":     previousTask.GetId(),
            "workflow_step":  tw.Current + 1,
            "total_steps":    len(tw.Tasks),
        }
        metadata, _ := structpb.NewStruct(dependencyMetadata)
        currentTask.Metadata = metadata
    }
    // Publish the task
    err := publishTaskUpdate(ctx, client, currentTask, "workflow_coordinator", "")
    if err != nil {
        return err
    }
    tw.Current++
    return nil
}
Parallel Task Execution
Execute multiple tasks concurrently:
func executeParallelTasks(ctx context.Context, client eventbus.AgentHubClient, tasks []*pb.Task) error {
    var wg sync.WaitGroup
    errors := make(chan error, len(tasks))
    for _, task := range tasks {
        wg.Add(1)
        go func(t *pb.Task) {
            defer wg.Done()
            // Add parallel execution metadata
            t.Metadata = &structpb.Struct{
                Fields: map[string]*structpb.Value{
                    "execution_mode": structpb.NewStringValue("parallel"),
                    "batch_id":       structpb.NewStringValue(uuid.New().String()),
                    "batch_size":     structpb.NewNumberValue(float64(len(tasks))),
                },
            }
            err := publishTaskUpdate(ctx, client, t, "parallel_coordinator", "")
            if err != nil {
                errors <- err
            }
        }(task)
    }
    wg.Wait()
    close(errors)
    // Check for errors
    for err := range errors {
        if err != nil {
            return err
        }
    }
    return nil
}
Best Practices
1. Always Update Task Status
// Update status for every significant state change
task = acceptTask(task, agentID)
publishTaskUpdate(ctx, client, task, agentID, "")
2. Provide Meaningful Progress Updates
// Send regular progress updates during long-running tasks
for progress := 10; progress <= 90; progress += 10 {
    task = sendProgressUpdate(task, progress, currentPhase, details)
    publishTaskUpdate(ctx, client, task, agentID, "")
    time.Sleep(processingInterval)
}
errorDetails := map[string]interface{}{
    "input_validation_errors": validationErrors,
    "system_resources":        resourceSnapshot,
    "retry_strategy":         "exponential_backoff",
}
task = failTask(task, "Data validation failed", "INVALID_INPUT", errorDetails)
4. Maintain Complete Message History
// Always append to history, never replace
task.History = append(task.History, statusMessage)
// Include context for debugging and monitoring
metadata := map[string]interface{}{
    "processing_node":  hostname,
    "resource_usage":   resourceMetrics,
    "performance_metrics": performanceData,
}
This guide covered the complete A2A task lifecycle management. You now have the tools to create, manage, and coordinate complex task workflows with proper state management and comprehensive observability.
2.4 - Debugging
Troubleshooting and debugging guides for AgentHub
Debugging How-to Guides
Practical troubleshooting guides to help you diagnose and resolve issues in your AgentHub deployments.
Available Guides
2.4.1 - How to Debug Agent Issues
Practical steps for troubleshooting common issues when developing and deploying agents with AgentHub.
How to Debug Agent Issues
This guide provides practical steps for troubleshooting common issues when developing and deploying agents with AgentHub.
Common Connection Issues
Problem: Agent Can’t Connect to Broker
Symptoms:
Failed to connect: connection refused
Solutions:
- Check if broker is running: - # Check if broker process is running
ps aux | grep broker
# Check if port 50051 is listening
netstat -tlnp | grep 50051
# or
lsof -i :50051
 
- Verify broker address and configuration: - // Using unified abstraction - configuration via environment or code
config := agenthub.NewGRPCConfig("subscriber")
config.BrokerAddr = "localhost"  // Default
config.BrokerPort = "50051"      // Default
// Or set via environment variables:
// export AGENTHUB_BROKER_ADDR="localhost"
// export AGENTHUB_BROKER_PORT="50051"
 
- Check firewall settings: - # On Linux, check if port is blocked
sudo ufw status
# Allow port if needed
sudo ufw allow 50051
 
Problem: TLS/SSL Errors
Symptoms:
transport: authentication handshake failed
Solution:
The unified abstraction handles TLS configuration automatically:
// TLS and connection management handled automatically
config := agenthub.NewGRPCConfig("subscriber")
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    panic(err)
}
Task Processing Issues
Problem: Agent Not Receiving Tasks
Debug Steps:
- Check subscription logs: - log.Printf("Agent %s subscribing to tasks...", agentID)
// Should see: "Successfully subscribed to tasks for agent {agentID}"
 
- Verify agent ID matching: - // In publisher
ResponderAgentId: "my_processing_agent"
// In subscriber (must match exactly)
const agentID = "my_processing_agent"
 
- Check task type filtering: - req := &pb.SubscribeToTasksRequest{
    AgentId: agentID,
    TaskTypes: []string{"math_calculation"}, // Remove to receive all types
}
 
- Monitor broker logs: - # Broker should show:
Received task request: task_xyz (type: math) from agent: publisher_agent
# And either:
No subscribers for task from agent 'publisher_agent'  # Bad - no matching agents
# Or task routing to subscribers  # Good - task delivered
 
Problem: Tasks Timing Out
Debug Steps:
- Check task processing time: - func processTask(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    start := time.Now()
    defer func() {
        log.Printf("Task %s took %v to process", task.GetTaskId(), time.Since(start))
    }()
    // Your processing logic
}
 
- Add timeout handling: - func processTaskWithTimeout(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    // Create timeout context
    taskCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()
    // Process with timeout
    select {
    case <-taskCtx.Done():
        if taskCtx.Err() == context.DeadlineExceeded {
            sendResult(ctx, task, nil, pb.TaskStatus_TASK_STATUS_FAILED, "Task timeout", client)
        }
        return
    default:
        // Process normally
    }
}
 
- Monitor progress updates: - // Send progress every few seconds
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
go func() {
    progress := 0
    for range ticker.C {
        progress += 10
        if progress > 100 {
            return
        }
        sendProgress(ctx, task, int32(progress), "Still processing...", client)
    }
}()
 
Message Serialization Issues
Problem: Parameter Marshaling Errors
Symptoms:
Error creating parameters struct: proto: invalid value type
Solution:
Ensure all parameter values are compatible with structpb:
// Bad - channels, functions, complex types not supported
params := map[string]interface{}{
    "callback": func() {},  // Not supported
    "channel": make(chan int),  // Not supported
}
// Good - basic types only
params := map[string]interface{}{
    "name": "value",           // string
    "count": 42,               // number
    "enabled": true,           // boolean
    "items": []string{"a", "b"}, // array
    "config": map[string]interface{}{ // nested object
        "timeout": 30,
    },
}
Problem: Result Unmarshaling Issues
Debug Steps:
- Check result structure: - func handleTaskResult(result *pb.TaskResult) {
    log.Printf("Raw result: %+v", result.GetResult())
    resultMap := result.GetResult().AsMap()
    log.Printf("Result as map: %+v", resultMap)
    // Type assert carefully
    if value, ok := resultMap["count"].(float64); ok {
        log.Printf("Count: %f", value)
    } else {
        log.Printf("Count field missing or wrong type: %T", resultMap["count"])
    }
}
 
- Handle type conversion safely: - func getStringField(m map[string]interface{}, key string) (string, error) {
    if val, ok := m[key]; ok {
        if str, ok := val.(string); ok {
            return str, nil
        }
        return "", fmt.Errorf("field %s is not a string: %T", key, val)
    }
    return "", fmt.Errorf("field %s not found", key)
}
func getNumberField(m map[string]interface{}, key string) (float64, error) {
    if val, ok := m[key]; ok {
        if num, ok := val.(float64); ok {
            return num, nil
        }
        return 0, fmt.Errorf("field %s is not a number: %T", key, val)
    }
    return 0, fmt.Errorf("field %s not found", key)
}
 
Stream and Connection Issues
Problem: Stream Disconnections
Symptoms:
Error receiving task: rpc error: code = Unavailable desc = connection error
Solutions:
- Implement retry logic: - func subscribeToTasksWithRetry(ctx context.Context, client pb.EventBusClient) {
    for {
        err := subscribeToTasks(ctx, client)
        if err != nil {
            log.Printf("Subscription error: %v, retrying in 5 seconds...", err)
            time.Sleep(5 * time.Second)
            continue
        }
        break
    }
}
 
- Handle context cancellation: - for {
    task, err := stream.Recv()
    if err == io.EOF {
        log.Printf("Stream closed by server")
        return
    }
    if err != nil {
        if ctx.Err() != nil {
            log.Printf("Context cancelled: %v", ctx.Err())
            return
        }
        log.Printf("Stream error: %v", err)
        return
    }
    // Process task
}
 
Problem: Memory Leaks in Long-Running Agents
Debug Steps:
- Monitor memory usage: - # Check memory usage
ps -o pid,ppid,cmd,%mem,%cpu -p $(pgrep -f "your-agent")
# Continuous monitoring
watch -n 5 'ps -o pid,ppid,cmd,%mem,%cpu -p $(pgrep -f "your-agent")'
 
- Profile memory usage: - import _ "net/http/pprof"
import "net/http"
func main() {
    // Start pprof server
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    // Your agent code
}
 
- Access profiles at - http://localhost:6060/debug/pprof/
 
- Check for goroutine leaks: - import "runtime"
func logGoroutines() {
    ticker := time.NewTicker(30 * time.Second)
    go func() {
        for range ticker.C {
            log.Printf("Goroutines: %d", runtime.NumGoroutine())
        }
    }()
}
 
Problem: Slow Task Processing
Debug Steps:
- Add timing measurements: - func processTask(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    timings := make(map[string]time.Duration)
    start := time.Now()
    // Phase 1: Parameter validation
    timings["validation"] = time.Since(start)
    last := time.Now()
    // Phase 2: Business logic
    // ... your logic here ...
    timings["processing"] = time.Since(last)
    last = time.Now()
    // Phase 3: Result formatting
    // ... result creation ...
    timings["formatting"] = time.Since(last)
    log.Printf("Task %s timings: %+v", task.GetTaskId(), timings)
}
 
- Profile CPU usage: - import "runtime/pprof"
import "os"
func startCPUProfile() func() {
    f, err := os.Create("cpu.prof")
    if err != nil {
        log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
    return func() {
        pprof.StopCPUProfile()
        f.Close()
    }
}
func main() {
    stop := startCPUProfile()
    defer stop()
    // Your agent code
}
 
- Monitor queue sizes: - type Agent struct {
    taskQueue chan *pb.TaskMessage
}
func (a *Agent) logQueueSize() {
    ticker := time.NewTicker(10 * time.Second)
    go func() {
        for range ticker.C {
            log.Printf("Task queue size: %d/%d", len(a.taskQueue), cap(a.taskQueue))
        }
    }()
}
 
1. Enable Verbose Logging
import "log"
import "os"
func init() {
    // Enable verbose logging
    log.SetFlags(log.LstdFlags | log.Lshortfile)
    // Set log level from environment
    if os.Getenv("DEBUG") == "true" {
        log.SetOutput(os.Stdout)
    }
}
2. Add Structured Logging
import "encoding/json"
import "time"
type LogEntry struct {
    Timestamp string                 `json:"timestamp"`
    Level     string                 `json:"level"`
    AgentID   string                 `json:"agent_id"`
    TaskID    string                 `json:"task_id,omitempty"`
    Message   string                 `json:"message"`
    Data      map[string]interface{} `json:"data,omitempty"`
}
func logInfo(agentID, taskID, message string, data map[string]interface{}) {
    entry := LogEntry{
        Timestamp: time.Now().Format(time.RFC3339),
        Level:     "INFO",
        AgentID:   agentID,
        TaskID:    taskID,
        Message:   message,
        Data:      data,
    }
    if jsonData, err := json.Marshal(entry); err == nil {
        log.Println(string(jsonData))
    }
}
3. Health Check Endpoint
import "net/http"
import "encoding/json"
type HealthStatus struct {
    Status       string    `json:"status"`
    AgentID      string    `json:"agent_id"`
    Uptime       string    `json:"uptime"`
    TasksProcessed int64   `json:"tasks_processed"`
    LastTaskTime  time.Time `json:"last_task_time"`
}
func startHealthServer(agent *Agent) {
    http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
        status := HealthStatus{
            Status:         "healthy",
            AgentID:        agent.ID,
            Uptime:         time.Since(agent.StartTime).String(),
            TasksProcessed: agent.TasksProcessed,
            LastTaskTime:   agent.LastTaskTime,
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(status)
    })
    log.Printf("Health server starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", nil))
}
4. Task Tracing
import "context"
type TraceID string
func withTraceID(ctx context.Context) context.Context {
    traceID := TraceID(fmt.Sprintf("trace-%d", time.Now().UnixNano()))
    return context.WithValue(ctx, "trace_id", traceID)
}
func getTraceID(ctx context.Context) TraceID {
    if traceID, ok := ctx.Value("trace_id").(TraceID); ok {
        return traceID
    }
    return ""
}
func processTaskWithTracing(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    ctx = withTraceID(ctx)
    traceID := getTraceID(ctx)
    log.Printf("[%s] Starting task %s", traceID, task.GetTaskId())
    defer log.Printf("[%s] Finished task %s", traceID, task.GetTaskId())
    // Your processing logic with trace ID logging
}
Common Error Patterns
1. Resource Exhaustion
Signs:
- Tasks start failing after running for a while
- Memory usage continuously increases
- File descriptor limits reached
Solutions:
- Implement proper resource cleanup
- Add connection pooling
- Set task processing limits
2. Deadlocks
Signs:
- Agent stops processing tasks
- Health checks show agent as “stuck”
Solutions:
- Avoid blocking operations in main goroutines
- Use timeouts for all operations
- Implement deadlock detection
3. Race Conditions
Signs:
- Intermittent task failures
- Inconsistent behavior
- Data corruption
Solutions:
- Use proper synchronization primitives
- Run race detector: go run -race your-agent.go
- Add mutex protection for shared state
With these debugging techniques, you should be able to identify and resolve most agent-related issues efficiently.
3 - Reference
Comprehensive technical documentation and API specifications
Reference Documentation
This section contains comprehensive technical documentation for all AgentHub components, APIs, and configuration options. Use this as your authoritative source for implementation details.
π Documentation Sections
- Configuration - Complete configuration options and settings
- API Reference - gRPC APIs, unified abstractions, and tracing interfaces
- Observability - Metrics, health endpoints, and monitoring
- Tasks - Task message specifications and data structures
π― How to Use This Reference
- Accuracy: All information is kept up-to-date with the latest version
- Completeness: Every public API and configuration option is documented
- Examples: Code examples illustrate usage where helpful
- Structure: Information is organized by component and function
π Quick Navigation
By Component
By Use Case
Note
This reference documentation describes the current stable version. For experimental features, check the explanation section or source code.3.1 - Configuration
Configuration reference and settings documentation
Configuration Reference
This section provides comprehensive documentation for all AgentHub configuration options, environment variables, and settings.
Available Documentation
3.1.1 - Environment Variables Reference
Complete reference for all environment variables used by AgentHub’s unified abstractions for configuration and observability.
Environment Variables Reference
This reference documents all environment variables used by AgentHub’s unified abstraction system. All components automatically load these variables for configuration.
Core Configuration
Broker Connection
| Variable | Default | Description | Used By | 
|---|
| AGENTHUB_BROKER_ADDR | localhost | Broker server hostname or IP address | Agents | 
| AGENTHUB_BROKER_PORT | 50051 | Broker gRPC port number | Agents | 
| AGENTHUB_GRPC_PORT | :50051 | Server listen address (for broker) | Broker | 
Example:
export AGENTHUB_BROKER_ADDR="production-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"
Health Monitoring
| Variable | Default | Description | Used By | 
|---|
| BROKER_HEALTH_PORT | 8080 | Broker health check endpoint port | Broker | 
| PUBLISHER_HEALTH_PORT | 8081 | Publisher health check endpoint port | Publishers | 
| SUBSCRIBER_HEALTH_PORT | 8082 | Subscriber health check endpoint port | Subscribers | 
Health Endpoints Available:
- http://localhost:8080/health- Health check
- http://localhost:8080/metrics- Prometheus metrics
- http://localhost:8080/ready- Readiness check
Example:
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"
Observability Configuration
Distributed Tracing
| Variable | Default | Description | Used By | 
|---|
| JAEGER_ENDPOINT | 127.0.0.1:4317 | Jaeger OTLP endpoint for traces | All components | 
| SERVICE_NAME | agenthub-service | Service name for tracing | All components | 
| SERVICE_VERSION | 1.0.0 | Service version for telemetry | All components | 
Example:
export JAEGER_ENDPOINT="http://jaeger.example.com:14268/api/traces"
export SERVICE_NAME="my-agenthub-app"
export SERVICE_VERSION="2.1.0"
Jaeger Integration:
- When JAEGER_ENDPOINTis set: Automatic tracing enabled
- When empty or unset: Tracing disabled (minimal overhead)
- Supports both gRPC (4317) and HTTP (14268) endpoints
Metrics Collection
| Variable | Default | Description | Used By | 
|---|
| PROMETHEUS_PORT | 9090 | Prometheus server port | Observability stack | 
| GRAFANA_PORT | 3333 | Grafana dashboard port | Observability stack | 
| ALERTMANAGER_PORT | 9093 | AlertManager port | Observability stack | 
Example:
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
export ALERTMANAGER_PORT="9093"
OpenTelemetry Collector
| Variable | Default | Description | Used By | 
|---|
| OTLP_GRPC_PORT | 4320 | OTLP Collector gRPC port | Observability stack | 
| OTLP_HTTP_PORT | 4321 | OTLP Collector HTTP port | Observability stack | 
Example:
export OTLP_GRPC_PORT="4320"
export OTLP_HTTP_PORT="4321"
Service Configuration
General Settings
| Variable | Default | Description | Used By | 
|---|
| ENVIRONMENT | development | Deployment environment | All components | 
| LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARN, ERROR) | All components | 
Example:
export ENVIRONMENT="production"
export LOG_LEVEL="WARN"
Environment-Specific Configurations
Development Environment
# .envrc for development
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"
# Health ports
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"
# Observability (local stack)
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
# Service metadata
export SERVICE_NAME="agenthub-dev"
export SERVICE_VERSION="dev"
export ENVIRONMENT="development"
export LOG_LEVEL="DEBUG"
Staging Environment
# .envrc for staging
export AGENTHUB_BROKER_ADDR="staging-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"
# Health ports (non-conflicting)
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"
# Observability (staging stack)
export JAEGER_ENDPOINT="http://staging-jaeger.example.com:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
# Service metadata
export SERVICE_NAME="agenthub-staging"
export SERVICE_VERSION="1.2.0-rc1"
export ENVIRONMENT="staging"
export LOG_LEVEL="INFO"
Production Environment
# .envrc for production
export AGENTHUB_BROKER_ADDR="prod-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"
# Health ports
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"
# Observability (production stack)
export JAEGER_ENDPOINT="http://jaeger.prod.example.com:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
export ALERTMANAGER_PORT="9093"
# Service metadata
export SERVICE_NAME="agenthub-prod"
export SERVICE_VERSION="1.2.0"
export ENVIRONMENT="production"
export LOG_LEVEL="WARN"
Configuration Loading
Automatic Loading by Unified Abstractions
The unified abstractions automatically load environment variables:
// Automatic configuration loading
config := agenthub.NewGRPCConfig("my-component")
// Results in:
// config.BrokerAddr = "localhost:50051" (AGENTHUB_BROKER_ADDR + AGENTHUB_BROKER_PORT)
// config.ServerAddr = ":50051" (AGENTHUB_GRPC_PORT)
// config.HealthPort = "8080" (BROKER_HEALTH_PORT)
// config.ComponentName = "my-component" (from parameter)
Using direnv (Recommended)
- Install direnv: https://direnv.net/docs/installation.html 
- Create .envrc file: - # Create .envrc in project root
cat > .envrc << 'EOF'
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export SERVICE_NAME="my-agenthub-app"
EOF
 
- Allow direnv: 
- Automatic loading: Variables load automatically when entering directory 
Manual Loading
# Source variables manually
source .envrc
# Or set individually
export AGENTHUB_BROKER_ADDR="localhost"
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
Configuration Validation
Required Variables
Minimal configuration (all have defaults):
- No variables are strictly required
- Defaults work for local development
Production recommendations:
- Set JAEGER_ENDPOINTfor tracing
- Set SERVICE_NAMEfor identification
- Set ENVIRONMENTto “production”
- Configure unique health ports if running multiple services
Configuration Verification
Check loaded configuration:
config := agenthub.NewGRPCConfig("test")
fmt.Printf("Broker: %s\n", config.BrokerAddr)
fmt.Printf("Health: %s\n", config.HealthPort)
fmt.Printf("Component: %s\n", config.ComponentName)
Verify health endpoints:
# Check if configuration is working
curl http://localhost:8080/health
curl http://localhost:8081/health  # Publisher
curl http://localhost:8082/health  # Subscriber
Verify tracing:
- Open Jaeger UI: http://localhost:16686
- Look for traces from your service name
- Check spans are being created
Common Patterns
Docker Compose
# docker-compose.yml
version: '3.8'
services:
  broker:
    build: .
    command: go run broker/main.go
    environment:
      - AGENTHUB_GRPC_PORT=:50051
      - BROKER_HEALTH_PORT=8080
      - JAEGER_ENDPOINT=http://jaeger:14268/api/traces
      - SERVICE_NAME=agenthub-broker
    ports:
      - "50051:50051"
      - "8080:8080"
  publisher:
    build: .
    command: go run agents/publisher/main.go
    environment:
      - AGENTHUB_BROKER_ADDR=broker
      - AGENTHUB_BROKER_PORT=50051
      - PUBLISHER_HEALTH_PORT=8081
      - JAEGER_ENDPOINT=http://jaeger:14268/api/traces
      - SERVICE_NAME=agenthub-publisher
    ports:
      - "8081:8081"
Kubernetes ConfigMap
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: agenthub-config
data:
  AGENTHUB_BROKER_ADDR: "agenthub-broker.default.svc.cluster.local"
  AGENTHUB_BROKER_PORT: "50051"
  JAEGER_ENDPOINT: "http://jaeger.observability.svc.cluster.local:14268/api/traces"
  SERVICE_NAME: "agenthub-k8s"
  SERVICE_VERSION: "1.0.0"
  ENVIRONMENT: "production"
  LOG_LEVEL: "INFO"
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agenthub-publisher
spec:
  template:
    spec:
      containers:
      - name: publisher
        image: agenthub:latest
        envFrom:
        - configMapRef:
            name: agenthub-config
        env:
        - name: PUBLISHER_HEALTH_PORT
          value: "8080"
Troubleshooting
Common Issues
| Problem | Solution | 
|---|
| Agent can’t connect to broker | Check AGENTHUB_BROKER_ADDRandAGENTHUB_BROKER_PORT | 
| Health endpoint not accessible | Verify *_HEALTH_PORTvariables and port availability | 
| No traces in Jaeger | Set JAEGER_ENDPOINTand ensure Jaeger is running | 
| Port conflicts | Use different ports for each component’s health endpoints | 
| Configuration not loading | Ensure variables are exported, check with printenv | 
Debug Configuration
Check environment variables:
# List all AgentHub variables
printenv | grep AGENTHUB
# List all observability variables
printenv | grep -E "(JAEGER|SERVICE|PROMETHEUS|GRAFANA)"
# Check specific variable
echo $AGENTHUB_BROKER_ADDR
Test configuration:
# Quick test with temporary override
AGENTHUB_BROKER_ADDR=test-broker go run agents/publisher/main.go
# Verify health endpoint responds
curl -f http://localhost:8080/health || echo "Health check failed"
Configuration Precedence
- Environment variables (highest priority)
- Default values (from code)
Example: If AGENTHUB_BROKER_ADDR is set, it overrides the default “localhost”
This environment variable reference provides comprehensive documentation for configuring AgentHub using the unified abstraction system. For practical usage examples, see the Installation and Setup Tutorial and Configuration Reference.
3.1.2 - Configuration Reference
Comprehensive reference for configuring AgentHub components using the unified abstraction library with environment-based configuration.
Configuration Reference
This document provides comprehensive reference for configuring AgentHub components using the unified abstraction library with environment-based configuration.
Unified Abstraction Configuration
AgentHub uses environment variables for all configuration with the unified abstraction library providing automatic configuration setup.
Core Environment Variables
gRPC Connection Configuration
| Variable | Default | Description | 
|---|
| AGENTHUB_BROKER_ADDR | localhost | Broker server hostname or IP address | 
| AGENTHUB_BROKER_PORT | 50051 | Broker gRPC port number | 
| AGENTHUB_GRPC_PORT | :50051 | Server listen address (for broker) | 
Note: The unified abstraction automatically combines AGENTHUB_BROKER_ADDR and AGENTHUB_BROKER_PORT into a complete broker address (e.g., localhost:50051).
Health Monitoring Configuration
| Variable | Default | Description | 
|---|
| BROKER_HEALTH_PORT | 8080 | Broker health check endpoint port | 
| PUBLISHER_HEALTH_PORT | 8081 | Publisher health check endpoint port | 
| SUBSCRIBER_HEALTH_PORT | 8082 | Subscriber health check endpoint port | 
Observability Configuration
| Variable | Default | Description | 
|---|
| JAEGER_ENDPOINT | 127.0.0.1:4317 | Jaeger OTLP endpoint for distributed tracing | 
| PROMETHEUS_PORT | 9090 | Prometheus metrics collection port | 
| GRAFANA_PORT | 3333 | Grafana dashboard web interface port | 
| ALERTMANAGER_PORT | 9093 | AlertManager web interface port | 
| OTLP_GRPC_PORT | 4320 | OpenTelemetry Collector gRPC port | 
| OTLP_HTTP_PORT | 4321 | OpenTelemetry Collector HTTP port | 
| Variable | Default | Description | 
|---|
| SERVICE_VERSION | 1.0.0 | Service version for telemetry and observability | 
| ENVIRONMENT | development | Deployment environment (development, staging, production) | 
A2A Protocol Configuration
| Variable | Default | Description | 
|---|
| AGENTHUB_MESSAGE_BUFFER_SIZE | 100 | Buffer size for A2A message processing | 
| AGENTHUB_TASK_UPDATE_INTERVAL | 1s | Interval for publishing task status updates | 
| AGENTHUB_ARTIFACT_MAX_SIZE | 10MB | Maximum size for task artifacts | 
| AGENTHUB_CONTEXT_TIMEOUT | 30s | Timeout for A2A message context | 
| AGENTHUB_A2A_PROTOCOL_VERSION | 1.0 | A2A protocol version for compatibility | 
| AGENTHUB_MESSAGE_HISTORY_LIMIT | 50 | Maximum message history per task | 
Unified Abstraction Usage
Using Configuration with the Unified Abstraction
The unified abstraction library automatically loads configuration from environment variables:
// Create configuration from environment variables
config := agenthub.NewGRPCConfig("my-component")
// Configuration is automatically populated:
// - config.BrokerAddr: "localhost:50051" (combined from AGENTHUB_BROKER_ADDR + AGENTHUB_BROKER_PORT)
// - config.ServerAddr: ":50051" (from AGENTHUB_GRPC_PORT)
// - config.HealthPort: "8080" (from BROKER_HEALTH_PORT)
// - config.ComponentName: "my-component" (from parameter)
Environment Variable Loading
The recommended way to load environment variables:
Option 1: Using direnv (recommended)
# Place variables in .envrc file
direnv allow
Option 2: Source manually
Option 3: Set individual variables
export AGENTHUB_BROKER_ADDR=localhost
export AGENTHUB_BROKER_PORT=50051
export JAEGER_ENDPOINT=127.0.0.1:4317
Configuration Override Examples
You can override defaults by setting environment variables before running:
# Use different broker address
export AGENTHUB_BROKER_ADDR=remote-broker.example.com
export AGENTHUB_BROKER_PORT=9090
go run broker/main.go
# Use different health ports to avoid conflicts
export BROKER_HEALTH_PORT=8083
export PUBLISHER_HEALTH_PORT=8084
export SUBSCRIBER_HEALTH_PORT=8085
go run agents/publisher/main.go
# Use custom observability endpoints
export JAEGER_ENDPOINT=jaeger.example.com:4317
export PROMETHEUS_PORT=9091
go run broker/main.go
Configuration Best Practices
- Use .envrc for Development: Keep all environment variables in .envrcfor consistent development experience
- Override Selectively: Only override specific variables when needed, use defaults otherwise
- Environment-Specific Configs: Use different variable values for development, staging, and production
- Health Port Management: Use different health ports for each component to avoid conflicts
- Observability Integration: Always configure observability endpoints for production deployments
Legacy Configuration Migration
If migrating from previous versions of AgentHub:
Old Configuration Pattern:
// Manual server setup (deprecated)
lis, err := net.Listen("tcp", ":50051")
server := grpc.NewServer()
// ... extensive setup code
New Unified Abstraction Pattern:
// Automatic configuration from environment
config := agenthub.NewGRPCConfig("broker")
server, err := agenthub.NewAgentHubServer(config)
service := agenthub.NewAgentHubService(server)
pb.RegisterAgentHubServer(server.Server, service)
server.Start(ctx)
Command-Line Usage
Basic Commands
The unified abstraction provides simplified command execution:
agenthub-server [OPTIONS]
Options:
  -port int
        Server port (default 50051)
  -host string
        Server host (default "0.0.0.0")
  -config string
        Configuration file path
  -log-level string
        Log level: debug, info, warn, error (default "info")
  -log-file string
        Log file path (default: stdout)
  -max-connections int
        Maximum concurrent connections (default 1000)
  -channel-buffer-size int
        Channel buffer size (default 10)
  -help
        Show help message
  -version
        Show version information
Configuration File
The broker can also be configured using a YAML configuration file:
# agenthub.yaml
server:
  host: "0.0.0.0"
  port: 50051
  max_connections: 1000
  timeout: "30s"
logging:
  level: "info"
  format: "json"
  file: "/var/log/agenthub/broker.log"
performance:
  channel_buffer_size: 10
  max_message_size: "4MB"
  keepalive_time: "30s"
  keepalive_timeout: "5s"
limits:
  max_agents: 10000
  max_tasks_per_agent: 100
  memory_limit: "1GB"
security:
  tls_enabled: false
  cert_file: ""
  key_file: ""
  ca_file: ""
Loading Configuration:
agenthub-server -config /path/to/agenthub.yaml
Agent Configuration
Environment Variables
Agents can be configured using environment variables:
Connection Configuration
| Variable | Default | Description | 
|---|
| AGENTHUB_BROKER_ADDRESS | localhost:50051 | Broker server address | 
| AGENTHUB_AGENT_ID | Generated | Unique agent identifier | 
| AGENTHUB_CONNECTION_TIMEOUT | 10s | Connection timeout | 
| AGENTHUB_RETRY_ATTEMPTS | 3 | Connection retry attempts | 
| AGENTHUB_RETRY_DELAY | 1s | Delay between retries | 
Task Processing Configuration
| Variable | Default | Description | 
|---|
| AGENTHUB_MAX_CONCURRENT_TASKS | 5 | Maximum concurrent task processing | 
| AGENTHUB_TASK_TIMEOUT | 300s | Default task timeout | 
| AGENTHUB_PROGRESS_INTERVAL | 5s | Progress reporting interval | 
| AGENTHUB_TASK_TYPES | "" | Comma-separated list of supported task types | 
Logging Configuration
| Variable | Default | Description | 
|---|
| AGENTHUB_AGENT_LOG_LEVEL | info | Agent logging level | 
| AGENTHUB_AGENT_LOG_FORMAT | text | Agent log format | 
| AGENTHUB_AGENT_LOG_FILE | "" | Agent log file path | 
Agent Configuration Examples
Publisher Configuration
package main
import (
    "os"
    "strconv"
    "time"
)
type PublisherConfig struct {
    BrokerAddress    string
    AgentID          string
    ConnectionTimeout time.Duration
    RetryAttempts    int
    RetryDelay       time.Duration
    LogLevel         string
}
func LoadPublisherConfig() *PublisherConfig {
    config := &PublisherConfig{
        BrokerAddress:    getEnv("AGENTHUB_BROKER_ADDRESS", "localhost:50051"),
        AgentID:          getEnv("AGENTHUB_AGENT_ID", generateAgentID()),
        ConnectionTimeout: getDuration("AGENTHUB_CONNECTION_TIMEOUT", "10s"),
        RetryAttempts:    getInt("AGENTHUB_RETRY_ATTEMPTS", 3),
        RetryDelay:       getDuration("AGENTHUB_RETRY_DELAY", "1s"),
        LogLevel:         getEnv("AGENTHUB_AGENT_LOG_LEVEL", "info"),
    }
    return config
}
func getEnv(key, defaultValue string) string {
    if value := os.Getenv(key); value != "" {
        return value
    }
    return defaultValue
}
func getInt(key string, defaultValue int) int {
    if value := os.Getenv(key); value != "" {
        if i, err := strconv.Atoi(value); err == nil {
            return i
        }
    }
    return defaultValue
}
func getDuration(key string, defaultValue string) time.Duration {
    if value := os.Getenv(key); value != "" {
        if d, err := time.ParseDuration(value); err == nil {
            return d
        }
    }
    d, _ := time.ParseDuration(defaultValue)
    return d
}
Subscriber Configuration
type SubscriberConfig struct {
    BrokerAddress      string
    AgentID            string
    MaxConcurrentTasks int
    TaskTimeout        time.Duration
    ProgressInterval   time.Duration
    SupportedTaskTypes []string
    LogLevel           string
}
func LoadSubscriberConfig() *SubscriberConfig {
    taskTypesStr := getEnv("AGENTHUB_TASK_TYPES", "")
    var taskTypes []string
    if taskTypesStr != "" {
        taskTypes = strings.Split(taskTypesStr, ",")
        for i, taskType := range taskTypes {
            taskTypes[i] = strings.TrimSpace(taskType)
        }
    }
    config := &SubscriberConfig{
        BrokerAddress:      getEnv("AGENTHUB_BROKER_ADDRESS", "localhost:50051"),
        AgentID:            getEnv("AGENTHUB_AGENT_ID", generateAgentID()),
        MaxConcurrentTasks: getInt("AGENTHUB_MAX_CONCURRENT_TASKS", 5),
        TaskTimeout:        getDuration("AGENTHUB_TASK_TIMEOUT", "300s"),
        ProgressInterval:   getDuration("AGENTHUB_PROGRESS_INTERVAL", "5s"),
        SupportedTaskTypes: taskTypes,
        LogLevel:           getEnv("AGENTHUB_AGENT_LOG_LEVEL", "info"),
    }
    return config
}
Agent Configuration File
Agents can also use configuration files:
# agent.yaml
agent:
  id: "data_processor_001"
  broker_address: "broker.example.com:50051"
  connection_timeout: "10s"
  retry_attempts: 3
  retry_delay: "1s"
task_processing:
  max_concurrent_tasks: 5
  task_timeout: "300s"
  progress_interval: "5s"
  supported_task_types:
    - "data_analysis"
    - "data_transformation"
    - "data_validation"
logging:
  level: "info"
  format: "json"
  file: "/var/log/agenthub/agent.log"
health:
  port: 8080
  endpoint: "/health"
  check_interval: "30s"
Security Configuration
TLS Configuration
Broker TLS Setup
# broker configuration
security:
  tls_enabled: true
  cert_file: "/etc/agenthub/certs/server.crt"
  key_file: "/etc/agenthub/certs/server.key"
  ca_file: "/etc/agenthub/certs/ca.crt"
  client_auth: "require_and_verify"
Agent TLS Setup
// Agent TLS connection
func createTLSConnection(address string) (*grpc.ClientConn, error) {
    config := &tls.Config{
        ServerName: "agenthub-broker",
        // Load client certificates if needed
    }
    creds := credentials.NewTLS(config)
    conn, err := grpc.Dial(address, grpc.WithTransportCredentials(creds))
    if err != nil {
        return nil, fmt.Errorf("failed to connect with TLS: %v", err)
    }
    return conn, nil
}
Authentication Configuration
JWT Authentication
# broker configuration
security:
  auth_enabled: true
  auth_method: "jwt"
  jwt_secret: "your-secret-key"
  jwt_issuer: "agenthub-broker"
  jwt_expiry: "24h"
// Agent authentication
type AuthenticatedAgent struct {
    client   pb.AgentHubClient
    token    string
    agentID  string
}
func (a *AuthenticatedAgent) authenticate() error {
    // Add authentication token to context
    ctx := metadata.AppendToOutgoingContext(context.Background(),
        "authorization", "Bearer "+a.token)
    // Use authenticated context for A2A requests
    _, err := a.client.PublishMessage(ctx, request)
    return err
}
Production Configuration Examples
# production-broker.yaml
server:
  host: "0.0.0.0"
  port: 50051
  max_connections: 5000
  timeout: "60s"
performance:
  channel_buffer_size: 50
  max_message_size: "16MB"
  keepalive_time: "10s"
  keepalive_timeout: "3s"
limits:
  max_agents: 50000
  max_tasks_per_agent: 500
  memory_limit: "8GB"
logging:
  level: "warn"
  format: "json"
  file: "/var/log/agenthub/broker.log"
security:
  tls_enabled: true
  cert_file: "/etc/ssl/certs/agenthub.crt"
  key_file: "/etc/ssl/private/agenthub.key"
Cluster Agent Configuration
# cluster-agent.yaml
agent:
  id: "${HOSTNAME}_${POD_ID}"
  broker_address: "agenthub-broker.agenthub.svc.cluster.local:50051"
  connection_timeout: "15s"
  retry_attempts: 5
  retry_delay: "2s"
task_processing:
  max_concurrent_tasks: 10
  task_timeout: "1800s"  # 30 minutes
  progress_interval: "10s"
logging:
  level: "info"
  format: "json"
  file: "stdout"
health:
  port: 8080
  endpoint: "/health"
  check_interval: "30s"
metrics:
  enabled: true
  port: 9090
  endpoint: "/metrics"
Environment-Specific Configurations
Development Environment
# .env.development
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=debug
AGENTHUB_LOG_FORMAT=text
AGENTHUB_MAX_CONNECTIONS=100
AGENTHUB_CHANNEL_BUFFER_SIZE=5
# Agent settings
AGENTHUB_BROKER_ADDRESS=localhost:50051
AGENTHUB_MAX_CONCURRENT_TASKS=2
AGENTHUB_TASK_TIMEOUT=60s
AGENTHUB_AGENT_LOG_LEVEL=debug
Staging Environment
# .env.staging
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=info
AGENTHUB_LOG_FORMAT=json
AGENTHUB_MAX_CONNECTIONS=1000
AGENTHUB_CHANNEL_BUFFER_SIZE=20
# Security
AGENTHUB_TLS_ENABLED=true
AGENTHUB_CERT_FILE=/etc/certs/staging.crt
AGENTHUB_KEY_FILE=/etc/certs/staging.key
# Agent settings
AGENTHUB_BROKER_ADDRESS=staging-broker.example.com:50051
AGENTHUB_MAX_CONCURRENT_TASKS=5
AGENTHUB_TASK_TIMEOUT=300s
Production Environment
# .env.production
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=warn
AGENTHUB_LOG_FORMAT=json
AGENTHUB_LOG_FILE=/var/log/agenthub/broker.log
AGENTHUB_MAX_CONNECTIONS=5000
AGENTHUB_CHANNEL_BUFFER_SIZE=50
# Security
AGENTHUB_TLS_ENABLED=true
AGENTHUB_CERT_FILE=/etc/ssl/certs/agenthub.crt
AGENTHUB_KEY_FILE=/etc/ssl/private/agenthub.key
AGENTHUB_CA_FILE=/etc/ssl/certs/ca.crt
# Performance
AGENTHUB_MAX_MESSAGE_SIZE=16MB
AGENTHUB_KEEPALIVE_TIME=10s
AGENTHUB_MEMORY_LIMIT=8GB
# Agent settings
AGENTHUB_BROKER_ADDRESS=agenthub-prod.example.com:50051
AGENTHUB_MAX_CONCURRENT_TASKS=10
AGENTHUB_TASK_TIMEOUT=1800s
AGENTHUB_CONNECTION_TIMEOUT=15s
AGENTHUB_RETRY_ATTEMPTS=5
Configuration Validation
Broker Configuration Validation
type BrokerConfig struct {
    Port             int           `yaml:"port" validate:"min=1,max=65535"`
    Host             string        `yaml:"host" validate:"required"`
    MaxConnections   int           `yaml:"max_connections" validate:"min=1"`
    Timeout          time.Duration `yaml:"timeout" validate:"min=1s"`
    ChannelBufferSize int          `yaml:"channel_buffer_size" validate:"min=1"`
}
func (c *BrokerConfig) Validate() error {
    validate := validator.New()
    return validate.Struct(c)
}
Agent Configuration Validation
type AgentConfig struct {
    BrokerAddress      string        `yaml:"broker_address" validate:"required"`
    AgentID            string        `yaml:"agent_id" validate:"required,min=1,max=64"`
    MaxConcurrentTasks int           `yaml:"max_concurrent_tasks" validate:"min=1,max=100"`
    TaskTimeout        time.Duration `yaml:"task_timeout" validate:"min=1s"`
}
func (c *AgentConfig) Validate() error {
    validate := validator.New()
    if err := validate.Struct(c); err != nil {
        return err
    }
    // Custom validation
    if !strings.Contains(c.BrokerAddress, ":") {
        return errors.New("broker_address must include port")
    }
    return nil
}
This comprehensive configuration reference covers all aspects of configuring AgentHub for different environments and use cases.
3.2 - API Reference
Complete API documentation and specifications
API Reference Documentation
This section contains comprehensive API documentation for all AgentHub interfaces, including gRPC APIs, unified abstractions, and tracing interfaces.
Available Documentation
3.2.1 - A2A-Compliant AgentHub API Reference
Complete technical reference for the A2A-compliant AgentHub API, including all gRPC services, message types, and operational details.
A2A-Compliant AgentHub API Reference
This document provides complete technical reference for the Agent2Agent (A2A) protocol-compliant AgentHub API, including all gRPC services, message types, and operational details.
gRPC Service Definition
The AgentHub broker implements the AgentHub service as defined in proto/eventbus.proto:
service AgentHub {
  // ===== A2A Message Publishing (EDA style) =====
  // PublishMessage submits an A2A message for delivery through the broker
  rpc PublishMessage(PublishMessageRequest) returns (PublishResponse);
  // PublishTaskUpdate notifies subscribers about A2A task state changes
  rpc PublishTaskUpdate(PublishTaskUpdateRequest) returns (PublishResponse);
  // PublishTaskArtifact delivers A2A task output artifacts to subscribers
  rpc PublishTaskArtifact(PublishTaskArtifactRequest) returns (PublishResponse);
  // ===== A2A Event Subscriptions (EDA style) =====
  // SubscribeToMessages creates a stream of A2A message events for an agent
  rpc SubscribeToMessages(SubscribeToMessagesRequest) returns (stream AgentEvent);
  // SubscribeToTasks creates a stream of A2A task events for an agent
  rpc SubscribeToTasks(SubscribeToTasksRequest) returns (stream AgentEvent);
  // SubscribeToAgentEvents creates a unified stream of all events for an agent
  rpc SubscribeToAgentEvents(SubscribeToAgentEventsRequest) returns (stream AgentEvent);
  // ===== A2A Task Management (compatible with A2A spec) =====
  // GetTask retrieves the current state of an A2A task by ID
  rpc GetTask(GetTaskRequest) returns (a2a.Task);
  // CancelTask cancels an active A2A task and notifies subscribers
  rpc CancelTask(CancelTaskRequest) returns (a2a.Task);
  // ListTasks returns A2A tasks matching the specified criteria
  rpc ListTasks(ListTasksRequest) returns (ListTasksResponse);
  // ===== Agent Discovery (A2A compatible) =====
  // GetAgentCard returns the broker's A2A agent card for discovery
  rpc GetAgentCard(google.protobuf.Empty) returns (a2a.AgentCard);
  // RegisterAgent registers an agent with the broker for event routing
  rpc RegisterAgent(RegisterAgentRequest) returns (RegisterAgentResponse);
}
A2A Message Types
Core A2A Types
A2A Message
Represents an A2A-compliant message for agent communication.
message Message {
  string message_id = 1;       // Required: Unique message identifier
  string context_id = 2;       // Optional: Conversation context
  string task_id = 3;          // Optional: Associated task
  Role role = 4;               // Required: USER or AGENT
  repeated Part content = 5;   // Required: Message content parts
  google.protobuf.Struct metadata = 6; // Optional: Additional metadata
  repeated string extensions = 7;       // Optional: Protocol extensions
}
Field Details:
- message_id: Must be unique across all messages. Generated automatically if not provided
- context_id: Groups related messages in a conversation or workflow
- task_id: Links message to a specific A2A task
- role: Indicates whether message is from USER (requesting agent) or AGENT (responding agent)
- content: Array of A2A Part structures containing the actual message content
- metadata: Additional context for routing, processing, or debugging
- extensions: Protocol extension identifiers for future compatibility
A2A Part
Represents content within an A2A message.
message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}
message DataPart {
  google.protobuf.Struct data = 1;    // Structured data content
  string description = 2;             // Optional data description
}
message FilePart {
  string file_id = 1;                 // File identifier or URI
  string filename = 2;                // Original filename
  string mime_type = 3;               // MIME type
  int64 size_bytes = 4;               // File size in bytes
  google.protobuf.Struct metadata = 5; // Additional file metadata
}
A2A Task
Represents an A2A-compliant task with lifecycle management.
message Task {
  string id = 1;                    // Required: Task identifier
  string context_id = 2;            // Optional: Conversation context
  TaskStatus status = 3;            // Required: Current task status
  repeated Message history = 4;     // Message history for this task
  repeated Artifact artifacts = 5;  // Task output artifacts
  google.protobuf.Struct metadata = 6; // Task metadata
}
message TaskStatus {
  TaskState state = 1;              // Current task state
  Message update = 2;               // Status update message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}
enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created and submitted
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed with error
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}
A2A Artifact
Represents structured output from completed tasks.
message Artifact {
  string artifact_id = 1;           // Required: Artifact identifier
  string name = 2;                  // Human-readable name
  string description = 3;           // Artifact description
  repeated Part parts = 4;          // Artifact content parts
  google.protobuf.Struct metadata = 5; // Artifact metadata
}
EDA Event Wrapper Types
AgentEvent
Wraps A2A messages for Event-Driven Architecture transport.
message AgentEvent {
  string event_id = 1;                     // Unique event identifier
  google.protobuf.Timestamp timestamp = 2; // Event timestamp
  // A2A-compliant payload
  oneof payload {
    a2a.Message message = 10;              // A2A Message
    a2a.Task task = 11;                    // A2A Task
    TaskStatusUpdateEvent status_update = 12; // Task status change
    TaskArtifactUpdateEvent artifact_update = 13; // Artifact update
  }
  // EDA routing metadata
  AgentEventMetadata routing = 20;
  // Observability context
  string trace_id = 30;
  string span_id = 31;
}
Provides routing and delivery information for events.
message AgentEventMetadata {
  string from_agent_id = 1;               // Source agent identifier
  string to_agent_id = 2;                 // Target agent ID (empty = broadcast)
  string event_type = 3;                  // Event classification
  repeated string subscriptions = 4;      // Topic-based routing tags
  Priority priority = 5;                  // Delivery priority
}
Request/Response Messages
PublishMessageRequest
message PublishMessageRequest {
  a2a.Message message = 1;                // A2A message to publish
  AgentEventMetadata routing = 2;         // EDA routing info
}
SubscribeToTasksRequest
message SubscribeToTasksRequest {
  string agent_id = 1;                    // Agent ID for subscription
  repeated string task_types = 2;         // Optional task type filter
  repeated a2a.TaskState states = 3;      // Optional state filter
}
GetTaskRequest
message GetTaskRequest {
  string task_id = 1;                     // Task identifier
  int32 history_length = 2;               // History limit (optional)
}
API Operations
Publishing A2A Messages
PublishMessage
Publishes an A2A message for delivery through the EDA broker.
Go Example:
// Create A2A message content
content := []*pb.Part{
    {
        Part: &pb.Part_Text{
            Text: "Hello! Please process this request.",
        },
    },
    {
        Part: &pb.Part_Data{
            Data: &pb.DataPart{
                Data: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "operation": structpb.NewStringValue("process_data"),
                        "dataset_id": structpb.NewStringValue("dataset_123"),
                    },
                },
            },
        },
    },
}
// Create A2A message
message := &pb.Message{
    MessageId: "msg_12345",
    ContextId: "conversation_abc",
    TaskId:    "task_67890",
    Role:      pb.Role_ROLE_USER,
    Content:   content,
    Metadata: &structpb.Struct{
        Fields: map[string]*structpb.Value{
            "priority": structpb.NewStringValue("high"),
        },
    },
}
// Publish through AgentHub
response, err := client.PublishMessage(ctx, &pb.PublishMessageRequest{
    Message: message,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "requester_agent",
        ToAgentId:   "processor_agent",
        EventType:   "task_message",
        Priority:    pb.Priority_PRIORITY_HIGH,
    },
})
Subscribing to A2A Events
SubscribeToTasks
Creates a stream of A2A task events for an agent.
Go Example:
req := &pb.SubscribeToTasksRequest{
    AgentId: "processor_agent",
    TaskTypes: []string{"data_processing", "image_analysis"}, // Optional filter
}
stream, err := client.SubscribeToTasks(ctx, req)
if err != nil {
    return err
}
for {
    event, err := stream.Recv()
    if err == io.EOF {
        break
    }
    if err != nil {
        return err
    }
    // Process different event types
    switch payload := event.GetPayload().(type) {
    case *pb.AgentEvent_Task:
        task := payload.Task
        log.Printf("Received A2A task: %s", task.GetId())
        // Process task using A2A handler
        artifact, status, errorMsg := processA2ATask(ctx, task)
        // Publish completion
        publishTaskCompletion(ctx, client, task, artifact, status, errorMsg)
    case *pb.AgentEvent_StatusUpdate:
        update := payload.StatusUpdate
        log.Printf("Task %s status: %s", update.GetTaskId(), update.GetStatus().GetState())
    case *pb.AgentEvent_ArtifactUpdate:
        artifact := payload.ArtifactUpdate
        log.Printf("Received artifact for task %s", artifact.GetTaskId())
    }
}
A2A Task Management
GetTask
Retrieves the current state of an A2A task.
Go Example:
req := &pb.GetTaskRequest{
    TaskId: "task_67890",
    HistoryLength: 10, // Optional: limit message history
}
task, err := client.GetTask(ctx, req)
if err != nil {
    return err
}
log.Printf("Task %s status: %s", task.GetId(), task.GetStatus().GetState())
log.Printf("Message history: %d messages", len(task.GetHistory()))
log.Printf("Artifacts: %d artifacts", len(task.GetArtifacts()))
CancelTask
Cancels an active A2A task.
Go Example:
req := &pb.CancelTaskRequest{
    TaskId: "task_67890",
    Reason: "User requested cancellation",
}
task, err := client.CancelTask(ctx, req)
if err != nil {
    return err
}
log.Printf("Task %s cancelled", task.GetId())
Agent Discovery
GetAgentCard
Returns the broker’s A2A agent card for discovery.
Go Example:
card, err := client.GetAgentCard(ctx, &emptypb.Empty{})
if err != nil {
    return err
}
log.Printf("AgentHub broker: %s v%s", card.GetName(), card.GetVersion())
log.Printf("Protocol version: %s", card.GetProtocolVersion())
log.Printf("Capabilities: streaming=%v", card.GetCapabilities().GetStreaming())
for _, skill := range card.GetSkills() {
    log.Printf("Skill: %s - %s", skill.GetName(), skill.GetDescription())
}
RegisterAgent
Registers an agent with the broker.
Go Example:
agentCard := &pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:           "my-processor-agent",
    Description:    "Data processing agent with A2A compliance",
    Version:        "1.0.0",
    Capabilities: &pb.AgentCapabilities{
        Streaming: true,
    },
    Skills: []*pb.AgentSkill{
        {
            Id:          "data_processing",
            Name:        "Data Processing",
            Description: "Process structured datasets",
            Tags:        []string{"data", "analysis"},
        },
    },
}
response, err := client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
    AgentCard: agentCard,
    Subscriptions: []string{"data_processing", "analytics"},
})
if response.GetSuccess() {
    log.Printf("Agent registered with ID: %s", response.GetAgentId())
} else {
    log.Printf("Registration failed: %s", response.GetError())
}
High-Level A2A Client Abstractions
A2ATaskPublisher
Simplified interface for publishing A2A tasks.
taskPublisher := &agenthub.A2ATaskPublisher{
    Client:         client,
    TraceManager:   traceManager,
    MetricsManager: metricsManager,
    Logger:         logger,
    ComponentName:  "my-publisher",
    AgentID:        "my-agent-id",
}
task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
    TaskType:         "data_analysis",
    Content:          contentParts,
    RequesterAgentID: "my-agent-id",
    ResponderAgentID: "data-processor",
    Priority:         pb.Priority_PRIORITY_MEDIUM,
    ContextID:        "analysis-session-123",
})
A2ATaskSubscriber
Simplified interface for processing A2A tasks.
taskSubscriber := agenthub.NewA2ATaskSubscriber(client, "my-agent-id")
// Register task handlers
taskSubscriber.RegisterTaskHandler("data_analysis", func(ctx context.Context, task *pb.Task, message *pb.Message) (*pb.Artifact, pb.TaskState, string) {
    // Process the A2A task
    result := processDataAnalysis(task, message)
    // Return A2A artifact
    artifact := &pb.Artifact{
        ArtifactId:  fmt.Sprintf("result_%s", task.GetId()),
        Name:        "analysis_result",
        Description: "Data analysis results",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data: result,
                    },
                },
            },
        },
    }
    return artifact, pb.TaskState_TASK_STATE_COMPLETED, ""
})
// Start processing A2A tasks
err := taskSubscriber.SubscribeToTasks(ctx)
Error Handling
gRPC Status Codes
AgentHub uses standard gRPC status codes:
InvalidArgument (Code: 3)
- Missing required fields (message_id, role, content)
- Invalid A2A message structure
- Malformed Part content
NotFound (Code: 5)
- Task ID not found in GetTask/CancelTask
- Agent not registered
Internal (Code: 13)
- Server-side processing errors
- Message routing failures
- A2A validation errors
Retry Patterns
func publishWithRetry(ctx context.Context, client pb.AgentHubClient, req *pb.PublishMessageRequest) error {
    for attempt := 0; attempt < 3; attempt++ {
        _, err := client.PublishMessage(ctx, req)
        if err == nil {
            return nil
        }
        // Check if error is retryable
        if status.Code(err) == codes.InvalidArgument {
            return err // Don't retry validation errors
        }
        // Exponential backoff
        time.Sleep(time.Duration(1<<attempt) * time.Second)
    }
    return fmt.Errorf("max retries exceeded")
}
Message Size Limits
- Maximum message size: 4MB (gRPC default)
- Recommended size: <100KB for optimal A2A compliance
- Large content: Use FilePart references for large data
A2A Best Practices
- Use structured Parts: Prefer DataPart for structured data over text
- Context management: Group related messages with context_id
- Artifact structure: Return well-formed Artifact objects
- Task lifecycle: Properly manage TaskState transitions
- Connection reuse: Maintain persistent gRPC connections
This completes the comprehensive A2A-compliant API reference for AgentHub, covering all message types, operations, and integration patterns with practical examples.
3.2.2 - AgentHub Tracing API Reference
Complete API documentation for AgentHub’s OpenTelemetry tracing integration, span management, context propagation, and instrumentation patterns.
π AgentHub Tracing API Reference
Technical reference: Complete API documentation for AgentHub’s OpenTelemetry tracing integration, span management, context propagation, and instrumentation patterns.
Core Components
TraceManager
The TraceManager provides high-level tracing operations for AgentHub events.
Constructor
func NewTraceManager(serviceName string) *TraceManager
Parameters:
- serviceName- Name of the service creating spans
Returns: Configured TraceManager instance
Methods
StartPublishSpan
func (tm *TraceManager) StartPublishSpan(ctx context.Context, responderAgentID, eventType string) (context.Context, trace.Span)
Purpose: Creates a span for event publishing operations
Parameters:
- ctx- Parent context (may contain existing trace)
- responderAgentID- Target agent for the event
- eventType- Type of event being published
Returns:
- context.Context- New context with active span
- trace.Span- The created span
Attributes Set:
- event.type- Event type being published
- responder.agent- Target agent ID
- operation.type- “publish”
Usage:
ctx, span := tm.StartPublishSpan(ctx, "agent_subscriber", "greeting")
defer span.End()
// ... publishing logic
StartEventProcessingSpan
func (tm *TraceManager) StartEventProcessingSpan(ctx context.Context, eventID, eventType, requesterAgentID, responderAgentID string) (context.Context, trace.Span)
Purpose: Creates a span for event processing operations
Parameters:
- ctx- Context with extracted trace information
- eventID- Unique identifier for the event
- eventType- Type of event being processed
- requesterAgentID- Agent that requested processing
- responderAgentID- Agent performing processing
Returns:
- context.Context- Context with processing span
- trace.Span- The processing span
Attributes Set:
- event.id- Event identifier
- event.type- Event type
- requester.agent- Requesting agent ID
- responder.agent- Processing agent ID
- operation.type- “process”
StartBrokerSpan
func (tm *TraceManager) StartBrokerSpan(ctx context.Context, operation, eventType string) (context.Context, trace.Span)
Purpose: Creates spans for broker operations
Parameters:
- ctx- Request context
- operation- Broker operation (route, subscribe, unsubscribe)
- eventType- Event type being handled
Returns:
- context.Context- Context with broker span
- trace.Span- The broker span
Attributes Set:
- operation.type- Broker operation type
- event.type- Event type being handled
- component- “broker”
InjectTraceContext
func (tm *TraceManager) InjectTraceContext(ctx context.Context, headers map[string]string)
Purpose: Injects trace context into headers for propagation
Parameters:
- ctx- Context containing trace information
- headers- Map to inject headers into
Headers Injected:
- traceparent- W3C trace context header
- tracestate- W3C trace state header (if present)
Usage:
headers := make(map[string]string)
tm.InjectTraceContext(ctx, headers)
// headers now contain trace context for propagation
func (tm *TraceManager) ExtractTraceContext(ctx context.Context, headers map[string]string) context.Context
Purpose: Extracts trace context from headers
Parameters:
- ctx- Base context
- headers- Headers containing trace context
Returns: Context with extracted trace information
Usage:
// Extract from event metadata
if metadata := event.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := structFieldsToStringMap(traceHeaders.GetStructValue().Fields)
        ctx = tm.ExtractTraceContext(ctx, headers)
    }
}
RecordError
func (tm *TraceManager) RecordError(span trace.Span, err error)
Purpose: Records an error on a span with proper formatting
Parameters:
- span- Span to record error on
- err- Error to record
Effects:
- Sets span status to error
- Records error as span event
- Adds error type attribute
SetSpanSuccess
func (tm *TraceManager) SetSpanSuccess(span trace.Span)
Purpose: Marks a span as successful
Parameters:
- span- Span to mark as successful
Effects:
- Sets span status to OK
- Ensures span is properly completed
Context Propagation
W3C Trace Context Standards
AgentHub uses the W3C Trace Context specification for interoperability.
traceparent
Format: 00-{trace-id}-{span-id}-{trace-flags}
- 00- Version (currently always 00)
- trace-id- 32-character hex string
- span-id- 16-character hex string
- trace-flags- 2-character hex flags
Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate
Format: Vendor-specific key-value pairs
Example: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
Propagation Implementation
Manual Injection
// Create headers map
headers := make(map[string]string)
// Inject trace context
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))
// Headers now contain trace context
// Convert to protobuf metadata if needed
metadataStruct, err := structpb.NewStruct(map[string]interface{}{
    "trace_headers": headers,
    "timestamp": time.Now().Format(time.RFC3339),
})
// Extract from protobuf metadata
if metadata := task.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := make(map[string]string)
        for k, v := range traceHeaders.GetStructValue().Fields {
            headers[k] = v.GetStringValue()
        }
        ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(headers))
    }
}
Span Lifecycle Management
Creating Spans
Basic Span Creation
tracer := otel.Tracer("my-service")
ctx, span := tracer.Start(ctx, "operation_name")
defer span.End()
Span with Attributes
ctx, span := tracer.Start(ctx, "operation_name", trace.WithAttributes(
    attribute.String("operation.type", "publish"),
    attribute.String("event.type", "greeting"),
    attribute.Int("event.priority", 1),
))
defer span.End()
Child Span Creation
// Parent span
ctx, parentSpan := tracer.Start(ctx, "parent_operation")
defer parentSpan.End()
// Child span (automatically linked)
ctx, childSpan := tracer.Start(ctx, "child_operation")
defer childSpan.End()
Span Attributes
Standard Attributes
AgentHub uses consistent attribute naming:
// Event attributes
attribute.String("event.id", taskID)
attribute.String("event.type", taskType)
attribute.Int("event.priority", priority)
// Agent attributes
attribute.String("agent.id", agentID)
attribute.String("agent.type", agentType)
attribute.String("requester.agent", requesterID)
attribute.String("responder.agent", responderID)
// Operation attributes
attribute.String("operation.type", "publish|process|route")
attribute.String("component", "broker|publisher|subscriber")
// Result attributes
attribute.Bool("success", true)
attribute.String("error.type", "validation|timeout|network")
Custom Attributes
span.SetAttributes(
    attribute.String("business.unit", "sales"),
    attribute.String("user.tenant", "acme-corp"),
    attribute.Int("batch.size", len(items)),
    attribute.Duration("timeout", 30*time.Second),
)
Span Events
Adding Events
// Simple event
span.AddEvent("validation.started")
// Event with attributes
span.AddEvent("cache.miss", trace.WithAttributes(
    attribute.String("cache.key", key),
    attribute.String("cache.type", "redis"),
))
// Event with timestamp
span.AddEvent("external.api.call", trace.WithAttributes(
    attribute.String("api.endpoint", "/v1/users"),
    attribute.Int("api.status_code", 200),
), trace.WithTimestamp(time.Now()))
Common Event Patterns
// Processing milestones
span.AddEvent("processing.started")
span.AddEvent("validation.completed")
span.AddEvent("business.logic.completed")
span.AddEvent("result.published")
// Error events
span.AddEvent("error.occurred", trace.WithAttributes(
    attribute.String("error.message", err.Error()),
    attribute.String("error.stack", debug.Stack()),
))
Span Status
Setting Status
// Success
span.SetStatus(codes.Ok, "")
// Error with message
span.SetStatus(codes.Error, "validation failed")
// Error without message
span.SetStatus(codes.Error, "")
Status Code Mapping
// gRPC codes to OpenTelemetry codes
statusCode := codes.Ok
if err != nil {
    switch {
    case errors.Is(err, context.DeadlineExceeded):
        statusCode = codes.DeadlineExceeded
    case errors.Is(err, context.Canceled):
        statusCode = codes.Cancelled
    default:
        statusCode = codes.Error
    }
}
span.SetStatus(statusCode, err.Error())
Advanced Instrumentation
Baggage Propagation
Setting Baggage
// Add baggage to context
ctx = baggage.ContextWithValues(ctx,
    baggage.String("user.id", userID),
    baggage.String("tenant.id", tenantID),
    baggage.String("request.id", requestID),
)
Reading Baggage
// Read baggage anywhere in the trace
if member := baggage.FromContext(ctx).Member("user.id"); member.Value() != "" {
    userID := member.Value()
    // Use user ID for business logic
}
Span Links
Creating Links
// Link to related span
linkedSpanContext := trace.SpanContextFromContext(relatedCtx)
ctx, span := tracer.Start(ctx, "operation", trace.WithLinks(
    trace.Link{
        SpanContext: linkedSpanContext,
        Attributes: []attribute.KeyValue{
            attribute.String("link.type", "related_operation"),
        },
    },
))
Sampling Control
Conditional Sampling
// Force sampling for important operations
ctx, span := tracer.Start(ctx, "critical_operation",
    trace.WithNewRoot(), // Start new trace
    trace.WithSpanKind(trace.SpanKindServer),
)
// Add sampling priority
span.SetAttributes(
    attribute.String("sampling.priority", "high"),
)
Integration Patterns
gRPC Integration
Server Interceptor
func TracingUnaryInterceptor(tracer trace.Tracer) grpc.UnaryServerInterceptor {
    return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
        ctx, span := tracer.Start(ctx, info.FullMethod)
        defer span.End()
        resp, err := handler(ctx, req)
        if err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, err.Error())
        }
        return resp, err
    }
}
Client Interceptor
func TracingUnaryClientInterceptor(tracer trace.Tracer) grpc.UnaryClientInterceptor {
    return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
        ctx, span := tracer.Start(ctx, method)
        defer span.End()
        err := invoker(ctx, method, req, reply, cc, opts...)
        if err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, err.Error())
        }
        return err
    }
}
HTTP Integration
HTTP Handler Wrapper
func TracingHandler(tracer trace.Tracer, next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
        ctx, span := tracer.Start(ctx, r.Method+" "+r.URL.Path)
        defer span.End()
        span.SetAttributes(
            attribute.String("http.method", r.Method),
            attribute.String("http.url", r.URL.String()),
            attribute.String("http.user_agent", r.UserAgent()),
        )
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}
Error Handling
Error Recording Best Practices
Complete Error Recording
if err != nil {
    // Record error on span
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
    // Add error context
    span.SetAttributes(
        attribute.String("error.type", classifyError(err)),
        attribute.Bool("error.retryable", isRetryable(err)),
    )
    // Log with context
    logger.ErrorContext(ctx, "Operation failed",
        slog.Any("error", err),
        slog.String("operation", "event_processing"),
    )
    return err
}
Error Classification
func classifyError(err error) string {
    switch {
    case errors.Is(err, context.DeadlineExceeded):
        return "timeout"
    case errors.Is(err, context.Canceled):
        return "cancelled"
    case strings.Contains(err.Error(), "connection"):
        return "network"
    case strings.Contains(err.Error(), "validation"):
        return "validation"
    default:
        return "unknown"
    }
}
Span Creation Overhead
- Span creation: ~1-2ΞΌs per span
- Attribute setting: ~100ns per attribute
- Event recording: ~200ns per event
- Context propagation: ~500ns per injection/extraction
Memory Usage
- Active span: ~500 bytes
- Completed span buffer: ~1KB per span
- Context overhead: ~100 bytes per context
Best Practices
- Limit span attributes to essential information
- Use batch exporters to reduce network overhead
- Sample appropriately for high-throughput services
- Pool span contexts where possible
- Avoid deep span nesting (>10 levels)
Troubleshooting
Missing Spans Checklist
- β
 OpenTelemetry properly initialized
- β
 Tracer retrieved from global provider
- β
 Context propagated correctly
- β
 Spans properly ended
- β
 Exporter configured and accessible
Common Issues
Broken Trace Chains
// β Wrong - creates new root trace
ctx, span := tracer.Start(context.Background(), "operation")
// β
 Correct - continues existing trace
ctx, span := tracer.Start(ctx, "operation")
Missing Context Propagation
// β Wrong - context not propagated
go func() {
    ctx, span := tracer.Start(context.Background(), "async_work")
    // work...
}()
// β
 Correct - context properly propagated
go func(ctx context.Context) {
    ctx, span := tracer.Start(ctx, "async_work")
    // work...
}(ctx)
π― Next Steps:
Implementation: Add Observability to Your Agent
Debugging: Debug with Distributed Tracing
Metrics: Observability Metrics Reference
3.2.3 - Unified Abstraction Library API Reference
The AgentHub unified abstraction library provides simplified APIs for building gRPC-based agent communication systems with built-in observability, automatic configuration, and correlation tracking.
Unified Abstraction Library API Reference
The AgentHub unified abstraction library provides simplified APIs for building gRPC-based agent communication systems with built-in observability, automatic configuration, and correlation tracking.
Package: internal/agenthub
The internal/agenthub package contains the core unified abstraction components that dramatically simplify AgentHub development by providing high-level APIs with automatic observability integration.
Overview
The unified abstraction library reduces agent implementation complexity from 380+ lines to ~29 lines by providing:
- Automatic gRPC Setup: One-line server and client creation
- Built-in Observability: Integrated OpenTelemetry tracing and metrics
- Environment-Based Configuration: Automatic configuration from environment variables
- Correlation Tracking: Automatic correlation ID generation and propagation
- Pluggable Architecture: Simple task handler registration
Core Components
GRPCConfig
Configuration structure for gRPC servers and clients with environment-based initialization.
type GRPCConfig struct {
    ServerAddr    string // gRPC server listen address (e.g., ":50051")
    BrokerAddr    string // Broker connection address (e.g., "localhost:50051")
    HealthPort    string // Health check endpoint port
    ComponentName string // Component identifier for observability
}
Constructor
func NewGRPCConfig(componentName string) *GRPCConfig
Creates a new gRPC configuration with environment variable defaults:
| Environment Variable | Default | Description | 
|---|
| AGENTHUB_BROKER_ADDR | localhost | Broker server host | 
| AGENTHUB_BROKER_PORT | 50051 | Broker gRPC port | 
| AGENTHUB_GRPC_PORT | :50051 | Server listen port | 
| BROKER_HEALTH_PORT | 8080 | Health endpoint port | 
Example:
config := agenthub.NewGRPCConfig("my-agent")
// Results in BrokerAddr: "localhost:50051" (automatically combined)
AgentHubServer
High-level gRPC server wrapper with integrated observability.
type AgentHubServer struct {
    Server         *grpc.Server                    // Underlying gRPC server
    Listener       net.Listener                    // Network listener
    Observability  *observability.Observability    // OpenTelemetry integration
    TraceManager   *observability.TraceManager     // Distributed tracing
    MetricsManager *observability.MetricsManager   // Metrics collection
    HealthServer   *observability.HealthServer     // Health monitoring
    Logger         *slog.Logger                    // Structured logging
    Config         *GRPCConfig                     // Configuration
}
Constructor
func NewAgentHubServer(config *GRPCConfig) (*AgentHubServer, error)
Creates a complete gRPC server with:
- OpenTelemetry instrumentation
- Health check endpoints
- Metrics collection
- Structured logging with trace correlation
Methods
func (s *AgentHubServer) Start(ctx context.Context) error
Starts the server with automatic:
- Health endpoint setup (/health,/ready,/metrics)
- Metrics collection goroutine
- gRPC server with observability
func (s *AgentHubServer) Shutdown(ctx context.Context) error
Gracefully shuts down all components:
- gRPC server graceful stop
- Health server shutdown
- Observability cleanup
Example:
config := agenthub.NewGRPCConfig("broker")
server, err := agenthub.NewAgentHubServer(config)
if err != nil {
    log.Fatal(err)
}
// Register services
eventBusService := agenthub.NewEventBusService(server)
pb.RegisterEventBusServer(server.Server, eventBusService)
// Start server
if err := server.Start(ctx); err != nil {
    log.Fatal(err)
}
AgentHubClient
High-level gRPC client wrapper with integrated observability.
type AgentHubClient struct {
    Client         pb.EventBusClient               // gRPC client
    Connection     *grpc.ClientConn                // Connection
    Observability  *observability.Observability    // OpenTelemetry integration
    TraceManager   *observability.TraceManager     // Distributed tracing
    MetricsManager *observability.MetricsManager   // Metrics collection
    HealthServer   *observability.HealthServer     // Health monitoring
    Logger         *slog.Logger                    // Structured logging
    Config         *GRPCConfig                     // Configuration
}
Constructor
func NewAgentHubClient(config *GRPCConfig) (*AgentHubClient, error)
Creates a complete gRPC client with:
- OpenTelemetry instrumentation
- Connection health monitoring
- Metrics collection
- Automatic retry and timeout handling
Methods
func (c *AgentHubClient) Start(ctx context.Context) error
Initializes client with health monitoring and metrics collection.
func (c *AgentHubClient) Shutdown(ctx context.Context) error
Gracefully closes connection and cleans up resources.
Example:
config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    log.Fatal(err)
}
err = client.Start(ctx)
if err != nil {
    log.Fatal(err)
}
// Use client.Client for gRPC calls
Service Abstractions
EventBusService
Broker service implementation with built-in observability and correlation tracking.
type EventBusService struct {
    Server          *AgentHubServer
    subscriptions   map[string][]Subscription
    resultSubs      map[string][]ResultSubscription
    progressSubs    map[string][]ProgressSubscription
    mu              sync.RWMutex
}
Constructor
func NewEventBusService(server *AgentHubServer) *EventBusService
Creates an EventBus service with automatic:
- Subscription management
- Task routing and correlation
- Observability integration
Key Methods
func (s *EventBusService) PublishTask(ctx context.Context, req *pb.PublishTaskRequest) (*pb.PublishResponse, error)
Publishes tasks with automatic:
- Input validation
- Correlation ID generation
- Distributed tracing
- Metrics collection
func (s *EventBusService) SubscribeToTasks(req *pb.SubscribeToTasksRequest, stream pb.EventBus_SubscribeToTasksServer) error
Manages task subscriptions with:
- Automatic subscription lifecycle
- Context cancellation handling
- Error recovery
SubscriberAgent
High-level subscriber implementation with pluggable task handlers.
type SubscriberAgent struct {
    client      *AgentHubClient
    agentID     string
    handlers    map[string]TaskHandler
    ctx         context.Context
    cancel      context.CancelFunc
}
Constructor
func NewSubscriberAgent(client *AgentHubClient, agentID string) *SubscriberAgent
Task Handler Interface
type TaskHandler interface {
    Handle(ctx context.Context, task *pb.TaskMessage) (*pb.TaskResult, error)
}
Methods
func (s *SubscriberAgent) RegisterHandler(taskType string, handler TaskHandler)
Registers handlers for specific task types with automatic:
- Task routing
- Error handling
- Result publishing
func (s *SubscriberAgent) Start(ctx context.Context) error
Starts the subscriber with automatic:
- Task subscription
- Handler dispatch
- Observability integration
Example:
type GreetingHandler struct{}
func (h *GreetingHandler) Handle(ctx context.Context, task *pb.TaskMessage) (*pb.TaskResult, error) {
    // Process greeting task
    return result, nil
}
// Register handler
subscriber.RegisterHandler("greeting", &GreetingHandler{})
Utility Functions
func ExtractCorrelationID(ctx context.Context) string
func InjectCorrelationID(ctx context.Context, correlationID string) context.Context
func GenerateCorrelationID() string
Automatic correlation ID management for distributed tracing.
Metrics Helpers
func NewMetricsTicker(ctx context.Context, manager *observability.MetricsManager) *MetricsTicker
Automatic metrics collection with configurable intervals.
Configuration Reference
Environment Variables
The unified abstraction library uses environment-based configuration:
| Variable | Type | Default | Description | 
|---|
| AGENTHUB_BROKER_ADDR | string | localhost | Broker server hostname | 
| AGENTHUB_BROKER_PORT | string | 50051 | Broker gRPC port | 
| AGENTHUB_GRPC_PORT | string | :50051 | Server listen address | 
| BROKER_HEALTH_PORT | string | 8080 | Health endpoint port | 
| SERVICE_VERSION | string | 1.0.0 | Service version for observability | 
| ENVIRONMENT | string | development | Deployment environment | 
Observability Integration
The unified abstraction automatically configures:
- OpenTelemetry Tracing: Automatic span creation and context propagation
- Prometheus Metrics: 47+ built-in metrics for performance monitoring
- Health Checks: Comprehensive health endpoints for service monitoring
- Structured Logging: Correlated logging with trace context
| Metric | Standard gRPC | Unified Abstraction | Overhead | 
|---|
| Setup Complexity | 380+ lines | ~29 lines | -92% code | 
| Throughput | 10,000+ tasks/sec | 9,500+ tasks/sec | -5% | 
| Latency | Baseline | +10ms for tracing | +10ms | 
| Memory | Baseline | +50MB per agent | +50MB | 
| CPU | Baseline | +5% for observability | +5% | 
Migration Guide
From Standard gRPC
Before (Standard gRPC):
// 380+ lines of boilerplate code
lis, err := net.Listen("tcp", ":50051")
server := grpc.NewServer()
// ... extensive setup code
After (Unified Abstraction):
// 29 lines total
config := agenthub.NewGRPCConfig("my-service")
server, err := agenthub.NewAgentHubServer(config)
service := agenthub.NewEventBusService(server)
pb.RegisterEventBusServer(server.Server, service)
server.Start(ctx)
Observability Benefits
The unified abstraction provides automatic:
- Distributed Tracing: Every request automatically traced
- Metrics Collection: 47+ metrics without configuration
- Health Monitoring: Built-in health and readiness endpoints
- Error Correlation: Automatic error tracking across services
- Performance Monitoring: Latency, throughput, and error rates
Error Handling
The unified abstraction provides comprehensive error handling:
- Automatic Retries: Built-in retry logic for transient failures
- Circuit Breaking: Protection against cascading failures
- Graceful Degradation: Service continues operating during partial failures
- Error Correlation: Distributed error tracking across service boundaries
Best Practices
1. Configuration Management
// Use environment-based configuration
config := agenthub.NewGRPCConfig("my-service")
// Override specific values if needed
config.HealthPort = "8083"
2. Handler Registration
// Register handlers before starting
subscriber.RegisterHandler("task-type", handler)
subscriber.Start(ctx)
3. Graceful Shutdown
// Always implement proper shutdown
defer func() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    server.Shutdown(ctx)
}()
4. Error Handling
// Use context for cancellation
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
result, err := client.Client.PublishTask(ctx, request)
if err != nil {
    // Error is automatically traced and logged
    return fmt.Errorf("failed to publish task: %w", err)
}
See Also
3.3 - Observability
Monitoring, metrics, and observability reference
Observability Reference
This section provides reference documentation for all observability features, including metrics, health endpoints, and monitoring capabilities.
Available Documentation
3.3.1 - AgentHub Health Endpoints Reference
Complete documentation for AgentHub’s health monitoring APIs, endpoint specifications, status codes, and integration patterns.
AgentHub Health Endpoints Reference
Technical reference: Complete documentation for AgentHub’s health monitoring APIs, endpoint specifications, status codes, and integration patterns.
Overview
Every observable AgentHub service exposes standardized health endpoints for monitoring, load balancing, and operational management.
Standard Endpoints
Health Check Endpoint
/health
Purpose: Comprehensive service health status
Method: GET
Port: Service-specific (8080-8083)
Response Format:
{
  "status": "healthy|degraded|unhealthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "database_connection",
      "status": "healthy",
      "message": "Database connection is active",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "15.6ms"
    }
  ]
}
Status Codes:
- 200 OK- All checks healthy
- 503 Service Unavailable- One or more checks unhealthy
- 500 Internal Server Error- Health check system failure
Readiness Endpoint
/ready
Purpose: Service readiness for traffic acceptance
Method: GET
Response Format:
{
  "ready": true,
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "dependencies": [
    {
      "name": "grpc_server",
      "ready": true,
      "message": "gRPC server listening on :50051"
    },
    {
      "name": "observability",
      "ready": true,
      "message": "OpenTelemetry initialized"
    }
  ]
}
Status Codes:
- 200 OK- Service ready for traffic
- 503 Service Unavailable- Service not ready
Metrics Endpoint
/metrics
Purpose: Prometheus metrics exposure
Method: GET
Content-Type: text/plain
Response Format:
# HELP events_processed_total Total number of events processed
# TYPE events_processed_total counter
events_processed_total{service="agenthub-broker",event_type="greeting",success="true"} 1234
# HELP system_cpu_usage_percent CPU usage percentage
# TYPE system_cpu_usage_percent gauge
system_cpu_usage_percent{service="agenthub-broker"} 23.4
Status Codes:
- 200 OK- Metrics available
- 500 Internal Server Error- Metrics collection failure
Service-Specific Configurations
Broker (Port 8080)
Health Checks:
- self- Basic service health
- grpc_server- gRPC server status
- observability- OpenTelemetry health
Example URLs:
- Health: http://localhost:8080/health
- Ready: http://localhost:8080/ready
- Metrics: http://localhost:8080/metrics
Publisher (Port 8081)
Health Checks:
- self- Basic service health
- broker_connection- Connection to AgentHub broker
- observability- Tracing and metrics health
Example URLs:
- Health: http://localhost:8081/health
- Ready: http://localhost:8081/ready
- Metrics: http://localhost:8081/metrics
Subscriber (Port 8082)
Health Checks:
- self- Basic service health
- broker_connection- Connection to AgentHub broker
- task_processor- Task processing capability
- observability- Observability stack health
Example URLs:
- Health: http://localhost:8082/health
- Ready: http://localhost:8082/ready
- Metrics: http://localhost:8082/metrics
Custom Agents (Port 8083+)
Configurable Health Checks:
- Custom business logic checks
- External dependency checks
- Resource availability checks
Health Check Types
BasicHealthChecker
Purpose: Simple function-based health checks
Implementation:
checker := observability.NewBasicHealthChecker("database", func(ctx context.Context) error {
    return db.Ping()
})
healthServer.AddChecker("database", checker)
Use Cases:
- Database connectivity
- File system access
- Configuration validation
- Memory/disk space checks
GRPCHealthChecker
Purpose: gRPC connection health verification
Implementation:
checker := observability.NewGRPCHealthChecker("broker_connection", "localhost:50051")
healthServer.AddChecker("broker_connection", checker)
Use Cases:
- AgentHub broker connectivity
- External gRPC service dependencies
- Service mesh health
HTTPHealthChecker
Purpose: HTTP endpoint health verification
Implementation:
checker := observability.NewHTTPHealthChecker("api_gateway", "http://gateway:8080/health")
healthServer.AddChecker("api_gateway", checker)
Use Cases:
- REST API dependencies
- Web service health
- Load balancer backends
Custom Health Checkers
Interface:
type HealthChecker interface {
    Check(ctx context.Context) error
    Name() string
}
Custom Implementation Example:
type BusinessLogicChecker struct {
    name string
    validator func() error
}
func (c *BusinessLogicChecker) Check(ctx context.Context) error {
    return c.validator()
}
func (c *BusinessLogicChecker) Name() string {
    return c.name
}
// Usage
checker := &BusinessLogicChecker{
    name: "license_validation",
    validator: func() error {
        if time.Now().After(licenseExpiry) {
            return errors.New("license expired")
        }
        return nil
    },
}
Health Check Configuration
Check Intervals
Default Intervals:
- Active checks: Every 30 seconds
- On-demand checks: Per request
- Startup checks: During service initialization
Configurable Timing:
config := observability.HealthConfig{
    CheckInterval: 15 * time.Second,
    Timeout:       5 * time.Second,
    RetryCount:    3,
    RetryDelay:    1 * time.Second,
}
Timeout Configuration
Per-Check Timeouts:
checker := observability.NewBasicHealthChecker("slow_service",
    func(ctx context.Context) error {
        // This check will timeout after 10 seconds
        return slowOperation(ctx)
    }).WithTimeout(10 * time.Second)
Global Timeout:
healthServer := observability.NewHealthServer("8080", "my-service", "1.0.0")
healthServer.SetGlobalTimeout(30 * time.Second)
Integration Patterns
Kubernetes Integration
Liveness Probe
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
Readiness Probe
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2
Startup Probe
startupProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 30
Load Balancer Integration
HAProxy Configuration
backend agentHub_brokers
    balance roundrobin
    option httpchk GET /health
    server broker1 broker1:8080 check
    server broker2 broker2:8080 check
NGINX Configuration
upstream agenthub_backend {
    server broker1:8080;
    server broker2:8080;
}
location /health_check {
    proxy_pass http://agenthub_backend/health;
    proxy_set_header Host $host;
}
Prometheus Integration
Service Discovery
- job_name: 'agenthub-health'
  static_configs:
    - targets:
      - 'broker:8080'
      - 'publisher:8081'
      - 'subscriber:8082'
  metrics_path: '/metrics'
  scrape_interval: 10s
  scrape_timeout: 5s
Health Check Metrics
# Health check status (1=healthy, 0=unhealthy)
health_check_status{service="agenthub-broker",check="database"}
# Health check duration
health_check_duration_seconds{service="agenthub-broker",check="database"}
# Service uptime
service_uptime_seconds{service="agenthub-broker"}
Status Definitions
Service Status Levels
Healthy
Definition: All health checks passing
HTTP Status: 200 OK
Criteria:
- All registered checks return no error
- Service is fully operational
- All dependencies available
Degraded
Definition: Service operational but with limitations
HTTP Status: 200 OK (with warning indicators)
Criteria:
- Critical checks passing
- Non-critical checks may be failing
- Service can handle requests with reduced functionality
Unhealthy
Definition: Service cannot handle requests properly
HTTP Status: 503 Service Unavailable
Criteria:
- One or more critical checks failing
- Service should not receive new requests
- Requires intervention or automatic recovery
Check-Level Status
Passing
- Check completed successfully
- No errors detected
- Within acceptable parameters
Warning
- Check completed with minor issues
- Service functional but attention needed
- May indicate future problems
Critical
- Check failed
- Service functionality compromised
- Immediate attention required
Monitoring and Alerting
Critical Alerts
# Service down alert
- alert: ServiceHealthCheckFailing
  expr: health_check_status == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Service health check failing"
    description: "{{ $labels.service }} health check {{ $labels.check }} is failing"
# Service not ready alert
- alert: ServiceNotReady
  expr: up{job=~"agenthub-.*"} == 0
  for: 30s
  labels:
    severity: critical
  annotations:
    summary: "Service not responding"
    description: "{{ $labels.instance }} is not responding to health checks"
Warning Alerts
# Slow health checks
- alert: SlowHealthChecks
  expr: health_check_duration_seconds > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Health checks taking too long"
    description: "{{ $labels.service }} health check {{ $labels.check }} taking {{ $value }}s"
# Service degraded
- alert: ServiceDegraded
  expr: service_status == 1  # degraded status
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Service running in degraded mode"
    description: "{{ $labels.service }} is degraded but still operational"
API Response Examples
Healthy Service Response
curl http://localhost:8080/health
{
  "status": "healthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "grpc_server",
      "status": "healthy",
      "message": "gRPC server listening on :50051",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "0.8ms"
    },
    {
      "name": "observability",
      "status": "healthy",
      "message": "OpenTelemetry exporter connected",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "12.4ms"
    }
  ]
}
Unhealthy Service Response
curl http://localhost:8080/health
{
  "status": "unhealthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "grpc_server",
      "status": "unhealthy",
      "message": "Failed to bind to port :50051: address already in use",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "0.1ms"
    },
    {
      "name": "observability",
      "status": "healthy",
      "message": "OpenTelemetry exporter connected",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "12.4ms"
    }
  ]
}
Best Practices
Health Check Design
- Fast Execution: Keep checks under 5 seconds
- Meaningful Tests: Test actual functionality, not just process existence
- Idempotent Operations: Checks should not modify system state
- Appropriate Timeouts: Set reasonable timeouts for external dependencies
- Clear Messages: Provide actionable error messages
Dependency Management
- Critical vs Non-Critical: Distinguish between essential and optional dependencies
- Cascade Prevention: Avoid cascading failures through dependency chains
- Circuit Breakers: Implement circuit breakers for flaky dependencies
- Graceful Degradation: Continue operating when non-critical dependencies fail
Operational Considerations
- Monitoring: Set up alerts for health check failures
- Documentation: Document what each health check validates
- Testing: Test health checks in development and staging
- Versioning: Version health check APIs for compatibility
π― Next Steps:
Implementation: Add Observability to Your Agent
Monitoring: Use Grafana Dashboards
Metrics: Observability Metrics Reference
3.3.2 - AgentHub Observability Metrics Reference
Complete catalog of all metrics exposed by AgentHub’s observability system, their meanings, usage patterns, and query examples.
AgentHub Observability Metrics Reference
Technical reference: Complete catalog of all metrics exposed by AgentHub’s observability system, their meanings, usage patterns, and query examples.
Overview
AgentHub automatically collects 47+ distinct metrics across all observable services, providing comprehensive visibility into event processing, system health, and performance characteristics.
Metric Categories
A2A Message Processing Metrics
a2a_messages_processed_total
Type: Counter
Description: Total number of A2A messages processed by service
Labels:
- service- Service name (agenthub, publisher, subscriber)
- message_type- Type of A2A message (task_update, message, artifact)
- success- Processing success (true/false)
- context_id- A2A conversation context (for workflow tracking)
Usage:
# A2A message processing rate per service
rate(a2a_messages_processed_total[5m])
# Success rate by A2A message type
rate(a2a_messages_processed_total{success="true"}[5m]) / rate(a2a_messages_processed_total[5m]) * 100
# Error rate across all A2A services
rate(a2a_messages_processed_total{success="false"}[5m]) / rate(a2a_messages_processed_total[5m]) * 100
# Workflow processing rate by context
rate(a2a_messages_processed_total[5m]) by (context_id)
a2a_messages_published_total
Type: Counter
Description: Total number of A2A messages published by agents
Labels:
- message_type- Type of A2A message published
- from_agent_id- Publishing agent identifier
- to_agent_id- Target agent identifier (empty for broadcast)
Usage:
# A2A publishing rate by message type
rate(a2a_messages_published_total[5m]) by (message_type)
# Most active A2A publishers
topk(5, rate(a2a_messages_published_total[5m]) by (from_agent_id))
# Broadcast vs direct messaging ratio
rate(a2a_messages_published_total{to_agent_id=""}[5m]) / rate(a2a_messages_published_total[5m])
a2a_message_processing_duration_seconds
Type: Histogram
Description: Time taken to process A2A messages
Labels:
- service- Service processing the message
- message_type- Type of A2A message being processed
- task_state- Current A2A task state (for task-related messages)
Buckets: 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10
Usage:
# p95 A2A message processing latency
histogram_quantile(0.95, rate(a2a_message_processing_duration_seconds_bucket[5m]))
# p99 latency by service
histogram_quantile(0.99, rate(a2a_message_processing_duration_seconds_bucket[5m])) by (service)
# Average A2A processing time by task state
rate(a2a_message_processing_duration_seconds_sum[5m]) / rate(a2a_message_processing_duration_seconds_count[5m]) by (task_state)
a2a_message_errors_total
Type: Counter
Description: Total number of A2A message processing errors
Labels:
- service- Service where error occurred
- message_type- Type of A2A message that failed
- error_type- Category of error (grpc_error, validation_error, protocol_error, etc.)
- a2a_version- A2A protocol version for compatibility tracking
Usage:
# A2A error rate by error type
rate(a2a_message_errors_total[5m]) by (error_type)
# Services with highest A2A error rates
topk(3, rate(a2a_message_errors_total[5m]) by (service))
# A2A protocol version compatibility issues
rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) by (a2a_version)
AgentHub Broker Metrics
agenthub_connections_total
Type: Gauge
Description: Number of active agent connections to AgentHub broker
Labels:
- connection_type- Type of connection (a2a_publisher, a2a_subscriber, unified)
- agent_type- Classification of connected agent
Usage:
# Current AgentHub connection count
agenthub_connections_total
# A2A connection growth over time
increase(agenthub_connections_total[1h])
# Connection distribution by type
agenthub_connections_total by (connection_type)
agenthub_subscriptions_total
Type: Gauge
Description: Number of active A2A message subscriptions
Labels:
- agent_id- Subscriber agent identifier
- subscription_type- Type of A2A subscription (tasks, messages, agent_events)
- filter_criteria- Applied subscription filters (task_types, states, etc.)
Usage:
# Total active A2A subscriptions
sum(agenthub_subscriptions_total)
# A2A subscriptions by agent
sum(agenthub_subscriptions_total) by (agent_id)
# Most popular A2A subscription types
sum(agenthub_subscriptions_total) by (subscription_type)
# Filtered vs unfiltered subscriptions
sum(agenthub_subscriptions_total{filter_criteria!=""}) / sum(agenthub_subscriptions_total)
agenthub_message_routing_duration_seconds
Type: Histogram
Description: Time taken to route A2A messages through AgentHub broker
Labels:
- routing_type- Type of routing (direct, broadcast, filtered)
- message_size_bucket- Message size classification (small, medium, large)
Buckets: 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1
Usage:
# AgentHub A2A routing latency percentiles
histogram_quantile(0.95, rate(agenthub_message_routing_duration_seconds_bucket[5m]))
# A2A routing performance by type
rate(agenthub_message_routing_duration_seconds_sum[5m]) / rate(agenthub_message_routing_duration_seconds_count[5m]) by (routing_type)
# Message size impact on routing
histogram_quantile(0.95, rate(agenthub_message_routing_duration_seconds_bucket[5m])) by (message_size_bucket)
agenthub_queue_size
Type: Gauge
Description: Number of A2A messages queued awaiting routing
Labels:
- queue_type- Type of queue (incoming, outgoing, dead_letter, retry)
- priority- Message priority level
- context_active- Whether messages belong to active A2A contexts
Usage:
# Current A2A queue sizes
agenthub_queue_size by (queue_type)
# A2A queue growth rate
rate(agenthub_queue_size[5m])
# Priority queue distribution
agenthub_queue_size by (priority)
# Active context message backlog
agenthub_queue_size{context_active="true"}
System Health Metrics
system_cpu_usage_percent
Type: Gauge
Description: CPU utilization percentage
Labels:
Usage:
# Current CPU usage
system_cpu_usage_percent
# High CPU services
system_cpu_usage_percent > 80
# Average CPU over time
avg_over_time(system_cpu_usage_percent[1h])
system_memory_usage_bytes
Type: Gauge
Description: Memory usage in bytes
Labels:
- service- Service name
- type- Memory type (heap, stack, total)
Usage:
# Memory usage in MB
system_memory_usage_bytes / 1024 / 1024
# Memory growth rate
rate(system_memory_usage_bytes[10m])
# Memory usage by type
system_memory_usage_bytes by (type)
system_goroutines_total
Type: Gauge
Description: Number of active goroutines
Labels:
Usage:
# Current goroutine count
system_goroutines_total
# Goroutine leaks detection
increase(system_goroutines_total[1h]) > 1000
# Goroutine efficiency
system_goroutines_total / system_cpu_usage_percent
system_file_descriptors_used
Type: Gauge
Description: Number of open file descriptors
Labels:
Usage:
# Current FD usage
system_file_descriptors_used
# FD growth rate
rate(system_file_descriptors_used[5m])
A2A Task-Specific Metrics
a2a_tasks_created_total
Type: Counter
Description: Total number of A2A tasks created
Labels:
- task_type- Type classification of the task
- context_id- A2A conversation context
- priority- Task priority level
Usage:
# A2A task creation rate
rate(a2a_tasks_created_total[5m])
# Task creation by type
rate(a2a_tasks_created_total[5m]) by (task_type)
# High priority task rate
rate(a2a_tasks_created_total{priority="PRIORITY_HIGH"}[5m])
a2a_task_state_transitions_total
Type: Counter
Description: Total number of A2A task state transitions
Labels:
- from_state- Previous task state
- to_state- New task state
- task_type- Type of task transitioning
Usage:
# Task completion rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_COMPLETED"}[5m])
# Task failure rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m])
# Task state transition patterns
rate(a2a_task_state_transitions_total[5m]) by (from_state, to_state)
a2a_task_duration_seconds
Type: Histogram
Description: Duration of A2A task execution from submission to completion
Labels:
- task_type- Type of task
- final_state- Final task state (COMPLETED, FAILED, CANCELLED)
Buckets: 0.1, 0.5, 1, 5, 10, 30, 60, 300, 600, 1800
Usage:
# A2A task completion time percentiles
histogram_quantile(0.95, rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m]))
# Task duration by type
histogram_quantile(0.50, rate(a2a_task_duration_seconds_bucket[5m])) by (task_type)
# Failed vs successful task duration comparison
histogram_quantile(0.95, rate(a2a_task_duration_seconds_bucket[5m])) by (final_state)
a2a_artifacts_produced_total
Type: Counter
Description: Total number of A2A artifacts produced by completed tasks
Labels:
- artifact_type- Type of artifact (data, file, text)
- task_type- Type of task that produced the artifact
- artifact_size_bucket- Size classification of artifact
Usage:
# Artifact production rate
rate(a2a_artifacts_produced_total[5m])
# Artifacts by type
rate(a2a_artifacts_produced_total[5m]) by (artifact_type)
# Large artifact production rate
rate(a2a_artifacts_produced_total{artifact_size_bucket="large"}[5m])
gRPC Metrics
grpc_server_started_total
Type: Counter
Description: Total number of RPCs started on the AgentHub server
Labels:
- grpc_method- gRPC method name (PublishMessage, SubscribeToTasks, etc.)
- grpc_service- gRPC service name (AgentHub)
Usage:
# AgentHub RPC request rate
rate(grpc_server_started_total[5m])
# Most called A2A methods
topk(5, rate(grpc_server_started_total[5m]) by (grpc_method))
# A2A vs EDA method usage
rate(grpc_server_started_total{grpc_method=~".*Message.*|.*Task.*"}[5m])
grpc_server_handled_total
Type: Counter
Description: Total number of RPCs completed on the AgentHub server
Labels:
- grpc_method- gRPC method name
- grpc_service- gRPC service name (AgentHub)
- grpc_code- gRPC status code
- a2a_operation- A2A operation type (publish, subscribe, get, cancel)
Usage:
# AgentHub RPC success rate
rate(grpc_server_handled_total{grpc_code="OK"}[5m]) / rate(grpc_server_handled_total[5m]) * 100
# A2A operation error rate
rate(grpc_server_handled_total{grpc_code!="OK"}[5m]) by (a2a_operation)
# A2A method-specific success rates
rate(grpc_server_handled_total{grpc_code="OK"}[5m]) / rate(grpc_server_handled_total[5m]) by (grpc_method)
grpc_server_handling_seconds
Type: Histogram
Description: Histogram of response latency of AgentHub RPCs
Labels:
- grpc_method- gRPC method name
- grpc_service- gRPC service name (AgentHub)
- a2a_operation- A2A operation type
Usage:
# AgentHub gRPC latency percentiles
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m]))
# Slow A2A operations
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m])) by (a2a_operation) > 0.1
# A2A method performance comparison
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m])) by (grpc_method)
Health Check Metrics
health_check_status
Type: Gauge
Description: Health check status (1=healthy, 0=unhealthy)
Labels:
- service- Service name
- check_name- Name of the health check
- endpoint- Health check endpoint
Usage:
# Unhealthy services
health_check_status == 0
# Health check success rate
avg_over_time(health_check_status[5m])
health_check_duration_seconds
Type: Histogram
Description: Time taken to execute health checks
Labels:
- service- Service name
- check_name- Name of the health check
Usage:
# Health check latency
histogram_quantile(0.95, rate(health_check_duration_seconds_bucket[5m]))
# Slow health checks
histogram_quantile(0.95, rate(health_check_duration_seconds_bucket[5m])) by (check_name) > 0.5
OpenTelemetry Metrics
otelcol_processor_batch_batch_send_size_count
Type: Counter
Description: Number of batches sent by OTEL collector
Labels: None
otelcol_exporter_sent_spans
Type: Counter
Description: Number of spans sent to tracing backend
Labels:
- exporter- Exporter name (jaeger, otlp)
Usage:
# Span export rate
rate(otelcol_exporter_sent_spans[5m])
# Export success by backend
rate(otelcol_exporter_sent_spans[5m]) by (exporter)
Common Query Patterns
# Top 5 slowest A2A message types
topk(5,
  histogram_quantile(0.95,
    rate(a2a_message_processing_duration_seconds_bucket[5m])
  ) by (message_type)
)
# A2A task completion time analysis
histogram_quantile(0.95,
  rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m])
) by (task_type)
# Services exceeding A2A latency SLA (>500ms p95)
histogram_quantile(0.95,
  rate(a2a_message_processing_duration_seconds_bucket[5m])
) by (service) > 0.5
# A2A throughput efficiency (messages per CPU percent)
rate(a2a_messages_processed_total[5m]) / system_cpu_usage_percent
# Task success rate by type
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_COMPLETED"}[5m]) /
rate(a2a_tasks_created_total[5m]) by (task_type)
A2A Error Analysis
# A2A message error rate by service over time
rate(a2a_message_errors_total[5m]) / rate(a2a_messages_processed_total[5m]) * 100
# A2A task failure rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
rate(a2a_tasks_created_total[5m]) * 100
# Most common A2A error types
topk(5, rate(a2a_message_errors_total[5m]) by (error_type))
# A2A protocol compatibility issues
rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) by (a2a_version)
# Services with increasing A2A error rates
increase(a2a_message_errors_total[1h]) by (service) > 10
A2A Capacity Planning
# Peak hourly A2A message throughput
max_over_time(
  rate(a2a_messages_processed_total[5m])[1h:]
) * 3600
# Peak A2A task creation rate
max_over_time(
  rate(a2a_tasks_created_total[5m])[1h:]
) * 3600
# Resource utilization during peak A2A load
(
  max_over_time(system_cpu_usage_percent[1h:]) +
  max_over_time(system_memory_usage_bytes[1h:] / 1024 / 1024 / 1024)
) by (service)
# AgentHub connection scaling needs
max_over_time(agenthub_connections_total[24h:])
# A2A queue depth trends
max_over_time(agenthub_queue_size[24h:]) by (queue_type)
A2A System Health
# Overall A2A system health score (0-1)
avg(health_check_status)
# A2A services with degraded performance
(
  system_cpu_usage_percent > 70 or
  system_memory_usage_bytes > 1e9 or
  rate(a2a_message_errors_total[5m]) / rate(a2a_messages_processed_total[5m]) > 0.05
)
# A2A task backlog health
agenthub_queue_size{queue_type="incoming"} > 1000
# A2A protocol health indicators
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
rate(a2a_tasks_created_total[5m]) > 0.1
# Resource leak detection
increase(system_goroutines_total[1h]) > 1000 or
increase(system_file_descriptors_used[1h]) > 100
Alert Rule Examples
Critical A2A Alerts
# High A2A message processing error rate alert
- alert: HighA2AMessageProcessingErrorRate
  expr: |
    (
      rate(a2a_message_errors_total[5m]) /
      rate(a2a_messages_processed_total[5m])
    ) * 100 > 10    
  for: 2m
  annotations:
    summary: "High A2A message processing error rate"
    description: "{{ $labels.service }} has {{ $value }}% A2A error rate"
# High A2A task failure rate alert
- alert: HighA2ATaskFailureRate
  expr: |
    (
      rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
      rate(a2a_tasks_created_total[5m])
    ) * 100 > 15    
  for: 3m
  annotations:
    summary: "High A2A task failure rate"
    description: "{{ $value }}% of A2A tasks are failing for task type {{ $labels.task_type }}"
# AgentHub service down alert
- alert: AgentHubServiceDown
  expr: health_check_status == 0
  for: 1m
  annotations:
    summary: "AgentHub service health check failing"
    description: "{{ $labels.service }} health check {{ $labels.check_name }} is failing"
# A2A queue backlog alert
- alert: A2AQueueBacklog
  expr: agenthub_queue_size{queue_type="incoming"} > 1000
  for: 5m
  annotations:
    summary: "A2A message queue backlog"
    description: "AgentHub has {{ $value }} messages queued"
A2A Warning Alerts
# High A2A message processing latency warning
- alert: HighA2AMessageProcessingLatency
  expr: |
    histogram_quantile(0.95,
      rate(a2a_message_processing_duration_seconds_bucket[5m])
    ) > 0.5    
  for: 5m
  annotations:
    summary: "High A2A message processing latency"
    description: "{{ $labels.service }} A2A p95 latency is {{ $value }}s"
# Slow A2A task completion warning
- alert: SlowA2ATaskCompletion
  expr: |
    histogram_quantile(0.95,
      rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m])
    ) > 300    
  for: 10m
  annotations:
    summary: "Slow A2A task completion"
    description: "A2A tasks of type {{ $labels.task_type }} taking {{ $value }}s to complete"
# High CPU usage warning
- alert: HighCPUUsage
  expr: system_cpu_usage_percent > 80
  for: 5m
  annotations:
    summary: "High CPU usage"
    description: "{{ $labels.service }} CPU usage is {{ $value }}%"
# A2A protocol version compatibility warning
- alert: A2AProtocolVersionMismatch
  expr: |
    rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) > 0.1    
  for: 3m
  annotations:
    summary: "A2A protocol version compatibility issues"
    description: "A2A protocol errors detected for version {{ $labels.a2a_version }}"
Metric Retention and Storage
Retention Policies
- Raw metrics: 15 days at 15-second resolution
- 5m averages: 60 days
- 1h averages: 1 year
- 1d averages: 5 years
Storage Requirements
- Per service: ~2MB/day for all metrics
- Complete system: ~10MB/day for 5 services
- 1 year retention: ~3.6GB total
- Scrape interval: 10 seconds (configurable)
- Evaluation interval: 15 seconds for alerts
- Query timeout: 30 seconds
- Max samples: 50M per query
Integration Examples
Grafana Dashboard Variables
{
  "service": {
    "query": "label_values(a2a_messages_processed_total, service)",
    "refresh": "on_time_range_changed"
  },
  "message_type": {
    "query": "label_values(a2a_messages_processed_total{service=\"$service\"}, message_type)",
    "refresh": "on_dashboard_load"
  },
  "task_type": {
    "query": "label_values(a2a_tasks_created_total, task_type)",
    "refresh": "on_dashboard_load"
  },
  "context_id": {
    "query": "label_values(a2a_messages_processed_total{service=\"$service\"}, context_id)",
    "refresh": "on_dashboard_load"
  }
}
Custom A2A Application Metrics
// Register custom A2A counter
a2aCustomCounter, err := meter.Int64Counter(
    "a2a_custom_business_metric_total",
    metric.WithDescription("Custom A2A business metric"),
)
// Increment with A2A context and labels
a2aCustomCounter.Add(ctx, 1, metric.WithAttributes(
    attribute.String("task_type", "custom_analysis"),
    attribute.String("context_id", contextID),
    attribute.String("agent_type", "analytics_agent"),
    attribute.String("a2a_version", "1.0"),
))
// Register A2A task-specific histogram
a2aTaskHistogram, err := meter.Float64Histogram(
    "a2a_custom_task_processing_seconds",
    metric.WithDescription("Custom A2A task processing time"),
    metric.WithUnit("s"),
)
// Record A2A task timing
start := time.Now()
// ... process A2A task ...
duration := time.Since(start).Seconds()
a2aTaskHistogram.Record(ctx, duration, metric.WithAttributes(
    attribute.String("task_type", taskType),
    attribute.String("task_state", "TASK_STATE_COMPLETED"),
))
Troubleshooting Metrics
Missing Metrics Checklist
- β
 Service built with -tags observability
- β
 Prometheus can reach metrics endpoint
- β
 Correct port in Prometheus config
- β
 Service is actually processing events
- β
 OpenTelemetry exporter configured correctly
High Cardinality Warning
Avoid metrics with unbounded label values:
- β User IDs as labels (millions of values)
- β Timestamps as labels
- β Request IDs as labels
- β
 Event types (limited set)
- β
 Service names (limited set)
- β
 Status codes (limited set)
π― Next Steps:
Implementation: Add Observability to Your Agent
Monitoring: Use Grafana Dashboards
Understanding: Distributed Tracing Explained
3.4 - Tasks
Task message specifications and reference
Tasks Reference
This section provides detailed reference documentation for task messages, specifications, and data structures used throughout AgentHub.
Available Documentation
- Task Reference - Detailed task message specifications and data structures
3.4.1 - A2A Task Reference
Comprehensive reference for all task-related message types and operations in the Agent2Agent protocol implementation.
A2A Task Reference
This document provides a comprehensive reference for all task-related message types and operations in the Agent2Agent (A2A) protocol implementation within AgentHub’s hybrid Event-Driven Architecture.
Core A2A Task Types
A2A Task
The primary message type for managing work requests between agents in the Agent2Agent protocol.
message Task {
  string id = 1;                    // Required: Task identifier
  string context_id = 2;            // Optional: Conversation context
  TaskStatus status = 3;            // Required: Current task status
  repeated Message history = 4;     // Message history for this task
  repeated Artifact artifacts = 5;  // Task output artifacts
  google.protobuf.Struct metadata = 6; // Task metadata
}
Field Reference
| Field | Type | Required | Description | 
|---|
| id | string | Yes | Globally unique identifier for the task | 
| context_id | string | No | Groups related tasks in a workflow or conversation | 
| status | TaskStatus | Yes | Current execution state and last update | 
| history | Message[] | No | Complete message history for this task | 
| artifacts | Artifact[] | No | Output artifacts produced by the task | 
| metadata | Struct | No | Additional context information | 
Task IDs should be globally unique and meaningful for debugging:
// Recommended formats:
taskID := fmt.Sprintf("task_%s_%d", taskType, time.Now().Unix())
taskID := fmt.Sprintf("task_%s_%s", taskType, uuid.New().String())
taskID := fmt.Sprintf("%s_%s_%d", requesterID, taskType, sequence)
A2A TaskStatus
Represents the current state and latest update for a task.
message TaskStatus {
  TaskState state = 1;              // Current task state
  Message update = 2;               // Status update message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}
Field Reference
| Field | Type | Required | Description | 
|---|
| state | TaskState | Yes | Current execution state | 
| update | Message | No | Latest status message from the executing agent | 
| timestamp | Timestamp | Yes | When this status was last updated | 
A2A Message
Agent-to-agent communication within task context.
message Message {
  string message_id = 1;       // Required: Unique message identifier
  string context_id = 2;       // Optional: Conversation context
  string task_id = 3;          // Optional: Associated task
  Role role = 4;               // Required: USER or AGENT
  repeated Part content = 5;   // Required: Message content parts
  google.protobuf.Struct metadata = 6; // Optional: Additional metadata
  repeated string extensions = 7;       // Optional: Protocol extensions
}
Message Content Parts
Messages contain structured content using A2A Part definitions:
message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}
message DataPart {
  google.protobuf.Struct data = 1;    // Structured data content
  string description = 2;             // Optional data description
}
message FilePart {
  string file_id = 1;                 // File identifier or URI
  string filename = 2;                // Original filename
  string mime_type = 3;               // MIME type
  int64 size_bytes = 4;               // File size in bytes
  google.protobuf.Struct metadata = 5; // Additional file metadata
}
A2A Artifact
Structured output produced by completed tasks.
message Artifact {
  string artifact_id = 1;           // Required: Artifact identifier
  string name = 2;                  // Human-readable name
  string description = 3;           // Artifact description
  repeated Part parts = 4;          // Artifact content parts
  google.protobuf.Struct metadata = 5; // Artifact metadata
}
Field Reference
| Field | Type | Required | Description | 
|---|
| artifact_id | string | Yes | Unique identifier for this artifact | 
| name | string | No | Human-readable artifact name | 
| description | string | No | Description of the artifact contents | 
| parts | Part[] | Yes | Structured content using A2A Part format | 
| metadata | Struct | No | Additional artifact information | 
Enumerations
TaskState
Current state of A2A task execution.
enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created and submitted
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed with error
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}
State Transition Rules
Valid state transitions:
TASK_STATE_SUBMITTED β TASK_STATE_WORKING β TASK_STATE_COMPLETED
TASK_STATE_SUBMITTED β TASK_STATE_WORKING β TASK_STATE_FAILED
TASK_STATE_SUBMITTED β TASK_STATE_WORKING β TASK_STATE_CANCELLED
TASK_STATE_SUBMITTED β TASK_STATE_CANCELLED (before execution starts)
Invalid transitions:
- Any state β TASK_STATE_SUBMITTED
- TASK_STATE_COMPLETED β any other state
- TASK_STATE_FAILED β any other state (except for retry scenarios)
Role
Identifies the role of the message sender in A2A communication.
enum Role {
  USER = 0;    // Message from requesting agent
  AGENT = 1;   // Message from responding agent
}
Priority
Task priority levels for scheduling and resource allocation.
enum Priority {
  PRIORITY_UNSPECIFIED = 0;  // Default value, treated as MEDIUM
  PRIORITY_LOW = 1;          // Low priority, can be delayed
  PRIORITY_MEDIUM = 2;       // Normal priority
  PRIORITY_HIGH = 3;         // High priority, expedited processing
  PRIORITY_CRITICAL = 4;     // Critical priority, immediate processing
}
Priority Usage Guidelines
| Priority | Use Cases | SLA Expectations | 
|---|
| LOW | Background jobs, cleanup tasks, analytics | Hours to days | 
| MEDIUM | Standard user requests, routine processing | Minutes to hours | 
| HIGH | User-visible operations, time-sensitive tasks | Seconds to minutes | 
| CRITICAL | Emergency operations, system health tasks | Immediate | 
AgentHub EDA Request/Response Messages
Task Publishing
PublishTaskUpdateRequest
Request to publish a task status update through the EDA broker.
message PublishTaskUpdateRequest {
  a2a.Task task = 1;                      // Updated A2A task
  AgentEventMetadata routing = 2;         // EDA routing metadata
}
PublishTaskArtifactRequest
Request to publish a task artifact through the EDA broker.
message PublishTaskArtifactRequest {
  string task_id = 1;                     // Associated task ID
  a2a.Artifact artifact = 2;              // A2A artifact
  AgentEventMetadata routing = 3;         // EDA routing metadata
}
Task Subscription
SubscribeToTasksRequest
Request to subscribe to A2A task events through the EDA broker.
message SubscribeToTasksRequest {
  string agent_id = 1;                    // Agent ID for subscription
  repeated string task_types = 2;         // Optional task type filter
  repeated a2a.TaskState states = 3;      // Optional state filter
}
Usage Examples
// Subscribe to all tasks for this agent
req := &pb.SubscribeToTasksRequest{
    AgentId: "data_processor_01",
}
// Subscribe only to working and completed tasks
req := &pb.SubscribeToTasksRequest{
    AgentId: "workflow_orchestrator",
    States: []a2a.TaskState{
        a2a.TaskState_TASK_STATE_WORKING,
        a2a.TaskState_TASK_STATE_COMPLETED,
    },
}
Task Management
GetTaskRequest
Request to retrieve the current state of an A2A task.
message GetTaskRequest {
  string task_id = 1;                     // Task identifier
  int32 history_length = 2;               // History limit (optional)
}
CancelTaskRequest
Request to cancel an active A2A task.
message CancelTaskRequest {
  string task_id = 1;                     // Task to cancel
  string reason = 2;                      // Cancellation reason
}
ListTasksRequest
Request to list A2A tasks matching criteria.
message ListTasksRequest {
  string agent_id = 1;                    // Filter by agent
  repeated a2a.TaskState states = 2;      // Filter by states
  google.protobuf.Timestamp since = 3;    // Filter by timestamp
  int32 limit = 4;                        // Results limit
}
gRPC Service Methods
Task Publishing Methods
PublishTaskUpdate
Publishes a task status update to the EDA broker.
rpc PublishTaskUpdate (PublishTaskUpdateRequest) returns (PublishResponse);
Example:
// Create updated task status
status := &a2a.TaskStatus{
    State: a2a.TaskState_TASK_STATE_WORKING,
    Update: &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    taskID,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Processing data analysis...",
                },
            },
        },
    },
    Timestamp: timestamppb.Now(),
}
task := &a2a.Task{
    Id:     taskID,
    Status: status,
}
req := &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "processor_01",
        EventType:   "task.status_update",
    },
}
res, err := client.PublishTaskUpdate(ctx, req)
PublishTaskArtifact
Publishes a task artifact to the EDA broker.
rpc PublishTaskArtifact (PublishTaskArtifactRequest) returns (PublishResponse);
Example:
// Create artifact with results
artifact := &a2a.Artifact{
    ArtifactId:  "artifact_" + uuid.New().String(),
    Name:        "Analysis Results",
    Description: "Statistical analysis of sales data",
    Parts: []*a2a.Part{
        {
            Part: &a2a.Part_Data{
                Data: &a2a.DataPart{
                    Data: structData, // Contains analysis results
                    Description: "Sales analysis summary statistics",
                },
            },
        },
        {
            Part: &a2a.Part_File{
                File: &a2a.FilePart{
                    FileId:   "file_123",
                    Filename: "analysis_report.pdf",
                    MimeType: "application/pdf",
                    SizeBytes: 1024576,
                },
            },
        },
    },
}
req := &pb.PublishTaskArtifactRequest{
    TaskId:   taskID,
    Artifact: artifact,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "processor_01",
        EventType:   "task.artifact",
    },
}
res, err := client.PublishTaskArtifact(ctx, req)
Task Subscription Methods
SubscribeToTasks
Subscribes to receive A2A task events through the EDA broker.
rpc SubscribeToTasks (SubscribeToTasksRequest) returns (stream AgentEvent);
Returns: Stream of AgentEvent objects containing A2A task updates
Example:
req := &pb.SubscribeToTasksRequest{
    AgentId: "processor_01",
    States: []a2a.TaskState{a2a.TaskState_TASK_STATE_SUBMITTED},
}
stream, err := client.SubscribeToTasks(ctx, req)
for {
    event, err := stream.Recv()
    if err != nil {
        break
    }
    // Extract A2A task from event
    if task := event.GetTask(); task != nil {
        go processA2ATask(task)
    }
}
Task Management Methods
GetTask
Retrieves the current state of an A2A task by ID.
rpc GetTask (GetTaskRequest) returns (a2a.Task);
CancelTask
Cancels an active A2A task and notifies subscribers.
rpc CancelTask (CancelTaskRequest) returns (a2a.Task);
ListTasks
Returns A2A tasks matching the specified criteria.
rpc ListTasks (ListTasksRequest) returns (ListTasksResponse);
A2A Task Workflow Patterns
Simple Request-Response
// 1. Agent A creates and publishes task request
task := &a2a.Task{
    Id:        "task_analysis_123",
    ContextId: "workflow_456",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            TaskId:    "task_analysis_123",
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please analyze the Q4 sales data",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: dataStruct, // Contains parameters
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}
// 2. Agent B receives task and updates status to WORKING
// 3. Agent B publishes progress updates during execution
// 4. Agent B publishes final artifacts and COMPLETED status
Multi-Step Workflow
// 1. Orchestrator creates main task
mainTask := &a2a.Task{
    Id:        "workflow_main_789",
    ContextId: "workflow_context_789",
    // ... initial message
}
// 2. Create subtasks with same context_id
subtask1 := &a2a.Task{
    Id:        "subtask_data_prep_790",
    ContextId: "workflow_context_789", // Same context
    // ... data preparation request
}
subtask2 := &a2a.Task{
    Id:        "subtask_analysis_791",
    ContextId: "workflow_context_789", // Same context
    // ... analysis request (depends on subtask1)
}
// 3. Tasks linked by context_id for workflow tracking
Error Handling Reference
A2A Task Error Patterns
Parameter Validation Errors
// Task fails with validation error
failedTask := &a2a.Task{
    Id: taskID,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_FAILED,
        Update: &a2a.Message{
            Role: a2a.Role_AGENT,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Task failed: Required parameter 'dataset_path' is missing",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: errorDetails, // Structured error info
                            Description: "Validation error details",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}
Resource Errors
// Task fails due to resource unavailability
failedTask := &a2a.Task{
    Id: taskID,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_FAILED,
        Update: &a2a.Message{
            Role: a2a.Role_AGENT,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Cannot access dataset file: /data/sales_2024.csv",
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}
Error Handling Best Practices
- Use structured error messages in A2A format for programmatic handling
- Include actionable error descriptions in text parts for human operators
- Add detailed error data in data parts for debugging and retry logic
- Maintain task history to preserve error context
- Consider partial results using artifacts for partially successful operations
Migration from Legacy EventBus
Message Type Mappings
| Legacy EventBus | A2A Equivalent | Notes | 
|---|
| TaskMessage | a2a.Taskwith initialMessage | Task creation with request message | 
| TaskResult | a2a.Taskwith finalArtifact | Task completion with result artifacts | 
| TaskProgress | a2a.Taskwith statusMessage | Progress updates via status messages | 
| TaskStatusenum | a2a.TaskStateenum | State names updated (e.g., IN_PROGRESSβTASK_STATE_WORKING) | 
API Method Mappings
| Legacy EventBus | A2A Equivalent | Notes | 
|---|
| PublishTask | PublishTaskUpdate | Now publishes A2A task objects | 
| PublishTaskResult | PublishTaskArtifact | Results published as artifacts | 
| PublishTaskProgress | PublishTaskUpdate | Progress via task status updates | 
| SubscribeToTasks | SubscribeToTasks | Now returns A2A task events | 
| SubscribeToTaskResults | SubscribeToTasks(filtered) | Filter by COMPLETED state | 
This reference provides the complete specification for A2A task-related messages and operations in the AgentHub Event-Driven Architecture, enabling robust distributed task coordination with full Agent2Agent protocol compliance.
4 - Explanations
Understanding-oriented discussions that provide context and background
Explanations
This section provides in-depth discussions and explanations to help you understand the concepts, design decisions, and architecture behind AgentHub. These materials are designed to broaden your understanding beyond just how to use the system.
π Explanation Categories
- Architecture - System design and architectural principles
- Core Concepts - Fundamental concepts and mental models
- Features - Deep dives into specific features and capabilities
π― How to Use These Explanations
These documents are designed to:
- Provide context for why things work the way they do
- Explain trade-offs and design decisions
- Offer multiple perspectives on the same concepts
- Help you make informed decisions about using AgentHub
π Reading Path
For Understanding Core Concepts
- Core Concepts - Start with fundamental principles
- Architecture - Understand system design
- Features - Explore specific capabilities
For System Design
- Architecture - System design patterns
- Features - Performance and scaling considerations
For Implementation Details
- Features - Technical implementation deep dives
π‘ Discussion Topics
These explanations discuss:
- Design philosophy and principles
- Architectural decisions and their rationale
- Performance considerations and trade-offs
- Future directions and possibilities
- Alternative approaches and their pros/cons
Note
Explanations focus on understanding rather than implementation. For practical guidance, see the 
tutorials and 
how-to guides.
4.1 - Architecture
Deep dive into AgentHub’s system architecture and design
Architecture Explanations
Understand the fundamental architecture and design principles behind AgentHub’s distributed agent system.
Available Documentation
4.1.1 - A2A-Compliant EDA Broker Architecture
Deep dive into the internal architecture of the AgentHub EDA broker, how it implements Agent2Agent (A2A) protocol-compliant communication patterns while maintaining Event-Driven Architecture benefits.
AgentHub A2A-Compliant EDA Broker Architecture
This document explains the internal architecture of the AgentHub Event-Driven Architecture (EDA) broker, how it implements Agent2Agent (A2A) protocol-compliant communication patterns, and the design decisions behind its hybrid approach.
Architectural Overview
The AgentHub broker serves as a centralized Event-Driven Architecture hub that transports Agent2Agent (A2A) protocol-compliant messages between distributed agents. It combines the scalability benefits of EDA with the interoperability guarantees of the A2A protocol.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β                     AgentHub Broker                             β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β  βββββββββββββββββββ    βββββββββββββββββββ    ββββββββββββββββββββ
β  β   Task Router   β    β   Subscriber    β    β   Progress      ββ
β  β                 β    β   Manager       β    β   Tracker       ββ
β  β β’ Route tasks   β    β                 β    β                 ββ
β  β β’ Apply filters β    β β’ Manage agent  β    β β’ Track task    ββ
β  β β’ Broadcast     β    β   subscriptions β    β   progress      ββ
β  β β’ Load balance  β    β β’ Handle        β    β β’ Update        ββ
β  β                 β    β   disconnects   β    β   requesters    ββ
β  βββββββββββββββββββ    βββββββββββββββββββ    ββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β                        gRPC Interface                           β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β  βββββββββββββββββββ    βββββββββββββββββββ    ββββββββββββββββββββ
β  β PublishTask     β    βSubscribeToTasksβ    βSubscribeToTask  ββ
β  β PublishResult   β    βSubscribeToRes  β    β Progress        ββ
β  β PublishProgress β    β                 β    β                 ββ
β  βββββββββββββββββββ    βββββββββββββββββββ    ββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. Event Bus Server
The main server implementation at broker/main.go:22 provides the central coordination point:
type eventBusServer struct {
    pb.UnimplementedEventBusServer
    // Subscription management
    taskSubscribers         map[string][]chan *pb.TaskMessage
    taskResultSubscribers   map[string][]chan *pb.TaskResult
    taskProgressSubscribers map[string][]chan *pb.TaskProgress
    taskMu                  sync.RWMutex
}
Key characteristics:
- Thread-safe: Uses sync.RWMutexto protect concurrent access to subscriber maps
- Channel-based: Uses Go channels for efficient message passing
- Non-blocking: Implements timeouts to prevent blocking on slow consumers
- Stateless: No persistent storage - all state is in-memory
2. Task Routing Engine
The routing logic determines how tasks are delivered to agents:
Direct Routing
When a task specifies a ResponderAgentId, it’s routed directly to that agent:
if responderID := req.GetTask().GetResponderAgentId(); responderID != "" {
    if subs, ok := s.taskSubscribers[responderID]; ok {
        targetChannels = subs
    }
}
Broadcast Routing
When no specific responder is set, tasks are broadcast to all subscribed agents:
} else {
    // Broadcast to all task subscribers
    for _, subs := range s.taskSubscribers {
        targetChannels = append(targetChannels, subs...)
    }
}
Routing Features
- Immediate delivery: Tasks are routed immediately upon receipt
- Multiple subscribers: Single agent can have multiple subscription channels
- Timeout protection: 5-second timeout prevents blocking on unresponsive agents
- Error isolation: Failed delivery to one agent doesn’t affect others
3. Subscription Management
The broker manages three types of subscriptions:
Task Subscriptions
Agents subscribe to receive tasks assigned to them:
func (s *eventBusServer) SubscribeToTasks(req *pb.SubscribeToTasksRequest, stream pb.EventBus_SubscribeToTasksServer) error
- Agent-specific: Tasks are delivered based on agent ID
- Type filtering: Optional filtering by task types
- Long-lived streams: Connections persist until agent disconnects
- Automatic cleanup: Subscriptions are removed when connections close
Result Subscriptions
Publishers subscribe to receive results of tasks they requested:
func (s *eventBusServer) SubscribeToTaskResults(req *pb.SubscribeToTaskResultsRequest, stream pb.EventBus_SubscribeToTaskResultsServer) error
Progress Subscriptions
Publishers can track progress of long-running tasks:
func (s *eventBusServer) SubscribeToTaskProgress(req *pb.SubscribeToTaskResultsRequest, stream pb.EventBus_SubscribeToTaskProgressServer) error
4. Message Flow Architecture
Task Publication Flow
- Validation: Incoming tasks are validated for required fields
- Routing: Tasks are routed to appropriate subscribers
- Delivery: Messages are sent via Go channels with timeout protection
- Response: Publisher receives acknowledgment of successful publication
Result Flow
- Receipt: Agents publish task results back to the broker
- Broadcasting: Results are broadcast to all result subscribers
- Filtering: Subscribers receive results for their requested tasks
- Delivery: Results are streamed back to requesting agents
Progress Flow
- Updates: Executing agents send periodic progress updates
- Distribution: Progress updates are sent to interested subscribers
- Real-time delivery: Updates are streamed immediately upon receipt
Design Decisions and Trade-offs
In-Memory State Management
Decision: Store all subscription state in memory using Go maps and channels.
Benefits:
- High performance: No database overhead for message routing
- Low latency: Sub-millisecond message routing
- Simplicity: Easier to develop, test, and maintain
- Concurrent efficiency: Go’s garbage collector handles channel cleanup
Trade-offs:
- No persistence: Broker restart loses all subscription state
- Memory usage: Large numbers of agents increase memory requirements
- Single point of failure: No built-in redundancy
When this works well:
- Development and testing environments
- Small to medium-scale deployments
- Scenarios where agents can re-establish subscriptions on broker restart
Asynchronous Message Delivery
Decision: Use Go channels with timeout-based delivery.
Implementation:
go func(ch chan *pb.TaskMessage, task pb.TaskMessage) {
    select {
    case ch <- &task:
        // Message sent successfully
    case <-ctx.Done():
        log.Printf("Context cancelled while sending task %s", task.GetTaskId())
    case <-time.After(5 * time.Second):
        log.Printf("Timeout sending task %s. Dropping message.", task.GetTaskId())
    }
}(subChan, taskToSend)
Benefits:
- Non-blocking: Slow agents don’t block the entire system
- Fault tolerance: Timeouts prevent resource leaks
- Scalability: Concurrent delivery to multiple agents
- Resource protection: Prevents unbounded queue growth
Trade-offs:
- Message loss: Timed-out messages are dropped
- Complexity: Requires careful timeout tuning
- No delivery guarantees: No acknowledgment of successful processing
gRPC Streaming for Subscriptions
Decision: Use bidirectional gRPC streams for agent subscriptions.
Benefits:
- Real-time delivery: Messages are pushed immediately
- Connection awareness: Broker knows when agents disconnect
- Flow control: gRPC handles backpressure automatically
- Type safety: Protocol Buffer messages ensure data consistency
Trade-offs:
- Connection overhead: Each agent maintains persistent connections
- Resource usage: Streams consume memory and file descriptors
- Network sensitivity: Transient network issues can break connections
Concurrent Access Patterns
Decision: Use read-write mutexes with channel-based message passing.
Implementation:
s.taskMu.RLock()
// Read subscriber information
var targetChannels []chan *pb.TaskMessage
for _, subs := range s.taskSubscribers {
    targetChannels = append(targetChannels, subs...)
}
s.taskMu.RUnlock()
// Send messages without holding locks
for _, subChan := range targetChannels {
    go func(ch chan *pb.TaskMessage, task pb.TaskMessage) {
        // Async delivery
    }(subChan, taskToSend)
}
Benefits:
- High concurrency: Multiple readers can access subscriptions simultaneously
- Lock-free delivery: Message sending doesn’t hold locks
- Deadlock prevention: Clear lock ordering and minimal critical sections
- Performance: Read operations are optimized for the common case
Scalability Characteristics
Throughput
- Task routing: ~10,000+ tasks/second on modern hardware
- Concurrent connections: Limited by file descriptor limits (typically ~1,000s)
- Memory usage: ~1KB per active subscription
Latency
- Task routing: <1ms for local network delivery
- End-to-end: <10ms for simple task processing cycles
- Progress updates: Real-time streaming with minimal buffering
Resource Usage
- CPU: Low CPU usage, primarily network I/O bound
- Memory: Linear growth with number of active subscriptions
- Network: Efficient binary Protocol Buffer encoding
Error Handling and Resilience
Connection Failures
- Automatic cleanup: Subscriptions are removed when connections close
- Graceful degradation: Failed agents don’t affect others
- Reconnection support: Agents can re-establish subscriptions
Message Delivery Failures
- Timeout handling: Messages that can’t be delivered are dropped
- Logging: All failures are logged for debugging
- Isolation: Per-agent timeouts prevent cascading failures
Resource Protection
- Channel buffering: Limited buffer sizes prevent memory exhaustion
- Timeout mechanisms: Prevent resource leaks from stuck operations
- Graceful shutdown: Proper cleanup during server shutdown
Monitoring and Observability
Built-in Logging
The broker provides comprehensive logging:
- Task routing decisions
- Subscription lifecycle events
- Error conditions and recovery
- Performance metrics
Integration Points
- Health checks: HTTP endpoints for monitoring
- Metrics export: Prometheus/metrics integration points
- Distributed tracing: Context propagation support
Future Enhancements
Persistence Layer
- Database backend: Store subscription state for broker restarts
- Message queuing: Durable task queues for reliability
- Transaction support: Atomic message delivery guarantees
Clustering Support
- Horizontal scaling: Multiple broker instances
- Load balancing: Distribute agents across brokers
- Consensus protocols: Consistent state across brokers
Advanced Routing
- Capability-based routing: Route tasks based on agent capabilities
- Load-aware routing: Consider agent load in routing decisions
- Geographic routing: Route based on agent location
Security Enhancements
- Authentication: Agent identity verification
- Authorization: Task-level access controls
- Encryption: TLS for all communications
The AgentHub broker architecture provides a solid foundation for Agent2Agent communication while maintaining simplicity and performance. Its design supports the immediate needs of most agent systems while providing clear paths for future enhancement as requirements evolve.
4.1.2 - Hexagonal Architecture & A2A Protocol Implementation
Understanding AgentHub’s hexagonal architecture with A2A protocol, gRPC communication, and event-driven design
Hexagonal Architecture & A2A Protocol Implementation
This document explains how AgentHub implements hexagonal architecture principles with the Agent2Agent (A2A) protocol, gRPC communication, and event-driven design patterns.
Overview
AgentHub follows hexagonal architecture (also known as Ports and Adapters) to achieve:
- Domain isolation: Core A2A protocol logic separated from infrastructure
- Testability: Clean interfaces enable comprehensive testing
- Flexibility: Multiple adapters for different communication protocols
- Maintainability: Clear separation of concerns and dependencies
System Architecture
graph TB
    subgraph "AgentHub Ecosystem"
        subgraph "External Agents"
            A["Agent A<br/>(Chat REPL)"]
            B["Agent B<br/>(Chat Responder)"]
            C["Agent C<br/>(Custom Agent)"]
        end
        subgraph "AgentHub Broker"
            subgraph "Adapters (Infrastructure)"
                GRPC["gRPC Server<br/>Adapter"]
                HEALTH["Health Check<br/>Adapter"]
                METRICS["Metrics<br/>Adapter"]
                TRACING["Tracing Adapter<br/>(OTLP/Jaeger)"]
            end
            subgraph "Ports (Interfaces)"
                SP["AgentHub<br/>Service Port"]
                PP["Message<br/>Publisher Port"]
                EP["Event<br/>Subscriber Port"]
                OP["Observability<br/>Port"]
            end
            subgraph "Domain (Core Logic)"
                A2A["A2A Protocol<br/>Engine"]
                ROUTER["Event Router<br/>& Broker"]
                VALIDATOR["Message<br/>Validator"]
                CONTEXT["Context<br/>Manager"]
                TASK["Task<br/>Lifecycle"]
            end
        end
        subgraph "External Systems"
            OTLP["OTLP Collector<br/>& Jaeger"]
            STORE["Event Store<br/>(Memory)"]
        end
    end
    %% External agent connections
    A -->|"gRPC calls<br/>(PublishMessage,<br/>SubscribeToMessages)"| GRPC
    B -->|"gRPC calls"| GRPC
    C -->|"gRPC calls"| GRPC
    %% Adapter to Port connections
    GRPC -->|"implements"| SP
    HEALTH -->|"implements"| OP
    METRICS -->|"implements"| OP
    TRACING -->|"implements"| OP
    %% Port to Domain connections
    SP -->|"delegates to"| A2A
    PP -->|"delegates to"| ROUTER
    EP -->|"delegates to"| ROUTER
    OP -->|"observes"| A2A
    %% Domain internal connections
    A2A -->|"uses"| VALIDATOR
    A2A -->|"uses"| CONTEXT
    A2A -->|"uses"| TASK
    ROUTER -->|"persists events"| STORE
    TRACING -->|"exports traces"| OTLP
    %% Styling
    classDef agents fill:#add8e6
    classDef adapters fill:#ffa500
    classDef ports fill:#e0ffff
    classDef domain fill:#ffb6c1
    classDef external fill:#dda0dd
    class A,B,C agents
    class GRPC,HEALTH,METRICS,TRACING adapters
    class SP,PP,EP,OP ports
    class A2A,ROUTER,VALIDATOR,CONTEXT,TASK domain
    class OTLP,STORE externalArchitecture Notes:
- Domain Core: Pure A2A protocol logic with message validation, event routing, context correlation, and task state management
- Ports: Clean, technology-agnostic interfaces providing testable contracts and dependency inversion
- Adapters: Infrastructure concerns including gRPC communication, observability exports, and protocol adaptations
A2A Message Flow
sequenceDiagram
    participant REPL as Chat REPL<br/>Agent
    participant gRPC as gRPC<br/>Adapter
    participant A2A as A2A Protocol<br/>Engine
    participant Router as Event<br/>Router
    participant Responder as Chat Responder<br/>Agent
    rect rgb(240, 248, 255)
        Note over REPL, Router: A2A Message Publishing
        REPL->>+gRPC: PublishMessage(A2AMessage)
        gRPC->>+A2A: validateA2AMessage()
        A2A->>A2A: check MessageId, Role, Content
        A2A-->>-gRPC: validation result
        gRPC->>+Router: routeA2AEvent(messageEvent)
        Router->>Router: identify subscribers<br/>by agent_id/broadcast
        Router->>Router: create tracing span<br/>with A2A attributes
        Router-->>Responder: deliver message event
        Router-->>-gRPC: routing success
        gRPC-->>-REPL: PublishResponse(event_id)
    end
    rect rgb(255, 248, 240)
        Note over Responder, Router: A2A Message Processing
        Responder->>+gRPC: SubscribeToMessages(agent_id)
        gRPC->>Router: register subscriber
        Router-->>gRPC: subscription stream
        gRPC-->>-Responder: message stream
        Note over Responder: Process A2A message<br/>with tracing spans
        Responder->>+gRPC: PublishMessage(A2AResponse)
        gRPC->>A2A: validateA2AMessage()
        A2A->>A2A: check AGENT role,<br/>ContextId correlation
        gRPC->>Router: routeA2AEvent(responseEvent)
        Router-->>REPL: deliver response event
        gRPC-->>-Responder: PublishResponse
    end
    Note over REPL, Responder: A2A Protocol ensures:<br/>β’ Message structure compliance<br/>β’ Role semantics (USER/AGENT)<br/>β’ Context correlation<br/>β’ Event-driven routingCore Components
1. A2A Protocol Engine (Domain Core)
The heart of the system implementing A2A protocol specifications:
// Core domain logic - technology agnostic
type A2AProtocolEngine struct {
    messageValidator MessageValidator
    contextManager   ContextManager
    taskLifecycle    TaskLifecycle
}
// A2A message validation
func (e *A2AProtocolEngine) ValidateMessage(msg *Message) error {
    // A2A compliance checks
    if msg.MessageId == "" { return ErrMissingMessageId }
    if msg.Role == ROLE_UNSPECIFIED { return ErrInvalidRole }
    if len(msg.Content) == 0 { return ErrEmptyContent }
    return nil
}
2. Event Router (Domain Core)
Manages event-driven communication between agents:
type EventRouter struct {
    messageSubscribers map[string][]chan *AgentEvent
    taskSubscribers    map[string][]chan *AgentEvent
    eventSubscribers   map[string][]chan *AgentEvent
}
func (r *EventRouter) RouteEvent(event *AgentEvent) error {
    // Route based on A2A metadata
    routing := event.GetRouting()
    subscribers := r.getSubscribers(routing.ToAgentId, event.PayloadType)
    // Deliver with tracing
    for _, sub := range subscribers {
        go r.deliverWithTracing(sub, event)
    }
}
3. gRPC Adapter (Infrastructure)
Translates between gRPC and domain logic:
type GrpcAdapter struct {
    a2aEngine    A2AProtocolEngine
    eventRouter  EventRouter
    tracer       TracingAdapter
}
func (a *GrpcAdapter) PublishMessage(ctx context.Context, req *PublishMessageRequest) (*PublishResponse, error) {
    // Start tracing span
    ctx, span := a.tracer.StartA2AMessageSpan(ctx, "publish_message", req.Message.MessageId, req.Message.Role)
    defer span.End()
    // Validate using domain logic
    if err := a.a2aEngine.ValidateMessage(req.Message); err != nil {
        a.tracer.RecordError(span, err)
        return nil, err
    }
    // Route using domain logic
    event := a.createA2AEvent(req)
    if err := a.eventRouter.RouteEvent(event); err != nil {
        return nil, err
    }
    return &PublishResponse{Success: true, EventId: event.EventId}, nil
}
Hexagonal Architecture Benefits
1. Domain Isolation
- A2A protocol logic is pure, testable business logic
- No infrastructure dependencies in the core domain
- Technology-agnostic implementation
2. Adapter Pattern
- gRPC Adapter: Handles Protocol Buffer serialization/deserialization
- Tracing Adapter: OTLP/Jaeger integration without domain coupling
- Health Adapter: Service health monitoring
- Metrics Adapter: Prometheus metrics collection
3. Port Interfaces
// Clean, testable interfaces
type MessagePublisher interface {
    PublishMessage(ctx context.Context, msg *Message) (*PublishResponse, error)
}
type EventSubscriber interface {
    SubscribeToMessages(ctx context.Context, agentId string) (MessageStream, error)
}
type ObservabilityPort interface {
    StartSpan(ctx context.Context, operation string) (context.Context, Span)
    RecordMetric(name string, value float64, labels map[string]string)
}
4. Dependency Inversion
- Domain depends on abstractions (ports), not concrete implementations
- Adapters depend on domain through well-defined interfaces
- Easy testing with mock implementations
A2A Protocol Integration
Message Structure Compliance
classDiagram
    class A2AMessage {
        +string MessageId
        +string ContextId
        +Role Role
        +Part Content
        +Metadata Metadata
        +string TaskId
    }
    class Part {
        +string Text
        +bytes Data
        +FileData File
    }
    class EventMetadata {
        +string FromAgentId
        +string ToAgentId
        +string EventType
        +Priority Priority
    }
    class Role {
        <<enumeration>>
        USER
        AGENT
    }
    class Metadata {
        +Fields map
    }
    A2AMessage "1" --> "0..*" Part : contains
    A2AMessage "1" --> "1" EventMetadata : routed_with
    A2AMessage "1" --> "1" Role : has
    A2AMessage "1" --> "0..1" Metadata : includesEvent-Driven Architecture
The system implements pure event-driven architecture:
- Publishers emit A2A-compliant events
- Broker routes events based on metadata
- Subscribers receive relevant events
- Correlation through ContextId maintains conversation flow
Observability Integration
Distributed Tracing
sequenceDiagram
    participant A as Agent A
    participant B as Broker
    participant AB as Agent B
    participant OTLP as OTLP Collector
    participant J as Jaeger
    A->>+B: PublishMessage<br/>[trace_id: 123]
    B->>B: Create A2A spans<br/>with structured attributes
    B->>+AB: RouteEvent<br/>[trace_id: 123]
    AB->>AB: Process with<br/>child spans
    AB->>-B: PublishResponse<br/>[trace_id: 123]
    B->>-A: Success<br/>[trace_id: 123]
    par Observability Export
        B->>OTLP: Export spans<br/>with A2A attributes
        OTLP->>J: Store traces
        J->>J: Build trace timeline<br/>with correlation
    end
    Note over A, J: End-to-end tracing<br/>with A2A protocol visibilityStructured Attributes
Each span includes A2A-specific attributes:
- a2a.message.id
- a2a.message.role
- a2a.context.id
- a2a.event.type
- a2a.routing.from_agent
- a2a.routing.to_agent
Testing Strategy
Unit Testing (Domain Core)
func TestA2AEngine_ValidateMessage(t *testing.T) {
    engine := NewA2AProtocolEngine()
    // Test A2A compliance
    msg := &Message{
        MessageId: "test_msg_123",
        Role: ROLE_USER,
        Content: []*Part{{Text: "hello"}},
    }
    err := engine.ValidateMessage(msg)
    assert.NoError(t, err)
}
Integration Testing (Adapters)
func TestGrpcAdapter_PublishMessage(t *testing.T) {
    // Mock domain dependencies
    mockEngine := &MockA2AEngine{}
    mockRouter := &MockEventRouter{}
    adapter := NewGrpcAdapter(mockEngine, mockRouter)
    // Test adapter behavior
    resp, err := adapter.PublishMessage(ctx, validRequest)
    assert.NoError(t, err)
    assert.True(t, resp.Success)
}
Conclusion
AgentHub’s hexagonal architecture with A2A protocol provides:
- Clean Architecture: Separation of concerns with domain-driven design
- A2A Compliance: Full protocol implementation with validation
- Event-Driven Design: Scalable, loosely-coupled communication
- Rich Observability: Comprehensive tracing and metrics
- Testability: Clean interfaces enable thorough testing
- Flexibility: Easy to extend with new adapters and protocols
This architecture ensures maintainable, scalable, and observable agent communication while maintaining strict A2A protocol compliance.
4.2 - Core Concepts
Fundamental concepts and principles of AgentHub
Core Concepts
Explore the fundamental concepts, principles, and mental models that underpin AgentHub’s agent-to-agent communication system.
Available Documentation
4.2.1 - The Agent2Agent Principle
Deep dive into the philosophy and design principles behind Agent2Agent communication and how AgentHub implements this pattern
The Agent2Agent Protocol and AgentHub Implementation
This document explores the core principles of Google’s Agent2Agent protocol and how AgentHub implements a communication broker based on these concepts. We distinguish between the Agent2Agent protocol specification (task structures and communication patterns) and our custom AgentHub broker implementation.
Agent2Agent vs AgentHub: What’s What
Agent2Agent Protocol (Google)
The Agent2Agent protocol defines:
- A2A Message Structures: Message,Task,Artifactwith structured content parts
- Task State Management: TaskStateenums (SUBMITTED, WORKING, COMPLETED, FAILED, CANCELLED)
- Communication Patterns: Asynchronous task delegation with context-aware message handling
AgentHub Implementation (This Project)
AgentHub provides:
- Hybrid EDA+A2A Broker: Centralized gRPC service implementing A2A protocol within Event-Driven Architecture
- A2A-Compliant Pub/Sub: Publisher-subscriber pattern using native A2A message structures
- A2A Subscription Mechanisms: SubscribeToTasks,SubscribeToMessages,SubscribeToAgentEventsmethods
- A2A Agent Implementations: Sample agents using A2ATaskPublisherandA2ATaskSubscriberabstractions
Philosophy and Core Concepts
Beyond Simple Request-Response
Traditional software architectures rely heavily on synchronous request-response patterns where a client requests a service and waits for an immediate response. While effective for simple operations, this pattern has limitations when dealing with:
- Complex, multi-step processes that require coordination between multiple specialized services
- Long-running operations that may take minutes or hours to complete
- Dynamic workload distribution where the best processor for a task may vary over time
- Autonomous decision-making where agents need to collaborate without central coordination
The Agent2Agent protocol addresses these limitations by defining task structures and communication patterns for autonomous agents. AgentHub implements a broker-based system that enables agents to communicate using Agent2Agent-inspired task structures:
- Delegating work to other agents based on their capabilities
- Accepting and processing tasks according to their specializations
- Reporting progress during long-running operations
- Making collaborative decisions about task distribution and execution
Autonomous Collaboration
In an Agent2Agent system, each agent operates with a degree of autonomy, making decisions about:
- Which tasks to accept based on current capacity and capabilities
- How to prioritize work when multiple tasks are pending
- When to delegate subtasks to other specialized agents
- How to report progress and handle failures
This autonomy enables the system to be more resilient, scalable, and adaptive compared to centrally-controlled architectures.
Key Design Principles
1. Asynchronous Communication
Agent2Agent communication is fundamentally asynchronous. When Agent A requests work from Agent B:
- Agent A doesn’t block waiting for completion
- Agent B can process the task when resources are available
- Progress updates provide visibility into long-running operations
- Results are delivered when the work is complete
This asynchronicity enables:
- Better resource utilization as agents aren’t blocked waiting
- Improved scalability as systems can handle more concurrent operations
- Enhanced resilience as temporary agent unavailability doesn’t block the entire system
2. Rich A2A Task Semantics
The Agent2Agent protocol defines rich task structures with flexible message content that AgentHub implements:
message Task {
  string id = 1;                         // Unique task identifier
  string context_id = 2;                 // Conversation/workflow context
  TaskStatus status = 3;                 // Current status with latest message
  repeated Message history = 4;          // Complete message history
  repeated Artifact artifacts = 5;       // Task output artifacts
  google.protobuf.Struct metadata = 6;   // Additional context
}
message Message {
  string message_id = 1;                 // Unique message identifier
  string context_id = 2;                 // Conversation context
  string task_id = 3;                    // Associated task
  Role role = 4;                         // USER or AGENT
  repeated Part content = 5;             // Structured content parts
  google.protobuf.Struct metadata = 6;   // Message metadata
}
message TaskStatus {
  TaskState state = 1;                   // SUBMITTED, WORKING, COMPLETED, etc.
  Message update = 2;                    // Latest status message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}
This rich A2A structure enables:
- Context-aware routing based on conversation context and message content
- Flexible content handling through structured Part types (text, data, files)
- Workflow coordination via shared context IDs across related tasks
- Complete communication history for debugging and audit trails
- Structured artifact delivery for rich result types
3. A2A Status Updates and Progress Tracking
Long-running tasks benefit from A2A status updates through the message history:
// Progress updates are A2A messages within the task
message TaskStatus {
  TaskState state = 1;                   // Current execution state
  Message update = 2;                    // Latest status message from agent
  google.protobuf.Timestamp timestamp = 3; // When this status was set
}
// Progress information is conveyed through message content
message Message {
  // ... other fields
  repeated Part content = 5;             // Can include progress details
}
// Example progress message content
Part progressPart = {
  part: {
    data: {
      data: {
        "progress_percentage": 65,
        "phase": "data_analysis",
        "estimated_remaining": "2m30s"
      },
      description: "Processing progress update"
    }
  }
}
This A2A approach enables:
- Rich progress communication through structured message content
- Complete audit trails via message history preservation
- Context-aware status updates linking progress to specific workflows
- Flexible progress formats supporting text, data, and file-based updates
- Multi-agent coordination through shared context and message threading
4. A2A EDA Routing Flexibility
AgentHub’s A2A implementation supports multiple routing patterns through EDA metadata:
message AgentEventMetadata {
  string from_agent_id = 1;              // Source agent
  string to_agent_id = 2;                // Target agent (empty = broadcast)
  string event_type = 3;                 // Event classification
  repeated string subscriptions = 4;      // Topic-based routing
  Priority priority = 5;                 // Delivery priority
}
- Direct A2A addressing: Tasks sent to specific agents via to_agent_id
- Broadcast A2A addressing: Tasks sent to all subscribed agents (empty to_agent_id)
- Topic-based A2A routing: Tasks routed via subscription filters and event types
- Context-aware routing: Tasks routed based on A2A context and conversation state
This hybrid EDA+A2A approach enables sophisticated routing patterns while maintaining A2A protocol compliance.
Architectural Patterns
Microservices Enhancement
In a microservices architecture, Agent2Agent can enhance service communication by:
- Replacing synchronous HTTP calls with asynchronous task delegation
- Adding progress visibility to long-running service operations
- Enabling service composition through task chaining
- Improving resilience through task retry and timeout mechanisms
Event-Driven Architecture with A2A Protocol
AgentHub integrates A2A protocol within Event-Driven Architecture by:
- Wrapping A2A messages in EDA event envelopes for routing and delivery
- Preserving A2A semantics while leveraging EDA scalability and reliability
- Enabling A2A conversation contexts within event-driven message flows
- Supporting A2A task coordination alongside traditional event broadcasting
- Providing A2A-compliant APIs that internally use EDA for transport
// A2A message wrapped in EDA event
type AgentEvent struct {
    EventId   string
    Timestamp timestamppb.Timestamp
    // A2A-compliant payload
    Payload oneof {
        a2a.Message message = 10
        a2a.Task task = 11
        TaskStatusUpdateEvent status_update = 12
        TaskArtifactUpdateEvent artifact_update = 13
    }
    // EDA routing metadata
    Routing AgentEventMetadata
}
Workflow Orchestration
Complex business processes can be modeled as Agent2Agent workflows:
- Process Initiation: A workflow agent receives a high-level business request
- Task Decomposition: The request is broken down into specific tasks
- Agent Coordination: Tasks are distributed to specialized agents
- Progress Aggregation: Individual task progress is combined into overall workflow status
- Result Assembly: Task results are combined into a final business outcome
Benefits and Trade-offs
Benefits
Scalability: Asynchronous operation and agent autonomy enable horizontal scaling without central bottlenecks.
Resilience: Agent failures don’t cascade as easily since tasks can be retried or redistributed.
Flexibility: New agent types can be added without modifying existing agents.
Observability: Rich task semantics and progress reporting provide excellent visibility into system operations.
Modularity: Agents can be developed, deployed, and scaled independently.
Trade-offs
Complexity: The system requires more sophisticated error handling and state management compared to simple request-response patterns.
Latency: For simple operations, the overhead of task creation and routing may add latency compared to direct calls.
Debugging: Distributed, asynchronous operations can be more challenging to debug than synchronous call chains.
Consistency: Managing data consistency across asynchronous agent operations requires careful design.
When to Use Agent2Agent
Agent2Agent is particularly well-suited for:
Complex Processing Pipelines
When work involves multiple steps that can be performed by different specialized agents:
- Data ingestion β validation β transformation β analysis β reporting
- Image upload β virus scan β thumbnail generation β metadata extraction
- Order processing β inventory check β payment processing β fulfillment
Long-Running Operations
When operations take significant time and users need progress feedback:
- Large file processing
- Machine learning model training
- Complex data analysis
- Batch job processing
Dynamic Load Distribution
When workload characteristics vary and different agents may be better suited for different tasks:
- Multi-tenant systems with varying customer requirements
- Resource-intensive operations that need specialized hardware
- Geographic distribution where local processing is preferred
System Integration
When connecting heterogeneous systems that need to coordinate:
- Third-party service coordination
- Cross-platform workflows
A2A Protocol Comparison with Other Patterns
vs. Message Queues
Traditional message queues provide asynchronous communication but lack:
- A2A structured message parts (text, data, files)
- A2A conversation context and task threading
- A2A bidirectional artifact delivery
- A2A complete message history preservation
- A2A flexible content types and metadata
vs. RPC/HTTP APIs
RPC and HTTP APIs provide structured communication but are typically:
- Synchronous (blocking) vs A2A asynchronous task delegation
- Lacking A2A-style progress tracking through message history
- Point-to-point rather than A2A context-aware routing
- Without A2A structured content parts and artifact handling
- Missing A2A conversation threading and workflow coordination
vs. Event Sourcing
Event sourcing provides audit trails and state reconstruction but:
- Focuses on state changes rather than A2A work coordination
- Lacks A2A structured task status and message threading
- Doesn’t provide A2A artifact-based result delivery
- Requires more complex patterns vs A2A’s built-in conversation context
- Missing A2A’s multi-modal content handling (text, data, files)
A2A Protocol Future Evolution
The A2A protocol and AgentHub implementation opens possibilities for:
Intelligent A2A Agent Networks
Agents that learn from A2A conversation contexts and message patterns to make better delegation decisions based on historical performance and capability matching.
Self-Organizing A2A Systems
Agent networks that automatically reconfigure based on A2A workflow patterns, context relationships, and agent availability, using A2A metadata for intelligent routing decisions.
Cross-Organization A2A Collaboration
Extending A2A protocols across organizational boundaries for B2B workflow automation, leveraging A2A’s structured content parts and artifact handling for secure inter-org communication.
AI Agent A2A Integration
Natural integration points for AI agents that can:
- Parse A2A message content parts for semantic understanding
- Generate appropriate A2A responses with structured artifacts
- Maintain A2A conversation context across complex multi-turn interactions
- Make autonomous decisions about A2A task acceptance based on content analysis
Enhanced A2A Features
- A2A Protocol Extensions: Custom Part types for domain-specific content
- Advanced A2A Routing: ML-based routing decisions using conversation context
- A2A Federation: Cross-cluster A2A communication with context preservation
- A2A Analytics: Deep insights from conversation patterns and artifact flows
The A2A protocol represents a foundational shift toward more intelligent, context-aware, and collaborative software systems that can handle complex distributed workflows while maintaining strong semantics, complete audit trails, and rich inter-agent communication patterns.
4.2.2 - Agent2Agent (A2A) Protocol Migration
Understanding the migration to Agent2Agent protocol compliance while maintaining Event-Driven Architecture benefits.
Agent2Agent (A2A) Protocol Migration
This document explains the migration of AgentHub to full Agent2Agent (A2A) protocol compliance while maintaining the essential Event-Driven Architecture (EDA) patterns that make the system scalable and resilient.
What is the Agent2Agent Protocol?
The Agent2Agent (A2A) protocol is a standardized specification for communication between AI agents. It defines:
- Standardized Message Formats: Using Message,Part,Task, andArtifactstructures
- Task Lifecycle Management: Clear states (SUBMITTED, WORKING, COMPLETED, FAILED, CANCELLED)
- Agent Discovery: Using AgentCardfor capability advertisement
- Interoperability: Ensuring agents can communicate across different platforms
Why Migrate to A2A?
Benefits of A2A Compliance
- Interoperability: AgentHub can now communicate with any A2A-compliant agent or system
- Standardization: Clear, well-defined message formats reduce integration complexity
- Ecosystem Compatibility: Join the growing ecosystem of A2A-compatible tools
- Future-Proofing: Built on industry standards rather than custom protocols
Maintained EDA Benefits
- Scalability: Event-driven routing scales to thousands of agents
- Resilience: Asynchronous communication handles network partitions gracefully
- Flexibility: Topic-based routing and priority queues enable sophisticated workflows
- Observability: Built-in tracing and metrics for production deployments
Hybrid Architecture
AgentHub implements a hybrid approach that combines the best of both worlds:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β                   A2A Protocol Layer                           β
β  βββββββββββββββ  βββββββββββββββ  βββββββββββββββ  ββββββββββββ
β  β A2A Message β  β  A2A Task   β  β A2A Artifactβ  βA2A Agentββ
β  β  (standard) β  β (standard)  β  β (standard)  β  β  Card   ββ
β  βββββββββββββββ  βββββββββββββββ  βββββββββββββββ  ββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β                    EDA Transport Layer                         β
β  βββββββββββββββ  βββββββββββββββ  βββββββββββββββ  ββββββββββββ
β  β AgentEvent  β  βEvent Router β  β Subscribers β  βPriority ββ
β  β  Wrapper    β  β             β  β  Manager    β  β Queues  ββ
β  βββββββββββββββ  βββββββββββββββ  βββββββββββββββ  ββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β                      gRPC Infrastructure                       β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
How It Works
- A2A Messages are created using standard A2A structures (Message,Task, etc.)
- EDA Wrapper wraps A2A messages in AgentEventfor transport
- Event Routing uses EDA patterns (pub/sub, priority, topics) for delivery
- A2A Compliance ensures messages follow A2A protocol semantics
API Changes
Before (Legacy API)
// Legacy TaskMessage (deprecated)
taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
    TaskType: "greeting",
    Parameters: map[string]interface{}{
        "name": "Claude",
    },
    RequesterAgentID: "my_agent",
    ResponderAgentID: "target_agent",
})
After (A2A-Compliant API)
// A2A-compliant task publishing
content := []*pb.Part{
    {
        Part: &pb.Part_Text{
            Text: "Hello! Please provide a greeting for Claude.",
        },
    },
}
task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
    TaskType:         "greeting",
    Content:          content,
    RequesterAgentID: "my_agent",
    ResponderAgentID: "target_agent",
    Priority:         pb.Priority_PRIORITY_MEDIUM,
    ContextID:        "conversation_123",
})
Message Structure Changes
message Message {
  string message_id = 1;       // Unique message identifier
  string context_id = 2;       // Conversation context
  string task_id = 3;          // Associated task (optional)
  Role role = 4;               // USER or AGENT
  repeated Part content = 5;   // Message content parts
  google.protobuf.Struct metadata = 6; // Additional metadata
}
message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}
message Task {
  string id = 1;                    // Task identifier
  string context_id = 2;            // Conversation context
  TaskStatus status = 3;            // Current status
  repeated Message history = 4;     // Message history
  repeated Artifact artifacts = 5;  // Task outputs
  google.protobuf.Struct metadata = 6; // Task metadata
}
enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}
Migration Guide
For Publishers
- Replace TaskPublisherwithA2ATaskPublisher
- Use A2APublishTaskRequestwith A2APartstructures
- Handle returned A2A Taskobjects
For Subscribers
- Replace TaskSubscriberwithA2ATaskSubscriber
- Update handlers to process A2A TaskandMessageobjects
- Return A2A Artifactobjects instead of custom results
For Custom Integrations
- Update protobuf imports to use events/a2apackage
- Replace custom message structures with A2A equivalents
- Use AgentHubservice instead ofEventBus
Backward Compatibility
The migration maintains wire-level compatibility through:
- Deprecated Types: Legacy message types marked as deprecated but still supported
- Automatic Conversion: EDA broker converts between legacy and A2A formats when needed
- Graceful Migration: Existing agents can migrate incrementally
Testing A2A Compliance
Run the demo to verify A2A compliance:
# Terminal 1: Start A2A broker
make run-server
# Terminal 2: Start A2A subscriber
make run-subscriber
# Terminal 3: Start A2A publisher
make run-publisher
Expected output shows successful A2A task processing:
- Publisher: “Published A2A task”
- Subscriber: “Task processing completed”
- Artifacts generated in A2A format
Best Practices
- Use A2A Types: Always use A2A message structures for new code
- Context Management: Use context_idto group related messages
- Proper Parts: Structure content using appropriate Parttypes
- Artifact Returns: Return structured Artifactobjects from tasks
- Status Updates: Properly manage task lifecycle states
The A2A migration ensures AgentHub remains both standards-compliant and highly scalable through its hybrid EDA+A2A architecture.
4.2.3 - Understanding Tasks in Agent2Agent Communication
Tasks are the fundamental unit of work exchange in the Agent2Agent protocol. Deep dive into task semantics, lifecycle, and design patterns.
Understanding Tasks in Agent2Agent Communication
Tasks are the fundamental unit of work exchange in the Agent2Agent protocol. This document provides a deep dive into task semantics, lifecycle, and design patterns.
Task Anatomy
Core Components
Every task in the Agent2Agent system consists of several key components that define its identity, purpose, and execution context:
A2A Task Identity
string id = 1;                         // Unique task identifier
string context_id = 2;                 // Optional conversation context
The id serves as a unique identifier that allows all participants to track the task throughout its lifecycle. It should be globally unique and meaningful for debugging purposes.
The context_id groups related tasks in a conversation or workflow context, enabling sophisticated multi-task coordination patterns.
Task classification in A2A is handled through the initial Message content rather than a separate task_type field, providing more flexibility for complex task descriptions.
A2A Task Status and History
TaskStatus status = 3;                 // Current task status
repeated Message history = 4;          // Message history for this task
repeated Artifact artifacts = 5;       // Task output artifacts
google.protobuf.Struct metadata = 6;   // Task metadata
In A2A, task data is contained within Message content using the structured Part format:
// A2A task request message
message Message {
  string message_id = 1;
  string context_id = 2;
  string task_id = 3;
  Role role = 4;                    // USER (requester) or AGENT (responder)
  repeated Part content = 5;        // Structured task content
}
message Part {
  oneof part {
    string text = 1;               // Text description
    DataPart data = 2;             // Structured data
    FilePart file = 3;             // File references
  }
}
// Example: A2A data analysis task
taskMessage := &a2a.Message{
    MessageId: "msg_" + uuid.New().String(),
    ContextId: "analysis_workflow_123",
    TaskId:    "task_analysis_456",
    Role:      a2a.Role_USER,
    Content: []*a2a.Part{
        {
            Part: &a2a.Part_Text{
                Text: "Please perform trend analysis on Q4 sales data",
            },
        },
        {
            Part: &a2a.Part_Data{
                Data: &a2a.DataPart{
                    Data: analysisParams, // Structured parameters
                    Description: "Analysis configuration",
                },
            },
        },
    },
}
Metadata in A2A tasks provides additional context for execution, auditing, or debugging:
// A2A task metadata
taskMetadata, _ := structpb.NewStruct(map[string]interface{}{
    "workflow_id":     "workflow_abc123",
    "user_id":         "user_456",
    "request_source":  "web_ui",
    "correlation_id":  "trace_789",
    "priority":        "high",
    "expected_duration": "5m",
})
task := &a2a.Task{
    Id:        "task_analysis_456",
    ContextId: "analysis_workflow_123",
    Metadata:  taskMetadata,
}
A2A Agent Coordination
In A2A, agent coordination is handled through the EDA routing metadata:
message AgentEventMetadata {
  string from_agent_id = 1;           // Source agent identifier
  string to_agent_id = 2;             // Target agent ID (empty = broadcast)
  string event_type = 3;              // Event classification
  repeated string subscriptions = 4;   // Topic-based routing tags
  Priority priority = 5;              // Delivery priority
}
This enables flexible routing patterns:
- from_agent_id identifies the requesting agent
- to_agent_id can specify a target agent or be empty for broadcast
- subscriptions enable topic-based routing for specialized agents
- priority ensures urgent tasks get precedence
A2A Execution Context
A2A handles execution context through the TaskStatus structure:
message TaskStatus {
  TaskState state = 1;                   // SUBMITTED, WORKING, COMPLETED, FAILED, CANCELLED
  Message update = 2;                    // Latest status message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}
enum TaskState {
  TASK_STATE_SUBMITTED = 0;
  TASK_STATE_WORKING = 1;
  TASK_STATE_COMPLETED = 2;
  TASK_STATE_FAILED = 3;
  TASK_STATE_CANCELLED = 4;
}
This context helps agents make intelligent scheduling decisions:
- deadline enables time-sensitive prioritization
- priority provides explicit urgency ranking
- created_at enables age-based scheduling policies
Task Lifecycle
1. A2A Task Creation and Publishing
A2A tasks begin their lifecycle when a requesting agent creates a task with an initial message:
// Create A2A task with initial request message
task := &a2a.Task{
    Id:        "task_analysis_" + uuid.New().String(),
    ContextId: "workflow_orchestration_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            TaskId:    "task_analysis_" + uuid.New().String(),
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please analyze the quarterly sales data for trends",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: analysisParams,
                            Description: "Analysis configuration",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}
// Publish to AgentHub broker
client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "data_orchestrator",
        ToAgentId:   "data_processor_01", // Optional: specific agent
        EventType:   "task.submitted",
        Priority:    pb.Priority_PRIORITY_HIGH,
    },
})
2. A2A Task Discovery and Acceptance
Agents subscribe to A2A task events and evaluate whether to accept them:
// Agent receives A2A task event
func (a *Agent) evaluateA2ATask(event *pb.AgentEvent) bool {
    task := event.GetTask()
    if task == nil || task.Status.State != a2a.TaskState_TASK_STATE_SUBMITTED {
        return false
    }
    // Analyze task content to understand requirements
    requestMessage := task.Status.Update
    taskDescription := a.extractTaskDescription(requestMessage)
    // Check if agent can handle this task type
    if !a.canHandleTaskType(taskDescription) {
        return false
    }
    // Check capacity constraints
    if a.getCurrentLoad() > a.maxCapacity {
        return false
    }
    // Estimate duration from task content and metadata
    estimatedDuration := a.estimateA2ATaskDuration(task)
    if estimatedDuration > a.maxTaskDuration {
        return false
    }
    return true
}
func (a *Agent) extractTaskDescription(msg *a2a.Message) string {
    for _, part := range msg.Content {
        if textPart := part.GetText(); textPart != "" {
            return textPart
        }
    }
    return ""
}
3. A2A Task Execution with Progress Reporting
Accepted A2A tasks enter the execution phase with regular status updates:
func (a *Agent) executeA2ATask(task *a2a.Task) {
    // Update task to WORKING state
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Task started")
    // Phase 1: Preparation
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Preparing data analysis")
    prepareResult := a.prepareA2AExecution(task)
    // Phase 2: Main processing
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Processing data - 50% complete")
    processResult := a.processA2AData(prepareResult)
    // Phase 3: Finalization
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Finalizing results - 75% complete")
    finalResult := a.finalizeA2AResults(processResult)
    // Completion with artifacts
    a.completeTaskWithArtifacts(task, finalResult)
}
func (a *Agent) updateTaskStatus(task *a2a.Task, state a2a.TaskState, message string) {
    statusUpdate := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: message,
                },
            },
        },
    }
    task.Status = &a2a.TaskStatus{
        State:     state,
        Update:    statusUpdate,
        Timestamp: timestamppb.Now(),
    }
    // Publish task update
    a.client.PublishTaskUpdate(context.Background(), &pb.PublishTaskUpdateRequest{
        Task: task,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.status_update",
        },
    })
}
4. A2A Result Delivery
A2A task completion delivers results through structured artifacts:
func (a *Agent) completeTaskWithArtifacts(task *a2a.Task, resultData interface{}) {
    // Create completion message
    completionMessage := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Analysis completed successfully",
                },
            },
        },
    }
    // Create result artifact
    resultArtifact := &a2a.Artifact{
        ArtifactId:  "artifact_" + uuid.New().String(),
        Name:        "Analysis Results",
        Description: "Quarterly sales trend analysis",
        Parts: []*a2a.Part{
            {
                Part: &a2a.Part_Data{
                    Data: &a2a.DataPart{
                        Data:        resultData.(structpb.Struct),
                        Description: "Analysis results and metrics",
                    },
                },
            },
        },
    }
    // Update task to completed
    task.Status = &a2a.TaskStatus{
        State:     a2a.TaskState_TASK_STATE_COMPLETED,
        Update:    completionMessage,
        Timestamp: timestamppb.Now(),
    }
    task.Artifacts = append(task.Artifacts, resultArtifact)
    // Publish final task update
    a.client.PublishTaskUpdate(context.Background(), &pb.PublishTaskUpdateRequest{
        Task: task,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.completed",
        },
    })
    // Publish artifact separately
    a.client.PublishTaskArtifact(context.Background(), &pb.PublishTaskArtifactRequest{
        TaskId:   task.Id,
        Artifact: resultArtifact,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.artifact",
        },
    })
}
A2A Task Design Patterns
1. Simple A2A Request-Response
The most basic pattern where one agent requests work from another using A2A messages:
Agent A ββ[A2A Task]ββ> AgentHub ββ[TaskEvent]ββ> Agent B
Agent A <β[Artifact]βββ AgentHub <β[TaskUpdate]ββ Agent B
A2A Implementation:
// Agent A creates task
task := &a2a.Task{
    Id: "simple_task_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Convert CSV to JSON"}}},
        },
    },
}
// Agent B responds with artifact
artifact := &a2a.Artifact{
    Name: "Converted Data",
    Parts: []*a2a.Part{{Part: &a2a.Part_File{File: &a2a.FilePart{FileId: "converted.json"}}}},
}
Use cases:
- File format conversion
- Simple calculations
- Data validation
- Content generation
2. A2A Broadcast Processing
One agent broadcasts a task to multiple potential processors using A2A context-aware routing:
Agent A ββ[A2A Task]ββ> AgentHub ββ[TaskEvent]ββ> Agent Bβ
                                ββ[TaskEvent]ββ> Agent Bβ
                                ββ[TaskEvent]ββ> Agent Bβ
A2A Implementation:
// Broadcast task with shared context
task := &a2a.Task{
    Id:        "broadcast_task_456",
    ContextId: "parallel_processing_context",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{
                {Part: &a2a.Part_Text{Text: "Process data chunk"}},
                {Part: &a2a.Part_Data{Data: &a2a.DataPart{Data: chunkData}}},
            },
        },
    },
}
// Publish without specific target (broadcast)
client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "orchestrator",
        // No ToAgentId = broadcast
        EventType: "task.broadcast",
    },
})
Use cases:
- Distributed computation
- Load testing
- Content distribution
- Parallel processing
3. A2A Pipeline Processing
Tasks flow through a series of specialized agents using shared A2A context:
Agent A ββ[A2A Taskβ]ββ> Agent B ββ[A2A Taskβ]ββ> Agent C ββ[A2A Taskβ]ββ> Agent D
       <ββ[Final Artifact]ββββββββββββββββββββββββββββββββββββββββββββββββββββ
A2A Implementation:
// Shared context for pipeline
pipelineContext := "data_pipeline_" + uuid.New().String()
// Stage 1: Data extraction
task1 := &a2a.Task{
    Id:        "extract_" + uuid.New().String(),
    ContextId: pipelineContext,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Extract data from source"}}},
        },
    },
}
// Stage 2: Data transformation (triggered by Stage 1 completion)
task2 := &a2a.Task{
    Id:        "transform_" + uuid.New().String(),
    ContextId: pipelineContext, // Same context
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Transform extracted data"}}},
        },
    },
}
// Context linking enables pipeline coordination
Use cases:
- Data processing pipelines
- Image processing workflows
- Document processing chains
- ETL operations
4. A2A Hierarchical Decomposition
Complex tasks are broken down into subtasks using A2A context hierarchy:
Agent A ββ[A2A ComplexTask]ββ> Coordinator
                                  βββ[A2A SubTaskβ]ββ> Specialistβ
                                  βββ[A2A SubTaskβ]ββ> Specialistβ
                                  βββ[A2A SubTaskβ]ββ> Specialistβ
A2A Implementation:
// Parent task
parentTask := &a2a.Task{
    Id:        "complex_analysis_789",
    ContextId: "business_workflow_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Perform comprehensive business analysis"}}},
        },
    },
}
// Coordinator creates subtasks with hierarchical context
subtask1 := &a2a.Task{
    Id:        "financial_analysis_790",
    ContextId: "business_workflow_123", // Same parent context
    Metadata: map[string]interface{}{
        "parent_task_id": "complex_analysis_789",
        "subtask_type":   "financial",
    },
}
subtask2 := &a2a.Task{
    Id:        "market_analysis_791",
    ContextId: "business_workflow_123", // Same parent context
    Metadata: map[string]interface{}{
        "parent_task_id": "complex_analysis_789",
        "subtask_type":   "market",
    },
}
// Context enables coordination and result aggregation
Use cases:
- Complex business workflows
- Multi-step analysis
- Orchestrated services
- Batch job coordination
5. Competitive Processing
Multiple agents compete to handle the same task (first-come-first-served):
Agent A ββ[Task]ββ> Broker ββ[Task]ββ> Agent Bβ (accepts)
                           ββ[Task]ββ> Agent Bβ (rejects)
                           ββ[Task]ββ> Agent Bβ (rejects)
Use cases:
- Resource-constrained environments
- Load balancing
- Fault tolerance
- Performance optimization
A2A Task Content and Semantics
A2A Message-Based Classification
In A2A, task classification is handled through message content rather than rigid type fields, providing more flexibility:
Content-Based Classification
// Data processing task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Analyze quarterly sales data for trends"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Analysis parameters"}}},
    },
}
// Image processing task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Generate product image with specifications"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Image requirements"}}},
    },
}
// Notification task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Send completion notification to user"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Notification details"}}},
    },
}
Operation-Based Classification
create.*        - Creation operations
update.*        - Modification operations
delete.*        - Removal operations
analyze.*       - Analysis operations
transform.*     - Transformation operations
Complexity-Based Classification
simple.*        - Quick, low-resource tasks
standard.*      - Normal processing tasks
complex.*       - Resource-intensive tasks
background.*    - Long-running batch tasks
A2A Content Design Guidelines
Be Explicit: Include all information needed for execution in structured Parts
// Good: Explicit A2A content
content := []*a2a.Part{
    {
        Part: &a2a.Part_Text{
            Text: "Convert CSV file to JSON format with specific options",
        },
    },
    {
        Part: &a2a.Part_Data{
            Data: &a2a.DataPart{
                Data: structpb.NewStruct(map[string]interface{}{
                    "source_format":   "csv",
                    "target_format":   "json",
                    "include_headers": true,
                    "delimiter":       ",",
                    "encoding":        "utf-8",
                }),
                Description: "Conversion parameters",
            },
        },
    },
    {
        Part: &a2a.Part_File{
            File: &a2a.FilePart{
                FileId:   "source_data.csv",
                Filename: "data.csv",
                MimeType: "text/csv",
            },
        },
    },
}
// Poor: Ambiguous A2A content
content := []*a2a.Part{
    {
        Part: &a2a.Part_Text{
            Text: "Convert file", // Too vague
        },
    },
}
Use Standard Data Types: Leverage common formats for interoperability
// Good: Standard formats
{
  "timestamp": "2024-01-15T10:30:00Z",      // ISO 8601
  "amount": "123.45",                        // String for precision
  "coordinates": {"lat": 40.7128, "lng": -74.0060}
}
Include Validation Information: Help agents validate inputs
{
  "email": "user@example.com",
  "email_format": "rfc5322",
  "max_length": 254,
  "required": true
}
A2A Error Handling and Edge Cases
A2A Task Rejection
Agents should provide meaningful rejection reasons using A2A message format:
func (a *Agent) rejectA2ATask(task *a2a.Task, reason string) {
    // Create rejection message
    rejectionMessage := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Task rejected: " + reason,
                },
            },
            {
                Part: &a2a.Part_Data{
                    Data: &a2a.DataPart{
                        Data: structpb.NewStruct(map[string]interface{}{
                            "rejection_reason": reason,
                            "agent_id":         a.agentId,
                            "timestamp":        time.Now().Unix(),
                        }),
                        Description: "Rejection details",
                    },
                },
            },
        },
    }
    // Update task status to failed
    task.Status = &a2a.TaskStatus{
        State:     a2a.TaskState_TASK_STATE_FAILED,
        Update:    rejectionMessage,
        Timestamp: timestamppb.Now(),
    }
    a.publishTaskUpdate(task)
}
Common rejection reasons:
- UNSUPPORTED_TASK_TYPE: Agent doesn’t handle this task type
- CAPACITY_EXCEEDED: Agent is at maximum capacity
- DEADLINE_IMPOSSIBLE: Cannot complete within deadline
- INVALID_PARAMETERS: Task parameters are malformed
- RESOURCE_UNAVAILABLE: Required external resources unavailable
Timeout Handling
Both requesters and processors should handle timeouts gracefully:
// Requester timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
select {
case result := <-resultChannel:
    // Process result
case <-ctx.Done():
    // Handle timeout - possibly retry or fail
}
// Processor timeout
func (a *Agent) executeWithTimeout(task *pb.TaskMessage) {
    deadline := task.GetDeadline().AsTime()
    ctx, cancel := context.WithDeadline(context.Background(), deadline)
    defer cancel()
    select {
    case result := <-a.processTask(ctx, task):
        a.publishResult(task, result, pb.TaskStatus_TASK_STATUS_COMPLETED)
    case <-ctx.Done():
        a.publishResult(task, nil, pb.TaskStatus_TASK_STATUS_FAILED, "Deadline exceeded")
    }
}
Partial Results
For long-running tasks, consider supporting partial results:
type PartialResult struct {
    TaskId          string
    CompletedPortion float64    // 0.0 to 1.0
    IntermediateData interface{}
    CanResume       bool
    ResumeToken     string
}
Best Practices
Task Design
- Make task types granular but not too fine-grained
- Design for idempotency when possible
- Include retry information in metadata
- Use consistent parameter naming across similar task types
- Version your task schemas to enable evolution
- Batch related tasks when appropriate
- Use appropriate priority levels to avoid starvation
- Set realistic deadlines based on historical performance
- Include resource hints to help with scheduling
- Monitor task completion rates to identify bottlenecks
Security Considerations
- Validate all task parameters before processing
- Sanitize user-provided data in task parameters
- Include authorization context in metadata
- Log task execution for audit trails
- Encrypt sensitive parameters when necessary
A2A tasks form the foundation of Agent2Agent communication, enabling sophisticated distributed processing patterns through structured messages, artifacts, and context-aware coordination. The A2A protocol’s flexible message format and EDA integration provide robust, scalable agent networks with clear semantics and strong observability. Proper A2A task design leverages the protocol’s strengths for building maintainable, interoperable agent systems.
4.3 - Features
Deep explanations of AgentHub’s key features and capabilities
Feature Explanations
Detailed explanations of AgentHub’s advanced features, their design rationale, and implementation details.
Available Documentation
4.3.1 - Distributed Tracing & OpenTelemetry
Deep dive into distributed tracing concepts, OpenTelemetry architecture, and how AgentHub implements comprehensive observability for event-driven systems.
π Distributed Tracing & OpenTelemetry
Understanding-oriented: Deep dive into distributed tracing concepts, OpenTelemetry architecture, and how AgentHub implements comprehensive observability for event-driven systems.
The Problem: Observing Distributed Systems
Traditional monolithic applications are relatively easy to debugβeverything happens in one process, on one machine, with one log file. But modern event-driven architectures like AgentHub present unique challenges:
The Complexity of Event-Driven Systems
Request Flow in AgentHub:
User β Publisher Agent β AgentHub Broker β Subscriber Agent β Result β Publisher Agent
Each step involves:
- Different processes (potentially on different machines)
- Asynchronous communication (events, not direct calls)
- Multiple protocol layers (gRPC, HTTP, network)
- Independent failure modes (network partitions, service crashes)
- Varying performance characteristics (CPU, memory, I/O)
Traditional Debugging Challenges
Without distributed tracing:
Publisher logs:   "Published task task_123 at 10:00:01"
Broker logs:     "Received task from agent_pub at 10:00:01"
                 "Routed task to agent_sub at 10:00:01"
Subscriber logs: "Processing task task_456 at 10:00:02"
                 "Completed task task_789 at 10:00:03"
Questions you can’t answer:
- Which subscriber processed task_123?
- How long did task_123 take end-to-end?
- Where did task_123 fail?
- What was the complete flow for a specific request?
The Solution: Distributed Tracing
Distributed tracing solves these problems by creating a unified view of requests as they flow through multiple services.
Core Concepts
Trace
A trace represents a complete request journey through the system. In AgentHub, a trace might represent:
- Publishing a task
- Processing the task
- Publishing the result
- Receiving the result
Trace ID: a1b2c3d4e5f67890
Duration: 150ms
Services: 3 (publisher, broker, subscriber)
Spans: 5
Status: Success
Span
A span represents a single operation within a trace. Each span has:
- Name: What operation it represents
- Start/End time: When it happened
- Tags: Metadata about the operation
- Logs: Events that occurred during the operation
- Status: Success, error, or timeout
Span: "publish_event"
  Service: agenthub-publisher
  Duration: 25ms
  Tags:
    event.type: "greeting"
    event.id: "task_123"
    responder.agent: "agent_demo_subscriber"
  Status: OK
Span Context
The glue that connects spans across service boundaries. Contains:
- Trace ID: Unique identifier for the entire request
- Span ID: Unique identifier for the current operation
- Trace Flags: Sampling decisions, debug mode, etc.
How Tracing Works in AgentHub
1. Trace Initiation
When a publisher creates a task, it starts a new trace:
// Publisher starts a trace
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()
// Add metadata
span.SetAttributes(
    attribute.String("event.type", "greeting"),
    attribute.String("event.id", taskID),
)
2. Context Propagation
The trace context is injected into the task metadata:
// Inject trace context into task headers
headers := make(map[string]string)
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))
// Embed headers in task metadata
task.Metadata = &structpb.Struct{
    Fields: map[string]*structpb.Value{
        "trace_headers": structpb.NewStructValue(&structpb.Struct{
            Fields: stringMapToStructFields(headers),
        }),
    },
}
The broker and subscriber extract the trace context:
// Extract trace context from task metadata
if metadata := task.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := structFieldsToStringMap(traceHeaders.GetStructValue().Fields)
        ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(headers))
    }
}
// Continue the trace
ctx, span := tracer.Start(ctx, "process_event")
defer span.End()
4. Complete Request Flow
The result is a complete trace showing the entire request journey:
Trace: a1b2c3d4e5f67890
βββ publish_event (agenthub-publisher) [25ms]
β   βββ event.type: greeting
β   βββ event.id: task_123
βββ route_task (agenthub-broker) [2ms]
β   βββ source.agent: agent_demo_publisher
β   βββ target.agent: agent_demo_subscriber
βββ consume_event (agenthub-subscriber) [5ms]
β   βββ messaging.operation: receive
βββ process_task (agenthub-subscriber) [98ms]
β   βββ task.type: greeting
β   βββ task.parameter.name: Claude
β   βββ processing.status: completed
βββ publish_result (agenthub-subscriber) [20ms]
    βββ result.status: success
OpenTelemetry Architecture
OpenTelemetry is the observability framework that powers AgentHub’s tracing implementation.
The OpenTelemetry Stack
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β                    Applications                         β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
β  β  Publisher  β β   Broker    β β Subscriber  β      β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
βββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬ββββββ
                  β               β               β
βββββββββββββββββββΌββββββββββββββββΌββββββββββββββββΌββββββ
β              OpenTelemetry SDK                        β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
β  β   Tracer    β β    Meter    β β   Logger    β      β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
                                  β
βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
β            OpenTelemetry Collector                     β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
β  β  Receivers  β β Processors  β β  Exporters  β      β
β  βββββββββββββββ βββββββββββββββ βββββββββββββββ      β
βββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬ββββββ
                  β               β               β
βββββββββββββββββββΌββββββ βββββββββΌββββββββ βββββββΌββββββ
β      Jaeger           β β  Prometheus   β β   Logs    β
β   (Tracing)           β β  (Metrics)    β β(Logging)  β
βββββββββββββββββββββββββ βββββββββββββββββ βββββββββββββ
Core Components
Tracer
Creates and manages spans:
tracer := otel.Tracer("agenthub-publisher")
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()
Meter
Creates and manages metrics:
meter := otel.Meter("agenthub-publisher")
counter, _ := meter.Int64Counter("events_published_total")
counter.Add(ctx, 1)
Propagators
Handle context propagation across service boundaries:
// Inject context
otel.GetTextMapPropagator().Inject(ctx, carrier)
// Extract context
ctx = otel.GetTextMapPropagator().Extract(ctx, carrier)
Exporters
Send telemetry data to backend systems:
- OTLP Exporter: Sends to OpenTelemetry Collector
- Jaeger Exporter: Sends directly to Jaeger
- Prometheus Exporter: Exposes metrics for Prometheus
AgentHub’s OpenTelemetry Implementation
Configuration
func NewObservability(config Config) (*Observability, error) {
    // Create resource (service identification)
    res, err := resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceName(config.ServiceName),
            semconv.ServiceVersion(config.ServiceVersion),
        ),
    )
    // Setup tracing
    traceExporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint(config.JaegerEndpoint),
        otlptracegrpc.WithInsecure(),
    )
    tracerProvider := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(traceExporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.AlwaysSample()),
    )
    otel.SetTracerProvider(tracerProvider)
    // Setup metrics
    meterProvider := sdkmetric.NewMeterProvider(
        sdkmetric.WithResource(res),
        sdkmetric.WithReader(promExporter),
    )
    otel.SetMeterProvider(meterProvider)
}
Custom slog Handler Integration
AgentHub’s custom logging handler automatically correlates logs with traces:
func (h *ObservabilityHandler) Handle(ctx context.Context, r slog.Record) error {
    // Extract trace context
    if span := trace.SpanFromContext(ctx); span.SpanContext().IsValid() {
        spanCtx := span.SpanContext()
        attrs = append(attrs,
            slog.String("trace_id", spanCtx.TraceID().String()),
            slog.String("span_id", spanCtx.SpanID().String()),
        )
    }
    // Structured log output with trace correlation
    logData := map[string]interface{}{
        "time":     r.Time.Format(time.RFC3339),
        "level":    r.Level.String(),
        "msg":      r.Message,
        "trace_id": spanCtx.TraceID().String(),
        "span_id":  spanCtx.SpanID().String(),
        "service":  h.serviceName,
    }
}
Observability Patterns in Event-Driven Systems
Pattern 1: Event Correlation
Challenge: Correlating events across async boundaries
Solution: Inject trace context into event metadata
// Publisher injects context
headers := make(map[string]string)
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))
event.Metadata["trace_headers"] = headers
// Consumer extracts context
ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(event.Metadata["trace_headers"]))
Pattern 2: Async Operation Tracking
Challenge: Tracking operations that complete asynchronously
Solution: Create child spans that can outlive their parents
// Start async operation
ctx, span := tracer.Start(ctx, "async_operation")
go func() {
    defer span.End()
    // Long-running async work
    processTask()
    span.SetStatus(2, "") // Success
}()
// Parent can continue/return immediately
Pattern 3: Error Propagation
Challenge: Understanding how errors flow through the system
Solution: Record errors at each span and propagate error status
if err != nil {
    span.RecordError(err)
    span.SetStatus(1, err.Error()) // Error status
    // Optionally add error details
    span.SetAttributes(
        attribute.String("error.type", "validation_error"),
        attribute.String("error.message", err.Error()),
    )
}
Challenge: Understanding where time is spent in complex flows
Solution: Detailed span hierarchy with timing
// High-level operation
ctx, span := tracer.Start(ctx, "process_task")
defer span.End()
// Sub-operations
ctx, validateSpan := tracer.Start(ctx, "validate_input")
// ... validation logic
validateSpan.End()
ctx, computeSpan := tracer.Start(ctx, "compute_result")
// ... computation logic
computeSpan.End()
ctx, persistSpan := tracer.Start(ctx, "persist_result")
// ... persistence logic
persistSpan.End()
Benefits of AgentHub’s Observability Implementation
1. Complete Request Visibility
- See every step of event processing
- Understand inter-service dependencies
- Track request flows across async boundaries
- Identify bottlenecks in event processing
- Understand where time is spent
- Optimize critical paths
3. Error Diagnosis
- Pinpoint exactly where failures occur
- Understand error propagation patterns
- Correlate errors with system state
4. Capacity Planning
- Understand system throughput characteristics
- Identify scaling bottlenecks
- Plan resource allocation
5. Troubleshooting
- Correlate logs, metrics, and traces
- Understand system behavior under load
- Debug complex distributed issues
Advanced Tracing Concepts
Sampling
Not every request needs to be traced. Sampling reduces overhead:
// Probability sampling (trace 10% of requests)
sdktrace.WithSampler(sdktrace.ParentBased(
    sdktrace.TraceIDRatioBased(0.1),
))
// Rate limiting sampling (max 100 traces/second)
sdktrace.WithSampler(sdktrace.ParentBased(
    sdktrace.RateLimited(100),
))
Custom Attributes
Add business context to spans:
span.SetAttributes(
    attribute.String("user.id", userID),
    attribute.String("tenant.id", tenantID),
    attribute.Int("batch.size", len(items)),
    attribute.String("workflow.type", workflowType),
)
Span Events
Add timestamped events within spans:
span.AddEvent("validation.started")
// ... validation logic
span.AddEvent("validation.completed", trace.WithAttributes(
    attribute.Int("validation.rules.evaluated", ruleCount),
))
Baggage
Propagate key-value pairs across the entire trace:
// Set baggage
ctx = baggage.ContextWithValues(ctx,
    baggage.String("user.tier", "premium"),
    baggage.String("feature.flag", "new_algorithm"),
)
// Read baggage in any service
if member := baggage.FromContext(ctx).Member("user.tier"); member.Value() == "premium" {
    // Use premium algorithm
}
Overhead Analysis
AgentHub’s observability adds:
- CPU: ~5% overhead for tracing
- Memory: ~50MB per service for buffers and metadata
- Network: Minimal (async batched export)
- Latency: ~10ms additional end-to-end latency
Optimization Strategies
- Sampling: Reduce trace volume for high-throughput systems
- Batching: Export spans in batches to reduce network overhead
- Async Processing: Never block business logic for observability
- Resource Limits: Use memory limiters in the collector
Production Recommendations
- Enable sampling for high-volume systems
- Monitor collector performance and scale horizontally if needed
- Set retention policies for traces and metrics
- Use dedicated infrastructure for observability stack
Troubleshooting Common Issues
Missing Traces
Symptoms: No traces appear in Jaeger
Causes:
- Context not propagated correctly
- Exporter configuration issues
- Collector connectivity problems
Debugging:
# Check if spans are being created
curl http://localhost:8080/metrics | grep trace
# Check collector logs
docker-compose logs otel-collector
# Verify Jaeger connectivity
curl http://localhost:16686/api/traces
Broken Trace Chains
Symptoms: Spans appear disconnected
Causes:
- Context not extracted properly
- New context created instead of continuing existing
Debugging:
// Always check if context contains active span
if span := trace.SpanFromContext(ctx); span.SpanContext().IsValid() {
    fmt.Printf("Active trace: %s\n", span.SpanContext().TraceID())
} else {
    fmt.Println("No active trace context")
}
High Memory Usage
Symptoms: Observability causing OOM errors
Causes:
- Too many spans in memory
- Large span attributes
- Export failures causing backlog
Solutions:
// Configure memory limits
config := sdktrace.NewTracerProvider(
    sdktrace.WithSpanLimits(sdktrace.SpanLimits{
        AttributeCountLimit: 128,
        EventCountLimit:     128,
        LinkCountLimit:      128,
    }),
)
The Future of Observability
Emerging Trends
- eBPF-based Observability: Automatic instrumentation without code changes
- AI-Powered Analysis: Automatic anomaly detection and root cause analysis
- Unified Observability: Single pane of glass for metrics, traces, logs, and profiles
- Real-time Alerting: Faster detection and response to issues
OpenTelemetry Roadmap
- Profiling: Continuous profiling integration
- Client-side Observability: Browser and mobile app tracing
- Database Instrumentation: Automatic query tracing
- Infrastructure Correlation: Link application traces to infrastructure metrics
Conclusion
Distributed tracing transforms debugging from guesswork into precise investigation. AgentHub’s implementation with OpenTelemetry provides:
- Complete visibility into event-driven workflows
- Performance insights for optimization
- Error correlation for faster resolution
- Business context through custom attributes
The investment in observability pays dividends in:
- Reduced MTTR (Mean Time To Resolution)
- Improved performance through data-driven optimization
- Better user experience through proactive monitoring
- Team productivity through better tooling
π― Ready to Implement?
Hands-on: Observability Demo Tutorial
Production: Add Observability to Your Agent
Deep Dive: Observability Architecture
4.3.2 - Architecture Evolution: From Build Tags to Unified Abstractions
Understanding AgentHub’s evolution from build tag-based conditional compilation to unified abstractions with built-in observability.
Understanding-oriented: Learn how AgentHub evolved from build tag-based conditional compilation to a unified abstraction approach that dramatically simplifies development while providing comprehensive observability.
AgentHub originally used Go build tags to handle different deployment scenarios:
- Development: Fast builds with minimal features (go build)
- Production: Full observability builds (go build -tags observability)
- Testing: Lightweight versions for testing environments
Problems with Build Tags:
- Maintenance overhead: Separate code paths for different builds
- Testing complexity: Hard to ensure feature parity across variants
- Developer experience: Multiple build commands and configurations
- Binary complexity: Different feature sets in different binaries
Modern Solution: Unified Abstractions
AgentHub now uses a unified abstraction layer (internal/agenthub/) that provides:
- Single codebase: No more separate files for different builds
- Built-in observability: Always available, configured via environment
- Simplified development: One build command, one binary
- Runtime configuration: Features controlled by environment variables
The New Architecture
Core Components
The unified abstraction provides these key components:
1. AgentHubServer
// Single server implementation with built-in observability
server, err := agenthub.NewAgentHubServer(config)
if err != nil {
    return err
}
// Automatic OpenTelemetry, metrics, health checks
err = server.Start(ctx)
2. AgentHubClient
// Single client implementation with built-in observability
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    return err
}
// Automatic tracing, metrics, structured logging
err = client.Start(ctx)
3. TaskPublisher & TaskSubscriber
// High-level abstractions with automatic correlation
publisher := &agenthub.TaskPublisher{
    Client: client.Client,
    TraceManager: client.TraceManager,
    // Built-in observability
}
subscriber := agenthub.NewTaskSubscriber(client, agentID)
// Automatic task processing with tracing
Before vs After Comparison
Old Build Tag Approach
File Structure (Legacy):
agents/publisher/
βββ main.go                 # Basic version (~200 lines)
βββ main_observability.go   # Observable version (~380 lines)
βββ shared.go              # Common code
βββ config.go              # Configuration
broker/
βββ main.go                 # Basic broker (~150 lines)
βββ main_observability.go   # Observable broker (~300 lines)
βββ server.go              # Core logic
Build Commands (Legacy):
# Basic build
go build -o bin/publisher agents/publisher/
# Observable build
go build -tags observability -o bin/publisher-obs agents/publisher/
# Testing observable features
go test -tags observability ./...
New Unified Approach
File Structure (Current):
agents/publisher/
βββ main.go                 # Single implementation (~50 lines)
agents/subscriber/
βββ main.go                 # Single implementation (~60 lines)
broker/
βββ main.go                 # Single implementation (~30 lines)
internal/agenthub/          # Unified abstraction layer
βββ grpc.go                # Client/server with observability
βββ subscriber.go          # Task processing abstractions
βββ broker.go             # Event bus implementation
βββ metadata.go           # Correlation and metadata
Build Commands (Current):
# Single build for all use cases
go build -o bin/publisher agents/publisher/
go build -o bin/subscriber agents/subscriber/
go build -o bin/broker broker/
# Testing (no special tags needed)
go test ./...
Configuration Evolution
Environment-Based Configuration
Instead of build tags, features are now controlled via environment variables:
# Observability configuration
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="agenthub"
export OTEL_SERVICE_VERSION="1.0.0"
# Health and metrics ports
export BROKER_HEALTH_PORT="8080"
# Broker connection
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
Automatic Feature Detection
The unified abstractions automatically configure features based on environment:
// Observability is automatically configured
config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)
// If JAEGER_ENDPOINT is set β tracing enabled
// If BROKER_HEALTH_PORT is set β health server enabled
// Always includes structured logging and basic metrics
Benefits of the New Architecture
1. Developer Experience
- Single build command: No more tag confusion
- Consistent behavior: Same binary for all environments
- Easier testing: No need for multiple test runs
- Simplified CI/CD: One build pipeline
2. Maintenance Reduction
- 90% less code: From 380+ lines to 29 lines for broker
- Single code path: No more duplicate implementations
- Unified testing: Test once, works everywhere
- Automatic features: Observability included by default
3. Operational Benefits
- Runtime configuration: Change behavior without rebuilding
- Consistent deployment: Same binary across environments
- Better observability: Always available when needed
- Easier debugging: Full context always present
Migration Guide
For users migrating from the old build tag approach:
Old Commands β New Commands
# OLD: Basic builds
go build -o bin/publisher agents/publisher/
# NEW: Same command (unchanged)
go build -o bin/publisher agents/publisher/
# OLD: Observable builds
go build -tags observability -o bin/publisher-obs agents/publisher/
# NEW: Same binary, configure via environment
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
go build -o bin/publisher agents/publisher/
# OLD: Testing with tags
go test -tags observability ./...
# NEW: Standard testing
go test ./...
Configuration Migration
# OLD: Feature controlled by build tags
go build -tags observability
# NEW: Feature controlled by environment
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="my-service"
Architecture Philosophy
From Compile-Time to Runtime
The move from build tags to unified abstractions represents a fundamental shift:
Build Tags Philosophy (Old):
- “Choose features at compile time”
- “Different binaries for different needs”
- “Minimize what’s included”
Unified Abstractions Philosophy (New):
- “Include everything, configure at runtime”
- “One binary, many configurations”
- “Maximize developer experience”
Why This Change?
- Cloud-Native Reality: Modern deployments use containers with environment-based config
- Developer Productivity: Unified approach eliminates confusion and errors
- Testing Simplicity: One code path means reliable testing
- Operational Excellence: Runtime configuration enables better operations
Resource Impact
The unified approach has minimal overhead:
Binary Size:
- Old basic: ~8MB
- Old observable: ~15MB
- New unified: ~12MB
Memory Usage:
- Baseline: ~10MB
- With observability: ~15MB (when enabled)
- Without observability: ~10MB (minimal overhead)
Startup Time:
- With observability enabled: ~150ms
- With observability disabled: ~50ms
Optimization Strategy
The abstractions use lazy initialization:
// Observability components only initialize if configured
if config.JaegerEndpoint != "" {
    // Initialize tracing
}
if config.HealthPort != "" {
    // Start health server
}
// Always minimal logging and basic metrics
Future Evolution
Planned Enhancements
- Plugin Architecture: Dynamic feature loading
- Configuration Profiles: Predefined environment sets
- Feature Flags: Runtime feature toggling
- Auto-Configuration: Intelligent environment detection
Compatibility Promise
The unified abstractions maintain backward compatibility:
- Old environment variables still work
- Gradual migration path available
- No breaking changes in core APIs
This architectural evolution demonstrates how AgentHub prioritizes developer experience and operational simplicity while maintaining full observability capabilities. The move from build tags to unified abstractions represents a maturation of the platform toward cloud-native best practices.
4.3.3 - Performance and Scaling Considerations
Explore the performance characteristics of AgentHub, scaling patterns, and optimization strategies for different deployment scenarios.
This document explores the performance characteristics of AgentHub, scaling patterns, and optimization strategies for different deployment scenarios.
Test Environment:
- 4-core Intel i7 processor
- 16GB RAM
- Local network (localhost)
- Go 1.24
Measured Performance:
- Task throughput: 8,000-12,000 tasks/second
- Task routing latency: 0.1-0.5ms average
- End-to-end latency: 2-10ms (including processing)
- Memory per agent: ~1KB active subscription state
- Concurrent agents: 1,000+ agents per broker instance
Task routing is the core performance bottleneck in AgentHub:
// Fast path: Direct agent routing
if responderID := req.GetTask().GetResponderAgentId(); responderID != "" {
    if subs, ok := s.taskSubscribers[responderID]; ok {
        targetChannels = subs  // O(1) lookup
    }
}
Optimization factors:
- Direct routing: O(1) lookup time for targeted tasks
- Broadcast routing: O(n) where n = number of subscribed agents
- Channel delivery: Concurrent delivery via goroutines
- Lock contention: Read locks allow concurrent routing
2. Message Serialization
Protocol Buffers provide efficient serialization:
- Binary encoding: ~60% smaller than JSON
- Zero-copy operations: Direct memory mapping where possible
- Schema evolution: Backward/forward compatibility
- Type safety: Compile-time validation
3. Memory Usage Patterns
// Memory usage breakdown per agent:
type agentMemoryFootprint struct {
    SubscriptionState    int // ~200 bytes (map entry + channel)
    ChannelBuffer       int // ~800 bytes (10 message buffer * 80 bytes avg)
    ConnectionOverhead  int // ~2KB (gRPC stream state)
    // Total: ~3KB per active agent
}
Memory optimization strategies:
- Bounded channels: Prevent unbounded growth
- Connection pooling: Reuse gRPC connections
- Garbage collection: Go’s GC handles cleanup automatically
Scaling Patterns
Vertical Scaling (Scale Up)
Increasing resources on a single broker instance:
CPU Scaling
- Multi-core utilization: Go’s runtime leverages multiple cores
- Goroutine efficiency: Lightweight concurrency (2KB stack)
- CPU-bound operations: Message serialization, routing logic
// Configure for CPU optimization
export GOMAXPROCS=8  // Match available CPU cores
Memory Scaling
- Linear growth: Memory usage scales with number of agents
- Buffer tuning: Adjust channel buffer sizes based on throughput
// Memory-optimized configuration
subChan := make(chan *pb.TaskMessage, 5)  // Smaller buffers for memory-constrained environments
// vs
subChan := make(chan *pb.TaskMessage, 50) // Larger buffers for high-throughput environments
Network Scaling
- Connection limits: OS file descriptor limits (ulimit -n)
- Bandwidth utilization: Protocol Buffers minimize bandwidth usage
- Connection keepalive: Efficient connection reuse
Horizontal Scaling (Scale Out)
Distributing load across multiple broker instances:
1. Agent Partitioning
Static Partitioning:
Agent Groups:
βββ Broker 1: agents_1-1000
βββ Broker 2: agents_1001-2000
βββ Broker 3: agents_2001-3000
Hash-based Partitioning:
func selectBroker(agentID string) string {
    hash := fnv.New32a()
    hash.Write([]byte(agentID))
    brokerIndex := hash.Sum32() % uint32(len(brokers))
    return brokers[brokerIndex]
}
2. Task Type Partitioning
Specialized Brokers:
Task Routing:
βββ Broker 1: data_processing, analytics
βββ Broker 2: image_processing, ml_inference
βββ Broker 3: notifications, logging
3. Geographic Partitioning
Regional Distribution:
Geographic Deployment:
βββ US-East: Broker cluster for East Coast agents
βββ US-West: Broker cluster for West Coast agents
βββ EU: Broker cluster for European agents
Load Balancing Strategies
1. Round-Robin Agent Distribution
type LoadBalancer struct {
    brokers []string
    current int
    mu      sync.Mutex
}
func (lb *LoadBalancer) NextBroker() string {
    lb.mu.Lock()
    defer lb.mu.Unlock()
    broker := lb.brokers[lb.current]
    lb.current = (lb.current + 1) % len(lb.brokers)
    return broker
}
2. Capacity-Based Routing
type BrokerMetrics struct {
    ActiveAgents int
    TasksPerSec  float64
    CPUUsage     float64
    MemoryUsage  float64
}
func selectBestBroker(brokers []BrokerMetrics) int {
    // Select broker with lowest load score
    bestIndex := 0
    bestScore := calculateLoadScore(brokers[0])
    for i, broker := range brokers[1:] {
        score := calculateLoadScore(broker)
        if score < bestScore {
            bestScore = score
            bestIndex = i + 1
        }
    }
    return bestIndex
}
1. Message Batching
For high-throughput scenarios, implement message batching:
type BatchProcessor struct {
    tasks     []*pb.TaskMessage
    batchSize int
    timeout   time.Duration
    ticker    *time.Ticker
}
func (bp *BatchProcessor) processBatch() {
    batch := make([]*pb.TaskMessage, len(bp.tasks))
    copy(batch, bp.tasks)
    bp.tasks = bp.tasks[:0] // Clear slice
    // Process entire batch
    go bp.routeBatch(batch)
}
2. Connection Pooling
Optimize gRPC connections for better resource utilization:
type ConnectionPool struct {
    connections map[string]*grpc.ClientConn
    maxConns    int
    mu          sync.RWMutex
}
func (cp *ConnectionPool) GetConnection(addr string) (*grpc.ClientConn, error) {
    cp.mu.RLock()
    if conn, exists := cp.connections[addr]; exists {
        cp.mu.RUnlock()
        return conn, nil
    }
    cp.mu.RUnlock()
    // Create new connection
    return cp.createConnection(addr)
}
3. Adaptive Channel Sizing
Dynamically adjust channel buffer sizes based on load:
func calculateOptimalBufferSize(avgTaskRate float64, processingTime time.Duration) int {
    // Buffer size = rate * processing time + safety margin
    bufferSize := int(avgTaskRate * processingTime.Seconds()) + 10
    // Clamp to reasonable bounds
    if bufferSize < 5 {
        return 5
    }
    if bufferSize > 100 {
        return 100
    }
    return bufferSize
}
4. Memory Optimization
Reduce memory allocations in hot paths:
// Use sync.Pool for frequent allocations
var taskPool = sync.Pool{
    New: func() interface{} {
        return &pb.TaskMessage{}
    },
}
func processTaskOptimized(task *pb.TaskMessage) {
    // Reuse task objects
    pooledTask := taskPool.Get().(*pb.TaskMessage)
    defer taskPool.Put(pooledTask)
    // Copy and process
    *pooledTask = *task
    // ... processing logic
}
Monitoring and Metrics
Throughput Metrics
type ThroughputMetrics struct {
    TasksPerSecond     float64
    ResultsPerSecond   float64
    ProgressPerSecond  float64
    MessagesPerSecond  float64
}
Latency Metrics
type LatencyMetrics struct {
    RoutingLatency     time.Duration // Broker routing time
    ProcessingLatency  time.Duration // Agent processing time
    EndToEndLatency    time.Duration // Total task completion time
    P50, P95, P99      time.Duration // Percentile latencies
}
Resource Metrics
type ResourceMetrics struct {
    ActiveAgents       int
    ActiveTasks        int
    MemoryUsage        int64
    CPUUsage           float64
    GoroutineCount     int
    OpenConnections    int
}
Monitoring Implementation
import "github.com/prometheus/client_golang/prometheus"
var (
    taskCounter = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "agenthub_tasks_total",
            Help: "Total number of tasks processed",
        },
        []string{"task_type", "status"},
    )
    latencyHistogram = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "agenthub_task_duration_seconds",
            Help:    "Task processing duration",
            Buckets: prometheus.DefBuckets,
        },
        []string{"task_type"},
    )
)
Scaling Recommendations
Small Deployments (1-100 agents)
- Single broker instance: Sufficient for most small deployments
- Vertical scaling: Add CPU/memory as needed
- Simple monitoring: Basic logging and health checks
Medium Deployments (100-1,000 agents)
- Load balancing: Implement agent distribution
- Resource monitoring: Track CPU, memory, and throughput
- Optimization: Tune channel buffer sizes and timeouts
Large Deployments (1,000+ agents)
- Horizontal scaling: Multiple broker instances
- Partitioning strategy: Implement agent or task type partitioning
- Advanced monitoring: Full metrics and alerting
- Performance testing: Regular load testing and optimization
High-Throughput Scenarios (10,000+ tasks/second)
- Message batching: Implement batch processing
- Connection optimization: Use connection pooling
- Hardware optimization: SSD storage, high-speed networking
- Profiling: Regular performance profiling and optimization
1. High Latency
Symptoms: Slow task processing times
Causes: Network latency, overloaded agents, inefficient routing
Solutions: Optimize routing, add caching, scale horizontally
2. Memory Leaks
Symptoms: Increasing memory usage over time
Causes: Unclosed channels, goroutine leaks, connection leaks
Solutions: Proper cleanup, monitoring, garbage collection tuning
3. Connection Limits
Symptoms: New agents can’t connect
Causes: OS file descriptor limits, broker resource limits
Solutions: Increase limits, implement connection pooling
4. Message Loss
Symptoms: Tasks not reaching agents or results not returned
Causes: Timeout issues, network problems, buffer overflows
Solutions: Increase timeouts, improve error handling, adjust buffer sizes
Load Testing Script
func loadTest() {
    // Create multiple publishers
    publishers := make([]Publisher, 10)
    for i := range publishers {
        publishers[i] = NewPublisher(fmt.Sprintf("publisher_%d", i))
    }
    // Send tasks concurrently
    taskRate := 1000 // tasks per second
    duration := 60 * time.Second
    ticker := time.NewTicker(time.Duration(1e9 / taskRate))
    timeout := time.After(duration)
    for {
        select {
        case <-ticker.C:
            publisher := publishers[rand.Intn(len(publishers))]
            go publisher.PublishTask(generateRandomTask())
        case <-timeout:
            return
        }
    }
}
The AgentHub architecture provides solid performance for most use cases and clear scaling paths for growing deployments. Regular monitoring and optimization ensure continued performance as your agent ecosystem evolves.
4.3.4 - The Unified Abstraction Library
The AgentHub Unified Abstraction Library dramatically simplifies the development of agents and brokers while providing built-in observability, environment-based configuration, and automatic correlation tracking.
The A2A-Compliant Unified Abstraction Library
Overview
The AgentHub Unified Abstraction Library (internal/agenthub/) is a comprehensive set of A2A protocol-compliant abstractions that dramatically simplifies the development of A2A agents and brokers while providing built-in observability, environment-based configuration, and automatic correlation tracking.
Key Benefits
Before and After Comparison
Before (Legacy approach):
- broker/main_observability.go: 380+ lines of boilerplate
- Manual OpenTelemetry setup in every component
- Duplicate configuration handling across components
- Manual correlation ID management
- Separate observability and non-observability variants
After (Unified abstractions):
- broker/main.go: 29 lines using abstractions
- Automatic OpenTelemetry integration
- Environment-based configuration
- Automatic correlation ID generation and propagation
- Single implementation with built-in observability
Core Components
1. gRPC Abstractions (grpc.go)
AgentHubServer
Provides a complete gRPC server abstraction with:
- Automatic OpenTelemetry instrumentation
- Environment-based configuration
- Built-in health checks
- Metrics collection
- Graceful shutdown
// Create and start a broker in one line
func StartBroker(ctx context.Context) error {
    config := NewGRPCConfig("broker")
    server, err := NewAgentHubServer(config)
    if err != nil {
        return err
    }
    return server.Start(ctx)
}
AgentHubClient
Provides a complete gRPC client abstraction with:
- Automatic connection management
- Built-in observability
- Environment-based server discovery
- Health monitoring
// Create a client with built-in observability
config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)
2. A2A Task Management Abstractions (a2a.go)
A2ATaskPublisher
Simplifies A2A task publishing with:
- Automatic A2A message generation
- Built-in observability tracing
- A2A context management
- Structured error handling
- A2A-compliant message formatting
a2aPublisher := &agenthub.A2ATaskPublisher{
    Client:         client.Client,
    TraceManager:   client.TraceManager,
    MetricsManager: client.MetricsManager,
    Logger:         client.Logger,
    ComponentName:  "a2a_publisher",
}
// Create A2A task with structured message content
task := &a2a.Task{
    Id:        "task_greeting_" + uuid.New().String(),
    ContextId: "conversation_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please process greeting task",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data:        greetingParams,
                            Description: "Greeting parameters",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}
err := a2aPublisher.PublishA2ATask(ctx, task, &pb.AgentEventMetadata{
    FromAgentId: "publisher_id",
    ToAgentId:   "subscriber_id",
    EventType:   "task.submitted",
    Priority:    pb.Priority_PRIORITY_MEDIUM,
})
A2ATaskProcessor
Provides full observability for A2A task processing:
- Automatic A2A trace propagation
- Rich A2A span annotations with context and message details
- A2A message processing metrics
- A2A conversation context tracking
- Error tracking with A2A-compliant error messages
3. A2A Subscriber Abstractions (a2a_subscriber.go)
A2ATaskSubscriber
Complete A2A subscriber implementation with:
- A2A-compliant task handler system
- Built-in A2A message processors
- Automatic A2A artifact publishing
- Full A2A observability integration
- A2A conversation context awareness
a2aSubscriber := agenthub.NewA2ATaskSubscriber(client, agentID)
a2aSubscriber.RegisterDefaultA2AHandlers()
// Custom A2A task handlers
a2aSubscriber.RegisterA2ATaskHandler("greeting", func(ctx context.Context, event *pb.AgentEvent) error {
    task := event.GetTask()
    if task == nil {
        return fmt.Errorf("no task in event")
    }
    // Process A2A task content
    requestMessage := task.Status.Update
    response := a2aSubscriber.ProcessA2AMessage(ctx, requestMessage)
    // Create completion artifact
    artifact := &a2a.Artifact{
        ArtifactId: "artifact_" + uuid.New().String(),
        Name:       "Greeting Response",
        Description: "Processed greeting task result",
        Parts: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: response,
                },
            },
        },
    }
    // Complete task with artifact
    return a2aSubscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
})
go a2aSubscriber.SubscribeToA2ATasks(ctx)
go a2aSubscriber.SubscribeToA2AMessages(ctx)
4. A2A Broker Service (a2a_broker.go)
Complete A2A-compliant AgentHub service implementation that handles:
- A2A message routing and delivery
- A2A subscription management with context filtering
- A2A artifact distribution
- A2A task state management
- EDA+A2A hybrid routing
- Full A2A observability
// A2A broker service with unified abstractions
type A2ABrokerService struct {
    // A2A-specific components
    MessageRouter    *A2AMessageRouter
    TaskManager      *A2ATaskManager
    ContextManager   *A2AContextManager
    ArtifactManager  *A2AArtifactManager
    // EDA integration
    EventBus         *EDAEventBus
    SubscriptionMgr  *A2ASubscriptionManager
    // Observability
    TraceManager     *TraceManager
    MetricsManager   *A2AMetricsManager
}
A2A Environment-Based Configuration
The library uses environment variables for zero-configuration A2A setup:
# Core AgentHub A2A Settings
export AGENTHUB_BROKER_ADDR=localhost
export AGENTHUB_BROKER_PORT=50051
# A2A Protocol Configuration
export AGENTHUB_A2A_PROTOCOL_VERSION=1.0
export AGENTHUB_MESSAGE_BUFFER_SIZE=100
export AGENTHUB_CONTEXT_TIMEOUT=30s
export AGENTHUB_ARTIFACT_MAX_SIZE=10MB
# Observability Endpoints
export JAEGER_ENDPOINT=127.0.0.1:4317
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317
# A2A Health Check Ports
export AGENTHUB_HEALTH_PORT=8080
export A2A_PUBLISHER_HEALTH_PORT=8081
export A2A_SUBSCRIBER_HEALTH_PORT=8082
A2A Automatic Observability
A2A Distributed Tracing
- Automatic A2A instrumentation: OpenTelemetry gRPC interceptors handle A2A trace propagation
- A2A service naming: Unified “agenthub” service with A2A component differentiation
- Rich A2A annotations: Message content, conversation context, task state transitions, and artifact details
- A2A context tracking: Complete conversation thread visibility across multiple agents
A2A Metrics Collection
- A2A message metrics: Message processing rates, A2A error rates, latencies by message type
- A2A task metrics: Task completion rates, state transition times, artifact production metrics
- A2A context metrics: Conversation context tracking, multi-agent coordination patterns
- A2A system metrics: Health checks, A2A connection status, protocol version compatibility
- A2A component metrics: Per-agent A2A performance, broker routing efficiency
Health Monitoring
- Automatic endpoints: /health,/ready,/metrics
- Component tracking: Individual health per service
- Graceful shutdown: Proper cleanup and connection management
A2A Correlation and Context Tracking
Automatic A2A Correlation IDs
// A2A task ID generation
taskID := fmt.Sprintf("task_%s_%s", taskDescription, uuid.New().String())
// A2A message ID generation
messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())
// A2A context ID for conversation threading
contextID := fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())
A2A Context Propagation
- A2A conversation threading: Context IDs link related tasks across agents
- A2A message history: Complete audit trail of all messages in a conversation
- A2A workflow tracking: End-to-end visibility of multi-agent workflows
Trace Propagation
- W3C Trace Context: Standard distributed tracing headers
- Automatic propagation: gRPC interceptors handle context passing
- End-to-end visibility: Publisher β Broker β Subscriber traces
A2A Migration Guide
From Legacy EventBus to A2A Abstractions
Before (Legacy EventBus):
// 50+ lines of observability setup
obs, err := observability.New(ctx, observability.Config{...})
server := grpc.NewServer(grpc.UnaryInterceptor(...))
pb.RegisterEventBusServer(server, &eventBusService{...})
// Manual task message creation
task := &pb.TaskMessage{
    TaskId:   "task_123",
    TaskType: "greeting",
    // ... manual field population
}
After (A2A Abstractions):
// One line A2A broker startup
err := agenthub.StartA2ABroker(ctx)
// A2A task creation with abstractions
task := a2aPublisher.CreateA2ATask("greeting", greetingContent, "conversation_123")
err := a2aPublisher.PublishA2ATask(ctx, task, routingMetadata)
Best Practices
1. Use Environment Configuration
Let the library handle configuration automatically:
source .envrc  # Load all environment variables
go run broker/main.go
2. Register Custom A2A Handlers
Extend functionality with custom A2A task handlers:
a2aSubscriber.RegisterA2ATaskHandler("my_task", myCustomA2AHandler)
// A2A handler signature with event and context
func myCustomA2AHandler(ctx context.Context, event *pb.AgentEvent) error {
    task := event.GetTask()
    // Process A2A message content
    return a2aSubscriber.CompleteA2ATaskWithArtifact(ctx, task, resultArtifact)
}
3. Leverage Built-in Observability
The library provides comprehensive observability by default - no additional setup required.
4. Use A2A Structured Logging
The library provides structured loggers with A2A trace correlation:
// A2A-aware logging with context
client.Logger.InfoContext(ctx, "Processing A2A task",
    "task_id", task.GetId(),
    "context_id", task.GetContextId(),
    "message_count", len(task.GetHistory()),
    "current_state", task.GetStatus().GetState().String(),
)
A2A Architecture Benefits
Code Reduction with A2A Abstractions
- A2A Broker: 380+ lines β 29 lines (92% reduction)
- A2A Publisher: 150+ lines β 45 lines (70% reduction)
- A2A Subscriber: 200+ lines β 55 lines (72% reduction)
- A2A Message Handling: Complex manual parsing β automatic Part processing
- A2A Context Management: Manual tracking β automatic conversation threading
A2A Maintainability
- A2A protocol compliance: Centralized A2A message handling ensures protocol adherence
- Consistent A2A patterns: Same abstractions across all A2A components
- A2A-aware configuration: Environment variables tuned for A2A performance
- A2A context preservation: Automatic conversation context management
A2A Developer Experience
- Zero A2A boilerplate: Built-in A2A message parsing and artifact handling
- A2A-native architecture: Easy to extend with custom A2A message processors
- Automatic A2A setup: One-line A2A service creation with protocol compliance
- A2A debugging: Rich conversation context and message history for troubleshooting
A2A Future Extensibility
The A2A abstraction library is designed for A2A protocol extension:
- Custom A2A Part types: Easy to add new content types (text, data, files, custom)
- Custom A2A observability: Extend A2A metrics and conversation tracing
- A2A configuration: Override A2A protocol defaults with environment variables
- A2A transport options: Extend beyond gRPC while maintaining A2A compliance
- A2A protocol evolution: Built-in version compatibility and migration support
A2A Protocol Extension Points
// Custom A2A Part type
type CustomPart struct {
    CustomData interface{} `json:"custom_data"`
    Format     string      `json:"format"`
}
// Custom A2A artifact processor
type CustomArtifactProcessor struct {
    SupportedTypes []string
    ProcessFunc    func(ctx context.Context, artifact *a2a.Artifact) error
}
// Custom A2A context manager
type CustomContextManager struct {
    ContextRules map[string]ContextRule
    RouteFunc    func(contextId string, message *a2a.Message) []string
}
This A2A-compliant unified approach provides a solid foundation for building complex multi-agent systems with full Agent2Agent protocol support while maintaining simplicity, comprehensive observability, and rich conversation capabilities.