Welcome to the AgentHub documentation! This comprehensive guide will help you understand, install, and use the Agent2Agent (A2A) protocol-compliant communication broker to build sophisticated multi-agent systems with Event-Driven Architecture scalability.

🚀 Quick Start

New to AgentHub?

Start with our Installation and Setup tutorial, then follow the Running the Demo guide to see AgentHub in action.

Want to understand the concepts?

Read about The Agent2Agent Principle to understand the philosophy behind AgentHub.

Ready to build agents?

Use our guides to Create a Publisher or Create a Subscriber.

Need technical details?

Check the API Reference and Configuration Reference.

📚 Documentation Types

Our documentation follows the Diátaxis framework with four distinct types:

Tutorials - Learning by doing

Step-by-step guides that teach you how to use AgentHub through practical exercises:

Installation and Setup
Running the Demo
Building Multi-Agent Workflows
Dashboard Tour
Observability Demo
Cortex Tutorials - New: Asynchronous AI orchestration

How-to Guides - Solving specific problems

Goal-oriented guides for accomplishing specific tasks:

Reference - Technical specifications

Comprehensive technical documentation and API details:

Explanation - Understanding concepts

In-depth discussions that provide context and background:

The Agent2Agent Principle
A2A Protocol Migration - New: Understanding the A2A compliance migration
Understanding Tasks
Broker Architecture
Cortex Architecture - New: Asynchronous AI orchestration design
Performance and Scaling
Unified Abstraction Library
Distributed Tracing
Go Build Tags

🎯 Learning Paths

For Beginners

For Experienced Developers

For System Architects

🔧 System Overview

AgentHub enables autonomous agents to collaborate through A2A protocol-compliant task delegation with EDA scalability:

A2A Protocol Compliance with standardized Message, Task, and Artifact formats
Event-Driven Architecture for scalable asynchronous communication
Flexible agent addressing (direct, broadcast, topic-based routing)
Rich task semantics with A2A lifecycle states and priorities
Built-in resilience with EDA patterns and graceful failure handling
Comprehensive observability with distributed tracing and metrics

🛠️ Key Technologies

Protocol: gRPC with Protocol Buffers
Language: Go 1.24+
Architecture: Event-driven with pub/sub patterns
Observability: OpenTelemetry integration
Message Format: Structured protobuf with flexible JSON parameters

1 - Tutorials

Learning-oriented guides that take you through practical exercises to master AgentHub

Tutorials

These hands-on tutorials will guide you through learning AgentHub by doing. Each tutorial is designed to be followed step-by-step and will help you build practical experience with the system.

📚 Tutorial Categories

Getting Started - Essential setup and first steps with AgentHub
Observability - Monitoring, tracing, and dashboard tutorials
Workflows - Building complex multi-agent systems

🎯 Learning Path

1. Start Here

Begin with Getting Started tutorials to install and run your first examples

2. Build Systems

Progress to Workflows to create sophisticated agent interactions

3. Monitor & Observe

Master Observability to monitor and troubleshoot your deployments

📋 Prerequisites

Before starting these tutorials, make sure you have:

Go 1.24 or later installed
Basic understanding of command-line tools
Familiarity with distributed systems concepts (helpful but not required)

💡 Tutorial Tips

Follow tutorials in order for the best learning experience
Each tutorial builds on concepts from previous ones
Code examples are tested and should work as written
Don’t hesitate to experiment beyond the tutorial steps

Note

All tutorials are designed to work on Linux, macOS, and Windows. Platform-specific instructions are provided where needed.

1.1 - Getting Started

Essential tutorials to get you up and running with AgentHub

Getting Started Tutorials

Step-by-step tutorials to help you get AgentHub installed, configured, and running your first examples.

Available Tutorials

Installation and Setup - Install AgentHub and configure your environment
Run Demo - Run your first AgentHub demo to see it in action

1.1.1 - Installation and Setup Tutorial

Guide for installing AgentHub and setting up your development environment from scratch. Get a working A2A-compliant AgentHub installation ready for building agent systems.

Installation and Setup Tutorial

This tutorial will guide you through installing AgentHub and setting up your development environment from scratch. By the end, you’ll have a working A2A-compliant AgentHub installation ready for building Agent2Agent protocol systems.

Prerequisites Check

Before we begin, let’s verify you have the required software installed.

Step 1: Verify Go Installation

Check if Go 1.24+ is installed:

go version

You should see output like:

go version go1.24.0 darwin/amd64

If Go is not installed or the version is older than 1.24:

macOS (using Homebrew):

brew install go

Linux (using package manager):

# Ubuntu/Debian
sudo apt update && sudo apt install golang-go

# CentOS/RHEL
sudo yum install golang

# Arch Linux
sudo pacman -S go

Windows: Download from https://golang.org/dl/ and run the installer.

Step 2: Verify Protocol Buffers Compiler

Check if protoc is installed:

protoc --version

You should see output like:

libprotoc 3.21.12

If protoc is not installed:

macOS (using Homebrew):

brew install protobuf

Linux:

# Ubuntu/Debian
sudo apt update && sudo apt install protobuf-compiler

# CentOS/RHEL
sudo yum install protobuf-compiler

# Arch Linux
sudo pacman -S protobuf

Windows: Download from Protocol Buffers releases and add to PATH.

Step 3: Install Go Protocol Buffer Plugins

Install the required Go plugins for Protocol Buffers:

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

Verify the plugins are in your PATH:

which protoc-gen-go
which protoc-gen-go-grpc

Both commands should return paths to the installed plugins.

Installing AgentHub

Step 4: Clone the Repository

Clone the AgentHub repository:

git clone https://github.com/owulveryck/agenthub.git
cd agenthub

Step 5: Verify Project Structure

Let’s explore what we have:

ls -la

You should see:

drwxr-xr-x agents/           # Sample A2A agent implementations
drwxr-xr-x broker/           # A2A-compliant AgentHub broker server
drwxr-xr-x documentation/    # Complete A2A documentation
drwxr-xr-x events/           # Generated A2A protocol code
drwxr-xr-x internal/         # Internal packages and abstractions
-rw-r--r-- go.mod            # Go module definition
-rw-r--r-- Makefile         # Build automation
drwxr-xr-x proto/           # A2A protocol definitions
-rw-r--r-- README.md        # Project overview

Step 6: Initialize Go Module

Ensure Go modules are properly initialized:

go mod tidy

This downloads all required dependencies. You should see output about downloading packages.

Step 7: Generate Protocol Buffer Code

Generate the Go code from Protocol Buffer definitions:

make proto

You should see:

Generating protobuf code for A2A protocol definitions...
Generating proto/eventbus.proto...
Generating proto/a2a.proto...
Protobuf code generated successfully.

Verify the generated files exist:

ls events/

You should see:

a2a/          # A2A protocol definitions
eventbus/     # AgentHub broker definitions

ls events/a2a/

You should see:

a2a.pb.go
a2a_grpc.pb.go

ls events/eventbus/

You should see:

eventbus.pb.go
eventbus_grpc.pb.go

Step 8: Build All Components

Build the AgentHub components:

make build

You should see:

Building A2A-compliant server binary...
Building A2A publisher binary...
Building A2A subscriber binary...
Build complete. A2A-compliant binaries are in the 'bin/' directory.

Verify the binaries were created:

ls bin/

You should see:

agenthub-server  # A2A-compliant AgentHub broker
publisher        # A2A message publisher
subscriber       # A2A message subscriber

Verification Test

Let’s verify everything works by running a quick test.

Step 9: Test the Installation

Start the A2A-compliant broker server in the background:

./bin/agenthub-server &

You should see:

2025/09/28 10:00:00 A2A-compliant AgentHub broker gRPC server listening on [::]:50051
2025/09/28 10:00:00 AgentHub service ready for A2A protocol communication

Start an A2A subscriber agent:

./bin/subscriber &

You should see:

A2A Agent started. Listening for A2A events and tasks. Press Enter to stop.
2025/09/28 10:00:05 A2A Agent agent_demo_subscriber subscribing to A2A tasks...
2025/09/28 10:00:05 Successfully subscribed to A2A tasks for agent agent_demo_subscriber. Waiting for A2A tasks...

Run the A2A publisher to send test tasks:

./bin/publisher

You should see A2A tasks being published and processed with conversation context and structured artifacts.

Clean up the test processes:

pkill -f agenthub-server
pkill -f subscriber

Development Environment Setup

Step 10: Configure Your Editor

For VS Code users:

Install the Go extension:

Open VS Code
Go to Extensions (Ctrl+Shift+X)
Search for “Go” and install the official Go extension
Open the AgentHub project folder

For other editors:

Ensure your editor has Go language support and Protocol Buffer syntax highlighting.

Step 11: Set Up Environment Variables (Recommended)

AgentHub uses environment variables for configuration. Create a .envrc file for local development:

cat > .envrc << EOF
# Core A2A AgentHub Configuration
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"

# A2A Protocol Configuration
export AGENTHUB_A2A_PROTOCOL_VERSION="1.0"
export AGENTHUB_MESSAGE_BUFFER_SIZE="100"
export AGENTHUB_CONTEXT_TIMEOUT="30s"
export AGENTHUB_ARTIFACT_MAX_SIZE="10MB"

# Health Check Ports
export AGENTHUB_HEALTH_PORT="8080"
export A2A_PUBLISHER_HEALTH_PORT="8081"
export A2A_SUBSCRIBER_HEALTH_PORT="8082"

# Observability (optional for development)
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export SERVICE_NAME="agenthub-dev"
export SERVICE_VERSION="dev"
export ENVIRONMENT="development"
export LOG_LEVEL="DEBUG"
EOF

Install direnv for automatic loading (recommended):

# macOS
brew install direnv

# Ubuntu/Debian
sudo apt install direnv

# After installation, add to your shell
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc  # For bash
echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc    # For zsh

Allow the environment file:

direnv allow

Alternative: Manual loading

source .envrc

📖 For complete environment variable reference, see Environment Variables Reference

Step 12: Verify Make Targets

Test all available make targets:

make help

You should see all available commands:

Makefile for gRPC Event Bus

Usage:
  make <target>

Targets:
  all              Builds all binaries (default).
  proto            Generates Go code from .proto files.
  build            Builds the server, publisher, and subscriber binaries.
  run-server       Runs the event bus gRPC server.
  run-publisher    Runs the publisher client.
  run-subscriber   Runs the subscriber client.
  clean            Removes generated Go files and build artifacts.
  help             Displays this help message.

Common Issues and Solutions

Issue: “protoc-gen-go: program not found”

Solution: Ensure Go bin directory is in your PATH:

export PATH=$PATH:$(go env GOPATH)/bin
echo 'export PATH=$PATH:$(go env GOPATH)/bin' >> ~/.bashrc
source ~/.bashrc

Issue: “go.mod not found”

Solution: Ensure you’re in the AgentHub project directory:

pwd  # Should show .../agenthub
ls go.mod  # Should exist

Issue: Port 50051 already in use

Solution: Kill existing processes or change the port:

lsof -ti:50051 | xargs kill -9

Issue: Permission denied on binaries

Solution: Make binaries executable:

chmod +x bin/*

Next Steps

Now that you have AgentHub installed and verified:

Learn the basics: Follow the Running the Demo tutorial
Build your first agent: Try Create a Subscriber
Understand the concepts: Read The Agent2Agent Principle

Getting Help

If you encounter issues:

Check the troubleshooting section above
Review the complete documentation
Open an issue on the GitHub repository

Congratulations! You now have a fully functional AgentHub development environment ready for building autonomous agent systems.

1.1.2 - Running the A2A-Compliant AgentHub Demo

Walk through setting up and running the complete A2A-compliant AgentHub EDA broker system. Learn how agents communicate using Agent2Agent protocol messages through the Event-Driven Architecture broker.

Running the A2A-Compliant AgentHub Demo

This tutorial will walk you through setting up and running the complete Agent2Agent (A2A) protocol-compliant AgentHub Event-Driven Architecture (EDA) broker system. By the end of this tutorial, you’ll have agents communicating using standardized A2A messages through the scalable EDA broker.

Prerequisites

Go 1.24 or later installed
Protocol Buffers compiler (protoc) installed
Basic understanding of gRPC and message brokers

Step 1: Build the A2A-Compliant Components

First, let’s build all the A2A-compliant components using the Makefile:

# Build all A2A-compliant binaries (generates protobuf files first)
make build

This will:

Generate A2A protocol files from proto/a2a_core.proto and proto/eventbus.proto
Build the A2A-compliant broker, publisher, and subscriber binaries
Place all binaries in the bin/ directory

You should see output like:

Building A2A-compliant server binary...
Building A2A-compliant publisher binary...
Building A2A-compliant subscriber binary...
Build complete. A2A-compliant binaries are in the 'bin/' directory.

Step 2: Start the AgentHub Broker Server

Open a terminal and start the AgentHub broker server:

./bin/broker

You should see output like:

time=2025-09-29T11:51:26.612+02:00 level=INFO msg="Starting health server" port=8080
time=2025-09-29T11:51:26.611+02:00 level=INFO msg="AgentHub gRPC server with observability listening" address=[::]:50051 health_endpoint=http://localhost:8080/health metrics_endpoint=http://localhost:8080/metrics component=broker

Keep this terminal open - the AgentHub broker needs to run continuously.

Step 3: Start an Agent (Subscriber)

Open a second terminal and start an agent that can receive and process tasks:

./bin/subscriber

You should see output indicating the agent has started:

time=2025-09-29T11:52:04.727+02:00 level=INFO msg="AgentHub client started with observability" broker_addr=localhost:50051 component=subscriber
time=2025-09-29T11:52:04.727+02:00 level=INFO msg="Starting health server" port=8082
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Agent started with observability. Listening for events and tasks."
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Subscribing to task results" agent_id=agent_demo_subscriber
time=2025-09-29T11:52:04.728+02:00 level=INFO msg="Subscribing to tasks" agent_id=agent_demo_subscriber

This agent can process several types of tasks:

greeting: Simple greeting messages
math_calculation: Basic arithmetic operations
random_number: Random number generation
Any unknown task type will be rejected

Step 4: Send A2A-Compliant Tasks

Open a third terminal and run the publisher to send A2A protocol-compliant task messages:

./bin/publisher

You’ll see the publisher send various A2A-compliant task messages through the AgentHub EDA broker:

time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Starting publisher demo"
time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Testing Agent2Agent Task Publishing via AgentHub with observability"
time=2025-09-29T14:41:11.237+02:00 level=INFO msg="Publishing A2A task" task_id=task_greeting_1759149671 task_type=greeting responder_agent_id=agent_demo_subscriber context_id=ctx_greeting_1759149671
time=2025-09-29T14:41:11.242+02:00 level=INFO msg="A2A task published successfully" task_id=task_greeting_1759149671 task_type=greeting event_id=evt_msg_greeting_1759149671_1759149671
time=2025-09-29T14:41:11.242+02:00 level=INFO msg="Published greeting task" task_id=task_greeting_1759149671
time=2025-09-29T14:41:14.243+02:00 level=INFO msg="Publishing A2A task" task_id=task_math_calculation_1759149674 task_type=math_calculation responder_agent_id=agent_demo_subscriber context_id=ctx_math_calculation_1759149674
time=2025-09-29T14:41:14.247+02:00 level=INFO msg="A2A task published successfully" task_id=task_math_calculation_1759149674 task_type=math_calculation event_id=evt_msg_math_calculation_1759149674_1759149674
time=2025-09-29T14:41:16.248+02:00 level=INFO msg="Publishing A2A task" task_id=task_random_number_1759149676 task_type=random_number responder_agent_id=agent_demo_subscriber context_id=ctx_random_number_1759149676
time=2025-09-29T14:41:16.249+02:00 level=INFO msg="Published random number task" task_id=task_random_number_1759149676

Notice how the A2A implementation includes:

Context IDs: Each task is grouped in a conversation context (ctx_greeting_...)
Event IDs: EDA wrapper events have unique identifiers for tracing
A2A Task Structure: Tasks use A2A-compliant Message and Part formats

Step 5: Observe A2A Task Processing

Switch back to the subscriber terminal to see the agent processing A2A tasks in real-time:

time=2025-09-29T14:41:11.243+02:00 level=INFO msg="Task processing completed" task_id=task_greeting_1759149671 status=TASK_STATE_COMPLETED has_artifact=true
time=2025-09-29T14:41:14.253+02:00 level=INFO msg="Task processing completed" task_id=task_math_calculation_1759149674 status=TASK_STATE_COMPLETED has_artifact=true
time=2025-09-29T14:41:16.249+02:00 level=INFO msg="Task processing completed" task_id=task_random_number_1759149676 status=TASK_STATE_COMPLETED has_artifact=true

Notice the A2A-compliant processing:

Task States: Using A2A standard states (TASK_STATE_COMPLETED)
Artifacts: Each completed task generates A2A artifacts (has_artifact=true)
Structured Processing: Tasks are processed using A2A Message and Part handlers

Step 6: Check the Broker Logs

In the first terminal (broker server), you’ll see logs showing message routing:

2025/09/27 16:34:33 Received task request: task_greeting_1758983673 (type: greeting) from agent: agent_demo_publisher
2025/09/27 16:34:35 Received task result for task: task_greeting_1758983673 from agent: agent_demo_subscriber
2025/09/27 16:34:35 Received task progress for task: task_greeting_1758983673 (100%) from agent: agent_demo_subscriber

Understanding What Happened

A2A Message Creation: The publisher created A2A-compliant messages with:
- Message Structure: Using A2A Message format with Part content
- Context Grouping: Each task belongs to a conversation context
- Task Association: Messages are linked to specific A2A tasks
- Role Definition: Messages specify USER (requester) or AGENT (responder) roles
EDA Event Routing: The AgentHub EDA broker:
- Wrapped A2A Messages: A2A messages wrapped in AgentEvent for EDA transport
- Event-Driven Routing: Used EDA patterns for scalable message delivery
- Task Storage: Stored A2A tasks with full message history and artifacts
- Status Tracking: Managed A2A task lifecycle (SUBMITTED → WORKING → COMPLETED)
A2A Task Processing: The subscriber agent:
- A2A Task Reception: Received A2A tasks via EDA event streams
- Message Processing: Processed A2A Message content using Part handlers
- Artifact Generation: Generated structured A2A artifacts as task output
- Status Updates: Published A2A-compliant status updates through EDA events
Hybrid Architecture Benefits:
- A2A Compliance: Full interoperability with other A2A-compliant systems
- EDA Scalability: Event-driven patterns for high-throughput scenarios
- Standards-Based: Using industry-standard Agent2Agent protocol
- Observable: Built-in tracing and metrics for production deployment

Next Steps

Now that you have the basic system working, you can:

Create Multiple Agents: Run multiple subscriber instances with different agent IDs to see task distribution
Add Custom Task Types: Modify the subscriber to handle new types of tasks
Build a Request-Response Flow: Create an agent that both requests and processes tasks
Monitor Task Progress: Build a dashboard that subscribes to task progress updates

Troubleshooting

Port Already in Use: If you see “bind: address already in use”, kill any existing processes:

lsof -ti:50051 | xargs kill -9

Agent Not Receiving Tasks: Ensure the agent ID in the publisher matches the subscriber’s agent ID (agent_demo_subscriber).

Build Errors: Regenerate A2A protocol buffer files and ensure all imports are correct:

# Clean old protobuf files
make clean

# Regenerate A2A protobuf files
make proto

# Rebuild everything
make build

A2A Compliance Issues: Verify A2A protocol structures are correctly generated:

# Check A2A core types
ls events/a2a/

# Should show: a2a_core.pb.go eventbus.pb.go eventbus_grpc.pb.go

You now have a working A2A-compliant AgentHub EDA broker system! The agents can exchange standardized A2A messages, maintain conversation contexts, generate structured artifacts, and track task lifecycles - all through your scalable Event-Driven Architecture broker with full Agent2Agent protocol compliance.

1.2 - Observability

Tutorials for monitoring and observing AgentHub systems

Observability Tutorials

Learn how to monitor, trace, and observe your AgentHub deployments with comprehensive observability features.

Available Tutorials

Observability Demo - Hands-on tutorial with tracing and metrics
Dashboard Tour - Explore the monitoring dashboards and visualizations

1.2.1 - Interactive Dashboard Tour

Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.

Interactive Dashboard Tour

Learn by doing: Take a guided tour through AgentHub’s Grafana dashboards while the system is running, learning to interpret metrics, identify issues, and understand system behavior in real-time.

Prerequisites

Observability stack running (from the Observability Demo)
Observable agents running (broker, publisher, subscriber)
Grafana open at http://localhost:3333
10-15 minutes for the complete tour

Quick Setup Reminder

If you haven’t completed the observability demo yet:

# Start observability stack
cd agenthub/observability
docker-compose up -d

# Run observable agents (3 terminals)
go run broker/main.go
go run agents/subscriber/main.go
go run agents/publisher/main.go

Accessing the Main Dashboard

Open Grafana: http://localhost:3333
Login: admin / admin (skip password change for demo)
Navigate: Dashboards → Browse → AgentHub → “AgentHub EDA System Observatory”
Bookmark: Save this URL for quick access: http://localhost:3333/d/agenthub-eda-dashboard

Dashboard Layout Overview

The dashboard is organized in 4 main rows:

🎯 Row 1: Event Processing Overview
├── Event Processing Rate (events/sec)
└── Event Processing Error Rate (%)

📊 Row 2: Event Analysis
├── Event Types Distribution (pie chart)
└── Event Processing Latency (p50, p95, p99)

🔍 Row 3: Distributed Tracing
└── Jaeger Integration Panel

💻 Row 4: System Health
├── Service CPU Usage (%)
├── Service Memory Usage (MB)
├── Go Goroutines Count
└── Service Health Status

Interactive Tour

Tour 1: Understanding Event Flow (3 minutes)

Step 1: Watch the Event Processing Rate

Location: Top-left panel What to observe: Real-time lines showing events per second

Identify the services:
- Green line: agenthub-broker (should be highest - processes all events)
- Blue line: agenthub-publisher (events being created)
- Orange line: agenthub-subscriber (events being processed)
Watch the pattern:
- Publisher creates bursts of events
- Broker immediately processes them (routing)
- Subscriber processes them shortly after

Understand the flow:

Publisher (creates) → Broker (routes) → Subscriber (processes)
     50/sec      →      150/sec     →      145/sec

💡 Tour Insight: The broker rate is higher because it processes both incoming tasks AND outgoing results.

Step 2: Monitor Error Rates

Location: Top-right panel (gauge) What to observe: Error percentage gauge

Healthy system: Should show 0-2% (green zone)
If you see higher errors:
- Check if all services are running
- Look for red traces in Jaeger (we’ll do this next)

Error rate calculation:

Error Rate = (Failed Events / Total Events) × 100

🎯 Action: Note your current error rate - we’ll compare it later.

Tour 2: Event Analysis Deep Dive (3 minutes)

Step 3: Explore Event Types

Location: Middle-left panel (pie chart) What to observe: Distribution of different event types

Identify event types:
- greeting: Most common (usually 40-50%)
- math_calculation: Compute-heavy tasks (30-40%)
- random_number: Quick tasks (15-25%)
- unknown_task: Error-generating tasks (2-5%)
Business insights:
- Larger slices = more frequent tasks
- Small red slice = intentional error tasks for testing

💡 Tour Insight: The publisher randomly generates different task types to simulate real-world workload diversity.

Step 4: Analyze Processing Latency

Location: Middle-right panel What to observe: Three latency lines (p50, p95, p99)

Understand percentiles:
- p50 (blue): 50% of events process faster than this
- p95 (green): 95% of events process faster than this
- p99 (red): 99% of events process faster than this
Healthy ranges:
- p50: < 50ms (very responsive)
- p95: < 200ms (good performance)
- p99: < 500ms (acceptable outliers)
Pattern recognition:
- Spiky p99 = occasional slow tasks (normal)
- Rising p50 = systemic slowdown (investigate)
- Flat lines = no activity or measurement issues

🎯 Action: Hover over the lines to see exact values at different times.

Tour 3: Distributed Tracing Exploration (4 minutes)

Step 5: Jump into Jaeger

Location: Middle section - “Distributed Traces” panel Action: Click the “Explore” button

This opens Jaeger in a new tab. Let’s explore:

In Jaeger UI:
- Service dropdown: Select “agenthub-broker”
- Operation: Leave as “All”
- Click “Find Traces”
Pick a trace to examine:
- Look for traces that show multiple spans
- Click on any trace line to open details

Understand the trace structure:

Timeline View:
agenthub-publisher: publish_event [2ms]
  └── agenthub-broker: process_event [1ms]
      └── agenthub-subscriber: consume_event [3ms]
          └── agenthub-subscriber: process_task [15ms]
              └── agenthub-subscriber: publish_result [2ms]

Explore span details:
- Click individual spans to see:
  - Tags: event_type, event_id, agent names
  - Process: Which service handled the span
  - Duration: Exact timing information

💡 Tour Insight: Each event creates a complete “trace” showing its journey from creation to completion.

Step 6: Find and Analyze an Error

Search for error traces:
- In Jaeger, add tag filter: error=true
- Or look for traces with red spans
Examine the error trace:
- Red spans indicate errors
- Error tags show the error type and message
- Stack traces help with debugging
Follow the error propagation:
- See how errors affect child spans
- Notice error context in span attributes

🎯 Action: Find a trace with “unknown_task” event type - these are designed to fail for demonstration.

Tour 4: System Health Monitoring (3 minutes)

Step 7: Monitor Resource Usage

Location: Bottom row panels What to observe: System resource consumption

CPU Usage Panel (Bottom-left):
- Normal range: 10-50% for demo workload
- Watch for: Sustained high CPU (>70%)
- Services comparison: See which service uses most CPU
Memory Usage Panel (Bottom-center-left):
- Normal range: 30-80MB per service for demo
- Watch for: Continuously growing memory (memory leaks)
- Pattern: Sawtooth = normal GC, steady growth = potential leak
Goroutines Panel (Bottom-center-right):
- Normal range: 10-50 goroutines per service
- Watch for: Continuously growing count (goroutine leaks)
- Pattern: Stable baseline with activity spikes

Step 8: Verify Service Health

Location: Bottom-right panel What to observe: Service up/down status

Health indicators:
- Green: Service healthy and responding
- Red: Service down or health check failing
- Yellow: Service degraded but operational
Health check details:
- Each service exposes /health endpoint
- Prometheus monitors these endpoints
- Dashboard shows aggregated status

🎯 Action: Open http://localhost:8080/health in a new tab to see raw health data.

Tour 5: Time-based Analysis (2 minutes)

Step 9: Change Time Ranges

Location: Top-right of dashboard (time picker) Current: Likely showing “Last 5 minutes”

Try different ranges:
- Last 15 minutes: See longer trends
- Last 1 hour: See full demo session
- Custom range: Pick specific time period
Observe pattern changes:
- Longer ranges: Show trends and patterns
- Shorter ranges: Show real-time detail
- Custom ranges: Zoom into specific incidents

Step 10: Use Dashboard Filters

Location: Top of dashboard - variable dropdowns

Service Filter:
- Select “All” to see everything
- Pick specific service to focus analysis
- Useful for isolating service-specific issues
Event Type Filter:
- Filter to specific event types
- Compare performance across task types
- Identify problematic event categories

💡 Tour Insight: Filtering helps you drill down from system-wide view to specific components or workloads.

Hands-on Experiments

Experiment 1: Create a Service Outage

Goal: See how the dashboard shows service failures

Stop the subscriber:
```
# In subscriber terminal, press Ctrl+C
```
Watch the dashboard changes:
- Error rate increases (top-right gauge turns red)
- Subscriber metrics disappear from bottom panels
- Service health shows subscriber as down
Check Jaeger for failed traces:
- Look for traces that don’t complete
- See where the chain breaks
Restart subscriber:
```
go run agents/subscriber/main.go
```

🎯 Learning: Dashboard immediately shows impact of service failures.

Experiment 2: Generate High Load

Goal: See system behavior under stress

Modify publisher to generate more events:

# Edit agents/publisher/main.go
# Change: time.Sleep(5 * time.Second)
# To:     time.Sleep(1 * time.Second)

Watch dashboard changes:
- Processing rate increases
- Latency may increase
- CPU/memory usage grows
Observe scaling behavior:
- How does the system handle increased load?
- Do error rates increase?
- Where are the bottlenecks?

🎯 Learning: Dashboard shows system performance characteristics under load.

Dashboard Interpretation Guide

What Good Looks Like

✅ Event Processing Rate: Steady activity matching workload ✅ Error Rate: < 5% (green zone) ✅ Event Types: Expected distribution ✅ Latency: p95 < 200ms, p99 < 500ms ✅ CPU Usage: < 50% sustained ✅ Memory: Stable or slow growth with GC cycles ✅ Goroutines: Stable baseline with activity spikes ✅ Service Health: All services green/up

Warning Signs

⚠️ Error Rate: 5-10% (yellow zone) ⚠️ Latency: p95 > 200ms or rising trend ⚠️ CPU: Sustained > 70% ⚠️ Memory: Continuous growth without GC ⚠️ Missing data: Gaps in metrics (service issues)

Critical Issues

🚨 Error Rate: > 10% (red zone) 🚨 Latency: p95 > 500ms 🚨 CPU: Sustained > 90% 🚨 Memory: Rapid growth or OOM 🚨 Service Health: Any service showing red/down 🚨 Traces: Missing or broken trace chains

Next Steps After the Tour

For Daily Operations:

Bookmark: Save dashboard URL for quick access
Set up alerts: Configure notifications for critical metrics
Create views: Use filters to create focused views for your team

For Development:

Add Observability to Your Agent - Instrument your own agents
Debug with Distributed Tracing - Use Jaeger for troubleshooting

For Deep Understanding:

Distributed Tracing Explained - Learn the concepts
Observability Metrics Reference - Complete metrics catalog

Troubleshooting Tour Issues

Issue	Solution
Dashboard shows no data	Verify observability environment variables are set
Grafana won’t load	Check `docker-compose ps` in observability/
Metrics missing	Verify Prometheus targets at http://localhost:9090/targets
Jaeger empty	Ensure trace context propagation is working

🎉 Congratulations! You’ve completed the interactive dashboard tour and learned to read AgentHub’s observability signals like a pro!

🎯 Ready for More?

Master the Tools: Use Grafana Dashboards - Advanced dashboard usage

Troubleshoot Issues: Debug with Distributed Tracing - Use Jaeger effectively

1.2.2 - AgentHub Observability Demo Tutorial

Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes through hands-on learning.

AgentHub Observability Demo Tutorial

Learn by doing: Experience the complete observability stack with distributed tracing, real-time metrics, and intelligent alerting in under 10 minutes.

What You’ll Learn

By the end of this tutorial, you’ll have:

✅ Seen distributed traces flowing across multiple agents
✅ Monitored real-time metrics in beautiful Grafana dashboards
✅ Understood event correlation through trace IDs
✅ Experienced intelligent alerting when things go wrong
✅ Explored the complete observability stack components

Prerequisites

Go 1.24+ installed
Docker and Docker Compose installed
Environment variables configured (see Installation and Setup)
10 minutes of your time
Basic terminal knowledge

💡 Environment Note: AgentHub agents automatically enable observability when JAEGER_ENDPOINT is configured. See Environment Variables Reference for all configuration options.

Step 1: Clone and Setup (1 minute)

# Clone the repository
git clone https://github.com/owulveryck/agenthub.git
cd agenthub

# Verify you have the observability files
ls observability/
# You should see: docker-compose.yml, grafana/, prometheus/, etc.

Step 2: Start the Observability Stack (2 minutes)

# Navigate to observability directory
cd observability

# Start all monitoring services
docker-compose up -d

# Verify services are running
docker-compose ps

Expected Output:

NAME                    COMMAND                  SERVICE             STATUS
agenthub-grafana        "/run.sh"                grafana             running
agenthub-jaeger         "/go/bin/all-in-one"     jaeger              running
agenthub-prometheus     "/bin/prometheus --c…"   prometheus          running
agenthub-otel-collector "/otelcol-contrib --…"   otel-collector      running

🎯 Checkpoint 1: All services should be “running”. If not, check Docker logs: docker-compose logs <service-name>

Step 3: Access the Dashboards (1 minute)

Open these URLs in your browser (keep them open in tabs):

Service	URL	Purpose
Grafana	http://localhost:3333	Main observability dashboard
Jaeger	http://localhost:16686	Distributed tracing
Prometheus	http://localhost:9090	Raw metrics and alerts

Grafana Login: admin / admin (skip password change for demo)

🎯 Checkpoint 2: You should see Grafana’s welcome page and Jaeger’s empty trace list.

Step 4: Start the Observable Broker (1 minute)

Open a new terminal and navigate back to the project root:

# From agenthub root directory
go run broker/main.go

Expected Output:

time=2025-09-28T21:00:00.000Z level=INFO msg="Starting health server on port 8080"
time=2025-09-28T21:00:00.000Z level=INFO msg="AgentHub broker gRPC server with observability listening" address="[::]:50051" health_endpoint="http://localhost:8080/health" metrics_endpoint="http://localhost:8080/metrics"

🎯 Checkpoint 3:

Broker is listening on port 50051
Health endpoint available at http://localhost:8080/health
Metrics endpoint available at http://localhost:8080/metrics

Step 5: Start the Observable Subscriber (1 minute)

Open another terminal:

go run agents/subscriber/main.go

Expected Output:

time=2025-09-28T21:00:01.000Z level=INFO msg="Starting health server on port 8082"
time=2025-09-28T21:00:01.000Z level=INFO msg="Starting observable subscriber"
time=2025-09-28T21:00:01.000Z level=INFO msg="Agent started with observability. Listening for events and tasks."

🎯 Checkpoint 4:

Subscriber is connected and listening
Health available at http://localhost:8082/health

Step 6: Generate Events with the Publisher (2 minutes)

Open a third terminal:

go run agents/publisher/main.go

Expected Output:

time=2025-09-28T21:00:02.000Z level=INFO msg="Starting health server on port 8081"
time=2025-09-28T21:00:02.000Z level=INFO msg="Starting observable publisher demo"
time=2025-09-28T21:00:02.000Z level=INFO msg="Publishing task" task_id=task_greeting_1727557202 task_type=greeting responder_agent_id=agent_demo_subscriber
time=2025-09-28T21:00:02.000Z level=INFO msg="Task published successfully" task_id=task_greeting_1727557202 task_type=greeting

🎯 Checkpoint 5: You should see:

Publisher creating and sending tasks
Subscriber receiving and processing tasks
Broker routing messages between them

Step 7: Explore Real-time Metrics in Grafana (2 minutes)

Go to Grafana: http://localhost:3333
Navigate to Dashboards → Browse → AgentHub → “AgentHub EDA System Observatory”
Observe the real-time data:

What You’ll See:

Event Processing Rate (Top Left)

Lines showing events/second for each service
Should show activity spikes when publisher runs

Error Rate (Top Right)

Gauge showing error percentage
Should be green (< 5% errors)

Event Types Distribution (Middle Left)

Pie chart showing task types: greeting, math_calculation, random_number
Different colors for each task type

Processing Latency (Middle Right)

Three lines: p50, p95, p99 latencies
Should show sub-second processing times

System Health (Bottom)

CPU usage, memory usage, goroutines
Service health status (all should be UP)

🎯 Checkpoint 6: Dashboard should show live metrics with recent activity.

Step 8: Explore Distributed Traces in Jaeger (2 minutes)

Go to Jaeger: http://localhost:16686
Select Service: Choose “agenthub-broker” from dropdown
Click “Find Traces”
Click on any trace to see details

What You’ll See:

Complete Event Journey:

agenthub-publisher: publish_event (2ms)
  └── agenthub-broker: process_event (1ms)
      └── agenthub-subscriber: consume_event (5ms)
          └── agenthub-subscriber: process_task (15ms)
              └── agenthub-subscriber: publish_result (2ms)

Trace Details:

Span Tags: event_id, event_type, service names
Timing Information: Exact start/end times and durations
Log Correlation: Each span linked to structured logs

Error Detection:

Look for red spans indicating errors
Trace the “unknown_task” type to see how errors propagate

🎯 Checkpoint 7: You should see complete traces showing the full event lifecycle.

Step 9: Correlate Logs with Traces (1 minute)

Copy a trace ID from Jaeger (the long hex string)

Check broker logs for that trace ID:

# In your broker terminal, look for lines like:
time=2025-09-28T21:00:02.000Z level=INFO msg="Received task request" task_id=task_greeting_1727557202 trace_id=a1b2c3d4e5f6...

Check subscriber logs for the same trace ID

🎯 Checkpoint 8: You should find the same trace_id in logs across multiple services.

Step 10: Experience Intelligent Alerting (Optional)

To see alerting in action:

Simulate errors by stopping the subscriber:
```
# In subscriber terminal, press Ctrl+C
```
Keep publisher running (it will fail to process tasks)
Check Prometheus alerts:
- Go to http://localhost:9090/alerts
- After ~5 minutes, you should see “HighEventProcessingErrorRate” firing
Restart subscriber to clear the alert

🎉 Congratulations!

You’ve successfully experienced the complete AgentHub observability stack!

Summary: What You Accomplished

✅ Deployed a complete observability stack with Docker Compose ✅ Ran observable agents with automatic instrumentation ✅ Monitored real-time metrics in Grafana dashboards ✅ Traced event flows across multiple services with Jaeger ✅ Correlated logs with traces using trace IDs ✅ Experienced intelligent alerting with Prometheus ✅ Understood the complete event lifecycle from publisher to subscriber

Key Observability Concepts You Learned

Distributed Tracing

Events get unique trace IDs that follow them everywhere
Each processing step creates a “span” with timing information
Complete request flows are visible across service boundaries

Metrics Collection

47+ different metrics automatically collected
Real-time visualization of system health and performance
Historical data for trend analysis

Structured Logging

All logs include trace context for correlation
Consistent format across all services
Easy debugging and troubleshooting

Intelligent Alerting

Proactive monitoring for error rates and performance
Automatic notifications when thresholds are exceeded
Helps prevent issues before they impact users

Next Steps

For Development:

Add Observability to Your Agent - Instrument your own agents
Debug with Distributed Tracing - Troubleshoot issues effectively

For Operations:

Use Grafana Dashboards - Master the monitoring interface
Configure Alerts - Set up production alerting

For Understanding:

Distributed Tracing Explained - Deep dive into concepts
Observability Architecture - How it all works together

Troubleshooting

Issue	Solution
Services won’t start	Run `docker-compose down && docker-compose up -d`
No metrics in Grafana	Check Prometheus targets: http://localhost:9090/targets
No traces in Jaeger	Verify JAEGER_ENDPOINT environment variable is set correctly
Permission errors	Ensure Docker has proper permissions

Clean Up

When you’re done exploring:

# Stop the observability stack
cd observability
docker-compose down

# Stop the Go applications
# Press Ctrl+C in each terminal running the agents

🎯 Ready for More?

Production Usage: Add Observability to Your Agent

Deep Understanding: Distributed Tracing Explained

1.3 - Workflows

Tutorials for building complex multi-agent workflows

Workflow Tutorials

Learn to design and implement sophisticated multi-agent workflows and orchestration patterns.

Available Tutorials

Building Multi-Agent Workflows - Complete guide to creating complex agent interactions

1.3.1 - Building Multi-Agent Workflows

Learn to create complex workflows involving multiple specialized agents working together to accomplish sophisticated tasks. Build a real document processing pipeline with multiple agents handling different stages.

Building Multi-Agent Workflows

This advanced tutorial teaches you to create complex workflows involving multiple specialized agents working together to accomplish sophisticated tasks. You’ll build a real document processing pipeline with multiple agents handling different stages.

What You’ll Build

By the end of this tutorial, you’ll have an A2A-compliant multi-agent system that:

Ingests documents through an A2A Document Intake Agent
Validates content using an A2A Validation Agent
Extracts metadata with an A2A Metadata Extraction Agent
Processes text through an A2A Text Processing Agent
Generates summaries using an A2A Summary Agent
Orchestrates the workflow with an A2A Workflow Coordinator Agent

This demonstrates real-world A2A agent collaboration patterns with conversation context, structured message content, and artifact-based results used in production systems.

Prerequisites

Complete the Installation and Setup tutorial
Complete the Running the Demo tutorial
Familiarity with Go programming
Understanding of basic agent concepts

Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  A2A Workflow   │    │   AgentHub      │    │  A2A Specialized│
│  Coordinator    │    │  A2A Broker     │    │    Agents       │
│                 │    │                 │    │                 │
│ • A2A context   │◄──►│ • Routes A2A    │◄──►│ • Document      │
│   management    │    │   messages      │    │   Intake        │
│ • Conversation  │    │ • Tracks A2A    │    │ • Validation    │
│   threading     │    │   conversations │    │ • Metadata      │
│ • Artifact      │    │ • Manages A2A   │    │ • Text Proc     │
│   aggregation   │    │   state         │    │ • Summary       │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Step 1: Create the Workflow Coordinator

First, let’s create the main coordinator that manages the document processing pipeline.

Create the coordinator agent:

mkdir -p agents/coordinator

Create agents/coordinator/main.go:

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/google/uuid"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"

	a2a "github.com/owulveryck/agenthub/events/a2a"
	pb "github.com/owulveryck/agenthub/events/eventbus"
)

const (
	agentHubAddr = "localhost:50051"
	agentID      = "a2a_workflow_coordinator"
)

type A2ADocumentWorkflow struct {
	DocumentID    string
	ContextID     string                 // A2A conversation context
	Status        string
	CurrentStage  string
	TaskHistory   []*a2a.Task           // Complete A2A task history
	Artifacts     []*a2a.Artifact       // Collected artifacts from stages
	StartTime     time.Time
	client        pb.AgentHubClient      // A2A-compliant client
}

func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()

	client := pb.NewAgentHubClient(conn)
	coordinator := &A2AWorkflowCoordinator{
		client:        client,
		workflows:     make(map[string]*A2ADocumentWorkflow),
	}

	ctx := context.Background()

	// Start listening for A2A task events
	go coordinator.subscribeToA2AEvents(ctx)

	// Start processing documents with A2A workflow
	coordinator.startA2ADocumentProcessing(ctx)

	// Keep running
	select {}
}

type A2AWorkflowCoordinator struct {
	client    pb.AgentHubClient
	workflows map[string]*A2ADocumentWorkflow
}

func (wc *A2AWorkflowCoordinator) startA2ADocumentProcessing(ctx context.Context) {
	// Simulate document arrival with A2A structured content
	documents := []map[string]interface{}{
		{
			"document_id": "doc_001",
			"content":     "This is a sample business document about quarterly results.",
			"filename":    "q3_results.txt",
			"source":      "email_attachment",
			"doc_type":    "business_report",
		},
		{
			"document_id": "doc_002",
			"content":     "Technical specification for the new API endpoints and authentication mechanisms.",
			"filename":    "api_spec.txt",
			"source":      "file_upload",
			"doc_type":    "technical_spec",
		},
	}

	for _, doc := range documents {
		wc.processA2ADocument(ctx, doc)
		time.Sleep(5 * time.Second)
	}
}

func (wc *A2AWorkflowCoordinator) processA2ADocument(ctx context.Context, document map[string]interface{}) {
	documentID := document["document_id"].(string)
	contextID := fmt.Sprintf("doc_workflow_%s_%s", documentID, uuid.New().String())

	workflow := &A2ADocumentWorkflow{
		DocumentID:   documentID,
		ContextID:    contextID,
		Status:       "started",
		CurrentStage: "intake",
		TaskHistory:  make([]*a2a.Task, 0),
		Artifacts:    make([]*a2a.Artifact, 0),
		StartTime:    time.Now(),
		client:       wc.client,
	}

	wc.workflows[documentID] = workflow

	log.Printf("Starting A2A document processing workflow for %s with context %s", documentID, contextID)

	// Stage 1: A2A Document Intake
	wc.publishA2ATask(ctx, "document_intake", document, "a2a_document_intake_agent", workflow)
}

func (wc *A2AWorkflowCoordinator) publishA2ATask(ctx context.Context, taskDescription string, params map[string]interface{}, targetAgent string, workflow *A2ADocumentWorkflow) {
	taskID := fmt.Sprintf("task_%s_%s", taskDescription, uuid.New().String())
	messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())

	// Create A2A structured content
	paramsData, err := structpb.NewStruct(params)
	if err != nil {
		log.Printf("Error creating parameters: %v", err)
		return
	}

	// Create A2A message with structured parts
	requestMessage := &a2a.Message{
		MessageId: messageID,
		ContextId: workflow.ContextID,
		TaskId:    taskID,
		Role:      a2a.Role_USER,
		Content: []*a2a.Part{
			{
				Part: &a2a.Part_Text{
					Text: fmt.Sprintf("Please process %s for document %s", taskDescription, workflow.DocumentID),
				},
			},
			{
				Part: &a2a.Part_Data{
					Data: &a2a.DataPart{
						Data:        paramsData,
						Description: fmt.Sprintf("%s parameters", taskDescription),
					},
				},
			},
		},
	}

	// Create A2A task
	task := &a2a.Task{
		Id:        taskID,
		ContextId: workflow.ContextID,
		Status: &a2a.TaskStatus{
			State:     a2a.TaskState_TASK_STATE_SUBMITTED,
			Update:    requestMessage,
			Timestamp: timestamppb.Now(),
		},
		History: []*a2a.Message{requestMessage},
		Metadata: paramsData,
	}

	// Store in workflow history
	workflow.TaskHistory = append(workflow.TaskHistory, task)

	// Publish A2A task update
	req := &pb.PublishTaskUpdateRequest{
		Task: task,
		Routing: &pb.AgentEventMetadata{
			FromAgentId: agentID,
			ToAgentId:   targetAgent,
			EventType:   "task.submitted",
			Priority:    pb.Priority_PRIORITY_MEDIUM,
		},
	}

	log.Printf("Publishing A2A %s task for workflow %s in context %s", taskDescription, workflow.DocumentID, workflow.ContextID)
	_, err = wc.client.PublishTaskUpdate(ctx, req)
	if err != nil {
		log.Printf("Error publishing A2A task: %v", err)
	}
}

func (wc *WorkflowCoordinator) subscribeToResults(ctx context.Context) {
	req := &pb.SubscribeToTaskResultsRequest{
		RequesterAgentId: agentID,
	}

	stream, err := wc.client.SubscribeToTaskResults(ctx, req)
	if err != nil {
		log.Printf("Error subscribing to results: %v", err)
		return
	}

	for {
		result, err := stream.Recv()
		if err != nil {
			log.Printf("Error receiving result: %v", err)
			return
		}

		wc.handleTaskResult(ctx, result)
	}
}

func (wc *WorkflowCoordinator) handleTaskResult(ctx context.Context, result *pb.TaskResult) {
	params := result.GetResult().AsMap()
	workflowID := params["workflow_id"].(string)
	stage := params["stage"].(string)

	workflow, exists := wc.workflows[workflowID]
	if !exists {
		log.Printf("Unknown workflow ID: %s", workflowID)
		return
	}

	log.Printf("Received result for workflow %s, stage %s: %s",
		workflowID, stage, result.GetStatus().String())

	if result.GetStatus() == pb.TaskStatus_TASK_STATUS_FAILED {
		workflow.Status = "failed"
		log.Printf("Workflow %s failed at stage %s: %s",
			workflowID, stage, result.GetErrorMessage())
		return
	}

	// Store stage results
	workflow.Results[stage] = params

	// Advance to next stage
	wc.advanceWorkflow(ctx, workflow, stage)
}

func (wc *WorkflowCoordinator) advanceWorkflow(ctx context.Context, workflow *DocumentWorkflow, completedStage string) {
	switch completedStage {
	case "document_intake":
		// Move to validation
		workflow.CurrentStage = "validation"
		data := workflow.Results["document_intake"]
		wc.publishTask(ctx, "document_validation", data.(map[string]interface{}), "validation_agent", workflow.DocumentID)

	case "document_validation":
		// Move to metadata extraction
		workflow.CurrentStage = "metadata_extraction"
		data := workflow.Results["document_validation"]
		wc.publishTask(ctx, "metadata_extraction", data.(map[string]interface{}), "metadata_agent", workflow.DocumentID)

	case "metadata_extraction":
		// Move to text processing
		workflow.CurrentStage = "text_processing"
		data := workflow.Results["metadata_extraction"]
		wc.publishTask(ctx, "text_processing", data.(map[string]interface{}), "text_processor_agent", workflow.DocumentID)

	case "text_processing":
		// Move to summary generation
		workflow.CurrentStage = "summary_generation"
		data := workflow.Results["text_processing"]
		wc.publishTask(ctx, "summary_generation", data.(map[string]interface{}), "summary_agent", workflow.DocumentID)

	case "summary_generation":
		// Workflow complete
		workflow.Status = "completed"
		workflow.CurrentStage = "finished"
		duration := time.Since(workflow.StartTime)

		log.Printf("Workflow %s completed successfully in %v", workflow.DocumentID, duration)
		wc.printWorkflowSummary(workflow)
	}
}

func (wc *WorkflowCoordinator) printWorkflowSummary(workflow *DocumentWorkflow) {
	fmt.Printf("\n=== WORKFLOW SUMMARY ===\n")
	fmt.Printf("Document ID: %s\n", workflow.DocumentID)
	fmt.Printf("Status: %s\n", workflow.Status)
	fmt.Printf("Duration: %v\n", time.Since(workflow.StartTime))
	fmt.Printf("Stages completed:\n")

	for stage, result := range workflow.Results {
		fmt.Printf("  - %s: %v\n", stage, result)
	}
	fmt.Printf("=======================\n\n")
}

Step 2: Create Specialized Agents

Now let’s create each specialized agent that handles specific stages of the pipeline.

Document Intake Agent

Create agents/document_intake/main.go:

package main

import (
	"context"
	"crypto/md5"
	"fmt"
	"io"
	"log"
	"strings"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"

	pb "github.com/owulveryck/agenthub/events/a2a"
)

const (
	agentHubAddr = "localhost:50051"
	agentID      = "document_intake_agent"
)

func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()

	client := pb.NewEventBusClient(conn)
	agent := &DocumentIntakeAgent{client: client}

	ctx := context.Background()
	agent.start(ctx)
}

type DocumentIntakeAgent struct {
	client pb.EventBusClient
}

func (dia *DocumentIntakeAgent) start(ctx context.Context) {
	log.Printf("Document Intake Agent %s starting...", agentID)

	req := &pb.SubscribeToTasksRequest{
		AgentId:   agentID,
		TaskTypes: []string{"document_intake"},
	}

	stream, err := dia.client.SubscribeToTasks(ctx, req)
	if err != nil {
		log.Fatalf("Error subscribing: %v", err)
	}

	log.Printf("Subscribed to document intake tasks")

	for {
		task, err := stream.Recv()
		if err == io.EOF {
			return
		}
		if err != nil {
			log.Printf("Error receiving task: %v", err)
			return
		}

		go dia.processTask(ctx, task)
	}
}

func (dia *DocumentIntakeAgent) processTask(ctx context.Context, task *pb.TaskMessage) {
	log.Printf("Processing document intake task: %s", task.GetTaskId())

	params := task.GetParameters().AsMap()

	// Simulate document intake processing
	time.Sleep(2 * time.Second)

	// Generate document hash
	content := params["content"].(string)
	hash := fmt.Sprintf("%x", md5.Sum([]byte(content)))

	// Extract basic metadata
	wordCount := len(strings.Fields(content))
	charCount := len(content)

	result := map[string]interface{}{
		"document_id":   params["document_id"],
		"workflow_id":   params["workflow_id"],
		"stage":         "document_intake",
		"content":       content,
		"filename":      params["filename"],
		"source":        params["source"],
		"document_hash": hash,
		"word_count":    wordCount,
		"char_count":    charCount,
		"intake_timestamp": time.Now().Format(time.RFC3339),
		"status":        "intake_complete",
	}

	dia.publishResult(ctx, task, result, pb.TaskStatus_TASK_STATUS_COMPLETED, "")
}

func (dia *DocumentIntakeAgent) publishResult(ctx context.Context, originalTask *pb.TaskMessage, result map[string]interface{}, status pb.TaskStatus, errorMsg string) {
	resultStruct, err := structpb.NewStruct(result)
	if err != nil {
		log.Printf("Error creating result struct: %v", err)
		return
	}

	taskResult := &pb.TaskResult{
		TaskId:          originalTask.GetTaskId(),
		Status:          status,
		Result:          resultStruct,
		ErrorMessage:    errorMsg,
		ExecutorAgentId: agentID,
		CompletedAt:     timestamppb.Now(),
	}

	req := &pb.PublishTaskResultRequest{Result: taskResult}

	_, err = dia.client.PublishTaskResult(ctx, req)
	if err != nil {
		log.Printf("Error publishing result: %v", err)
	} else {
		log.Printf("Published result for task %s", originalTask.GetTaskId())
	}
}

Validation Agent

Create agents/validation/main.go:

package main

import (
	"context"
	"io"
	"log"
	"strings"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/types/known/structpb"
	"google.golang.org/protobuf/types/known/timestamppb"

	pb "github.com/owulveryck/agenthub/events/a2a"
)

const (
	agentHubAddr = "localhost:50051"
	agentID      = "validation_agent"
)

func main() {
	conn, err := grpc.Dial(agentHubAddr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()

	client := pb.NewEventBusClient(conn)
	agent := &ValidationAgent{client: client}

	ctx := context.Background()
	agent.start(ctx)
}

type ValidationAgent struct {
	client pb.EventBusClient
}

func (va *ValidationAgent) start(ctx context.Context) {
	log.Printf("Validation Agent %s starting...", agentID)

	req := &pb.SubscribeToTasksRequest{
		AgentId:   agentID,
		TaskTypes: []string{"document_validation"},
	}

	stream, err := va.client.SubscribeToTasks(ctx, req)
	if err != nil {
		log.Fatalf("Error subscribing: %v", err)
	}

	log.Printf("Subscribed to document validation tasks")

	for {
		task, err := stream.Recv()
		if err == io.EOF {
			return
		}
		if err != nil {
			log.Printf("Error receiving task: %v", err)
			return
		}

		go va.processTask(ctx, task)
	}
}

func (va *ValidationAgent) processTask(ctx context.Context, task *pb.TaskMessage) {
	log.Printf("Processing validation task: %s", task.GetTaskId())

	params := task.GetParameters().AsMap()

	// Simulate validation processing
	time.Sleep(1500 * time.Millisecond)

	content := params["content"].(string)

	// Perform validation checks
	validationResults := va.validateDocument(content)

	result := map[string]interface{}{
		"document_id":       params["document_id"],
		"workflow_id":       params["workflow_id"],
		"stage":             "document_validation",
		"content":           content,
		"filename":          params["filename"],
		"source":            params["source"],
		"document_hash":     params["document_hash"],
		"word_count":        params["word_count"],
		"char_count":        params["char_count"],
		"intake_timestamp":  params["intake_timestamp"],
		"validation_results": validationResults,
		"validation_timestamp": time.Now().Format(time.RFC3339),
		"status":            "validation_complete",
	}

	var status pb.TaskStatus
	var errorMsg string

	if validationResults["is_valid"].(bool) {
		status = pb.TaskStatus_TASK_STATUS_COMPLETED
	} else {
		status = pb.TaskStatus_TASK_STATUS_FAILED
		errorMsg = "Document validation failed: " + validationResults["errors"].(string)
	}

	va.publishResult(ctx, task, result, status, errorMsg)
}

func (va *ValidationAgent) validateDocument(content string) map[string]interface{} {
	// Simple validation rules
	isValid := true
	var errors []string

	// Check minimum length
	if len(content) < 10 {
		isValid = false
		errors = append(errors, "content too short")
	}

	// Check for suspicious content
	suspiciousTerms := []string{"malware", "virus", "hack"}
	for _, term := range suspiciousTerms {
		if strings.Contains(strings.ToLower(content), term) {
			isValid = false
			errors = append(errors, "suspicious content detected")
			break
		}
	}

	// Check language (simple heuristic)
	isEnglish := va.isEnglishContent(content)

	return map[string]interface{}{
		"is_valid":    isValid,
		"is_english":  isEnglish,
		"errors":      strings.Join(errors, "; "),
		"length_ok":   len(content) >= 10,
		"safe_content": !strings.Contains(strings.ToLower(content), "malware"),
	}
}

func (va *ValidationAgent) isEnglishContent(content string) bool {
	// Simple heuristic: check for common English words
	commonWords := []string{"the", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"}
	lowerContent := strings.ToLower(content)

	matches := 0
	for _, word := range commonWords {
		if strings.Contains(lowerContent, " "+word+" ") {
			matches++
		}
	}

	return matches >= 2
}

func (va *ValidationAgent) publishResult(ctx context.Context, originalTask *pb.TaskMessage, result map[string]interface{}, status pb.TaskStatus, errorMsg string) {
	resultStruct, err := structpb.NewStruct(result)
	if err != nil {
		log.Printf("Error creating result struct: %v", err)
		return
	}

	taskResult := &pb.TaskResult{
		TaskId:          originalTask.GetTaskId(),
		Status:          status,
		Result:          resultStruct,
		ErrorMessage:    errorMsg,
		ExecutorAgentId: agentID,
		CompletedAt:     timestamppb.Now(),
	}

	req := &pb.PublishTaskResultRequest{Result: taskResult}

	_, err = va.client.PublishTaskResult(ctx, req)
	if err != nil {
		log.Printf("Error publishing result: %v", err)
	} else {
		log.Printf("Published result for task %s", originalTask.GetTaskId())
	}
}

Step 3: Build and Test the Multi-Agent System

Update the Makefile to include the new agents:

# Add to Makefile build target
build: proto
	@echo "Building server binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(SERVER_BINARY) broker/main.go

	@echo "Building coordinator binary..."
	go build $(GO_BUILD_FLAGS) -o bin/coordinator agents/coordinator/main.go

	@echo "Building document intake agent..."
	go build $(GO_BUILD_FLAGS) -o bin/document-intake agents/document_intake/main.go

	@echo "Building validation agent..."
	go build $(GO_BUILD_FLAGS) -o bin/validation agents/validation/main.go

	@echo "Building publisher binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(PUBLISHER_BINARY) agents/publisher/main.go

	@echo "Building subscriber binary..."
	go build $(GO_BUILD_FLAGS) -o bin/$(SUBSCRIBER_BINARY) agents/subscriber/main.go

	@echo "Build complete. Binaries are in the 'bin/' directory."

Build all components:

make build

Step 4: Run the Multi-Agent Workflow

Now let’s run the complete multi-agent system:

Terminal 1 - Start the broker:

make run-server

Terminal 2 - Start the document intake agent:

./bin/document-intake

Terminal 3 - Start the validation agent:

./bin/validation

Terminal 4 - Start the workflow coordinator:

./bin/coordinator

Step 5: Observe the Workflow

You’ll see the workflow coordinator processing documents through multiple stages:

Document Intake: Receives and processes raw documents
Validation: Checks content for safety and validity
Metadata Extraction: Extracts structured metadata
Text Processing: Processes and analyzes text content
Summary Generation: Creates document summaries

Each agent processes its stage and passes results to the next stage via the AgentHub broker.

Understanding the Multi-Agent Pattern

This tutorial demonstrates several key patterns:

1. Workflow Orchestration

The coordinator agent manages the overall workflow, determining which stage comes next and handling failures.

2. Specialized Agents

Each agent has a specific responsibility and can be developed, deployed, and scaled independently.

3. Asynchronous Processing

Agents work asynchronously, allowing for better resource utilization and scalability.

4. Error Handling

The system handles failures gracefully, with the coordinator managing workflow state.

5. Data Flow

Structured data flows between agents, with each stage adding value to the processing pipeline.

Next Steps

Now that you understand multi-agent workflows:

Add more agents: Create metadata extraction, text processing, and summary agents
Implement error recovery: Add retry mechanisms and failure handling
Add monitoring: Create a dashboard agent that tracks workflow progress
Scale the system: Run multiple instances of each agent type
Add persistence: Store workflow state in a database for recovery

This pattern scales to handle complex business processes, data pipelines, and automated workflows in production systems.

Common Patterns and Best Practices

Workflow State Management

Store workflow state persistently for recovery
Use unique workflow IDs for tracking
Implement timeouts for stuck workflows

Agent Communication

Use structured messages with clear schemas
Include metadata for routing and tracking
Implement progress reporting for long-running tasks

Error Handling

Design for partial failures
Implement retry mechanisms with backoff
Provide clear error messages and recovery paths

Monitoring and Observability

Log all state transitions
Track workflow performance metrics
Implement health checks for agents

You now have the foundation for building sophisticated multi-agent systems that can handle complex, real-world workflows!

1.4 - Cortex Tutorials

Learn how to use Cortex, the asynchronous AI orchestration engine

Cortex Tutorials

These hands-on tutorials will teach you how to use Cortex to build asynchronous, AI-powered multi-agent systems.

What is Cortex?

Cortex is an asynchronous orchestration engine that:

Manages conversations across multiple agents
Uses LLMs to make intelligent routing decisions
Enables non-blocking task execution
Maintains conversation state and context

Prerequisites

AgentHub installed and configured
Go 1.21 or later
Basic understanding of the A2A protocol

Available Tutorials

Getting Started with Cortex - Run your first Cortex demo
Building a Custom Agent - Create agents that work with Cortex
Async Task Orchestration - Handle long-running tasks

Quick Start

Run the Cortex demo to see it in action:

cd /path/to/agenthub
./demo_cortex.sh

This starts:

Event Bus (broker)
Cortex orchestrator
Echo agent (example)
Interactive CLI

Type messages and see how Cortex orchestrates responses!

Learning Path

Start with Getting Started to understand the basics
Read Cortex Architecture for deeper understanding
Try Building a Custom Agent to extend functionality
Explore Advanced Orchestration for complex workflows

1.4.1 - Getting Started with Cortex

Run your first Cortex orchestration demo and understand how it works

Getting Started with Cortex

This tutorial will guide you through running your first Cortex demo and understanding the asynchronous orchestration pattern.

What You’ll Learn

By the end of this tutorial, you will:

Understand what Cortex does and why it’s useful
Run the complete Cortex demo system
Send messages through Cortex and see orchestration in action
Understand the message flow between components

Prerequisites

AgentHub repository cloned locally
Go 1.21+ installed
Basic terminal/command-line knowledge

Step 1: Build the Components

First, let’s build all the necessary binaries:

cd /path/to/agenthub

# Build the broker (Event Bus)
go build -o bin/broker ./broker

# Build Cortex orchestrator
go build -o bin/cortex ./agents/cortex/cmd

# Build Echo agent (example agent)
go build -o bin/echo_agent ./agents/echo_agent

# Build CLI interface
go build -o bin/chat_cli ./agents/chat_cli

Verify all binaries were created:

ls -lh bin/ | grep -E "(broker|cortex|echo|chat_cli)"

You should see all four executables listed.

Step 2: Understanding the Architecture

Before we run the demo, let’s understand what each component does:

┌─────────────┐      ┌────────────┐      ┌──────────┐
│  Chat CLI   │─────>│ Event Bus  │<─────│ Cortex   │
│ (You type)  │      │  (Broker)  │      │ (Brain)  │
└─────────────┘      └────────────┘      └──────────┘
      ▲                     ▲                   │
      │                     │                   │
      │ Responses           │ Results           │ Tasks
      │                     │                   │
      │               ┌─────────────┐           │
      └───────────────│ Echo Agent  │◄──────────┘
                      │  (Worker)   │
                      └─────────────┘

Components:

Event Bus (Broker) - Routes all messages between components
Cortex - The “brain” that decides what to do with messages
Echo Agent - A simple worker that echoes messages back
Chat CLI - Your interface to interact with the system

Step 3: Run the Demo (Automated)

The easiest way to start everything is using the demo script:

./demo_cortex.sh

This script will:

✅ Start the Event Bus (broker)
✅ Start Cortex orchestrator
✅ Start Echo agent
✅ Launch the interactive CLI

You should see:

╔════════════════════════════════════════════════════╗
║            Cortex POC Demo Launcher                ║
╚════════════════════════════════════════════════════╝

Starting Event Bus (Broker)...
✓ Broker started (PID: 12345)

Starting Cortex Orchestrator...
✓ Cortex started (PID: 12346)

Starting Echo Agent...
✓ Echo Agent started (PID: 12347)

════════════════════════════════════════════════════
  All services started successfully!
════════════════════════════════════════════════════

╔════════════════════════════════════════════════════╗
║         Cortex Chat CLI - POC Demo                ║
╚════════════════════════════════════════════════════╝

Session ID: cli_session_1234567890

Type your messages and press Enter.
Type 'exit' or 'quit' to end the session.
Press Ctrl+C to shutdown.

>

Step 4: Interact with Cortex

Now you can type messages! Try these:

> Hello Cortex

🤖 Cortex: Echo: Hello Cortex

> How are you today?

🤖 Cortex: Echo: How are you today?

> Testing async orchestration

🤖 Cortex: Echo: Testing async orchestration

What Just Happened?

Let’s trace what happens when you type “Hello Cortex”:

You type → CLI creates an A2A Message (role=USER)
CLI publishes → Event Bus receives message
Cortex receives → Retrieves conversation state
Cortex decides → LLM analyzes: “This is a greeting, respond friendly”
Cortex publishes → Sends response back through Event Bus
CLI receives → Displays the response to you

All of this happens asynchronously through event-driven architecture!

Step 5: Understanding Message Flow

Let’s look at what’s happening under the hood.

Message Structure

Every message contains:

{
  "message_id": "cli_msg_1234567890",
  "context_id": "cli_session_1234567890",  // Session ID
  "role": "ROLE_USER",  // or ROLE_AGENT
  "content": [
    {
      "text": "Hello Cortex"
    }
  ],
  "metadata": {
    "task_type": "chat_request",
    "from_agent": "agent_chat_cli"
  }
}

Key Fields:

message_id - Unique identifier for this message
context_id - Groups messages in the same conversation
role - USER (from human) or AGENT (from AI/system)
content - The actual message text
metadata - Additional context

Conversation State

Cortex maintains state for each session:

ConversationState {
    SessionID: "cli_session_1234567890"
    Messages: [
        {role: USER, text: "Hello Cortex"},
        {role: AGENT, text: "Echo: Hello Cortex"},
        {role: USER, text: "How are you today?"},
        {role: AGENT, text: "Echo: How are you today?"},
        // ... full history
    ]
    PendingTasks: {}
    RegisteredAgents: {"agent_echo": {...}}
}

This allows Cortex to:

Remember conversation history
Track which tasks are in-flight
Know which agents are available

Step 6: Run Manually (Optional)

For learning purposes, you can run each component manually in separate terminals:

Terminal 1: Event Bus

export AGENTHUB_GRPC_PORT=127.0.0.1:50051
./bin/broker

Terminal 2: Cortex

export AGENTHUB_BROKER_ADDR=127.0.0.1
./bin/cortex

Terminal 3: Echo Agent

export AGENTHUB_BROKER_ADDR=127.0.0.1
./bin/echo_agent

Terminal 4: CLI

export AGENTHUB_BROKER_ADDR=127.0.0.1
./bin/chat_cli

This gives you visibility into each component’s logs.

Step 7: Observing the Logs

When running manually, you’ll see detailed logs from each component.

Cortex Logs

INFO Cortex initialized agent_id=cortex llm_client=mock state_manager=in-memory
INFO Starting Cortex Orchestrator
INFO Cortex received message message_id=cli_msg_... context_id=cli_session_... role=ROLE_USER
INFO Cortex successfully processed message message_id=cli_msg_...

Echo Agent Logs

INFO Echo agent registered successfully agent_id=agent_echo
INFO Received echo request message_id=task_request_... context_id=cli_session_...
INFO Published echo response message_id=msg_echo_response_... echo_text="Echo: Hello"

Event Bus Logs

INFO Agent registered agent_id=cortex
INFO Agent registered agent_id=agent_echo
INFO Agent registered agent_id=agent_chat_cli

Step 8: Shutting Down

To stop the demo:

In the CLI, type exit or quit
Or press Ctrl+C

The demo script will automatically clean up all processes.

If running manually, press Ctrl+C in each terminal (start with Terminal 4 and work backwards).

What You’ve Learned

✅ Architecture - You understand the four main components ✅ Message Flow - You know how messages route through the system ✅ Orchestration - You see how Cortex coordinates agents ✅ State Management - You understand conversation state ✅ Async Pattern - You grasp the non-blocking nature

Next Steps

Now that you’ve run the basic demo:

Build a Custom Agent - Create your own worker agent
Understand Cortex Architecture - Deep dive into design
Async Task Orchestration - Handle long-running tasks

Troubleshooting

Broker fails to start

Error: failed to listen on port 50051

Solution: Port is already in use. Kill existing process:

lsof -ti:50051 | xargs kill -9

Cortex can’t connect to broker

Error: Failed to create AgentHub client

Solution: Ensure broker is running first and environment variables are set:

export AGENTHUB_BROKER_ADDR=127.0.0.1
export AGENTHUB_GRPC_PORT=127.0.0.1:50051

No response from Cortex

Check:

All services running? ps aux | grep -E "(broker|cortex|echo)"
Check logs for errors
Ensure Echo agent started successfully

Messages not routing

Debug:

Check broker logs for registration confirmations
Verify all agents registered successfully
Ensure context_id is consistent in your session

Key Concepts Recap

Concept	What It Does
Event Bus	Routes all messages between components
Cortex	Decides what to do with each message
Agent	Performs specific tasks (echo, transcribe, etc.)
Session	Groups related messages (context_id)
State	Remembers conversation history
Async	Non-blocking - user can chat while work happens

Code to Explore

If you want to dive into the code:

Cortex core logic: agents/cortex/cortex.go
State management: agents/cortex/state/memory.go
LLM interface: agents/cortex/llm/interface.go
Echo agent: agents/echo_agent/main.go
CLI: agents/chat_cli/main.go

Each file is well-documented with comments explaining the logic.

Resources

Cortex Architecture - Design deep-dive
SPEC.md - Original specification
Implementation Summary - Build notes
Source Code - Full implementation

Congratulations! You’ve successfully run your first Cortex orchestration demo. You’re now ready to build custom agents and create sophisticated multi-agent systems.

2 - How-To Guides

Goal-oriented guides that solve specific problems and accomplish particular tasks

How-To Guides

These practical guides will help you solve specific problems and accomplish particular tasks with AgentHub. Each guide focuses on a specific goal and assumes you have some familiarity with the basic concepts.

📚 How-to Categories

Agent Development - Creating and managing different types of agents
Observability - Setting up monitoring, metrics, and dashboards
Debugging - Troubleshooting and problem-solving guides

📋 Before You Start

These guides assume you have:

Completed the Installation and Setup tutorial
Basic understanding of AgentHub concepts
A working development environment

💡 How to Use These Guides

Each guide is self-contained and focuses on one specific task
Guides can be followed in any order based on your needs
Code examples are production-ready where possible
Links to related concepts and references are provided

Tip

If you’re new to AgentHub, start with the tutorials before diving into these how-to guides.

2.1 - Agent Development

Practical guides for creating and managing agents

Agent Development How-to Guides

Step-by-step guides for creating, configuring, and managing different types of agents in AgentHub.

Getting Started with Agents

Create Agent with Cortex Auto-Discovery - Build agents that automatically register with Cortex for LLM-based orchestration
Design Effective Agent Cards - Learn how to create AgentCards that enable powerful LLM-based discovery and delegation

Basic Agent Patterns

Create Publisher - Build agents that publish tasks and messages
Create Subscriber - Build agents that consume and process tasks

2.1.1 - How to Create an A2A Task Publisher

Learn how to create an agent that publishes Agent2Agent (A2A) protocol-compliant tasks to other agents through the AgentHub EDA broker.

How to Create an A2A Task Publisher

This guide shows you how to create an agent that publishes Agent2Agent (A2A) protocol-compliant tasks to other agents through the AgentHub Event-Driven Architecture (EDA) broker.

Basic Setup

Using AgentHub’s unified abstractions, creating a publisher is straightforward:

package main

import (
    "context"
    "fmt"
    "time"

    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
)

const (
    myAgentID = "my_publisher_agent"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
    defer cancel()

    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("publisher")
    config.HealthPort = "8081" // Unique port for this publisher

    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }

    // Automatic graceful shutdown
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()

    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }

    // Create A2A task publisher with automatic tracing and metrics
    taskPublisher := &agenthub.A2ATaskPublisher{
        Client:         client.Client,
        TraceManager:   client.TraceManager,
        MetricsManager: client.MetricsManager,
        Logger:         client.Logger,
        ComponentName:  "publisher",
        AgentID:        myAgentID,
    }

    // Your A2A task publishing code goes here
}

Publishing a Simple A2A Task

Here’s how to publish a basic A2A task using the A2ATaskPublisher abstraction:

func publishSimpleTask(ctx context.Context, taskPublisher *agenthub.A2ATaskPublisher) error {
    // Create A2A-compliant content parts
    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Hello! Please provide a greeting for Claude.",
            },
        },
    }

    // Publish A2A task using the unified abstraction
    task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
        TaskType:         "greeting",
        Content:          content,
        RequesterAgentID: myAgentID,
        ResponderAgentID: "agent_demo_subscriber", // Target agent
        Priority:         pb.Priority_PRIORITY_HIGH,
        ContextID:        "ctx_greeting_demo", // Optional: conversation context
    })
    if err != nil {
        return fmt.Errorf("failed to publish greeting task: %w", err)
    }

    taskPublisher.Logger.InfoContext(ctx, "Published A2A greeting task",
        "task_id", task.GetId(),
        "context_id", task.GetContextId())
    return nil
}

Publishing Different Task Types

Math Calculation Task with A2A Data Parts

func publishMathTask(ctx context.Context, taskPublisher *agenthub.A2ATaskPublisher) error {
    // Create A2A-compliant content with structured data
    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Please perform the following mathematical calculation:",
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data: &structpb.Struct{
                        Fields: map[string]*structpb.Value{
                            "operation": structpb.NewStringValue("multiply"),
                            "a":         structpb.NewNumberValue(15.0),
                            "b":         structpb.NewNumberValue(7.0),
                        },
                    },
                },
            },
        },
    }

    // Publish A2A math task
    task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
        TaskType:         "math_calculation",
        Content:          content,
        RequesterAgentID: myAgentID,
        ResponderAgentID: "agent_demo_subscriber",
        Priority:         pb.Priority_PRIORITY_MEDIUM,
        ContextID:        "ctx_math_demo",
    })
    if err != nil {
        return fmt.Errorf("failed to publish math task: %w", err)
    }

    taskPublisher.Logger.InfoContext(ctx, "Published A2A math task",
        "task_id", task.GetId(),
        "operation", "multiply")
    return nil
}

Data Processing Task

func publishDataProcessingTask(ctx context.Context, taskPublisher *agenthub.TaskPublisher) {
    err := taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
        TaskType: "data_processing",
        Parameters: map[string]interface{}{
            "dataset_path":   "/data/customer_data.csv",
            "analysis_type":  "summary_statistics",
            "output_format":  "json",
            "filters": map[string]interface{}{
                "date_range": "last_30_days",
                "status":     "active",
            },
            // Metadata is handled automatically by TaskPublisher
            "workflow_id": "workflow_123",
            "user_id":     "user_456",
        },
        RequesterAgentID: myAgentID,
        ResponderAgentID: "data_agent",
        Priority:         pb.Priority_PRIORITY_HIGH,
    })
    if err != nil {
        panic(fmt.Sprintf("Failed to publish data processing task: %v", err))
    }
}

Broadcasting Tasks (No Specific Responder)

To broadcast a task to all available agents, omit the ResponderAgentID:

func broadcastTask(ctx context.Context, taskPublisher *agenthub.TaskPublisher) {
    err := taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
        TaskType: "announcement",
        Parameters: map[string]interface{}{
            "announcement":    "Server maintenance in 30 minutes",
            "action_required": false,
        },
        RequesterAgentID: myAgentID,
        // ResponderAgentID omitted - will broadcast to all agents
        ResponderAgentID: "",
        Priority:         pb.Priority_PRIORITY_LOW,
    })
    if err != nil {
        panic(fmt.Sprintf("Failed to publish announcement: %v", err))
    }
}

Subscribing to Task Results

As a publisher, you’ll want to receive results from tasks you’ve requested. You can use the AgentHub client directly:

func subscribeToResults(ctx context.Context, client *agenthub.AgentHubClient) {
    req := &pb.SubscribeToTaskResultsRequest{
        RequesterAgentId: myAgentID,
        // TaskIds: []string{"specific_task_id"}, // Optional: filter specific tasks
    }

    stream, err := client.Client.SubscribeToTaskResults(ctx, req)
    if err != nil {
        client.Logger.ErrorContext(ctx, "Error subscribing to results", "error", err)
        return
    }

    client.Logger.InfoContext(ctx, "Subscribed to task results", "agent_id", myAgentID)

    for {
        result, err := stream.Recv()
        if err != nil {
            client.Logger.ErrorContext(ctx, "Error receiving result", "error", err)
            return
        }

        handleTaskResult(ctx, client, result)
    }
}

func handleTaskResult(ctx context.Context, client *agenthub.AgentHubClient, result *pb.TaskResult) {
    client.Logger.InfoContext(ctx, "Received task result",
        "task_id", result.GetTaskId(),
        "status", result.GetStatus().String())

    switch result.GetStatus() {
    case pb.TaskStatus_TASK_STATUS_COMPLETED:
        client.Logger.InfoContext(ctx, "Task completed successfully",
            "task_id", result.GetTaskId(),
            "result", result.GetResult().AsMap())
    case pb.TaskStatus_TASK_STATUS_FAILED:
        client.Logger.ErrorContext(ctx, "Task failed",
            "task_id", result.GetTaskId(),
            "error", result.GetErrorMessage())
    case pb.TaskStatus_TASK_STATUS_CANCELLED:
        client.Logger.InfoContext(ctx, "Task was cancelled",
            "task_id", result.GetTaskId())
    }
}

Monitoring Task Progress

Subscribe to progress updates to track long-running tasks:

func subscribeToProgress(ctx context.Context, client *agenthub.AgentHubClient) {
    req := &pb.SubscribeToTaskResultsRequest{
        RequesterAgentId: myAgentID,
    }

    stream, err := client.Client.SubscribeToTaskProgress(ctx, req)
    if err != nil {
        client.Logger.ErrorContext(ctx, "Error subscribing to progress", "error", err)
        return
    }

    client.Logger.InfoContext(ctx, "Subscribed to task progress", "agent_id", myAgentID)

    for {
        progress, err := stream.Recv()
        if err != nil {
            client.Logger.ErrorContext(ctx, "Error receiving progress", "error", err)
            return
        }

        client.Logger.InfoContext(ctx, "Task progress update",
            "task_id", progress.GetTaskId(),
            "progress_percentage", progress.GetProgressPercentage(),
            "progress_message", progress.GetProgressMessage())
    }
}

Complete Publisher Example

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
    defer cancel()

    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("publisher")
    config.HealthPort = "8081"

    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }

    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()

    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }

    // Create task publisher with automatic tracing and metrics
    taskPublisher := &agenthub.TaskPublisher{
        Client:         client.Client,
        TraceManager:   client.TraceManager,
        MetricsManager: client.MetricsManager,
        Logger:         client.Logger,
        ComponentName:  "publisher",
    }

    client.Logger.InfoContext(ctx, "Starting publisher demo")

    // Publish various tasks with automatic observability
    publishMathTask(ctx, taskPublisher)
    time.Sleep(2 * time.Second)

    publishDataProcessingTask(ctx, taskPublisher)
    time.Sleep(2 * time.Second)

    broadcastTask(ctx, taskPublisher)

    client.Logger.InfoContext(ctx, "All tasks published! Check subscriber logs for results")
}

Best Practices

Always set a unique task ID: Use timestamps, UUIDs, or sequential IDs to ensure uniqueness.
Use appropriate priorities: Reserve PRIORITY_CRITICAL for urgent tasks that must be processed immediately.
Set realistic deadlines: Include deadlines for time-sensitive tasks to help agents prioritize.
Handle results gracefully: Always subscribe to task results and handle failures appropriately.
Include helpful metadata: Add context information that might be useful for debugging or auditing.
Validate parameters: Ensure task parameters are properly structured before publishing.
Use specific responder IDs when possible: This ensures tasks go to the most appropriate agent.

Your publisher is now ready to send tasks to agents and receive results!

2.1.2 - How to Create an A2A Task Subscriber (Agent)

Learn how to create an agent that can receive, process, and respond to Agent2Agent (A2A) protocol tasks through the AgentHub EDA broker using A2A-compliant abstractions.

How to Create an A2A Task Subscriber (Agent)

This guide shows you how to create an agent that can receive, process, and respond to Agent2Agent (A2A) protocol tasks through the AgentHub Event-Driven Architecture (EDA) broker using AgentHub’s A2A-compliant abstractions.

Basic Agent Setup

Start by creating the basic structure for your agent using the unified abstraction:

package main

import (
    "context"
    "os"
    "os/signal"
    "syscall"
    "time"

    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)

const (
    agentID = "my_agent_processor"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    // Create configuration with automatic observability
    config := agenthub.NewGRPCConfig("subscriber")
    config.HealthPort = "8082" // Unique port for this agent

    // Create AgentHub client with built-in observability
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }

    // Automatic graceful shutdown
    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()

    // Start the client (enables observability)
    if err := client.Start(ctx); err != nil {
        client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
        panic(err)
    }

    // Create A2A task subscriber with automatic observability
    taskSubscriber := agenthub.NewA2ATaskSubscriber(client, agentID)

    // Register A2A task handlers (see below for examples)
    taskSubscriber.RegisterDefaultHandlers()

    // Handle graceful shutdown
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

    go func() {
        <-sigChan
        client.Logger.Info("Received shutdown signal")
        cancel()
    }()

    client.Logger.InfoContext(ctx, "Starting subscriber agent")

    // Start task subscription (with automatic observability)
    go func() {
        if err := taskSubscriber.SubscribeToTasks(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task subscription failed", "error", err)
        }
    }()

    // Optional: Subscribe to task results if this agent also publishes tasks
    go func() {
        if err := taskSubscriber.SubscribeToTaskResults(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task result subscription failed", "error", err)
        }
    }()

    client.Logger.InfoContext(ctx, "Agent started with observability. Listening for tasks.")

    // Wait for context cancellation
    <-ctx.Done()
    client.Logger.Info("Agent shutdown complete")
}

Default Task Handlers

The RegisterDefaultHandlers() method provides built-in handlers for common task types:

greeting: Simple greeting with name parameter
math_calculation: Basic arithmetic operations (add, subtract, multiply, divide)
random_number: Random number generation with seed

Custom Task Handlers

Simple Custom Handler

Add your own task handlers using RegisterTaskHandler():

func setupCustomHandlers(taskSubscriber *agenthub.TaskSubscriber) {
    // Register a custom data processing handler
    taskSubscriber.RegisterTaskHandler("data_processing", handleDataProcessing)

    // Register a file conversion handler
    taskSubscriber.RegisterTaskHandler("file_conversion", handleFileConversion)

    // Register a status check handler
    taskSubscriber.RegisterTaskHandler("status_check", handleStatusCheck)
}

func handleDataProcessing(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()
    datasetPath := params.Fields["dataset_path"].GetStringValue()
    analysisType := params.Fields["analysis_type"].GetStringValue()

    if datasetPath == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "dataset_path parameter is required"
    }

    // Simulate data processing
    time.Sleep(2 * time.Second)

    result, err := structpb.NewStruct(map[string]interface{}{
        "dataset_path":    datasetPath,
        "analysis_type":   analysisType,
        "records_processed": 1500,
        "processing_time": "2.1s",
        "summary": map[string]interface{}{
            "mean":   42.7,
            "median": 41.2,
            "stddev": 8.3,
        },
        "processed_at": time.Now().Format(time.RFC3339),
    })

    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }

    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}

Advanced Handler with Validation

func handleFileConversion(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()

    // Extract and validate parameters
    inputPath := params.Fields["input_path"].GetStringValue()
    outputFormat := params.Fields["output_format"].GetStringValue()

    if inputPath == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "input_path parameter is required"
    }

    if outputFormat == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "output_format parameter is required"
    }

    // Validate output format
    validFormats := []string{"pdf", "docx", "txt", "html"}
    isValidFormat := false
    for _, format := range validFormats {
        if outputFormat == format {
            isValidFormat = true
            break
        }
    }

    if !isValidFormat {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("unsupported output format: %s", outputFormat)
    }

    // Simulate file conversion process
    time.Sleep(1 * time.Second)

    outputPath := strings.Replace(inputPath, filepath.Ext(inputPath), "."+outputFormat, 1)

    result, err := structpb.NewStruct(map[string]interface{}{
        "input_path":      inputPath,
        "output_path":     outputPath,
        "output_format":   outputFormat,
        "file_size":       "2.5MB",
        "conversion_time": "1.2s",
        "status":          "success",
        "converted_at":    time.Now().Format(time.RFC3339),
    })

    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }

    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}

Handler with External Service Integration

func handleStatusCheck(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
    params := task.GetParameters()
    serviceURL := params.Fields["service_url"].GetStringValue()

    if serviceURL == "" {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "service_url parameter is required"
    }

    // Create HTTP client with timeout
    client := &http.Client{
        Timeout: 10 * time.Second,
    }

    // Perform health check
    resp, err := client.Get(serviceURL + "/health")
    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("Failed to reach service: %v", err)
    }
    defer resp.Body.Close()

    // Determine status
    isHealthy := resp.StatusCode >= 200 && resp.StatusCode < 300
    status := "unhealthy"
    if isHealthy {
        status = "healthy"
    }

    result, err := structpb.NewStruct(map[string]interface{}{
        "service_url":     serviceURL,
        "status":          status,
        "status_code":     resp.StatusCode,
        "response_time":   "150ms",
        "checked_at":      time.Now().Format(time.RFC3339),
    })

    if err != nil {
        return nil, pb.TaskStatus_TASK_STATUS_FAILED, "Failed to create result structure"
    }

    return result, pb.TaskStatus_TASK_STATUS_COMPLETED, ""
}

Complete Agent Example

Here’s a complete agent that handles multiple task types:

package main

import (
    "context"
    "fmt"
    "net/http"
    "os"
    "os/signal"
    "path/filepath"
    "strings"
    "syscall"
    "time"

    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)

const agentID = "multi_task_agent"

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    // Create AgentHub client with observability
    config := agenthub.NewGRPCConfig("subscriber")
    config.HealthPort = "8082"

    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        panic("Failed to create AgentHub client: " + err.Error())
    }

    defer func() {
        shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer shutdownCancel()
        if err := client.Shutdown(shutdownCtx); err != nil {
            client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
        }
    }()

    if err := client.Start(ctx); err != nil {
        panic(err)
    }

    // Create and configure task subscriber
    taskSubscriber := agenthub.NewTaskSubscriber(client, agentID)

    // Register both default and custom handlers
    taskSubscriber.RegisterDefaultHandlers()
    setupCustomHandlers(taskSubscriber)

    // Graceful shutdown handling
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

    go func() {
        <-sigChan
        client.Logger.Info("Received shutdown signal")
        cancel()
    }()

    client.Logger.InfoContext(ctx, "Starting multi-task agent")

    // Start subscriptions
    go func() {
        if err := taskSubscriber.SubscribeToTasks(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task subscription failed", "error", err)
        }
    }()

    go func() {
        if err := taskSubscriber.SubscribeToTaskResults(ctx); err != nil {
            client.Logger.ErrorContext(ctx, "Task result subscription failed", "error", err)
        }
    }()

    client.Logger.InfoContext(ctx, "Agent ready to process tasks",
        "supported_tasks", []string{"greeting", "math_calculation", "random_number", "data_processing", "file_conversion", "status_check"})

    <-ctx.Done()
    client.Logger.Info("Agent shutdown complete")
}

func setupCustomHandlers(taskSubscriber *agenthub.TaskSubscriber) {
    taskSubscriber.RegisterTaskHandler("data_processing", handleDataProcessing)
    taskSubscriber.RegisterTaskHandler("file_conversion", handleFileConversion)
    taskSubscriber.RegisterTaskHandler("status_check", handleStatusCheck)
}

// ... (include the handler functions from above)

Automatic Features

The unified abstraction provides automatic features:

Observability

Distributed tracing for each task processing
Metrics collection for processing times and success rates
Structured logging with correlation IDs

Task Management

Automatic result publishing back to the broker
Error handling and status reporting
Progress tracking capabilities

Resource Management

Graceful shutdown handling
Connection management to the broker
Health endpoints for monitoring

Best Practices

Parameter Validation: Always validate task parameters before processing

if requiredParam == "" {
    return nil, pb.TaskStatus_TASK_STATUS_FAILED, "required_param is missing"
}

Error Handling: Provide meaningful error messages

if err != nil {
    return nil, pb.TaskStatus_TASK_STATUS_FAILED, fmt.Sprintf("Processing failed: %v", err)
}

Timeouts: Use context with timeouts for external operations
```
client := &http.Client{Timeout: 10 * time.Second}
```
Resource Cleanup: Always clean up resources in handlers
```
defer file.Close()
defer resp.Body.Close()
```

Structured Results: Return well-structured result data

result, _ := structpb.NewStruct(map[string]interface{}{
    "status": "completed",
    "timestamp": time.Now().Format(time.RFC3339),
    "data": processedData,
})

Handler Function Signature

All task handlers must implement the TaskHandler interface:

type TaskHandler func(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string)

Return values:

*structpb.Struct: The result data (can be nil on failure)
pb.TaskStatus: One of:
- pb.TaskStatus_TASK_STATUS_COMPLETED
- pb.TaskStatus_TASK_STATUS_FAILED
- pb.TaskStatus_TASK_STATUS_CANCELLED
string: Error message (empty string on success)

Your agent is now ready to receive and process tasks from other agents in the system with full observability and automatic result publishing!

2.1.3 -

How to Create an Agent with Cortex Auto-Discovery

This guide shows you how to create an agent using the SubAgent library, which handles all the boilerplate and lets you focus on your agent’s business logic.

What You’ll Build

An agent that:

Automatically registers with the broker on startup
Gets discovered by Cortex for LLM-based task delegation
Processes delegated tasks and returns results
Has built-in observability (tracing, logging, metrics)
Handles graceful shutdown

All with ~50 lines of code instead of 500+.

Prerequisites

AgentHub broker running
Cortex orchestrator running
Basic understanding of Go
Familiarity with the A2A protocol (helpful but not required)

Step 1: Import the SubAgent Library

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    pb "github.com/owulveryck/agenthub/events/a2a"
    "github.com/owulveryck/agenthub/internal/subagent"
    "google.golang.org/protobuf/types/known/structpb"
)

Step 2: Define Your Agent Configuration

func main() {
    // Configure your agent with required fields
    config := &subagent.Config{
        AgentID:     "agent_translator",     // Unique agent identifier
        ServiceName: "translator_service",    // Optional gRPC service name
        Name:        "Translation Agent",     // Human-readable name
        Description: "Translates text between languages using AI models",
        Version:     "1.0.0",                // Optional, defaults to 1.0.0
        HealthPort:  "8087",                 // Optional, defaults to 8080
    }

    // Create the subagent
    agent, err := subagent.New(config)
    if err != nil {
        log.Fatal(err)
    }

Step 3: Register Your Skills

Skills are capabilities your agent provides. Each skill needs a handler function:

    // Add a translation skill
    agent.MustAddSkill(
        "Language Translation",                    // Skill name (shown to LLM)
        "Translates text from one language to another", // Description
        translateHandler,                          // Your handler function
    )

    // You can add multiple skills
    agent.MustAddSkill(
        "Language Detection",
        "Detects the language of input text",
        detectLanguageHandler,
    )

Best Practices for Skill Definition

Clear Names: Use descriptive skill names that the LLM can understand
Specific Descriptions: Explain what the skill does and when to use it
Multiple Skills: An agent can have multiple related skills

Step 4: Implement Your Handler Functions

A handler function receives a task and returns a result:

// Handler signature: (ctx, task, message) -> (artifact, state, errorMessage)
func translateHandler(ctx context.Context, task *pb.Task, message *pb.Message) (*pb.Artifact, pb.TaskState, string) {
    // 1. Extract input from the message
    var inputText string
    for _, part := range message.Content {
        if text := part.GetText(); text != "" {
            inputText = text
            break
        }
    }

    if inputText == "" {
        return nil, pb.TaskState_TASK_STATE_FAILED, "No input text provided"
    }

    // 2. Extract parameters from metadata (optional)
    targetLang := "en" // default
    if message.Metadata != nil && message.Metadata.Fields != nil {
        if lang, exists := message.Metadata.Fields["target_language"]; exists {
            targetLang = lang.GetStringValue()
        }
    }

    // 3. Perform your business logic
    translatedText, err := performTranslation(ctx, inputText, targetLang)
    if err != nil {
        return nil, pb.TaskState_TASK_STATE_FAILED, fmt.Sprintf("Translation failed: %v", err)
    }

    // 4. Create an artifact with your result
    artifact := &pb.Artifact{
        ArtifactId:  fmt.Sprintf("translation_%s_%d", task.GetId(), time.Now().Unix()),
        Name:        "translation_result",
        Description: fmt.Sprintf("Translation to %s", targetLang),
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: translatedText,
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "original_text":   structpb.NewStringValue(inputText),
                "target_language": structpb.NewStringValue(targetLang),
                "translated_at":   structpb.NewStringValue(time.Now().Format(time.RFC3339)),
            },
        },
    }

    // 5. Return success
    return artifact, pb.TaskState_TASK_STATE_COMPLETED, ""
}

func performTranslation(ctx context.Context, text, targetLang string) (string, error) {
    // Implement your actual translation logic here
    // This could call an external API, use a local model, etc.

    // Example placeholder:
    return fmt.Sprintf("[Translated to %s]: %s", targetLang, text), nil
}

Handler Return Values

Your handler returns three values:

*pb.Artifact: The result data (or nil if failed)
pb.TaskState: Status code (TASK_STATE_COMPLETED, TASK_STATE_FAILED, etc.)
string: Error message (empty string if successful)

Step 5: Run Your Agent

    // Run the agent (blocks until shutdown signal)
    if err := agent.Run(context.Background()); err != nil {
        log.Fatal(err)
    }
}

That’s it! The SubAgent library handles:

✅ gRPC client setup and connection
✅ Agent card creation with A2A-compliant structure
✅ Broker registration and auto-discovery by Cortex
✅ Task subscription and routing
✅ Distributed tracing (automatic span creation)
✅ Structured logging (all operations logged)
✅ Graceful shutdown (SIGINT/SIGTERM handling)
✅ Health checks
✅ Error handling

Complete Example

Here’s a full working agent in ~80 lines:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    pb "github.com/owulveryck/agenthub/events/a2a"
    "github.com/owulveryck/agenthub/internal/subagent"
    "google.golang.org/protobuf/types/known/structpb"
)

func main() {
    config := &subagent.Config{
        AgentID:     "agent_translator",
        ServiceName: "translator_service",
        Name:        "Translation Agent",
        Description: "Translates text between languages using AI models",
        Version:     "1.0.0",
        HealthPort:  "8087",
    }

    agent, err := subagent.New(config)
    if err != nil {
        log.Fatal(err)
    }

    agent.MustAddSkill(
        "Language Translation",
        "Translates text from one language to another. Supports major languages including English, Spanish, French, German, Japanese, and Chinese",
        translateHandler,
    )

    if err := agent.Run(context.Background()); err != nil {
        log.Fatal(err)
    }
}

func translateHandler(ctx context.Context, task *pb.Task, message *pb.Message) (*pb.Artifact, pb.TaskState, string) {
    var inputText string
    for _, part := range message.Content {
        if text := part.GetText(); text != "" {
            inputText = text
            break
        }
    }

    if inputText == "" {
        return nil, pb.TaskState_TASK_STATE_FAILED, "No input text provided"
    }

    targetLang := "en"
    if message.Metadata != nil && message.Metadata.Fields != nil {
        if lang, exists := message.Metadata.Fields["target_language"]; exists {
            targetLang = lang.GetStringValue()
        }
    }

    translatedText := fmt.Sprintf("[Translated to %s]: %s", targetLang, inputText)

    artifact := &pb.Artifact{
        ArtifactId:  fmt.Sprintf("translation_%s_%d", task.GetId(), time.Now().Unix()),
        Name:        "translation_result",
        Description: fmt.Sprintf("Translation to %s", targetLang),
        Parts: []*pb.Part{
            {Part: &pb.Part_Text{Text: translatedText}},
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "original_text":   structpb.NewStringValue(inputText),
                "target_language": structpb.NewStringValue(targetLang),
            },
        },
    }

    return artifact, pb.TaskState_TASK_STATE_COMPLETED, ""
}

Build and Test

# Build your agent
go build -o bin/translator ./agents/translator

# Start broker (if not running)
./bin/broker &

# Start Cortex (if not running)
./bin/cortex &

# Start your agent
./bin/translator

Verify Registration

Check the logs:

# Your agent logs should show:
time=... level=INFO msg="Agent card registered" agent_id=agent_translator skills=1
time=... level=INFO msg="Agent started successfully" agent_id=agent_translator name="Translation Agent" skills=1

# Cortex logs should show:
time=... level=INFO msg="Received agent card event" agent_id=agent_translator event_type=registered
time=... level=INFO msg="Agent registered with Cortex orchestrator" agent_id=agent_translator total_agents=N

Test with Chat CLI

./bin/chat_cli

# Try these prompts (if using VertexAI LLM):
> Can you translate "hello world" to Spanish?
> Translate "good morning" to French

How It Works

sequenceDiagram
    participant A as Your Agent
    participant SL as SubAgent Library
    participant B as Broker
    participant C as Cortex
    participant L as LLM
    participant U as User

    Note over A: agent.Run() called
    A->>SL: Initialize
    SL->>B: RegisterAgent(AgentCard)
    B->>C: AgentCardEvent
    C->>C: Register agent
    Note over C: Agent now available

    U->>C: "Translate this to Spanish"
    C->>L: Decide(history, agents, message)
    L-->>C: Delegate to translator
    C->>B: Publish task
    B->>SL: Route to agent
    SL->>SL: Start tracing span
    SL->>SL: Log task receipt
    SL->>A: Call your handler
    A->>A: Process translation
    A-->>SL: Return artifact
    SL->>SL: Log completion
    SL->>SL: End tracing span
    SL->>B: Publish result
    B->>C: Route result
    C->>U: "Aquí está: Hola mundo"

Advanced Usage

Accessing the Client

If you need access to the underlying AgentHub client:

client := agent.GetClient()
logger := agent.GetLogger()
config := agent.GetConfig()

Custom Configuration

config := &subagent.Config{
    AgentID:     "my_agent",
    ServiceName: "custom_service_name",  // Optional
    Name:        "My Agent",
    Description: "Does amazing things",
    Version:     "2.0.0",
    HealthPort:  "9000",
    BrokerAddr:  "broker.example.com",   // Optional
    BrokerPort:  "50051",                // Optional
}

Multiple Skills Example

agent.MustAddSkill("Skill A", "Description A", handlerA)
agent.MustAddSkill("Skill B", "Description B", handlerB)
agent.MustAddSkill("Skill C", "Description C", handlerC)

// Each skill gets its own handler function
// The SubAgent library routes tasks to the correct handler based on task type

Error Handling in Handlers

func myHandler(ctx context.Context, task *pb.Task, message *pb.Message) (*pb.Artifact, pb.TaskState, string) {
    result, err := doWork(ctx, message)
    if err != nil {
        // Return failure with error message
        return nil, pb.TaskState_TASK_STATE_FAILED, err.Error()
    }

    // Return success with result
    artifact := createArtifact(result)
    return artifact, pb.TaskState_TASK_STATE_COMPLETED, ""
}

What the SubAgent Library Provides

Automatic Setup

gRPC client connection to broker
Health check endpoint
Signal handling for graceful shutdown
Configuration validation with defaults

A2A-Compliant AgentCard

Correct protocol version (0.2.9)
Proper capabilities structure
Complete skill definitions with all required fields
Automatic skill ID generation and tagging

Observability

Tracing: Automatic span creation for each task with attributes
Logging: Structured logging for all operations (registration, task receipt, completion, errors)
Metrics: Built-in metrics collection (via AgentHub client)

Task Management

Automatic task subscription
Skill-based handler routing
Error handling and reporting
Result publishing

Developer Experience

Simple 3-step API: New() → AddSkill() → Run()
Clear error messages
Type-safe handler functions
Automatic resource cleanup

Best Practices

1. Skill Design

Be specific: Clear descriptions help the LLM delegate correctly
Single responsibility: Each skill should do one thing well
Related skills: Group related capabilities in one agent

2. Handler Implementation

Validate input: Always check that required data is present
Handle errors gracefully: Return meaningful error messages
Include metadata: Add useful context to your artifacts
Keep it focused: Handlers should do one thing

3. Configuration

Unique ports: Each agent needs a unique health port
Meaningful names: Use descriptive agent IDs and names
Version appropriately: Use semantic versioning

4. Testing

Unit test handlers: Test business logic independently
Integration test: Verify agent works with broker and Cortex
E2E test: Test the complete flow with the LLM

Troubleshooting

Build Errors

Import issues:

go mod tidy
go mod download

Agent Not Registering

Check:

Broker is running and accessible
Config has all required fields (AgentID, Name, Description)
Health port is not in use by another service
Logs for specific error messages

Tasks Not Reaching Agent

Check:

Cortex is running and has discovered your agent
Skill names and descriptions match what users are asking for
LLM is configured (not using mock LLM for delegation)
Check broker and Cortex logs for routing events

Handler Errors

Check:

Handler function signature matches TaskHandler type
Input validation is working correctly
Error messages are being returned properly
Logs show task receipt and processing

Next Steps

See the Echo Agent Example for a minimal working agent
Read Designing Effective Agent Cards for skill design guidance
Explore SubAgent Library Reference for advanced features
Check Distributed Tracing to understand built-in tracing

With the SubAgent library, creating production-ready agents is now as simple as defining your configuration, implementing your business logic, and calling Run()!

2.1.4 -

How to Design Effective Agent Cards

This guide shows you how to design AgentCards that enable effective LLM-based discovery and delegation in the Cortex orchestration system.

Why AgentCards Matter

When you register an agent with AgentHub, the Cortex orchestrator uses your AgentCard to:

Understand your agent’s capabilities - What can it do?
Match user requests - Does this request fit this agent?
Generate LLM prompts - Include your agent in decision-making
Delegate tasks - Route appropriate work to your agent

The quality of your AgentCard directly impacts how effectively Cortex can use your agent.

AgentCard Structure

type AgentCard struct {
    ProtocolVersion string             // A2A protocol version (e.g., "0.2.9")
    Name            string             // Unique agent identifier
    Description     string             // Human-readable description
    Version         string             // Agent version (e.g., "1.0.0")
    Url             string             // Service endpoint (optional)
    Capabilities    *AgentCapabilities // Technical capabilities
    Skills          []*AgentSkill      // What the agent can do
    // ... other fields
}

The most important field for Cortex integration is Skills.

Designing Skills

Each skill represents a specific capability your agent offers. The LLM uses skill information to decide when to delegate to your agent.

Skill Structure

type AgentSkill struct {
    Id          string   // Unique skill identifier
    Name        string   // Human-readable skill name
    Description string   // Detailed description of what this skill does
    Tags        []string // Categorization and keywords
    Examples    []string // Example user requests that match this skill
    InputModes  []string // Supported input types (e.g., "text/plain")
    OutputModes []string // Supported output types
}

Writing Effective Descriptions

❌ Poor description:

Description: "Processes data"

✅ Good description:

Description: "Analyzes time-series data to detect anomalies using statistical methods. " +
    "Supports multiple algorithms including Z-score, moving average, and ARIMA. " +
    "Returns anomaly locations, severity scores, and confidence intervals."

Why the good description works:

Specific about what it does (“analyzes time-series data”)
Mentions the method (“statistical methods”)
Lists supported features (“Z-score, moving average, ARIMA”)
Describes output (“anomaly locations, severity scores, confidence intervals”)

Writing Powerful Examples

Examples are critical - the LLM uses them to recognize when a user request matches your skill.

❌ Weak examples:

Examples: []string{
    "analyze data",
    "find problems",
}

✅ Strong examples:

Examples: []string{
    "Can you detect anomalies in this time series?",
    "Find unusual patterns in the sensor data",
    "Analyze this dataset for outliers",
    "Check if there are any abnormal readings",
    "Identify spikes or drops in the data",
    "Run anomaly detection on this log file",
    "Are there any suspicious values in this series?",
}

Why strong examples work:

Variety: Different phrasings (“detect anomalies”, “find unusual patterns”, “outliers”)
Natural language: How users actually ask questions
Specific: Mention domain terms (“time series”, “sensor data”, “log file”)
Action-oriented: Clear about what to do
Multiple formats: Questions and commands

Example Categories to Cover

For each skill, include examples that cover:

Direct requests: “Translate this text to Spanish”
Questions: “Can you convert this to French?”
Implied tasks: “I need this in German”
Variations: “Spanish translation please”
With context: “Translate the following paragraph to Japanese: …”
Different phrasings: “Convert to Spanish”, “Change to Spanish”, “Make it Spanish”

Complete Examples

Example 1: Translation Agent

agentCard := &pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:            "agent_translator",
    Description: "Professional-grade language translation service powered by neural machine translation. " +
        "Supports 100+ languages with context-aware translation and proper handling of idioms, " +
        "technical terms, and cultural nuances.",
    Version: "2.1.0",

    Capabilities: &pb.AgentCapabilities{
        Streaming:         false,
        PushNotifications: false,
    },

    Skills: []*pb.AgentSkill{
        {
            Id:   "translate_text",
            Name: "Text Translation",
            Description: "Translates text between any pair of 100+ supported languages including " +
                "English, Spanish, French, German, Chinese, Japanese, Arabic, Russian, and many more. " +
                "Preserves formatting, handles idioms, and maintains context. " +
                "Supports both short phrases and long documents.",

            Tags: []string{
                "translation", "language", "nlp", "localization",
                "multilingual", "i18n", "communication",
            },

            Examples: []string{
                "Translate this to Spanish",
                "Can you convert this text to French?",
                "I need this paragraph in Japanese",
                "Translate from English to German",
                "What does this mean in Chinese?",
                "Convert this Spanish text to English",
                "Please translate to Portuguese",
                "How do you say this in Italian?",
                "Russian translation needed",
                "Change this to Arabic",
            },

            InputModes:  []string{"text/plain", "text/html"},
            OutputModes: []string{"text/plain", "text/html"},
        },
        {
            Id:   "detect_language",
            Name: "Language Detection",
            Description: "Automatically identifies the language of input text with high accuracy. " +
                "Can detect 100+ languages and provides confidence scores. " +
                "Useful for routing, preprocessing, and automatic translation workflows.",

            Tags: []string{"language", "detection", "nlp", "identification"},

            Examples: []string{
                "What language is this text in?",
                "Detect the language",
                "Can you identify this language?",
                "Which language is this?",
                "Tell me what language this is",
            },

            InputModes:  []string{"text/plain"},
            OutputModes: []string{"text/plain"},
        },
    },
}

Example 2: Data Analysis Agent

agentCard := &pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:            "agent_data_analyst",
    Description: "Advanced data analysis and statistical computing agent. Performs exploratory " +
        "data analysis, statistical tests, correlation analysis, and generates insights from datasets.",
    Version: "1.5.2",

    Capabilities: &pb.AgentCapabilities{
        Streaming:         true, // Can stream large results
        PushNotifications: false,
    },

    Skills: []*pb.AgentSkill{
        {
            Id:   "analyze_dataset",
            Name: "Dataset Analysis",
            Description: "Performs comprehensive statistical analysis on datasets including " +
                "descriptive statistics (mean, median, std dev), distribution analysis, " +
                "outlier detection, correlation matrices, and trend identification. " +
                "Supports CSV, JSON, and structured data formats.",

            Tags: []string{
                "data-analysis", "statistics", "analytics", "dataset",
                "eda", "exploratory", "insights",
            },

            Examples: []string{
                "Analyze this dataset",
                "Can you provide statistics for this data?",
                "What are the key insights from this CSV?",
                "Run an analysis on this data file",
                "Give me a statistical summary",
                "Find correlations in this dataset",
                "What patterns do you see in this data?",
                "Analyze the distribution of these values",
                "Calculate descriptive statistics",
                "Identify trends in this time series",
            },

            InputModes:  []string{"text/csv", "application/json", "text/plain"},
            OutputModes: []string{"application/json", "text/plain", "text/html"},
        },
        {
            Id:   "visualize_data",
            Name: "Data Visualization",
            Description: "Creates charts and graphs from data including line charts, bar charts, " +
                "scatter plots, histograms, box plots, and heatmaps. Returns visualization " +
                "specifications in various formats.",

            Tags: []string{"visualization", "charts", "graphs", "plotting"},

            Examples: []string{
                "Create a chart from this data",
                "Visualize this dataset",
                "Make a graph of these values",
                "Plot this time series",
                "Show me a chart",
                "Generate a histogram",
                "Can you create a scatter plot?",
            },

            InputModes:  []string{"text/csv", "application/json"},
            OutputModes: []string{"image/png", "application/json", "text/html"},
        },
    },
}

Example 3: Image Processing Agent

agentCard := &pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:            "agent_image_processor",
    Description: "Image processing and computer vision agent with capabilities for transformation, " +
        "enhancement, analysis, and object detection. Supports all major image formats.",
    Version: "3.0.0",

    Skills: []*pb.AgentSkill{
        {
            Id:   "resize_image",
            Name: "Image Resizing",
            Description: "Resizes images to specified dimensions while maintaining aspect ratio " +
                "and quality. Supports various scaling algorithms including bicubic, lanczos, " +
                "and nearest neighbor. Can batch process multiple images.",

            Tags: []string{"image", "resize", "scale", "transform", "dimensions"},

            Examples: []string{
                "Resize this image to 800x600",
                "Make this image smaller",
                "Scale this photo to 50%",
                "Can you resize to thumbnail size?",
                "Change image dimensions",
                "Shrink this image",
                "Make it 1920x1080",
            },

            InputModes:  []string{"image/jpeg", "image/png", "image/webp"},
            OutputModes: []string{"image/jpeg", "image/png", "image/webp"},
        },
        {
            Id:   "detect_objects",
            Name: "Object Detection",
            Description: "Detects and identifies objects in images using deep learning models. " +
                "Can recognize 1000+ object categories including people, animals, vehicles, " +
                "furniture, and more. Returns bounding boxes and confidence scores.",

            Tags: []string{
                "computer-vision", "object-detection", "ai", "recognition",
                "detection", "classification",
            },

            Examples: []string{
                "What objects are in this image?",
                "Detect objects in this photo",
                "What do you see in this picture?",
                "Identify items in this image",
                "Find all people in this photo",
                "Detect cars in this image",
                "What's in this picture?",
            },

            InputModes:  []string{"image/jpeg", "image/png"},
            OutputModes: []string{"application/json", "text/plain"},
        },
    },
}

Best Practices Checklist

✅ Description Quality

Clearly states what the agent does
Mentions key features and capabilities
Specifies supported formats/types
Describes what output users can expect
Uses domain-specific terminology appropriately

✅ Skill Design

Each skill has a focused, specific purpose
Skill names are clear and descriptive
Descriptions explain benefits, not just features
Tags are relevant and searchable
Input/output modes accurately reflect capabilities

✅ Examples Coverage

5-10 examples per skill
Mix of questions and commands
Different phrasings and variations
Natural language, not technical jargon
Cover common use cases
Include domain-specific terms
Represent how real users ask

✅ Metadata

Version follows semantic versioning
Capabilities accurately reflect agent features
Protocol version is current
Agent name is unique and descriptive

Testing Your AgentCard

1. Manual Testing

Start your agent and check Cortex logs:

grep "Agent skills registered" cortex.log

You should see your skill descriptions displayed.

2. LLM Prompt Testing

Check what the LLM sees by enabling DEBUG logging in Cortex:

LOG_LEVEL=DEBUG ./bin/cortex

Look for prompts that include:

Available agents:
- your_agent: Your agent description
  Skills:
    * Skill Name: Skill description

3. Integration Testing

Test with actual user requests:

# Start services
./bin/broker &
./bin/cortex &
./bin/your_agent &

# Use chat CLI
./bin/chat_cli

# Try requests that match your examples
> Can you translate this to Spanish?

Watch the logs to see if Cortex delegates to your agent.

Common Mistakes to Avoid

❌ Vague Descriptions

Description: "Does useful things"

Problem: LLM can’t determine if this agent is suitable

❌ Too Few Examples

Examples: []string{"do the thing"}

Problem: LLM won’t recognize variations

❌ Technical Jargon in Examples

Examples: []string{
    "Execute POST /api/v1/translate with payload",
}

Problem: Users don’t talk like this

❌ Overly Broad Skills

{
    Name: "Do Everything",
    Description: "This agent can help with anything",
}

Problem: LLM can’t make good decisions

❌ Missing Context

{
    Name: "Process",
    Description: "Processes the input",
}

Problem: What kind of processing? What input?

Advanced Topics

Multi-Language Support

Include examples in multiple languages if your agent supports them:

Examples: []string{
    "Translate to Spanish",
    "Traduire en français",
    "Übersetzen Sie nach Deutsch",
    "日本語に翻訳",
}

Conditional Capabilities

Use metadata to indicate conditional features:

Metadata: &structpb.Struct{
    Fields: map[string]*structpb.Value{
        "requires_api_key": structpb.NewBoolValue(true),
        "max_input_size":   structpb.NewNumberValue(10000),
        "rate_limit":       structpb.NewStringValue("100/minute"),
    },
}

Skill Dependencies

Indicate if skills build on each other:

{
    Id: "advanced_analysis",
    Description: "Advanced statistical analysis. Requires dataset to be preprocessed " +
        "using the 'clean_data' skill first.",
}

Iteration and Improvement

Your AgentCard isn’t set in stone. Improve it based on:

Usage patterns: What requests do users actually make?
Delegation success: Is Cortex routing appropriate tasks?
User feedback: Are users getting what they expect?
LLM behavior: What decisions is the LLM making?

Update your AgentCard and restart your agent to reflect improvements.

Next Steps

See Creating an Agent with Cortex for implementation
Read A2A Protocol for context
Review Example Agents for inspiration
Check AGENT_DECIDE.md for the complete specification

Well-designed AgentCards are the key to effective AI orchestration!

2.2 - Observability

Practical guides for monitoring and observability setup

Observability How-to Guides

Practical step-by-step guides for setting up monitoring, metrics, and observability in your AgentHub deployments.

Available Guides

Add Observability - Integrate tracing and metrics into your agents
Use Dashboards - Set up and customize monitoring dashboards

2.2.1 - How to Add Observability to Your Agent

Use AgentHub’s unified abstractions to automatically get distributed tracing, metrics, and structured logging in your agents.

How to Add Observability to Your Agent

Goal-oriented guide: Use AgentHub’s unified abstractions to automatically get distributed tracing, metrics, and structured logging in your agents with minimal configuration.

Prerequisites

Go 1.24+ installed
Basic understanding of AgentHub concepts
10-15 minutes

Overview: What You Get Automatically

With AgentHub’s unified abstractions, you automatically get:

✅ Distributed Tracing - OpenTelemetry traces with correlation IDs ✅ Comprehensive Metrics - Performance and health monitoring ✅ Structured Logging - JSON logs with trace correlation ✅ Health Endpoints - HTTP health checks and metrics endpoints ✅ Graceful Shutdown - Clean resource management

Quick Start: Observable Agent in 5 Minutes

Step 1: Create Your Agent Using Abstractions

package main

import (
	"context"
	"time"

	"github.com/owulveryck/agenthub/internal/agenthub"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	// Create configuration (observability included automatically)
	config := agenthub.NewGRPCConfig("my-agent")
	config.HealthPort = "8083" // Unique port for your agent

	// Create AgentHub client (observability built-in)
	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic("Failed to create AgentHub client: " + err.Error())
	}

	// Automatic graceful shutdown
	defer func() {
		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 10*time.Second)
		defer shutdownCancel()
		if err := client.Shutdown(shutdownCtx); err != nil {
			client.Logger.ErrorContext(shutdownCtx, "Error during shutdown", "error", err)
		}
	}()

	// Start the client (enables observability)
	if err := client.Start(ctx); err != nil {
		client.Logger.ErrorContext(ctx, "Failed to start client", "error", err)
		panic(err)
	}

	// Your agent logic here...
	client.Logger.Info("My observable agent is running!")

	// Keep running
	select {}
}

That’s it! Your agent now has full observability.

Step 2: Configure Environment Variables

Set observability configuration via environment:

# Tracing configuration
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="my-agent"
export OTEL_SERVICE_VERSION="1.0.0"

# Health server port
export BROKER_HEALTH_PORT="8083"

# Broker connection
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"

Step 3: Run Your Observable Agent

go run main.go

Expected Output:

time=2025-09-29T10:00:00.000Z level=INFO msg="Starting health server" port=8083
time=2025-09-29T10:00:00.000Z level=INFO msg="AgentHub client connected" broker_addr=localhost:50051
time=2025-09-29T10:00:00.000Z level=INFO msg="My observable agent is running!"

Available Observability Features

Automatic Health Endpoints

Your agent automatically exposes:

Health Check: http://localhost:8083/health
Metrics: http://localhost:8083/metrics (Prometheus format)
Readiness: http://localhost:8083/ready

Structured Logging

All logs are automatically structured with trace correlation:

{
  "time": "2025-09-29T10:00:00.000Z",
  "level": "INFO",
  "msg": "Task published",
  "trace_id": "abc123...",
  "span_id": "def456...",
  "task_type": "process_document",
  "correlation_id": "req_789"
}

Distributed Tracing

Traces are automatically created for:

gRPC calls to broker
Task publishing and subscribing
Custom operations (when you use the TraceManager)

Metrics Collection

Automatic metrics include:

Task processing duration
Success/failure rates
gRPC call metrics
Health check status

Advanced Usage

Adding Custom Tracing

Use the built-in TraceManager for custom operations:

// Custom operation with tracing
ctx, span := client.TraceManager.StartPublishSpan(ctx, "my_operation", "document")
defer span.End()

// Add custom attributes
client.TraceManager.AddComponentAttribute(span, "my-component")
span.SetAttributes(attribute.String("document.id", "doc-123"))

// Your operation logic
result, err := doCustomOperation(ctx)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}

Adding Custom Metrics

Use the MetricsManager for custom metrics:

// Start timing an operation
timer := client.MetricsManager.StartTimer()
defer timer(ctx, "my_operation", "my-component")

// Your operation
processDocument()

Custom Log Fields

Use the structured logger with context:

client.Logger.InfoContext(ctx, "Processing document",
    "document_id", "doc-123",
    "user_id", "user-456",
    "processing_type", "ocr",
)

Publisher Example with Observability

package main

import (
	"context"
	"time"

	"github.com/owulveryck/agenthub/internal/agenthub"
	pb "github.com/owulveryck/agenthub/events/a2a"
	"google.golang.org/protobuf/types/known/structpb"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	// Observable client setup
	config := agenthub.NewGRPCConfig("publisher")
	config.HealthPort = "8081"

	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic(err)
	}
	defer client.Shutdown(context.Background())

	if err := client.Start(ctx); err != nil {
		panic(err)
	}

	// Create observable task publisher
	publisher := &agenthub.TaskPublisher{
		Client:         client.Client,
		TraceManager:   client.TraceManager,
		MetricsManager: client.MetricsManager,
		Logger:         client.Logger,
		ComponentName:  "publisher",
	}

	// Publish task with automatic tracing
	data, _ := structpb.NewStruct(map[string]interface{}{
		"message": "Hello, observable world!",
	})

	task := &pb.TaskMessage{
		TaskId:   "task-123",
		TaskType: "greeting",
		Data:     data,
		Priority: pb.Priority_MEDIUM,
	}

	// Automatically traced and metered
	if err := publisher.PublishTask(ctx, task); err != nil {
		client.Logger.ErrorContext(ctx, "Failed to publish task", "error", err)
	} else {
		client.Logger.InfoContext(ctx, "Task published successfully", "task_id", task.TaskId)
	}
}

Subscriber Example with Observability

package main

import (
	"context"
	"os"
	"os/signal"
	"syscall"

	"github.com/owulveryck/agenthub/internal/agenthub"
	pb "github.com/owulveryck/agenthub/events/a2a"
	"google.golang.org/protobuf/types/known/structpb"
)

func main() {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	// Observable client setup
	config := agenthub.NewGRPCConfig("subscriber")
	config.HealthPort = "8082"

	client, err := agenthub.NewAgentHubClient(config)
	if err != nil {
		panic(err)
	}
	defer client.Shutdown(context.Background())

	if err := client.Start(ctx); err != nil {
		panic(err)
	}

	// Create observable task subscriber
	subscriber := agenthub.NewTaskSubscriber(client, "my-subscriber")

	// Register handler with automatic tracing
	subscriber.RegisterHandler("greeting", func(ctx context.Context, task *pb.TaskMessage) (*structpb.Struct, pb.TaskStatus, string) {
		// This is automatically traced and logged
		client.Logger.InfoContext(ctx, "Processing greeting task", "task_id", task.TaskId)

		// Your processing logic
		result, _ := structpb.NewStruct(map[string]interface{}{
			"response": "Hello back!",
		})

		return result, pb.TaskStatus_COMPLETED, ""
	})

	// Start processing with automatic observability
	go subscriber.StartProcessing(ctx)

	// Graceful shutdown
	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
	<-sigChan
}

Configuration Reference

📖 Complete Reference: For all environment variables and configuration options, see Environment Variables Reference

Key Environment Variables

Variable	Description	Default
`JAEGER_ENDPOINT`	Jaeger tracing endpoint	"" (tracing disabled)
`SERVICE_NAME`	Service name for tracing	“agenthub-service”
`SERVICE_VERSION`	Service version	“1.0.0”
`BROKER_HEALTH_PORT`	Health endpoint port	“8080”
`AGENTHUB_BROKER_ADDR`	Broker address	“localhost”
`AGENTHUB_BROKER_PORT`	Broker port	“50051”

Health Endpoints

Each agent exposes these endpoints:

Endpoint	Purpose	Response
`/health`	Overall health status	JSON status
`/metrics`	Prometheus metrics	Metrics format
`/ready`	Readiness check	200 OK or 503

Troubleshooting

Common Issues

Issue	Solution
No traces in Jaeger	Set `JAEGER_ENDPOINT` environment variable
Health endpoint not accessible	Check `BROKER_HEALTH_PORT` is unique
Logs not structured	Ensure using `client.Logger` not standard `log`
Missing correlation IDs	Use `context.Context` in all operations

Verification Steps

Check health endpoint:
```
curl http://localhost:8083/health
```
Verify metrics:
```
curl http://localhost:8083/metrics
```
Check traces in Jaeger:
- Open http://localhost:16686
- Search for your service name

Migration from Manual Setup

If you have existing agents using manual observability setup:

Old Approach (Manual)

// 50+ lines of OpenTelemetry setup
obs, err := observability.NewObservability(config)
traceManager := observability.NewTraceManager(serviceName)
// Manual gRPC client setup
// Manual health server setup

New Approach (Unified)

// 3 lines - everything automatic
config := agenthub.NewGRPCConfig("my-agent")
client, err := agenthub.NewAgentHubClient(config)
client.Start(ctx)

The unified abstractions provide the same observability features with 90% less code and no manual setup required.

With AgentHub’s unified abstractions, observability is no longer an add-on feature but a built-in capability that comes automatically with every agent. Focus on your business logic while the platform handles monitoring, tracing, and health checks for you.

2.2.2 - How to Use Grafana Dashboards

Master the AgentHub observability dashboards to monitor, analyze, and troubleshoot your event-driven system effectively.

How to Use Grafana Dashboards

Goal-oriented guide: Master the AgentHub observability dashboards to monitor, analyze, and troubleshoot your event-driven system effectively.

Prerequisites

AgentHub observability stack running (docker-compose up -d)
AgentHub agents running with observability enabled
Basic understanding of metrics concepts
10-15 minutes

Quick Access

Grafana Dashboard: http://localhost:3333 (admin/admin)
Direct Dashboard: http://localhost:3333/d/agenthub-eda-dashboard

Dashboard Overview

The AgentHub EDA System Observatory provides comprehensive monitoring across three main areas:

Event Metrics (Top Row) - Event processing performance
Distributed Tracing (Middle) - Request flow visualization
System Health (Bottom Row) - Infrastructure monitoring

Panel-by-Panel Guide

🚀 Event Processing Rate (Top Left)

What it shows: Events processed per second by each service

How to use:

Monitor throughput: See how many events your system processes
Identify bottlenecks: Low rates may indicate performance issues
Compare services: See which agents are busiest

Reading the chart:

Green line: agenthub-broker (150 events/sec)
Blue line:  agenthub-publisher (50 events/sec)
Red line:   agenthub-subscriber (145 events/sec)

Troubleshooting:

Flat lines: No activity - check if agents are running
Dropping rates: Performance degradation - check CPU/memory
Spiky patterns: Bursty workloads - consider load balancing

🚨 Event Processing Error Rate (Top Right)

What it shows: Percentage of events that failed processing

How to use:

Monitor reliability: Should stay below 5% (green zone)
Alert threshold: Yellow above 5%, red above 10%
Quick health check: Single glance system reliability

Color coding:

Green (0-5%): Healthy system
Yellow (5-10%): Moderate issues
Red (>10%): Critical problems

Troubleshooting:

High error rates: Check Jaeger for failing traces
Sudden spikes: Look for recent deployments or config changes
Persistent errors: Check logs for recurring issues

📈 Event Types Distribution (Middle Left)

What it shows: Breakdown of event types by volume

How to use:

Understand workload: See what types of tasks dominate
Capacity planning: Identify which task types need scaling
Anomaly detection: Unusual distributions may indicate issues

Example interpretation:

greeting: 40% (blue) - Most common task type
math_calculation: 35% (green) - Heavy computational tasks
random_number: 20% (yellow) - Quick tasks
unknown_task: 5% (red) - Error-generating tasks

Troubleshooting:

Missing task types: Check if specific agents are down
Unexpected distributions: May indicate upstream issues
Dominant error types: Focus optimization efforts

⏱️ Event Processing Latency (Middle Right)

What it shows: Processing time percentiles (p50, p95, p99)

How to use:

Performance monitoring: Track how fast events are processed
SLA compliance: Ensure latencies meet requirements
Outlier detection: p99 shows worst-case scenarios

Understanding percentiles:

p50 (median): 50% of events process faster than this
p95: 95% of events process faster than this
p99: 99% of events process faster than this

Healthy ranges:

p50: < 50ms (very responsive)
p95: < 200ms (good performance)
p99: < 500ms (acceptable outliers)

Troubleshooting:

Rising latencies: Check CPU/memory usage
High p99: Look for resource contention or long-running tasks
Flatlined metrics: May indicate measurement issues

🔍 Distributed Traces (Middle Section)

What it shows: Integration with Jaeger for trace visualization

How to use:

Click “Explore” to open Jaeger
Select service from dropdown
Find specific traces to debug issues
Analyze request flows across services

When to use:

Debugging errors: Find root cause of failures
Performance analysis: Identify slow operations
Understanding flows: See complete request journeys

🖥️ Service CPU Usage (Bottom Left)

What it shows: CPU utilization by service

How to use:

Capacity monitoring: Ensure services aren’t overloaded
Resource planning: Identify when to scale
Performance correlation: High CPU often explains high latency

Healthy ranges:

< 50%: Comfortable utilization
50-70%: Moderate load
> 70%: Consider scaling

💾 Service Memory Usage (Bottom Center)

What it shows: Memory consumption by service

How to use:

Memory leak detection: Watch for continuously growing usage
Capacity planning: Ensure sufficient memory allocation
Garbage collection: High usage may impact performance

Monitoring tips:

Steady growth: May indicate memory leaks
Sawtooth pattern: Normal GC behavior
Sudden spikes: Check for large event batches

🧵 Go Goroutines (Bottom Right)

What it shows: Number of concurrent goroutines per service

How to use:

Concurrency monitoring: Track parallel processing
Resource leak detection: Continuously growing numbers indicate leaks
Performance tuning: Optimize concurrency levels

Normal patterns:

Stable baseline: Normal operation
Activity spikes: During high load
Continuous growth: Potential goroutine leaks

🏥 Service Health Status (Bottom Far Right)

What it shows: Up/down status of each service

How to use:

Quick status check: See if all services are running
Outage detection: Immediately identify down services
Health monitoring: Green = UP, Red = DOWN

Dashboard Variables and Filters

Service Filter

Location: Top of dashboard Purpose: Filter metrics by specific services Usage:

Select “All” to see everything
Choose specific services to focus analysis
Useful for isolating problems to specific components

Event Type Filter

Location: Top of dashboard Purpose: Filter by event/task types Usage:

Analyze specific workflow types
Debug particular task categories
Compare performance across task types

Time Range Selector

Location: Top right of dashboard Purpose: Control time window for analysis Common ranges:

5 minutes: Real-time monitoring
1 hour: Recent trend analysis
24 hours: Daily pattern analysis
7 days: Weekly trend and capacity planning

Advanced Usage Patterns

Performance Investigation Workflow

Start with Overview:
- Check error rates (should be < 5%)
- Verify processing rates look normal
- Scan for any red/yellow indicators
Drill Down on Issues:
- If high error rates → check distributed traces
- If high latency → examine CPU/memory usage
- If low throughput → check service health
Root Cause Analysis:
- Use time range selector to find when problems started
- Filter by specific services to isolate issues
- Correlate metrics across different panels

Capacity Planning Workflow

Analyze Peak Patterns:
- Set time range to 7 days
- Identify peak usage periods
- Note maximum throughput achieved
Resource Utilization:
- Check CPU usage during peaks
- Monitor memory consumption trends
- Verify goroutine scaling behavior
Plan Scaling:
- If CPU > 70% during peaks, scale up
- If memory continuously growing, investigate leaks
- If error rates spike during load, optimize before scaling

Troubleshooting Workflow

Identify Symptoms:
- High error rates: Focus on traces and logs
- High latency: Check resource utilization
- Low throughput: Verify service health
Time Correlation:
- Use time range to find when issues started
- Look for correlated changes across metrics
- Check for deployment or configuration changes
Service Isolation:
- Use service filter to identify problematic components
- Compare healthy vs unhealthy services
- Check inter-service dependencies

Dashboard Customization

Adding New Panels

Click “+ Add panel” in top menu
Choose visualization type:
- Time series for trends
- Stat for current values
- Gauge for thresholds

Configure query:

# Example: Custom error rate
rate(my_custom_errors_total[5m]) / rate(my_custom_requests_total[5m]) * 100

Creating Alerts

Edit existing panel or create new one
Click “Alert” tab

Configure conditions:

Query: rate(event_errors_total[5m]) / rate(events_processed_total[5m]) * 100
Condition: IS ABOVE 5
Evaluation: Every 1m for 2m

Set notification channels

Custom Time Ranges

Click time picker (top right)
Select “Custom range”
Set specific dates/times for historical analysis
Use “Refresh” settings for auto-updating

Troubleshooting Dashboard Issues

Dashboard Not Loading

# Check Grafana status
docker-compose ps grafana

# Check Grafana logs
docker-compose logs grafana

# Restart if needed
docker-compose restart grafana

No Data in Panels

# Check Prometheus connection
curl http://localhost:9090/api/v1/targets

# Verify agents are exposing metrics
curl http://localhost:8080/metrics
curl http://localhost:8081/metrics
curl http://localhost:8082/metrics

# Check Prometheus configuration
docker-compose logs prometheus

Slow Dashboard Performance

Reduce time range: Use shorter windows for better performance
Limit service selection: Filter to specific services
Optimize queries: Use appropriate rate intervals
Check resource usage: Ensure Prometheus has enough memory

Authentication Issues

Default credentials: admin/admin
Reset password: Through Grafana UI after first login
Lost access: Restart Grafana container to reset

Best Practices

Regular Monitoring

Check dashboard daily: Quick health overview
Weekly reviews: Trend analysis and capacity planning
Set up alerts: Proactive monitoring for critical metrics

Performance Optimization

Use appropriate time ranges: Don’t query more data than needed
Filter effectively: Use service and event type filters
Refresh intervals: Balance real-time needs with performance

Team Usage

Share dashboard URLs: Bookmark specific views
Create annotations: Mark deployments and incidents
Export snapshots: Share findings with team members

Integration with Other Tools

Jaeger Integration

Click Explore in traces panel
Auto-links to Jaeger with service context
Correlate traces with metrics timeframes

Prometheus Integration

Click Explore on any panel
Edit queries in Prometheus query language
Access raw metrics for custom analysis

Log Correlation

Use trace IDs from Jaeger
Search logs for matching trace IDs
Correlate log events with metric spikes

🎯 Next Steps:

Deep Debugging: Debug with Distributed Tracing

Production Setup: Configure Alerts

Understanding: Observability Architecture Explained

2.3 - Agent2Agent Protocol

Learn how to work with Agent2Agent (A2A) protocol components including messages, conversation contexts, artifacts, and task lifecycle management.

Agent2Agent Protocol How-To Guides

This section provides practical guides for working with the Agent2Agent (A2A) protocol in AgentHub. These guides show you how to implement A2A-compliant communication patterns for building robust agent systems.

Available Guides

Working with A2A Messages

Learn how to create, structure, and process A2A messages with text, data, and file content parts. This is the foundation for all A2A communication.

Working with A2A Conversation Context

Understand how to manage conversation contexts for multi-turn interactions, workflow coordination, and state preservation across agent communications.

Working with A2A Artifacts

Master the creation and handling of A2A artifacts - structured outputs that deliver rich results from completed tasks.

Working with A2A Task Lifecycle

Learn how to manage the complete task lifecycle from creation through completion, including state transitions, progress updates, and error handling.

A2A Protocol Benefits

The Agent2Agent protocol provides:

Structured Communication: Standardized message formats with rich content types
Conversation Threading: Context-aware message grouping for complex workflows
Rich Artifacts: Structured outputs with multiple content types
Lifecycle Management: Complete task state tracking from submission to completion
Interoperability: Standards-based communication for multi-vendor agent systems

Prerequisites

Before following these guides:

Complete the Installation and Setup tutorial
Run the AgentHub Demo to see A2A in action
Understand the Agent2Agent Principle

Implementation Approach

These guides use AgentHub’s unified abstractions from internal/agenthub which provide:

A2ATaskPublisher: Simplified A2A task creation and publishing
A2ATaskSubscriber: Streamlined A2A task processing and response generation
Automatic Observability: Built-in tracing, metrics, and logging
Environment Configuration: Zero-config setup with environment variables

Start with the A2A Messages guide to learn the fundamentals, then progress through the other guides to build complete A2A-compliant agent systems.

2.3.1 - How to Work with A2A Messages

Learn how to create, structure, and work with Agent2Agent protocol messages including text, data, and file parts.

How to Work with A2A Messages

This guide shows you how to create and work with Agent2Agent (A2A) protocol messages using AgentHub’s unified abstractions. A2A messages are the foundation of all agent communication.

Understanding A2A Message Structure

A2A messages consist of several key components:

Message ID: Unique identifier for the message
Context ID: Groups related messages in a conversation
Task ID: Links the message to a specific task
Role: Indicates if the message is from USER (requester) or AGENT (responder)
Content Parts: The actual message content (text, data, or files)
Metadata: Additional context for routing and processing

Creating Basic A2A Messages

Text Messages

Create a simple text message:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/timestamppb"
)

func createTextMessage() *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_greeting",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Hello! Please process this greeting request.",
                },
            },
        },
        Metadata: nil, // Optional
    }
}

Data Messages

Include structured data in your message:

import (
    "google.golang.org/protobuf/types/known/structpb"
)

func createDataMessage() *pb.Message {
    // Create structured data
    data, err := structpb.NewStruct(map[string]interface{}{
        "operation": "calculate",
        "numbers":   []float64{10, 20, 30},
        "formula":   "sum",
        "precision": 2,
    })
    if err != nil {
        log.Fatal(err)
    }

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_math",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please perform the calculation described in the data.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        data,
                        Description: "Calculation parameters",
                    },
                },
            },
        },
    }
}

File Reference Messages

Reference files in your messages:

func createFileMessage() *pb.Message {
    // Create file metadata
    fileMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "source":      "user_upload",
        "category":    "image",
        "permissions": "read-only",
    })

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_image_analysis",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please analyze the uploaded image.",
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "file_abc123",
                        Filename: "analysis_target.jpg",
                        MimeType: "image/jpeg",
                        SizeBytes: 2048576, // 2MB
                        Metadata:  fileMetadata,
                    },
                },
            },
        },
    }
}

Working with Mixed Content

Combine multiple part types in a single message:

func createMixedContentMessage() *pb.Message {
    // Configuration data
    config, _ := structpb.NewStruct(map[string]interface{}{
        "format":     "json",
        "output_dir": "/results",
        "compress":   true,
    })

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: "conversation_data_processing",
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Process the dataset with the following configuration and source file.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        config,
                        Description: "Processing configuration",
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "dataset_xyz789",
                        Filename: "raw_data.csv",
                        MimeType: "text/csv",
                        SizeBytes: 5242880, // 5MB
                    },
                },
            },
        },
    }
}

Publishing A2A Messages

Use AgentHub’s unified abstractions to publish messages:

package main

import (
    "context"
    "log"

    "github.com/owulveryck/agenthub/internal/agenthub"
    pb "github.com/owulveryck/agenthub/events/eventbus"
)

func publishA2AMessage(ctx context.Context) error {
    // Create AgentHub client
    config := agenthub.NewGRPCConfig("message_publisher")
    client, err := agenthub.NewAgentHubClient(config)
    if err != nil {
        return err
    }
    defer client.Close()

    // Create A2A message
    message := createTextMessage()

    // Publish using AgentHub client
    response, err := client.Client.PublishMessage(ctx, &pb.PublishMessageRequest{
        Message: message,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "message_publisher",
            ToAgentId:   "message_processor",
            EventType:   "a2a.message",
            Priority:    pb.Priority_PRIORITY_MEDIUM,
        },
    })

    if err != nil {
        return err
    }

    log.Printf("A2A message published: %s", response.GetEventId())
    return nil
}

Processing Received A2A Messages

Handle incoming A2A messages in your agent:

func processA2AMessage(ctx context.Context, message *pb.Message) (string, error) {
    var response string

    // Process each content part
    for i, part := range message.GetContent() {
        switch content := part.GetPart().(type) {
        case *pb.Part_Text:
            log.Printf("Text part %d: %s", i, content.Text)
            response += fmt.Sprintf("Processed text: %s\n", content.Text)

        case *pb.Part_Data:
            log.Printf("Data part %d: %s", i, content.Data.GetDescription())
            // Process structured data
            data := content.Data.GetData()
            response += fmt.Sprintf("Processed data: %s\n", content.Data.GetDescription())

            // Access specific fields
            if operation, ok := data.GetFields()["operation"]; ok {
                log.Printf("Operation: %s", operation.GetStringValue())
            }

        case *pb.Part_File:
            log.Printf("File part %d: %s (%s)", i, content.File.GetFilename(), content.File.GetMimeType())
            response += fmt.Sprintf("Processed file: %s\n", content.File.GetFilename())

            // Handle file processing based on MIME type
            switch content.File.GetMimeType() {
            case "image/jpeg", "image/png":
                // Process image
                response += "Image analysis completed\n"
            case "text/csv":
                // Process CSV data
                response += "CSV data parsed\n"
            }
        }
    }

    return response, nil
}

Message Role Management

Properly set message roles for A2A compliance:

// User message (requesting work)
func createUserMessage(content string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{Text: content},
            },
        },
    }
}

// Agent response message
func createAgentResponse(contextId, taskId, response string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextId,
        TaskId:    taskId,
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{Text: response},
            },
        },
    }
}

Message Validation

Validate A2A messages before publishing:

func validateA2AMessage(message *pb.Message) error {
    if message.GetMessageId() == "" {
        return fmt.Errorf("message_id is required")
    }

    if message.GetRole() == pb.Role_ROLE_UNSPECIFIED {
        return fmt.Errorf("role must be specified (USER or AGENT)")
    }

    if len(message.GetContent()) == 0 {
        return fmt.Errorf("message must have at least one content part")
    }

    // Validate each part
    for i, part := range message.GetContent() {
        if part.GetPart() == nil {
            return fmt.Errorf("content part %d is empty", i)
        }
    }

    return nil
}

Best Practices

1. Always Use Unique Message IDs

messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())

contextID := fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())

3. Include Descriptive Metadata for Complex Data

dataPart := &pb.DataPart{
    Data:        structData,
    Description: "User preferences for recommendation engine",
}

4. Validate Messages Before Publishing

if err := validateA2AMessage(message); err != nil {
    return fmt.Errorf("invalid A2A message: %w", err)
}

5. Handle All Part Types in Message Processors

switch content := part.GetPart().(type) {
case *pb.Part_Text:
    // Handle text
case *pb.Part_Data:
    // Handle structured data
case *pb.Part_File:
    // Handle file references
default:
    log.Printf("Unknown part type: %T", content)
}

This guide covered the fundamentals of working with A2A messages. Next, learn about A2A Conversation Context to group related messages and maintain conversation state across multiple interactions.

2.3.2 - How to Work with A2A Conversation Context

Learn how to manage conversation contexts in Agent2Agent protocol for multi-turn interactions and workflow coordination.

How to Work with A2A Conversation Context

This guide shows you how to use A2A conversation contexts to group related messages, maintain state across interactions, and coordinate multi-agent workflows.

Understanding A2A Conversation Context

A2A conversation context is identified by a context_id that groups related messages and tasks. This enables:

Multi-turn conversations between agents
Workflow coordination across multiple tasks
State preservation throughout long-running processes
Message threading for audit trails
Context-aware routing based on conversation history

Creating Conversation Contexts

Simple Conversation Context

Start a basic conversation context:

package main

import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
)

func createConversationContext(workflowType string) string {
    return fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())
}

func startConversation() *pb.Message {
    contextID := createConversationContext("user_onboarding")

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please start the user onboarding process for new user.",
                },
            },
        },
    }
}

Workflow-Specific Contexts

Create contexts for different workflow types:

func createWorkflowContexts() map[string]string {
    return map[string]string{
        "data_analysis":    createConversationContext("data_analysis"),
        "image_processing": createConversationContext("image_processing"),
        "user_support":     createConversationContext("user_support"),
        "integration_test": createConversationContext("integration_test"),
    }
}

Multi-Turn Conversations

Conversation Initiation

Start a conversation with initial context:

import (
    "google.golang.org/protobuf/types/known/structpb"
)

func initiateDataAnalysisConversation() *pb.Message {
    contextID := createConversationContext("data_analysis")

    // Initial conversation metadata
    contextMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "workflow_type":    "data_analysis",
        "initiated_by":     "user_12345",
        "priority":         "high",
        "expected_steps":   []string{"validation", "processing", "analysis", "report"},
        "timeout_minutes":  30,
    })

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Please analyze the uploaded dataset and provide insights.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        contextMetadata,
                        Description: "Conversation context and workflow parameters",
                    },
                },
            },
        },
        Metadata: contextMetadata,
    }
}

Continuing the Conversation

Add follow-up messages to the same context:

func continueConversation(contextID, previousMessageID string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID, // Same context as initial message
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Can you also include trend analysis in the report?",
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "follows_message": structpb.NewStringValue(previousMessageID),
                "conversation_turn": structpb.NewNumberValue(2),
            },
        },
    }
}

Agent Responses in Context

Agents respond within the same conversation context:

func createAgentResponse(contextID, requestMessageID, response string) *pb.Message {
    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID, // Same context as request
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: response,
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "responding_to": structpb.NewStringValue(requestMessageID),
                "agent_id":      structpb.NewStringValue("data_analysis_agent"),
            },
        },
    }
}

Context-Aware Task Management

Creating Tasks with Context

Link tasks to conversation contexts:

import (
    "google.golang.org/protobuf/types/known/timestamppb"
)

func createContextAwareTask(contextID string) *pb.Task {
    taskID := fmt.Sprintf("task_%s_%s", "analysis", uuid.New().String())

    return &pb.Task{
        Id:        taskID,
        ContextId: contextID, // Link to conversation
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: contextID,
                TaskId:    taskID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Task submitted for data analysis workflow",
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
        History: []*pb.Message{}, // Will be populated during processing
        Artifacts: []*pb.Artifact{}, // Will be populated on completion
    }
}

Context-Based Task Querying

Retrieve all tasks for a conversation context:

func getTasksForContext(ctx context.Context, client pb.AgentHubClient, contextID string) ([]*pb.Task, error) {
    response, err := client.ListTasks(ctx, &pb.ListTasksRequest{
        ContextId: contextID,
        Limit:     100,
    })
    if err != nil {
        return nil, err
    }

    return response.GetTasks(), nil
}

Workflow Coordination

Multi-Agent Workflow with Shared Context

Coordinate multiple agents within a single conversation:

type WorkflowCoordinator struct {
    client    pb.AgentHubClient
    contextID string
    logger    *log.Logger
}

func (wc *WorkflowCoordinator) ExecuteDataPipeline(ctx context.Context) error {
    // Step 1: Data Validation
    validationTask := &pb.Task{
        Id:        fmt.Sprintf("task_validation_%s", uuid.New().String()),
        ContextId: wc.contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: wc.contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Validate uploaded dataset for quality and completeness",
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }

    // Publish validation task
    _, err := wc.client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
        Task: validationTask,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "data_validator",
            EventType:   "task.validation",
            Priority:    pb.Priority_PRIORITY_HIGH,
        },
    })
    if err != nil {
        return err
    }

    // Step 2: Data Processing (after validation)
    processingTask := &pb.Task{
        Id:        fmt.Sprintf("task_processing_%s", uuid.New().String()),
        ContextId: wc.contextID, // Same context
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: wc.contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Process validated dataset and extract features",
                        },
                    },
                },
                Metadata: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "depends_on": structpb.NewStringValue(validationTask.GetId()),
                        "workflow_step": structpb.NewNumberValue(2),
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }

    // Publish processing task
    _, err = wc.client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
        Task: processingTask,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "data_processor",
            EventType:   "task.processing",
            Priority:    pb.Priority_PRIORITY_MEDIUM,
        },
    })

    return err
}

Context State Management

Tracking Conversation State

Maintain state throughout the conversation:

type ConversationState struct {
    ContextID     string                 `json:"context_id"`
    WorkflowType  string                 `json:"workflow_type"`
    CurrentStep   int                    `json:"current_step"`
    TotalSteps    int                    `json:"total_steps"`
    CompletedTasks []string              `json:"completed_tasks"`
    PendingTasks   []string              `json:"pending_tasks"`
    Variables      map[string]interface{} `json:"variables"`
    CreatedAt      time.Time             `json:"created_at"`
    UpdatedAt      time.Time             `json:"updated_at"`
}

func (cs *ConversationState) ToMetadata() (*structpb.Struct, error) {
    data := map[string]interface{}{
        "context_id":      cs.ContextID,
        "workflow_type":   cs.WorkflowType,
        "current_step":    cs.CurrentStep,
        "total_steps":     cs.TotalSteps,
        "completed_tasks": cs.CompletedTasks,
        "pending_tasks":   cs.PendingTasks,
        "variables":       cs.Variables,
        "updated_at":      cs.UpdatedAt.Format(time.RFC3339),
    }

    return structpb.NewStruct(data)
}

func (cs *ConversationState) UpdateFromMessage(message *pb.Message) {
    cs.UpdatedAt = time.Now()

    // Extract state updates from message metadata
    if metadata := message.GetMetadata(); metadata != nil {
        if step, ok := metadata.GetFields()["current_step"]; ok {
            cs.CurrentStep = int(step.GetNumberValue())
        }

        if vars, ok := metadata.GetFields()["variables"]; ok {
            if varsStruct := vars.GetStructValue(); varsStruct != nil {
                for key, value := range varsStruct.GetFields() {
                    cs.Variables[key] = value
                }
            }
        }
    }
}

State-Aware Message Creation

Include conversation state in messages:

func createStateAwareMessage(contextID string, state *ConversationState, content string) *pb.Message {
    stateMetadata, _ := state.ToMetadata()

    return &pb.Message{
        MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
        ContextId: contextID,
        Role:      pb.Role_USER,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: content,
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        stateMetadata,
                        Description: "Current conversation state",
                    },
                },
            },
        },
        Metadata: stateMetadata,
    }
}

Context-Based Routing

Route Messages Based on Context

Use conversation context for intelligent routing:

func routeByContext(contextID string) *pb.AgentEventMetadata {
    // Determine routing based on context type
    var targetAgent string
    var eventType string

    if strings.Contains(contextID, "data_analysis") {
        targetAgent = "data_analysis_agent"
        eventType = "data.analysis"
    } else if strings.Contains(contextID, "image_processing") {
        targetAgent = "image_processor"
        eventType = "image.processing"
    } else if strings.Contains(contextID, "user_support") {
        targetAgent = "support_agent"
        eventType = "support.request"
    } else {
        targetAgent = "" // Broadcast to all agents
        eventType = "general.message"
    }

    return &pb.AgentEventMetadata{
        FromAgentId:   "context_router",
        ToAgentId:     targetAgent,
        EventType:     eventType,
        Subscriptions: []string{eventType},
        Priority:      pb.Priority_PRIORITY_MEDIUM,
    }
}

Agents can subscribe to specific conversation contexts:

func subscribeToContextEvents(ctx context.Context, client pb.AgentHubClient, agentID, contextPattern string) error {
    stream, err := client.SubscribeToMessages(ctx, &pb.SubscribeToMessagesRequest{
        AgentId: agentID,
        ContextPattern: contextPattern, // e.g., "ctx_data_analysis_*"
    })
    if err != nil {
        return err
    }

    for {
        event, err := stream.Recv()
        if err != nil {
            return err
        }

        if message := event.GetMessage(); message != nil {
            log.Printf("Received context message: %s in context: %s",
                message.GetMessageId(), message.GetContextId())

            // Process message within context
            processContextMessage(ctx, message)
        }
    }
}

Best Practices

1. Use Descriptive Context IDs

contextID := fmt.Sprintf("ctx_%s_%s_%s", workflowType, userID, uuid.New().String())

// All messages in the same workflow should use the same context_id
message.ContextId = existingContextID

3. Include Context Metadata for State Tracking

contextMetadata := map[string]interface{}{
    "workflow_type":   "data_pipeline",
    "initiated_by":    userID,
    "current_step":    stepNumber,
    "total_steps":     totalSteps,
}

4. Use Context for Task Dependencies

taskMetadata := map[string]interface{}{
    "context_id":     contextID,
    "depends_on":     previousTaskID,
    "workflow_step":  stepNumber,
}

5. Handle Context Cleanup

// Set context expiration for long-running workflows
contextMetadata["expires_at"] = time.Now().Add(24 * time.Hour).Format(time.RFC3339)

This guide covered conversation context management in A2A protocol. Next, learn about Working with A2A Artifacts to understand how to create and manage structured outputs from completed tasks.

2.3.3 - How to Work with A2A Artifacts

Learn how to create, structure, and deliver Agent2Agent protocol artifacts as structured outputs from completed tasks.

How to Work with A2A Artifacts

This guide shows you how to create and work with Agent2Agent (A2A) protocol artifacts, which are structured outputs delivered when tasks are completed. Artifacts provide rich, typed results that can include text reports, data files, structured data, and more.

Understanding A2A Artifacts

A2A artifacts are structured containers for task outputs that include:

Artifact ID: Unique identifier for the artifact
Name: Human-readable name for the artifact
Description: Explanation of what the artifact contains
Parts: The actual content (text, data, files)
Metadata: Additional context about the artifact

Artifacts are typically generated when tasks reach TASK_STATE_COMPLETED status.

Creating Basic Artifacts

Text Report Artifacts

Create simple text-based results:

package main

import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/structpb"
)

func createTextReportArtifact(taskID, reportContent string) *pb.Artifact {
    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_%s_%s", taskID, uuid.New().String()),
        Name:        "Analysis Report",
        Description: "Detailed analysis results and recommendations",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: reportContent,
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type": structpb.NewStringValue("report"),
                "format":        structpb.NewStringValue("text"),
                "task_id":       structpb.NewStringValue(taskID),
                "generated_at":  structpb.NewStringValue(time.Now().Format(time.RFC3339)),
            },
        },
    }
}

Data Analysis Artifacts

Create artifacts with structured analysis results:

func createDataAnalysisArtifact(taskID string, results map[string]interface{}) *pb.Artifact {
    // Convert results to structured data
    resultsData, err := structpb.NewStruct(results)
    if err != nil {
        log.Printf("Error creating results data: %v", err)
        resultsData = &structpb.Struct{}
    }

    // Create summary statistics
    summary := map[string]interface{}{
        "total_records":    results["record_count"],
        "processing_time":  results["duration_ms"],
        "success_rate":     results["success_percentage"],
        "anomalies_found":  results["anomaly_count"],
    }
    summaryData, _ := structpb.NewStruct(summary)

    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_analysis_%s", uuid.New().String()),
        Name:        "Data Analysis Results",
        Description: "Complete analysis results with statistics and insights",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: "Data analysis completed successfully. See attached results for detailed findings.",
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        resultsData,
                        Description: "Complete analysis results",
                    },
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        summaryData,
                        Description: "Summary statistics",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":   structpb.NewStringValue("analysis"),
                "analysis_type":   structpb.NewStringValue("statistical"),
                "data_source":     structpb.NewStringValue(results["source"].(string)),
                "record_count":    structpb.NewNumberValue(results["record_count"].(float64)),
                "processing_time": structpb.NewNumberValue(results["duration_ms"].(float64)),
            },
        },
    }
}

File-Based Artifacts

Create artifacts that reference generated files:

func createFileArtifact(taskID, fileID, filename, mimeType string, sizeBytes int64) *pb.Artifact {
    // File metadata
    fileMetadata, _ := structpb.NewStruct(map[string]interface{}{
        "generated_by":   "data_processor_v1.2",
        "file_version":   "1.0",
        "encoding":       "utf-8",
        "compression":    "gzip",
        "checksum_sha256": "abc123...", // Calculate actual checksum
    })

    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_file_%s", uuid.New().String()),
        Name:        "Processed Dataset",
        Description: "Cleaned and processed dataset ready for analysis",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Dataset processing completed. Generated file: %s", filename),
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   fileID,
                        Filename: filename,
                        MimeType: mimeType,
                        SizeBytes: sizeBytes,
                        Metadata:  fileMetadata,
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":    structpb.NewStringValue("file"),
                "file_type":        structpb.NewStringValue("dataset"),
                "original_task":    structpb.NewStringValue(taskID),
                "processing_stage": structpb.NewStringValue("cleaned"),
            },
        },
    }
}

Complex Multi-Part Artifacts

Complete Analysis Package

Create comprehensive artifacts with multiple content types:

func createCompleteAnalysisArtifact(taskID string, analysisResults map[string]interface{}) *pb.Artifact {
    // Executive summary
    summary := fmt.Sprintf(`
Analysis Complete: %s

Key Findings:
- Processed %v records
- Found %v anomalies
- Success rate: %v%%
- Processing time: %v ms

Recommendations:
%s
`,
        analysisResults["dataset_name"],
        analysisResults["record_count"],
        analysisResults["anomaly_count"],
        analysisResults["success_percentage"],
        analysisResults["duration_ms"],
        analysisResults["recommendations"],
    )

    // Detailed results data
    detailedResults, _ := structpb.NewStruct(analysisResults)

    // Configuration used
    configData, _ := structpb.NewStruct(map[string]interface{}{
        "algorithm":         "statistical_analysis_v2",
        "confidence_level":  0.95,
        "outlier_threshold": 2.5,
        "normalization":     "z-score",
    })

    return &pb.Artifact{
        ArtifactId:  fmt.Sprintf("artifact_complete_%s", uuid.New().String()),
        Name:        "Complete Analysis Package",
        Description: "Full analysis results including summary, data, configuration, and generated files",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: summary,
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        detailedResults,
                        Description: "Detailed analysis results and metrics",
                    },
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        configData,
                        Description: "Analysis configuration parameters",
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "results_visualization_123",
                        Filename: "analysis_charts.png",
                        MimeType: "image/png",
                        SizeBytes: 1024000,
                    },
                },
            },
            {
                Part: &pb.Part_File{
                    File: &pb.FilePart{
                        FileId:   "results_dataset_456",
                        Filename: "processed_data.csv",
                        MimeType: "text/csv",
                        SizeBytes: 5120000,
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "artifact_type":     structpb.NewStringValue("complete_package"),
                "analysis_type":     structpb.NewStringValue("comprehensive"),
                "includes_files":    structpb.NewBoolValue(true),
                "includes_data":     structpb.NewBoolValue(true),
                "includes_summary":  structpb.NewBoolValue(true),
                "file_count":        structpb.NewNumberValue(2),
                "total_size_bytes":  structpb.NewNumberValue(6144000),
            },
        },
    }
}

Publishing Artifacts

Using A2A Task Completion

Publish artifacts when completing tasks:

import (
    "context"
    "github.com/owulveryck/agenthub/internal/agenthub"
    eventbus "github.com/owulveryck/agenthub/events/eventbus"
)

func completeTaskWithArtifact(ctx context.Context, client eventbus.AgentHubClient, task *pb.Task, artifact *pb.Artifact) error {
    // Update task status to completed
    task.Status = &pb.TaskStatus{
        State: pb.TaskState_TASK_STATE_COMPLETED,
        Update: &pb.Message{
            MessageId: fmt.Sprintf("msg_completion_%s", uuid.New().String()),
            ContextId: task.GetContextId(),
            TaskId:    task.GetId(),
            Role:      pb.Role_AGENT,
            Content: []*pb.Part{
                {
                    Part: &pb.Part_Text{
                        Text: "Task completed successfully. Artifact has been generated.",
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    }

    // Add artifact to task
    task.Artifacts = append(task.Artifacts, artifact)

    // Publish task completion
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: task,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "processing_agent",
            ToAgentId:   "", // Broadcast completion
            EventType:   "task.completed",
            Priority:    eventbus.Priority_PRIORITY_MEDIUM,
        },
    })

    if err != nil {
        return fmt.Errorf("failed to publish task completion: %w", err)
    }

    // Separately publish artifact update
    return publishArtifactUpdate(ctx, client, task.GetId(), artifact)
}

func publishArtifactUpdate(ctx context.Context, client eventbus.AgentHubClient, taskID string, artifact *pb.Artifact) error {
    _, err := client.PublishTaskArtifact(ctx, &eventbus.PublishTaskArtifactRequest{
        TaskId:   taskID,
        Artifact: artifact,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "processing_agent",
            ToAgentId:   "", // Broadcast to interested parties
            EventType:   "artifact.created",
            Priority:    eventbus.Priority_PRIORITY_LOW,
        },
    })

    return err
}

Using A2A Abstractions

Use AgentHub’s simplified artifact publishing:

func completeTaskWithA2AArtifact(ctx context.Context, subscriber *agenthub.A2ATaskSubscriber, task *pb.Task, artifact *pb.Artifact) error {
    return subscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
}

Processing Received Artifacts

Artifact Event Handling

Handle incoming artifact notifications:

func handleArtifactEvents(ctx context.Context, client eventbus.AgentHubClient, agentID string) error {
    stream, err := client.SubscribeToAgentEvents(ctx, &eventbus.SubscribeToAgentEventsRequest{
        AgentId: agentID,
        EventTypes: []string{"artifact.created", "task.completed"},
    })
    if err != nil {
        return err
    }

    for {
        event, err := stream.Recv()
        if err != nil {
            return err
        }

        switch payload := event.GetPayload().(type) {
        case *eventbus.AgentEvent_ArtifactUpdate:
            artifactEvent := payload.ArtifactUpdate
            log.Printf("Received artifact: %s for task: %s",
                artifactEvent.GetArtifact().GetArtifactId(),
                artifactEvent.GetTaskId())

            // Process the artifact
            err := processArtifact(ctx, artifactEvent.GetArtifact())
            if err != nil {
                log.Printf("Error processing artifact: %v", err)
            }

        case *eventbus.AgentEvent_Task:
            task := payload.Task
            if task.GetStatus().GetState() == pb.TaskState_TASK_STATE_COMPLETED {
                // Process completed task artifacts
                for _, artifact := range task.GetArtifacts() {
                    err := processArtifact(ctx, artifact)
                    if err != nil {
                        log.Printf("Error processing task artifact: %v", err)
                    }
                }
            }
        }
    }
}

Artifact Content Processing

Process different types of artifact content:

func processArtifact(ctx context.Context, artifact *pb.Artifact) error {
    log.Printf("Processing artifact: %s - %s", artifact.GetName(), artifact.GetDescription())

    for i, part := range artifact.GetParts() {
        switch content := part.GetPart().(type) {
        case *pb.Part_Text:
            log.Printf("Text part %d: Processing text content (%d chars)", i, len(content.Text))
            // Process text content
            err := processTextArtifact(content.Text)
            if err != nil {
                return fmt.Errorf("failed to process text part: %w", err)
            }

        case *pb.Part_Data:
            log.Printf("Data part %d: Processing structured data (%s)", i, content.Data.GetDescription())
            // Process structured data
            err := processDataArtifact(content.Data.GetData())
            if err != nil {
                return fmt.Errorf("failed to process data part: %w", err)
            }

        case *pb.Part_File:
            log.Printf("File part %d: Processing file %s (%s, %d bytes)",
                i, content.File.GetFilename(), content.File.GetMimeType(), content.File.GetSizeBytes())
            // Process file reference
            err := processFileArtifact(ctx, content.File)
            if err != nil {
                return fmt.Errorf("failed to process file part: %w", err)
            }
        }
    }

    return nil
}

func processTextArtifact(text string) error {
    // Extract insights, save to database, etc.
    log.Printf("Extracting insights from text artifact...")
    return nil
}

func processDataArtifact(data *structpb.Struct) error {
    // Parse structured data, update metrics, etc.
    log.Printf("Processing structured data artifact...")

    // Access specific fields
    if recordCount, ok := data.GetFields()["record_count"]; ok {
        log.Printf("Records processed: %v", recordCount.GetNumberValue())
    }

    return nil
}

func processFileArtifact(ctx context.Context, file *pb.FilePart) error {
    // Download file, process content, etc.
    log.Printf("Processing file artifact: %s", file.GetFileId())

    // Handle different file types
    switch file.GetMimeType() {
    case "text/csv":
        return processCSVFile(ctx, file.GetFileId())
    case "image/png", "image/jpeg":
        return processImageFile(ctx, file.GetFileId())
    case "application/json":
        return processJSONFile(ctx, file.GetFileId())
    default:
        log.Printf("Unknown file type: %s", file.GetMimeType())
    }

    return nil
}

Artifact Chaining

Using Artifacts as Inputs

Use artifacts from one task as inputs to another:

func chainArtifactProcessing(ctx context.Context, client eventbus.AgentHubClient, inputArtifact *pb.Artifact) error {
    // Create a new task using the artifact as input
    contextID := fmt.Sprintf("ctx_chained_%s", uuid.New().String())

    chainedTask := &pb.Task{
        Id:        fmt.Sprintf("task_chained_%s", uuid.New().String()),
        ContextId: contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: fmt.Sprintf("msg_%s", uuid.New().String()),
                ContextId: contextID,
                Role:      pb.Role_USER,
                Content: []*pb.Part{
                    {
                        Part: &pb.Part_Text{
                            Text: "Please process the results from the previous analysis task.",
                        },
                    },
                    {
                        Part: &pb.Part_Data{
                            Data: &pb.DataPart{
                                Data: &structpb.Struct{
                                    Fields: map[string]*structpb.Value{
                                        "input_artifact_id": structpb.NewStringValue(inputArtifact.GetArtifactId()),
                                        "processing_type":   structpb.NewStringValue("enhancement"),
                                    },
                                },
                                Description: "Processing parameters with input artifact reference",
                            },
                        },
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
    }

    // Publish the chained task
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: chainedTask,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: "workflow_coordinator",
            ToAgentId:   "enhancement_processor",
            EventType:   "task.chained",
            Priority:    eventbus.Priority_PRIORITY_MEDIUM,
        },
    })

    return err
}

Best Practices

1. Use Descriptive Artifact Names and Descriptions

artifact := &pb.Artifact{
    Name:        "Customer Segmentation Analysis Results",
    Description: "Complete customer segmentation with demographics, behavior patterns, and actionable insights",
    // ...
}

2. Include Rich Metadata for Discovery

metadata := map[string]interface{}{
    "artifact_type":    "analysis",
    "domain":          "customer_analytics",
    "data_source":     "customer_transactions_2024",
    "algorithm":       "k_means_clustering",
    "confidence":      0.94,
    "generated_by":    "analytics_engine_v2.1",
    "valid_until":     time.Now().Add(30*24*time.Hour).Format(time.RFC3339),
}

3. Structure Multi-Part Artifacts Logically

// Order parts from most important to least important
parts := []*pb.Part{
    textSummaryPart,      // Human-readable summary first
    structuredDataPart,   // Machine-readable data second
    configurationPart,    // Configuration details third
    fileReferencePart,    // File references last
}

4. Validate Artifacts Before Publishing

func validateArtifact(artifact *pb.Artifact) error {
    if artifact.GetArtifactId() == "" {
        return fmt.Errorf("artifact_id is required")
    }
    if len(artifact.GetParts()) == 0 {
        return fmt.Errorf("artifact must have at least one part")
    }
    return nil
}

5. Handle Large Artifacts Appropriately

// For large data, use file references instead of inline data
if len(dataBytes) > 1024*1024 { // 1MB threshold
    // Save to file storage and reference
    fileID := saveToFileStorage(dataBytes)
    part = createFileReferencePart(fileID, filename, mimeType)
} else {
    // Include data inline
    part = createInlineDataPart(data)
}

This guide covered creating and working with A2A artifacts. Next, learn about A2A Task Lifecycle Management to understand how to properly manage task states and coordinate complex workflows.

2.3.4 - How to Work with A2A Task Lifecycle

Learn how to manage Agent2Agent protocol task states, handle lifecycle transitions, and coordinate complex task workflows.

How to Work with A2A Task Lifecycle

This guide shows you how to manage the complete lifecycle of Agent2Agent (A2A) protocol tasks, from creation through completion. Understanding task states and transitions is essential for building reliable agent workflows.

Understanding A2A Task States

A2A tasks progress through the following states:

TASK_STATE_SUBMITTED: Task created and submitted for processing
TASK_STATE_WORKING: Task accepted and currently being processed
TASK_STATE_COMPLETED: Task finished successfully with results
TASK_STATE_FAILED: Task failed with error information
TASK_STATE_CANCELLED: Task cancelled before completion

Each state transition is recorded with a timestamp and status message.

Creating A2A Tasks

Basic Task Creation

Create a new task with initial state:

package main

import (
    "fmt"
    "github.com/google/uuid"
    pb "github.com/owulveryck/agenthub/events/a2a"
    "google.golang.org/protobuf/types/known/timestamppb"
    "google.golang.org/protobuf/types/known/structpb"
)

func createA2ATask(contextID, taskType string, content []*pb.Part) *pb.Task {
    taskID := fmt.Sprintf("task_%s_%s", taskType, uuid.New().String())
    messageID := fmt.Sprintf("msg_%s", uuid.New().String())

    return &pb.Task{
        Id:        taskID,
        ContextId: contextID,
        Status: &pb.TaskStatus{
            State: pb.TaskState_TASK_STATE_SUBMITTED,
            Update: &pb.Message{
                MessageId: messageID,
                ContextId: contextID,
                TaskId:    taskID,
                Role:      pb.Role_USER,
                Content:   content,
                Metadata: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "task_type":      structpb.NewStringValue(taskType),
                        "submitted_by":   structpb.NewStringValue("user_agent"),
                        "priority":       structpb.NewStringValue("medium"),
                    },
                },
            },
            Timestamp: timestamppb.Now(),
        },
        History:   []*pb.Message{},
        Artifacts: []*pb.Artifact{},
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "task_type":    structpb.NewStringValue(taskType),
                "created_at":   structpb.NewStringValue(time.Now().Format(time.RFC3339)),
                "expected_duration": structpb.NewStringValue("5m"),
            },
        },
    }
}

Task with Complex Requirements

Create tasks with detailed specifications:

func createComplexAnalysisTask(contextID string) *pb.Task {
    // Task configuration
    taskConfig, _ := structpb.NewStruct(map[string]interface{}{
        "algorithm":         "advanced_ml_analysis",
        "confidence_level":  0.95,
        "max_processing_time": "30m",
        "output_formats":    []string{"json", "csv", "visualization"},
        "quality_threshold": 0.9,
    })

    // Input data specification
    inputSpec, _ := structpb.NewStruct(map[string]interface{}{
        "dataset_id":       "customer_data_2024",
        "required_fields":  []string{"customer_id", "transaction_amount", "timestamp"},
        "date_range":       map[string]string{"start": "2024-01-01", "end": "2024-12-31"},
        "preprocessing":    true,
    })

    content := []*pb.Part{
        {
            Part: &pb.Part_Text{
                Text: "Perform comprehensive customer behavior analysis on the specified dataset with advanced ML algorithms.",
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data:        taskConfig,
                    Description: "Analysis configuration parameters",
                },
            },
        },
        {
            Part: &pb.Part_Data{
                Data: &pb.DataPart{
                    Data:        inputSpec,
                    Description: "Input dataset specification",
                },
            },
        },
    }

    task := createA2ATask(contextID, "customer_analysis", content)

    // Add complex task metadata
    task.Metadata = &structpb.Struct{
        Fields: map[string]*structpb.Value{
            "task_type":           structpb.NewStringValue("customer_analysis"),
            "complexity":          structpb.NewStringValue("high"),
            "estimated_duration":  structpb.NewStringValue("30m"),
            "required_resources":  structpb.NewListValue(&structpb.ListValue{
                Values: []*structpb.Value{
                    structpb.NewStringValue("gpu_compute"),
                    structpb.NewStringValue("large_memory"),
                },
            }),
            "deliverables":        structpb.NewListValue(&structpb.ListValue{
                Values: []*structpb.Value{
                    structpb.NewStringValue("analysis_report"),
                    structpb.NewStringValue("customer_segments"),
                    structpb.NewStringValue("predictions"),
                },
            }),
        },
    }

    return task
}

Task State Transitions

Accepting a Task (SUBMITTED → WORKING)

When an agent accepts a task:

func acceptTask(task *pb.Task, agentID string) *pb.Task {
    // Create acceptance message
    acceptanceMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_accept_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task accepted by agent %s. Beginning processing.", agentID),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "accepting_agent": structpb.NewStringValue(agentID),
                "estimated_completion": structpb.NewStringValue(
                    time.Now().Add(15*time.Minute).Format(time.RFC3339),
                ),
            },
        },
    }

    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_WORKING,
        Update:    acceptanceMessage,
        Timestamp: timestamppb.Now(),
    }

    // Add to history
    task.History = append(task.History, acceptanceMessage)

    return task
}

Progress Updates (WORKING → WORKING)

Send progress updates during processing:

func sendProgressUpdate(task *pb.Task, progressPercentage int, currentPhase, details string) *pb.Task {
    // Create progress data
    progressData, _ := structpb.NewStruct(map[string]interface{}{
        "progress_percentage": progressPercentage,
        "current_phase":       currentPhase,
        "details":            details,
        "estimated_remaining": calculateRemainingTime(progressPercentage),
        "memory_usage_mb":     getCurrentMemoryUsage(),
        "cpu_usage_percent":   getCurrentCPUUsage(),
    })

    progressMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_progress_%s_%d", uuid.New().String(), progressPercentage),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Progress update: %d%% complete. Current phase: %s",
                        progressPercentage, currentPhase),
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        progressData,
                        Description: "Detailed progress information",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "update_type":         structpb.NewStringValue("progress"),
                "progress_percentage": structpb.NewNumberValue(float64(progressPercentage)),
                "phase":              structpb.NewStringValue(currentPhase),
            },
        },
    }

    // Update task status (still WORKING, but with new message)
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_WORKING,
        Update:    progressMessage,
        Timestamp: timestamppb.Now(),
    }

    // Add to history
    task.History = append(task.History, progressMessage)

    return task
}

func calculateRemainingTime(progressPercentage int) string {
    if progressPercentage <= 0 {
        return "unknown"
    }
    // Simplified estimation logic
    remainingMinutes := (100 - progressPercentage) * 15 / 100
    return fmt.Sprintf("%dm", remainingMinutes)
}

Completing a Task (WORKING → COMPLETED)

Complete a task with results:

func completeTask(task *pb.Task, results string, artifacts []*pb.Artifact) *pb.Task {
    // Create completion message
    completionMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_complete_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task completed successfully. %s", results),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "completion_status": structpb.NewStringValue("success"),
                "processing_time":   structpb.NewStringValue(
                    time.Since(getTaskStartTime(task)).String(),
                ),
                "artifact_count":    structpb.NewNumberValue(float64(len(artifacts))),
            },
        },
    }

    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_COMPLETED,
        Update:    completionMessage,
        Timestamp: timestamppb.Now(),
    }

    // Add completion message to history
    task.History = append(task.History, completionMessage)

    // Add artifacts
    task.Artifacts = append(task.Artifacts, artifacts...)

    return task
}

Handling Task Failures (WORKING → FAILED)

Handle task failures with detailed error information:

func failTask(task *pb.Task, errorMessage, errorCode string, errorDetails map[string]interface{}) *pb.Task {
    // Create error data
    errorData, _ := structpb.NewStruct(map[string]interface{}{
        "error_code":    errorCode,
        "error_message": errorMessage,
        "error_details": errorDetails,
        "failure_phase": getCurrentProcessingPhase(task),
        "retry_possible": determineRetryPossibility(errorCode),
        "diagnostic_info": map[string]interface{}{
            "memory_at_failure": getCurrentMemoryUsage(),
            "cpu_at_failure":   getCurrentCPUUsage(),
            "logs_reference":   getLogReference(),
        },
    })

    failureMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_failure_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task failed: %s (Code: %s)", errorMessage, errorCode),
                },
            },
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data:        errorData,
                        Description: "Detailed error information and diagnostics",
                    },
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "failure_type":  structpb.NewStringValue("processing_error"),
                "error_code":    structpb.NewStringValue(errorCode),
                "retry_possible": structpb.NewBoolValue(determineRetryPossibility(errorCode)),
            },
        },
    }

    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_FAILED,
        Update:    failureMessage,
        Timestamp: timestamppb.Now(),
    }

    // Add failure message to history
    task.History = append(task.History, failureMessage)

    return task
}

func determineRetryPossibility(errorCode string) bool {
    // Determine if the error is retryable
    retryableErrors := []string{
        "TEMPORARY_RESOURCE_UNAVAILABLE",
        "NETWORK_TIMEOUT",
        "RATE_LIMIT_EXCEEDED",
    }

    for _, retryable := range retryableErrors {
        if errorCode == retryable {
            return true
        }
    }
    return false
}

Cancelling Tasks (ANY → CANCELLED)

Handle task cancellation:

func cancelTask(task *pb.Task, reason, cancelledBy string) *pb.Task {
    cancellationMessage := &pb.Message{
        MessageId: fmt.Sprintf("msg_cancel_%s", uuid.New().String()),
        ContextId: task.GetContextId(),
        TaskId:    task.GetId(),
        Role:      pb.Role_AGENT,
        Content: []*pb.Part{
            {
                Part: &pb.Part_Text{
                    Text: fmt.Sprintf("Task cancelled: %s", reason),
                },
            },
        },
        Metadata: &structpb.Struct{
            Fields: map[string]*structpb.Value{
                "cancellation_reason": structpb.NewStringValue(reason),
                "cancelled_by":        structpb.NewStringValue(cancelledBy),
                "previous_state":      structpb.NewStringValue(task.GetStatus().GetState().String()),
            },
        },
    }

    // Update task status
    task.Status = &pb.TaskStatus{
        State:     pb.TaskState_TASK_STATE_CANCELLED,
        Update:    cancellationMessage,
        Timestamp: timestamppb.Now(),
    }

    // Add cancellation message to history
    task.History = append(task.History, cancellationMessage)

    return task
}

Publishing Task Updates

Using AgentHub Client

Publish task updates through the AgentHub broker:

import (
    "context"
    eventbus "github.com/owulveryck/agenthub/events/eventbus"
)

func publishTaskUpdate(ctx context.Context, client eventbus.AgentHubClient, task *pb.Task, fromAgent, toAgent string) error {
    _, err := client.PublishTaskUpdate(ctx, &eventbus.PublishTaskUpdateRequest{
        Task: task,
        Routing: &eventbus.AgentEventMetadata{
            FromAgentId: fromAgent,
            ToAgentId:   toAgent,
            EventType:   fmt.Sprintf("task.%s", task.GetStatus().GetState().String()),
            Priority:    getPriorityFromTaskState(task.GetStatus().GetState()),
        },
    })

    return err
}

func getPriorityFromTaskState(state pb.TaskState) eventbus.Priority {
    switch state {
    case pb.TaskState_TASK_STATE_FAILED:
        return eventbus.Priority_PRIORITY_HIGH
    case pb.TaskState_TASK_STATE_COMPLETED:
        return eventbus.Priority_PRIORITY_MEDIUM
    case pb.TaskState_TASK_STATE_WORKING:
        return eventbus.Priority_PRIORITY_LOW
    default:
        return eventbus.Priority_PRIORITY_MEDIUM
    }
}

Using A2A Abstractions

Use simplified A2A task management:

import (
    "github.com/owulveryck/agenthub/internal/agenthub"
)

func manageTaskWithA2A(ctx context.Context, subscriber *agenthub.A2ATaskSubscriber, task *pb.Task) error {
    // Process the task
    artifact, status, errorMsg := processTaskContent(ctx, task)

    switch status {
    case pb.TaskState_TASK_STATE_COMPLETED:
        return subscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
    case pb.TaskState_TASK_STATE_FAILED:
        return subscriber.FailA2ATask(ctx, task, errorMsg)
    default:
        return subscriber.UpdateA2ATaskProgress(ctx, task, 50, "Processing data", "Halfway complete")
    }
}

Task Monitoring and Querying

Get Task Status

Query task status and history:

func getTaskStatus(ctx context.Context, client eventbus.AgentHubClient, taskID string) (*pb.Task, error) {
    task, err := client.GetTask(ctx, &eventbus.GetTaskRequest{
        TaskId:        taskID,
        HistoryLength: 10, // Get last 10 messages
    })
    if err != nil {
        return nil, err
    }

    // Log current status
    log.Printf("Task %s status: %s", taskID, task.GetStatus().GetState().String())
    log.Printf("Last update: %s", task.GetStatus().GetUpdate().GetContent()[0].GetText())
    log.Printf("History length: %d messages", len(task.GetHistory()))
    log.Printf("Artifacts: %d", len(task.GetArtifacts()))

    return task, nil
}

List Tasks by Context

Get all tasks for a conversation context:

func getTasksInContext(ctx context.Context, client eventbus.AgentHubClient, contextID string) ([]*pb.Task, error) {
    response, err := client.ListTasks(ctx, &eventbus.ListTasksRequest{
        ContextId: contextID,
        States:    []pb.TaskState{}, // All states
        Limit:     100,
    })
    if err != nil {
        return nil, err
    }

    tasks := response.GetTasks()
    log.Printf("Found %d tasks in context %s", len(tasks), contextID)

    // Analyze task distribution
    stateCount := make(map[pb.TaskState]int)
    for _, task := range tasks {
        stateCount[task.GetStatus().GetState()]++
    }

    for state, count := range stateCount {
        log.Printf("  %s: %d tasks", state.String(), count)
    }

    return tasks, nil
}

Workflow Coordination

Sequential Task Workflow

Create dependent tasks that execute in sequence:

type TaskWorkflow struct {
    ContextID string
    Tasks     []*pb.Task
    Current   int
}

func (tw *TaskWorkflow) ExecuteNext(ctx context.Context, client eventbus.AgentHubClient) error {
    if tw.Current >= len(tw.Tasks) {
        return fmt.Errorf("workflow completed")
    }

    currentTask := tw.Tasks[tw.Current]

    // Add dependency metadata if not first task
    if tw.Current > 0 {
        previousTask := tw.Tasks[tw.Current-1]
        dependencyMetadata := map[string]interface{}{
            "depends_on":     previousTask.GetId(),
            "workflow_step":  tw.Current + 1,
            "total_steps":    len(tw.Tasks),
        }

        metadata, _ := structpb.NewStruct(dependencyMetadata)
        currentTask.Metadata = metadata
    }

    // Publish the task
    err := publishTaskUpdate(ctx, client, currentTask, "workflow_coordinator", "")
    if err != nil {
        return err
    }

    tw.Current++
    return nil
}

Parallel Task Execution

Execute multiple tasks concurrently:

func executeParallelTasks(ctx context.Context, client eventbus.AgentHubClient, tasks []*pb.Task) error {
    var wg sync.WaitGroup
    errors := make(chan error, len(tasks))

    for _, task := range tasks {
        wg.Add(1)
        go func(t *pb.Task) {
            defer wg.Done()

            // Add parallel execution metadata
            t.Metadata = &structpb.Struct{
                Fields: map[string]*structpb.Value{
                    "execution_mode": structpb.NewStringValue("parallel"),
                    "batch_id":       structpb.NewStringValue(uuid.New().String()),
                    "batch_size":     structpb.NewNumberValue(float64(len(tasks))),
                },
            }

            err := publishTaskUpdate(ctx, client, t, "parallel_coordinator", "")
            if err != nil {
                errors <- err
            }
        }(task)
    }

    wg.Wait()
    close(errors)

    // Check for errors
    for err := range errors {
        if err != nil {
            return err
        }
    }

    return nil
}

Best Practices

1. Always Update Task Status

// Update status for every significant state change
task = acceptTask(task, agentID)
publishTaskUpdate(ctx, client, task, agentID, "")

2. Provide Meaningful Progress Updates

// Send regular progress updates during long-running tasks
for progress := 10; progress <= 90; progress += 10 {
    task = sendProgressUpdate(task, progress, currentPhase, details)
    publishTaskUpdate(ctx, client, task, agentID, "")
    time.Sleep(processingInterval)
}

3. Include Rich Error Information

errorDetails := map[string]interface{}{
    "input_validation_errors": validationErrors,
    "system_resources":        resourceSnapshot,
    "retry_strategy":         "exponential_backoff",
}
task = failTask(task, "Data validation failed", "INVALID_INPUT", errorDetails)

4. Maintain Complete Message History

// Always append to history, never replace
task.History = append(task.History, statusMessage)

5. Use Appropriate Metadata

// Include context for debugging and monitoring
metadata := map[string]interface{}{
    "processing_node":  hostname,
    "resource_usage":   resourceMetrics,
    "performance_metrics": performanceData,
}

This guide covered the complete A2A task lifecycle management. You now have the tools to create, manage, and coordinate complex task workflows with proper state management and comprehensive observability.

2.4 - Debugging

Troubleshooting and debugging guides for AgentHub

Debugging How-to Guides

Practical troubleshooting guides to help you diagnose and resolve issues in your AgentHub deployments.

Available Guides

Debugging Agent Issues - Diagnose and fix common agent problems

2.4.1 - How to Debug Agent Issues

Practical steps for troubleshooting common issues when developing and deploying agents with AgentHub.

How to Debug Agent Issues

This guide provides practical steps for troubleshooting common issues when developing and deploying agents with AgentHub.

Common Connection Issues

Problem: Agent Can’t Connect to Broker

Symptoms:

Failed to connect: connection refused

Solutions:

Check if broker is running:

# Check if broker process is running
ps aux | grep broker

# Check if port 50051 is listening
netstat -tlnp | grep 50051
# or
lsof -i :50051

Verify broker address and configuration:

// Using unified abstraction - configuration via environment or code
config := agenthub.NewGRPCConfig("subscriber")
config.BrokerAddr = "localhost"  // Default
config.BrokerPort = "50051"      // Default

// Or set via environment variables:
// export AGENTHUB_BROKER_ADDR="localhost"
// export AGENTHUB_BROKER_PORT="50051"

Check firewall settings:

# On Linux, check if port is blocked
sudo ufw status

# Allow port if needed
sudo ufw allow 50051

Problem: TLS/SSL Errors

Symptoms:

transport: authentication handshake failed

Solution: The unified abstraction handles TLS configuration automatically:

// TLS and connection management handled automatically
config := agenthub.NewGRPCConfig("subscriber")
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    panic(err)
}

Task Processing Issues

Problem: Agent Not Receiving Tasks

Debug Steps:

Check subscription logs:

log.Printf("Agent %s subscribing to tasks...", agentID)
// Should see: "Successfully subscribed to tasks for agent {agentID}"

Verify agent ID matching:

// In publisher
ResponderAgentId: "my_processing_agent"

// In subscriber (must match exactly)
const agentID = "my_processing_agent"

Check task type filtering:

req := &pb.SubscribeToTasksRequest{
    AgentId: agentID,
    TaskTypes: []string{"math_calculation"}, // Remove to receive all types
}

Monitor broker logs:

# Broker should show:
Received task request: task_xyz (type: math) from agent: publisher_agent
# And either:
No subscribers for task from agent 'publisher_agent'  # Bad - no matching agents
# Or task routing to subscribers  # Good - task delivered

Problem: Tasks Timing Out

Debug Steps:

Check task processing time:

func processTask(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    start := time.Now()
    defer func() {
        log.Printf("Task %s took %v to process", task.GetTaskId(), time.Since(start))
    }()

    // Your processing logic
}

Add timeout handling:

func processTaskWithTimeout(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    // Create timeout context
    taskCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    // Process with timeout
    select {
    case <-taskCtx.Done():
        if taskCtx.Err() == context.DeadlineExceeded {
            sendResult(ctx, task, nil, pb.TaskStatus_TASK_STATUS_FAILED, "Task timeout", client)
        }
        return
    default:
        // Process normally
    }
}

Monitor progress updates:

// Send progress every few seconds
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()

go func() {
    progress := 0
    for range ticker.C {
        progress += 10
        if progress > 100 {
            return
        }
        sendProgress(ctx, task, int32(progress), "Still processing...", client)
    }
}()

Message Serialization Issues

Problem: Parameter Marshaling Errors

Symptoms:

Error creating parameters struct: proto: invalid value type

Solution: Ensure all parameter values are compatible with structpb:

// Bad - channels, functions, complex types not supported
params := map[string]interface{}{
    "callback": func() {},  // Not supported
    "channel": make(chan int),  // Not supported
}

// Good - basic types only
params := map[string]interface{}{
    "name": "value",           // string
    "count": 42,               // number
    "enabled": true,           // boolean
    "items": []string{"a", "b"}, // array
    "config": map[string]interface{}{ // nested object
        "timeout": 30,
    },
}

Problem: Result Unmarshaling Issues

Debug Steps:

Check result structure:

func handleTaskResult(result *pb.TaskResult) {
    log.Printf("Raw result: %+v", result.GetResult())

    resultMap := result.GetResult().AsMap()
    log.Printf("Result as map: %+v", resultMap)

    // Type assert carefully
    if value, ok := resultMap["count"].(float64); ok {
        log.Printf("Count: %f", value)
    } else {
        log.Printf("Count field missing or wrong type: %T", resultMap["count"])
    }
}

Handle type conversion safely:

func getStringField(m map[string]interface{}, key string) (string, error) {
    if val, ok := m[key]; ok {
        if str, ok := val.(string); ok {
            return str, nil
        }
        return "", fmt.Errorf("field %s is not a string: %T", key, val)
    }
    return "", fmt.Errorf("field %s not found", key)
}

func getNumberField(m map[string]interface{}, key string) (float64, error) {
    if val, ok := m[key]; ok {
        if num, ok := val.(float64); ok {
            return num, nil
        }
        return 0, fmt.Errorf("field %s is not a number: %T", key, val)
    }
    return 0, fmt.Errorf("field %s not found", key)
}

Stream and Connection Issues

Problem: Stream Disconnections

Symptoms:

Error receiving task: rpc error: code = Unavailable desc = connection error

Solutions:

Implement retry logic:

func subscribeToTasksWithRetry(ctx context.Context, client pb.EventBusClient) {
    for {
        err := subscribeToTasks(ctx, client)
        if err != nil {
            log.Printf("Subscription error: %v, retrying in 5 seconds...", err)
            time.Sleep(5 * time.Second)
            continue
        }
        break
    }
}

Handle context cancellation:

for {
    task, err := stream.Recv()
    if err == io.EOF {
        log.Printf("Stream closed by server")
        return
    }
    if err != nil {
        if ctx.Err() != nil {
            log.Printf("Context cancelled: %v", ctx.Err())
            return
        }
        log.Printf("Stream error: %v", err)
        return
    }
    // Process task
}

Problem: Memory Leaks in Long-Running Agents

Debug Steps:

Monitor memory usage:

# Check memory usage
ps -o pid,ppid,cmd,%mem,%cpu -p $(pgrep -f "your-agent")

# Continuous monitoring
watch -n 5 'ps -o pid,ppid,cmd,%mem,%cpu -p $(pgrep -f "your-agent")'

Profile memory usage:

import _ "net/http/pprof"
import "net/http"

func main() {
    // Start pprof server
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()

    // Your agent code
}

Access profiles at http://localhost:6060/debug/pprof/

Check for goroutine leaks:

import "runtime"

func logGoroutines() {
    ticker := time.NewTicker(30 * time.Second)
    go func() {
        for range ticker.C {
            log.Printf("Goroutines: %d", runtime.NumGoroutine())
        }
    }()
}

Performance Issues

Problem: Slow Task Processing

Debug Steps:

Add timing measurements:

func processTask(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    timings := make(map[string]time.Duration)

    start := time.Now()

    // Phase 1: Parameter validation
    timings["validation"] = time.Since(start)
    last := time.Now()

    // Phase 2: Business logic
    // ... your logic here ...
    timings["processing"] = time.Since(last)
    last = time.Now()

    // Phase 3: Result formatting
    // ... result creation ...
    timings["formatting"] = time.Since(last)

    log.Printf("Task %s timings: %+v", task.GetTaskId(), timings)
}

Profile CPU usage:

import "runtime/pprof"
import "os"

func startCPUProfile() func() {
    f, err := os.Create("cpu.prof")
    if err != nil {
        log.Fatal(err)
    }
    pprof.StartCPUProfile(f)

    return func() {
        pprof.StopCPUProfile()
        f.Close()
    }
}

func main() {
    stop := startCPUProfile()
    defer stop()

    // Your agent code
}

Monitor queue sizes:

type Agent struct {
    taskQueue chan *pb.TaskMessage
}

func (a *Agent) logQueueSize() {
    ticker := time.NewTicker(10 * time.Second)
    go func() {
        for range ticker.C {
            log.Printf("Task queue size: %d/%d", len(a.taskQueue), cap(a.taskQueue))
        }
    }()
}

Debugging Tools and Techniques

1. Enable Verbose Logging

import "log"
import "os"

func init() {
    // Enable verbose logging
    log.SetFlags(log.LstdFlags | log.Lshortfile)

    // Set log level from environment
    if os.Getenv("DEBUG") == "true" {
        log.SetOutput(os.Stdout)
    }
}

2. Add Structured Logging

import "encoding/json"
import "time"

type LogEntry struct {
    Timestamp string                 `json:"timestamp"`
    Level     string                 `json:"level"`
    AgentID   string                 `json:"agent_id"`
    TaskID    string                 `json:"task_id,omitempty"`
    Message   string                 `json:"message"`
    Data      map[string]interface{} `json:"data,omitempty"`
}

func logInfo(agentID, taskID, message string, data map[string]interface{}) {
    entry := LogEntry{
        Timestamp: time.Now().Format(time.RFC3339),
        Level:     "INFO",
        AgentID:   agentID,
        TaskID:    taskID,
        Message:   message,
        Data:      data,
    }

    if jsonData, err := json.Marshal(entry); err == nil {
        log.Println(string(jsonData))
    }
}

3. Health Check Endpoint

import "net/http"
import "encoding/json"

type HealthStatus struct {
    Status       string    `json:"status"`
    AgentID      string    `json:"agent_id"`
    Uptime       string    `json:"uptime"`
    TasksProcessed int64   `json:"tasks_processed"`
    LastTaskTime  time.Time `json:"last_task_time"`
}

func startHealthServer(agent *Agent) {
    http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
        status := HealthStatus{
            Status:         "healthy",
            AgentID:        agent.ID,
            Uptime:         time.Since(agent.StartTime).String(),
            TasksProcessed: agent.TasksProcessed,
            LastTaskTime:   agent.LastTaskTime,
        }

        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(status)
    })

    log.Printf("Health server starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", nil))
}

4. Task Tracing

import "context"

type TraceID string

func withTraceID(ctx context.Context) context.Context {
    traceID := TraceID(fmt.Sprintf("trace-%d", time.Now().UnixNano()))
    return context.WithValue(ctx, "trace_id", traceID)
}

func getTraceID(ctx context.Context) TraceID {
    if traceID, ok := ctx.Value("trace_id").(TraceID); ok {
        return traceID
    }
    return ""
}

func processTaskWithTracing(ctx context.Context, task *pb.TaskMessage, client pb.EventBusClient) {
    ctx = withTraceID(ctx)
    traceID := getTraceID(ctx)

    log.Printf("[%s] Starting task %s", traceID, task.GetTaskId())
    defer log.Printf("[%s] Finished task %s", traceID, task.GetTaskId())

    // Your processing logic with trace ID logging
}

Common Error Patterns

1. Resource Exhaustion

Signs:

Tasks start failing after running for a while
Memory usage continuously increases
File descriptor limits reached

Solutions:

Implement proper resource cleanup
Add connection pooling
Set task processing limits

2. Deadlocks

Signs:

Agent stops processing tasks
Health checks show agent as “stuck”

Solutions:

Avoid blocking operations in main goroutines
Use timeouts for all operations
Implement deadlock detection

3. Race Conditions

Signs:

Intermittent task failures
Inconsistent behavior
Data corruption

Solutions:

Use proper synchronization primitives
Run race detector: go run -race your-agent.go
Add mutex protection for shared state

With these debugging techniques, you should be able to identify and resolve most agent-related issues efficiently.

3 - Reference

Comprehensive technical documentation and API specifications

Reference Documentation

This section contains comprehensive technical documentation for all AgentHub components, APIs, and configuration options. Use this as your authoritative source for implementation details.

📚 Documentation Sections

Configuration - Complete configuration options and settings
API Reference - gRPC APIs, unified abstractions, and tracing interfaces
Observability - Metrics, health endpoints, and monitoring
Tasks - Task message specifications and data structures

🎯 How to Use This Reference

Accuracy: All information is kept up-to-date with the latest version
Completeness: Every public API and configuration option is documented
Examples: Code examples illustrate usage where helpful
Structure: Information is organized by component and function

By Component

Broker: API Reference, Configuration
Agents: Task Reference, Unified API
Observability: Metrics, Health Endpoints

By Use Case

Integration: API Reference, Configuration
Monitoring: Health Endpoints, Metrics
Development: Unified API, Task Reference

Note

This reference documentation describes the current stable version. For experimental features, check the explanation section or source code.

3.1 - Configuration

Configuration reference and settings documentation

Configuration Reference

This section provides comprehensive documentation for all AgentHub configuration options, environment variables, and settings.

Available Documentation

Environment Variables Reference - Complete environment variable reference for unified abstractions
Configuration Reference - Complete configuration options and examples

3.1.1 - Environment Variables Reference

Complete reference for all environment variables used by AgentHub’s unified abstractions for configuration and observability.

Environment Variables Reference

This reference documents all environment variables used by AgentHub’s unified abstraction system. All components automatically load these variables for configuration.

Core Configuration

Broker Connection

Variable	Default	Description	Used By
`AGENTHUB_BROKER_ADDR`	`localhost`	Broker server hostname or IP address	Agents
`AGENTHUB_BROKER_PORT`	`50051`	Broker gRPC port number	Agents
`AGENTHUB_GRPC_PORT`	`:50051`	Server listen address (for broker)	Broker

Example:

export AGENTHUB_BROKER_ADDR="production-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"

Health Monitoring

Variable	Default	Description	Used By
`BROKER_HEALTH_PORT`	`8080`	Broker health check endpoint port	Broker
`PUBLISHER_HEALTH_PORT`	`8081`	Publisher health check endpoint port	Publishers
`SUBSCRIBER_HEALTH_PORT`	`8082`	Subscriber health check endpoint port	Subscribers

Health Endpoints Available:

http://localhost:8080/health - Health check
http://localhost:8080/metrics - Prometheus metrics
http://localhost:8080/ready - Readiness check

Example:

export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"

Observability Configuration

Distributed Tracing

Variable	Default	Description	Used By
`JAEGER_ENDPOINT`	`127.0.0.1:4317`	Jaeger OTLP endpoint for traces	All components
`SERVICE_NAME`	`agenthub-service`	Service name for tracing	All components
`SERVICE_VERSION`	`1.0.0`	Service version for telemetry	All components

Example:

export JAEGER_ENDPOINT="http://jaeger.example.com:14268/api/traces"
export SERVICE_NAME="my-agenthub-app"
export SERVICE_VERSION="2.1.0"

Jaeger Integration:

When JAEGER_ENDPOINT is set: Automatic tracing enabled
When empty or unset: Tracing disabled (minimal overhead)
Supports both gRPC (4317) and HTTP (14268) endpoints

Metrics Collection

Variable	Default	Description	Used By
`PROMETHEUS_PORT`	`9090`	Prometheus server port	Observability stack
`GRAFANA_PORT`	`3333`	Grafana dashboard port	Observability stack
`ALERTMANAGER_PORT`	`9093`	AlertManager port	Observability stack

Example:

export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
export ALERTMANAGER_PORT="9093"

OpenTelemetry Collector

Variable	Default	Description	Used By
`OTLP_GRPC_PORT`	`4320`	OTLP Collector gRPC port	Observability stack
`OTLP_HTTP_PORT`	`4321`	OTLP Collector HTTP port	Observability stack

Example:

export OTLP_GRPC_PORT="4320"
export OTLP_HTTP_PORT="4321"

Service Configuration

General Settings

Variable	Default	Description	Used By
`ENVIRONMENT`	`development`	Deployment environment	All components
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARN, ERROR)	All components

Example:

export ENVIRONMENT="production"
export LOG_LEVEL="WARN"

Environment-Specific Configurations

Development Environment

# .envrc for development
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export AGENTHUB_GRPC_PORT=":50051"

# Health ports
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"

# Observability (local stack)
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"

# Service metadata
export SERVICE_NAME="agenthub-dev"
export SERVICE_VERSION="dev"
export ENVIRONMENT="development"
export LOG_LEVEL="DEBUG"

Staging Environment

# .envrc for staging
export AGENTHUB_BROKER_ADDR="staging-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"

# Health ports (non-conflicting)
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"

# Observability (staging stack)
export JAEGER_ENDPOINT="http://staging-jaeger.example.com:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"

# Service metadata
export SERVICE_NAME="agenthub-staging"
export SERVICE_VERSION="1.2.0-rc1"
export ENVIRONMENT="staging"
export LOG_LEVEL="INFO"

Production Environment

# .envrc for production
export AGENTHUB_BROKER_ADDR="prod-broker.example.com"
export AGENTHUB_BROKER_PORT="50051"

# Health ports
export BROKER_HEALTH_PORT="8080"
export PUBLISHER_HEALTH_PORT="8081"
export SUBSCRIBER_HEALTH_PORT="8082"

# Observability (production stack)
export JAEGER_ENDPOINT="http://jaeger.prod.example.com:14268/api/traces"
export PROMETHEUS_PORT="9090"
export GRAFANA_PORT="3333"
export ALERTMANAGER_PORT="9093"

# Service metadata
export SERVICE_NAME="agenthub-prod"
export SERVICE_VERSION="1.2.0"
export ENVIRONMENT="production"
export LOG_LEVEL="WARN"

Configuration Loading

Automatic Loading by Unified Abstractions

The unified abstractions automatically load environment variables:

// Automatic configuration loading
config := agenthub.NewGRPCConfig("my-component")

// Results in:
// config.BrokerAddr = "localhost:50051" (AGENTHUB_BROKER_ADDR + AGENTHUB_BROKER_PORT)
// config.ServerAddr = ":50051" (AGENTHUB_GRPC_PORT)
// config.HealthPort = "8080" (BROKER_HEALTH_PORT)
// config.ComponentName = "my-component" (from parameter)

Using direnv (Recommended)

Install direnv: https://direnv.net/docs/installation.html

Create .envrc file:

# Create .envrc in project root
cat > .envrc << 'EOF'
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export SERVICE_NAME="my-agenthub-app"
EOF

Allow direnv:
```
direnv allow
```
Automatic loading: Variables load automatically when entering directory

Manual Loading

# Source variables manually
source .envrc

# Or set individually
export AGENTHUB_BROKER_ADDR="localhost"
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"

Configuration Validation

Required Variables

Minimal configuration (all have defaults):

No variables are strictly required
Defaults work for local development

Production recommendations:

Set JAEGER_ENDPOINT for tracing
Set SERVICE_NAME for identification
Set ENVIRONMENT to “production”
Configure unique health ports if running multiple services

Configuration Verification

Check loaded configuration:

config := agenthub.NewGRPCConfig("test")
fmt.Printf("Broker: %s\n", config.BrokerAddr)
fmt.Printf("Health: %s\n", config.HealthPort)
fmt.Printf("Component: %s\n", config.ComponentName)

Verify health endpoints:

# Check if configuration is working
curl http://localhost:8080/health
curl http://localhost:8081/health  # Publisher
curl http://localhost:8082/health  # Subscriber

Verify tracing:

Open Jaeger UI: http://localhost:16686
Look for traces from your service name
Check spans are being created

Common Patterns

Docker Compose

# docker-compose.yml
version: '3.8'
services:
  broker:
    build: .
    command: go run broker/main.go
    environment:
      - AGENTHUB_GRPC_PORT=:50051
      - BROKER_HEALTH_PORT=8080
      - JAEGER_ENDPOINT=http://jaeger:14268/api/traces
      - SERVICE_NAME=agenthub-broker
    ports:
      - "50051:50051"
      - "8080:8080"

  publisher:
    build: .
    command: go run agents/publisher/main.go
    environment:
      - AGENTHUB_BROKER_ADDR=broker
      - AGENTHUB_BROKER_PORT=50051
      - PUBLISHER_HEALTH_PORT=8081
      - JAEGER_ENDPOINT=http://jaeger:14268/api/traces
      - SERVICE_NAME=agenthub-publisher
    ports:
      - "8081:8081"

Kubernetes ConfigMap

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: agenthub-config
data:
  AGENTHUB_BROKER_ADDR: "agenthub-broker.default.svc.cluster.local"
  AGENTHUB_BROKER_PORT: "50051"
  JAEGER_ENDPOINT: "http://jaeger.observability.svc.cluster.local:14268/api/traces"
  SERVICE_NAME: "agenthub-k8s"
  SERVICE_VERSION: "1.0.0"
  ENVIRONMENT: "production"
  LOG_LEVEL: "INFO"

---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agenthub-publisher
spec:
  template:
    spec:
      containers:
      - name: publisher
        image: agenthub:latest
        envFrom:
        - configMapRef:
            name: agenthub-config
        env:
        - name: PUBLISHER_HEALTH_PORT
          value: "8080"

Troubleshooting

Common Issues

Problem	Solution
Agent can’t connect to broker	Check `AGENTHUB_BROKER_ADDR` and `AGENTHUB_BROKER_PORT`
Health endpoint not accessible	Verify `*_HEALTH_PORT` variables and port availability
No traces in Jaeger	Set `JAEGER_ENDPOINT` and ensure Jaeger is running
Port conflicts	Use different ports for each component’s health endpoints
Configuration not loading	Ensure variables are exported, check with `printenv`

Debug Configuration

Check environment variables:

# List all AgentHub variables
printenv | grep AGENTHUB

# List all observability variables
printenv | grep -E "(JAEGER|SERVICE|PROMETHEUS|GRAFANA)"

# Check specific variable
echo $AGENTHUB_BROKER_ADDR

Test configuration:

# Quick test with temporary override
AGENTHUB_BROKER_ADDR=test-broker go run agents/publisher/main.go

# Verify health endpoint responds
curl -f http://localhost:8080/health || echo "Health check failed"

Configuration Precedence

Environment variables (highest priority)
Default values (from code)

Example: If AGENTHUB_BROKER_ADDR is set, it overrides the default “localhost”

This environment variable reference provides comprehensive documentation for configuring AgentHub using the unified abstraction system. For practical usage examples, see the Installation and Setup Tutorial and Configuration Reference.

3.1.2 - Configuration Reference

Comprehensive reference for configuring AgentHub components using the unified abstraction library with environment-based configuration.

Configuration Reference

This document provides comprehensive reference for configuring AgentHub components using the unified abstraction library with environment-based configuration.

Unified Abstraction Configuration

AgentHub uses environment variables for all configuration with the unified abstraction library providing automatic configuration setup.

Core Environment Variables

gRPC Connection Configuration

Variable	Default	Description
`AGENTHUB_BROKER_ADDR`	`localhost`	Broker server hostname or IP address
`AGENTHUB_BROKER_PORT`	`50051`	Broker gRPC port number
`AGENTHUB_GRPC_PORT`	`:50051`	Server listen address (for broker)

Note: The unified abstraction automatically combines AGENTHUB_BROKER_ADDR and AGENTHUB_BROKER_PORT into a complete broker address (e.g., localhost:50051).

Health Monitoring Configuration

Variable	Default	Description
`BROKER_HEALTH_PORT`	`8080`	Broker health check endpoint port
`PUBLISHER_HEALTH_PORT`	`8081`	Publisher health check endpoint port
`SUBSCRIBER_HEALTH_PORT`	`8082`	Subscriber health check endpoint port

Observability Configuration

Variable	Default	Description
`JAEGER_ENDPOINT`	`127.0.0.1:4317`	Jaeger OTLP endpoint for distributed tracing
`PROMETHEUS_PORT`	`9090`	Prometheus metrics collection port
`GRAFANA_PORT`	`3333`	Grafana dashboard web interface port
`ALERTMANAGER_PORT`	`9093`	AlertManager web interface port
`OTLP_GRPC_PORT`	`4320`	OpenTelemetry Collector gRPC port
`OTLP_HTTP_PORT`	`4321`	OpenTelemetry Collector HTTP port

Service Metadata

Variable	Default	Description
`SERVICE_VERSION`	`1.0.0`	Service version for telemetry and observability
`ENVIRONMENT`	`development`	Deployment environment (development, staging, production)

A2A Protocol Configuration

Variable	Default	Description
`AGENTHUB_MESSAGE_BUFFER_SIZE`	`100`	Buffer size for A2A message processing
`AGENTHUB_TASK_UPDATE_INTERVAL`	`1s`	Interval for publishing task status updates
`AGENTHUB_ARTIFACT_MAX_SIZE`	`10MB`	Maximum size for task artifacts
`AGENTHUB_CONTEXT_TIMEOUT`	`30s`	Timeout for A2A message context
`AGENTHUB_A2A_PROTOCOL_VERSION`	`1.0`	A2A protocol version for compatibility
`AGENTHUB_MESSAGE_HISTORY_LIMIT`	`50`	Maximum message history per task

Unified Abstraction Usage

Using Configuration with the Unified Abstraction

The unified abstraction library automatically loads configuration from environment variables:

// Create configuration from environment variables
config := agenthub.NewGRPCConfig("my-component")

// Configuration is automatically populated:
// - config.BrokerAddr: "localhost:50051" (combined from AGENTHUB_BROKER_ADDR + AGENTHUB_BROKER_PORT)
// - config.ServerAddr: ":50051" (from AGENTHUB_GRPC_PORT)
// - config.HealthPort: "8080" (from BROKER_HEALTH_PORT)
// - config.ComponentName: "my-component" (from parameter)

Environment Variable Loading

The recommended way to load environment variables:

Option 1: Using direnv (recommended)

# Place variables in .envrc file
direnv allow

Option 2: Source manually

source .envrc

Option 3: Set individual variables

export AGENTHUB_BROKER_ADDR=localhost
export AGENTHUB_BROKER_PORT=50051
export JAEGER_ENDPOINT=127.0.0.1:4317

Configuration Override Examples

You can override defaults by setting environment variables before running:

# Use different broker address
export AGENTHUB_BROKER_ADDR=remote-broker.example.com
export AGENTHUB_BROKER_PORT=9090
go run broker/main.go

# Use different health ports to avoid conflicts
export BROKER_HEALTH_PORT=8083
export PUBLISHER_HEALTH_PORT=8084
export SUBSCRIBER_HEALTH_PORT=8085
go run agents/publisher/main.go

# Use custom observability endpoints
export JAEGER_ENDPOINT=jaeger.example.com:4317
export PROMETHEUS_PORT=9091
go run broker/main.go

Configuration Best Practices

Use .envrc for Development: Keep all environment variables in .envrc for consistent development experience
Override Selectively: Only override specific variables when needed, use defaults otherwise
Environment-Specific Configs: Use different variable values for development, staging, and production
Health Port Management: Use different health ports for each component to avoid conflicts
Observability Integration: Always configure observability endpoints for production deployments

Legacy Configuration Migration

If migrating from previous versions of AgentHub:

Old Configuration Pattern:

// Manual server setup (deprecated)
lis, err := net.Listen("tcp", ":50051")
server := grpc.NewServer()
// ... extensive setup code

New Unified Abstraction Pattern:

// Automatic configuration from environment
config := agenthub.NewGRPCConfig("broker")
server, err := agenthub.NewAgentHubServer(config)
service := agenthub.NewAgentHubService(server)
pb.RegisterAgentHubServer(server.Server, service)
server.Start(ctx)

Command-Line Usage

Basic Commands

The unified abstraction provides simplified command execution:

agenthub-server [OPTIONS]

Options:
  -port int
        Server port (default 50051)
  -host string
        Server host (default "0.0.0.0")
  -config string
        Configuration file path
  -log-level string
        Log level: debug, info, warn, error (default "info")
  -log-file string
        Log file path (default: stdout)
  -max-connections int
        Maximum concurrent connections (default 1000)
  -channel-buffer-size int
        Channel buffer size (default 10)
  -help
        Show help message
  -version
        Show version information

Configuration File

The broker can also be configured using a YAML configuration file:

# agenthub.yaml
server:
  host: "0.0.0.0"
  port: 50051
  max_connections: 1000
  timeout: "30s"

logging:
  level: "info"
  format: "json"
  file: "/var/log/agenthub/broker.log"

performance:
  channel_buffer_size: 10
  max_message_size: "4MB"
  keepalive_time: "30s"
  keepalive_timeout: "5s"

limits:
  max_agents: 10000
  max_tasks_per_agent: 100
  memory_limit: "1GB"

security:
  tls_enabled: false
  cert_file: ""
  key_file: ""
  ca_file: ""

Loading Configuration:

agenthub-server -config /path/to/agenthub.yaml

Agent Configuration

Environment Variables

Agents can be configured using environment variables:

Connection Configuration

Variable	Default	Description
`AGENTHUB_BROKER_ADDRESS`	`localhost:50051`	Broker server address
`AGENTHUB_AGENT_ID`	Generated	Unique agent identifier
`AGENTHUB_CONNECTION_TIMEOUT`	`10s`	Connection timeout
`AGENTHUB_RETRY_ATTEMPTS`	`3`	Connection retry attempts
`AGENTHUB_RETRY_DELAY`	`1s`	Delay between retries

Task Processing Configuration

Variable	Default	Description
`AGENTHUB_MAX_CONCURRENT_TASKS`	`5`	Maximum concurrent task processing
`AGENTHUB_TASK_TIMEOUT`	`300s`	Default task timeout
`AGENTHUB_PROGRESS_INTERVAL`	`5s`	Progress reporting interval
`AGENTHUB_TASK_TYPES`	`""`	Comma-separated list of supported task types

Logging Configuration

Variable	Default	Description
`AGENTHUB_AGENT_LOG_LEVEL`	`info`	Agent logging level
`AGENTHUB_AGENT_LOG_FORMAT`	`text`	Agent log format
`AGENTHUB_AGENT_LOG_FILE`	`""`	Agent log file path

Agent Configuration Examples

Publisher Configuration

package main

import (
    "os"
    "strconv"
    "time"
)

type PublisherConfig struct {
    BrokerAddress    string
    AgentID          string
    ConnectionTimeout time.Duration
    RetryAttempts    int
    RetryDelay       time.Duration
    LogLevel         string
}

func LoadPublisherConfig() *PublisherConfig {
    config := &PublisherConfig{
        BrokerAddress:    getEnv("AGENTHUB_BROKER_ADDRESS", "localhost:50051"),
        AgentID:          getEnv("AGENTHUB_AGENT_ID", generateAgentID()),
        ConnectionTimeout: getDuration("AGENTHUB_CONNECTION_TIMEOUT", "10s"),
        RetryAttempts:    getInt("AGENTHUB_RETRY_ATTEMPTS", 3),
        RetryDelay:       getDuration("AGENTHUB_RETRY_DELAY", "1s"),
        LogLevel:         getEnv("AGENTHUB_AGENT_LOG_LEVEL", "info"),
    }

    return config
}

func getEnv(key, defaultValue string) string {
    if value := os.Getenv(key); value != "" {
        return value
    }
    return defaultValue
}

func getInt(key string, defaultValue int) int {
    if value := os.Getenv(key); value != "" {
        if i, err := strconv.Atoi(value); err == nil {
            return i
        }
    }
    return defaultValue
}

func getDuration(key string, defaultValue string) time.Duration {
    if value := os.Getenv(key); value != "" {
        if d, err := time.ParseDuration(value); err == nil {
            return d
        }
    }
    d, _ := time.ParseDuration(defaultValue)
    return d
}

Subscriber Configuration

type SubscriberConfig struct {
    BrokerAddress      string
    AgentID            string
    MaxConcurrentTasks int
    TaskTimeout        time.Duration
    ProgressInterval   time.Duration
    SupportedTaskTypes []string
    LogLevel           string
}

func LoadSubscriberConfig() *SubscriberConfig {
    taskTypesStr := getEnv("AGENTHUB_TASK_TYPES", "")
    var taskTypes []string
    if taskTypesStr != "" {
        taskTypes = strings.Split(taskTypesStr, ",")
        for i, taskType := range taskTypes {
            taskTypes[i] = strings.TrimSpace(taskType)
        }
    }

    config := &SubscriberConfig{
        BrokerAddress:      getEnv("AGENTHUB_BROKER_ADDRESS", "localhost:50051"),
        AgentID:            getEnv("AGENTHUB_AGENT_ID", generateAgentID()),
        MaxConcurrentTasks: getInt("AGENTHUB_MAX_CONCURRENT_TASKS", 5),
        TaskTimeout:        getDuration("AGENTHUB_TASK_TIMEOUT", "300s"),
        ProgressInterval:   getDuration("AGENTHUB_PROGRESS_INTERVAL", "5s"),
        SupportedTaskTypes: taskTypes,
        LogLevel:           getEnv("AGENTHUB_AGENT_LOG_LEVEL", "info"),
    }

    return config
}

Agent Configuration File

Agents can also use configuration files:

# agent.yaml
agent:
  id: "data_processor_001"
  broker_address: "broker.example.com:50051"
  connection_timeout: "10s"
  retry_attempts: 3
  retry_delay: "1s"

task_processing:
  max_concurrent_tasks: 5
  task_timeout: "300s"
  progress_interval: "5s"
  supported_task_types:
    - "data_analysis"
    - "data_transformation"
    - "data_validation"

logging:
  level: "info"
  format: "json"
  file: "/var/log/agenthub/agent.log"

health:
  port: 8080
  endpoint: "/health"
  check_interval: "30s"

Security Configuration

TLS Configuration

Broker TLS Setup

# broker configuration
security:
  tls_enabled: true
  cert_file: "/etc/agenthub/certs/server.crt"
  key_file: "/etc/agenthub/certs/server.key"
  ca_file: "/etc/agenthub/certs/ca.crt"
  client_auth: "require_and_verify"

Agent TLS Setup

// Agent TLS connection
func createTLSConnection(address string) (*grpc.ClientConn, error) {
    config := &tls.Config{
        ServerName: "agenthub-broker",
        // Load client certificates if needed
    }

    creds := credentials.NewTLS(config)

    conn, err := grpc.Dial(address, grpc.WithTransportCredentials(creds))
    if err != nil {
        return nil, fmt.Errorf("failed to connect with TLS: %v", err)
    }

    return conn, nil
}

Authentication Configuration

JWT Authentication

# broker configuration
security:
  auth_enabled: true
  auth_method: "jwt"
  jwt_secret: "your-secret-key"
  jwt_issuer: "agenthub-broker"
  jwt_expiry: "24h"

// Agent authentication
type AuthenticatedAgent struct {
    client   pb.AgentHubClient
    token    string
    agentID  string
}

func (a *AuthenticatedAgent) authenticate() error {
    // Add authentication token to context
    ctx := metadata.AppendToOutgoingContext(context.Background(),
        "authorization", "Bearer "+a.token)

    // Use authenticated context for A2A requests
    _, err := a.client.PublishMessage(ctx, request)
    return err
}

Production Configuration Examples

High-Performance Broker Configuration

# production-broker.yaml
server:
  host: "0.0.0.0"
  port: 50051
  max_connections: 5000
  timeout: "60s"

performance:
  channel_buffer_size: 50
  max_message_size: "16MB"
  keepalive_time: "10s"
  keepalive_timeout: "3s"

limits:
  max_agents: 50000
  max_tasks_per_agent: 500
  memory_limit: "8GB"

logging:
  level: "warn"
  format: "json"
  file: "/var/log/agenthub/broker.log"

security:
  tls_enabled: true
  cert_file: "/etc/ssl/certs/agenthub.crt"
  key_file: "/etc/ssl/private/agenthub.key"

Cluster Agent Configuration

# cluster-agent.yaml
agent:
  id: "${HOSTNAME}_${POD_ID}"
  broker_address: "agenthub-broker.agenthub.svc.cluster.local:50051"
  connection_timeout: "15s"
  retry_attempts: 5
  retry_delay: "2s"

task_processing:
  max_concurrent_tasks: 10
  task_timeout: "1800s"  # 30 minutes
  progress_interval: "10s"

logging:
  level: "info"
  format: "json"
  file: "stdout"

health:
  port: 8080
  endpoint: "/health"
  check_interval: "30s"

metrics:
  enabled: true
  port: 9090
  endpoint: "/metrics"

Environment-Specific Configurations

Development Environment

# .env.development
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=debug
AGENTHUB_LOG_FORMAT=text
AGENTHUB_MAX_CONNECTIONS=100
AGENTHUB_CHANNEL_BUFFER_SIZE=5

# Agent settings
AGENTHUB_BROKER_ADDRESS=localhost:50051
AGENTHUB_MAX_CONCURRENT_TASKS=2
AGENTHUB_TASK_TIMEOUT=60s
AGENTHUB_AGENT_LOG_LEVEL=debug

Staging Environment

# .env.staging
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=info
AGENTHUB_LOG_FORMAT=json
AGENTHUB_MAX_CONNECTIONS=1000
AGENTHUB_CHANNEL_BUFFER_SIZE=20

# Security
AGENTHUB_TLS_ENABLED=true
AGENTHUB_CERT_FILE=/etc/certs/staging.crt
AGENTHUB_KEY_FILE=/etc/certs/staging.key

# Agent settings
AGENTHUB_BROKER_ADDRESS=staging-broker.example.com:50051
AGENTHUB_MAX_CONCURRENT_TASKS=5
AGENTHUB_TASK_TIMEOUT=300s

Production Environment

# .env.production
AGENTHUB_PORT=50051
AGENTHUB_LOG_LEVEL=warn
AGENTHUB_LOG_FORMAT=json
AGENTHUB_LOG_FILE=/var/log/agenthub/broker.log
AGENTHUB_MAX_CONNECTIONS=5000
AGENTHUB_CHANNEL_BUFFER_SIZE=50

# Security
AGENTHUB_TLS_ENABLED=true
AGENTHUB_CERT_FILE=/etc/ssl/certs/agenthub.crt
AGENTHUB_KEY_FILE=/etc/ssl/private/agenthub.key
AGENTHUB_CA_FILE=/etc/ssl/certs/ca.crt

# Performance
AGENTHUB_MAX_MESSAGE_SIZE=16MB
AGENTHUB_KEEPALIVE_TIME=10s
AGENTHUB_MEMORY_LIMIT=8GB

# Agent settings
AGENTHUB_BROKER_ADDRESS=agenthub-prod.example.com:50051
AGENTHUB_MAX_CONCURRENT_TASKS=10
AGENTHUB_TASK_TIMEOUT=1800s
AGENTHUB_CONNECTION_TIMEOUT=15s
AGENTHUB_RETRY_ATTEMPTS=5

Configuration Validation

Broker Configuration Validation

type BrokerConfig struct {
    Port             int           `yaml:"port" validate:"min=1,max=65535"`
    Host             string        `yaml:"host" validate:"required"`
    MaxConnections   int           `yaml:"max_connections" validate:"min=1"`
    Timeout          time.Duration `yaml:"timeout" validate:"min=1s"`
    ChannelBufferSize int          `yaml:"channel_buffer_size" validate:"min=1"`
}

func (c *BrokerConfig) Validate() error {
    validate := validator.New()
    return validate.Struct(c)
}

Agent Configuration Validation

type AgentConfig struct {
    BrokerAddress      string        `yaml:"broker_address" validate:"required"`
    AgentID            string        `yaml:"agent_id" validate:"required,min=1,max=64"`
    MaxConcurrentTasks int           `yaml:"max_concurrent_tasks" validate:"min=1,max=100"`
    TaskTimeout        time.Duration `yaml:"task_timeout" validate:"min=1s"`
}

func (c *AgentConfig) Validate() error {
    validate := validator.New()
    if err := validate.Struct(c); err != nil {
        return err
    }

    // Custom validation
    if !strings.Contains(c.BrokerAddress, ":") {
        return errors.New("broker_address must include port")
    }

    return nil
}

This comprehensive configuration reference covers all aspects of configuring AgentHub for different environments and use cases.

3.2 - API Reference

Complete API documentation and specifications

API Reference Documentation

This section contains comprehensive API documentation for all AgentHub interfaces, including gRPC APIs, unified abstractions, and tracing interfaces.

Available Documentation

API Reference - Complete gRPC API documentation
Unified Abstraction API - High-level abstraction library reference
Tracing API - OpenTelemetry integration reference

3.2.1 - A2A-Compliant AgentHub API Reference

Complete technical reference for the A2A-compliant AgentHub API, including all gRPC services, message types, and operational details.

A2A-Compliant AgentHub API Reference

This document provides complete technical reference for the Agent2Agent (A2A) protocol-compliant AgentHub API, including all gRPC services, message types, and operational details.

gRPC Service Definition

The AgentHub broker implements the AgentHub service as defined in proto/eventbus.proto:

service AgentHub {
  // ===== A2A Message Publishing (EDA style) =====

  // PublishMessage submits an A2A message for delivery through the broker
  rpc PublishMessage(PublishMessageRequest) returns (PublishResponse);

  // PublishTaskUpdate notifies subscribers about A2A task state changes
  rpc PublishTaskUpdate(PublishTaskUpdateRequest) returns (PublishResponse);

  // PublishTaskArtifact delivers A2A task output artifacts to subscribers
  rpc PublishTaskArtifact(PublishTaskArtifactRequest) returns (PublishResponse);

  // ===== A2A Event Subscriptions (EDA style) =====

  // SubscribeToMessages creates a stream of A2A message events for an agent
  rpc SubscribeToMessages(SubscribeToMessagesRequest) returns (stream AgentEvent);

  // SubscribeToTasks creates a stream of A2A task events for an agent
  rpc SubscribeToTasks(SubscribeToTasksRequest) returns (stream AgentEvent);

  // SubscribeToAgentEvents creates a unified stream of all events for an agent
  rpc SubscribeToAgentEvents(SubscribeToAgentEventsRequest) returns (stream AgentEvent);

  // ===== A2A Task Management (compatible with A2A spec) =====

  // GetTask retrieves the current state of an A2A task by ID
  rpc GetTask(GetTaskRequest) returns (a2a.Task);

  // CancelTask cancels an active A2A task and notifies subscribers
  rpc CancelTask(CancelTaskRequest) returns (a2a.Task);

  // ListTasks returns A2A tasks matching the specified criteria
  rpc ListTasks(ListTasksRequest) returns (ListTasksResponse);

  // ===== Agent Discovery (A2A compatible) =====

  // GetAgentCard returns the broker's A2A agent card for discovery
  rpc GetAgentCard(google.protobuf.Empty) returns (a2a.AgentCard);

  // RegisterAgent registers an agent with the broker for event routing
  rpc RegisterAgent(RegisterAgentRequest) returns (RegisterAgentResponse);
}

A2A Message Types

Core A2A Types

A2A Message

Represents an A2A-compliant message for agent communication.

message Message {
  string message_id = 1;       // Required: Unique message identifier
  string context_id = 2;       // Optional: Conversation context
  string task_id = 3;          // Optional: Associated task
  Role role = 4;               // Required: USER or AGENT
  repeated Part content = 5;   // Required: Message content parts
  google.protobuf.Struct metadata = 6; // Optional: Additional metadata
  repeated string extensions = 7;       // Optional: Protocol extensions
}

Field Details:

message_id: Must be unique across all messages. Generated automatically if not provided
context_id: Groups related messages in a conversation or workflow
task_id: Links message to a specific A2A task
role: Indicates whether message is from USER (requesting agent) or AGENT (responding agent)
content: Array of A2A Part structures containing the actual message content
metadata: Additional context for routing, processing, or debugging
extensions: Protocol extension identifiers for future compatibility

A2A Part

Represents content within an A2A message.

message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}

message DataPart {
  google.protobuf.Struct data = 1;    // Structured data content
  string description = 2;             // Optional data description
}

message FilePart {
  string file_id = 1;                 // File identifier or URI
  string filename = 2;                // Original filename
  string mime_type = 3;               // MIME type
  int64 size_bytes = 4;               // File size in bytes
  google.protobuf.Struct metadata = 5; // Additional file metadata
}

A2A Task

Represents an A2A-compliant task with lifecycle management.

message Task {
  string id = 1;                    // Required: Task identifier
  string context_id = 2;            // Optional: Conversation context
  TaskStatus status = 3;            // Required: Current task status
  repeated Message history = 4;     // Message history for this task
  repeated Artifact artifacts = 5;  // Task output artifacts
  google.protobuf.Struct metadata = 6; // Task metadata
}

message TaskStatus {
  TaskState state = 1;              // Current task state
  Message update = 2;               // Status update message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}

enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created and submitted
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed with error
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}

A2A Artifact

Represents structured output from completed tasks.

message Artifact {
  string artifact_id = 1;           // Required: Artifact identifier
  string name = 2;                  // Human-readable name
  string description = 3;           // Artifact description
  repeated Part parts = 4;          // Artifact content parts
  google.protobuf.Struct metadata = 5; // Artifact metadata
}

EDA Event Wrapper Types

AgentEvent

Wraps A2A messages for Event-Driven Architecture transport.

message AgentEvent {
  string event_id = 1;                     // Unique event identifier
  google.protobuf.Timestamp timestamp = 2; // Event timestamp

  // A2A-compliant payload
  oneof payload {
    a2a.Message message = 10;              // A2A Message
    a2a.Task task = 11;                    // A2A Task
    TaskStatusUpdateEvent status_update = 12; // Task status change
    TaskArtifactUpdateEvent artifact_update = 13; // Artifact update
  }

  // EDA routing metadata
  AgentEventMetadata routing = 20;

  // Observability context
  string trace_id = 30;
  string span_id = 31;
}

AgentEventMetadata

Provides routing and delivery information for events.

message AgentEventMetadata {
  string from_agent_id = 1;               // Source agent identifier
  string to_agent_id = 2;                 // Target agent ID (empty = broadcast)
  string event_type = 3;                  // Event classification
  repeated string subscriptions = 4;      // Topic-based routing tags
  Priority priority = 5;                  // Delivery priority
}

Request/Response Messages

PublishMessageRequest

message PublishMessageRequest {
  a2a.Message message = 1;                // A2A message to publish
  AgentEventMetadata routing = 2;         // EDA routing info
}

SubscribeToTasksRequest

message SubscribeToTasksRequest {
  string agent_id = 1;                    // Agent ID for subscription
  repeated string task_types = 2;         // Optional task type filter
  repeated a2a.TaskState states = 3;      // Optional state filter
}

GetTaskRequest

message GetTaskRequest {
  string task_id = 1;                     // Task identifier
  int32 history_length = 2;               // History limit (optional)
}

API Operations

Publishing A2A Messages

PublishMessage

Publishes an A2A message for delivery through the EDA broker.

Go Example:

// Create A2A message content
content := []*pb.Part{
    {
        Part: &pb.Part_Text{
            Text: "Hello! Please process this request.",
        },
    },
    {
        Part: &pb.Part_Data{
            Data: &pb.DataPart{
                Data: &structpb.Struct{
                    Fields: map[string]*structpb.Value{
                        "operation": structpb.NewStringValue("process_data"),
                        "dataset_id": structpb.NewStringValue("dataset_123"),
                    },
                },
            },
        },
    },
}

// Create A2A message
message := &pb.Message{
    MessageId: "msg_12345",
    ContextId: "conversation_abc",
    TaskId:    "task_67890",
    Role:      pb.Role_ROLE_USER,
    Content:   content,
    Metadata: &structpb.Struct{
        Fields: map[string]*structpb.Value{
            "priority": structpb.NewStringValue("high"),
        },
    },
}

// Publish through AgentHub
response, err := client.PublishMessage(ctx, &pb.PublishMessageRequest{
    Message: message,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "requester_agent",
        ToAgentId:   "processor_agent",
        EventType:   "task_message",
        Priority:    pb.Priority_PRIORITY_HIGH,
    },
})

Subscribing to A2A Events

SubscribeToTasks

Creates a stream of A2A task events for an agent.

Go Example:

req := &pb.SubscribeToTasksRequest{
    AgentId: "processor_agent",
    TaskTypes: []string{"data_processing", "image_analysis"}, // Optional filter
}

stream, err := client.SubscribeToTasks(ctx, req)
if err != nil {
    return err
}

for {
    event, err := stream.Recv()
    if err == io.EOF {
        break
    }
    if err != nil {
        return err
    }

    // Process different event types
    switch payload := event.GetPayload().(type) {
    case *pb.AgentEvent_Task:
        task := payload.Task
        log.Printf("Received A2A task: %s", task.GetId())

        // Process task using A2A handler
        artifact, status, errorMsg := processA2ATask(ctx, task)

        // Publish completion
        publishTaskCompletion(ctx, client, task, artifact, status, errorMsg)

    case *pb.AgentEvent_StatusUpdate:
        update := payload.StatusUpdate
        log.Printf("Task %s status: %s", update.GetTaskId(), update.GetStatus().GetState())

    case *pb.AgentEvent_ArtifactUpdate:
        artifact := payload.ArtifactUpdate
        log.Printf("Received artifact for task %s", artifact.GetTaskId())
    }
}

A2A Task Management

GetTask

Retrieves the current state of an A2A task.

Go Example:

req := &pb.GetTaskRequest{
    TaskId: "task_67890",
    HistoryLength: 10, // Optional: limit message history
}

task, err := client.GetTask(ctx, req)
if err != nil {
    return err
}

log.Printf("Task %s status: %s", task.GetId(), task.GetStatus().GetState())
log.Printf("Message history: %d messages", len(task.GetHistory()))
log.Printf("Artifacts: %d artifacts", len(task.GetArtifacts()))

CancelTask

Cancels an active A2A task.

Go Example:

req := &pb.CancelTaskRequest{
    TaskId: "task_67890",
    Reason: "User requested cancellation",
}

task, err := client.CancelTask(ctx, req)
if err != nil {
    return err
}

log.Printf("Task %s cancelled", task.GetId())

Agent Discovery

GetAgentCard

Returns the broker’s A2A agent card for discovery.

Go Example:

card, err := client.GetAgentCard(ctx, &emptypb.Empty{})
if err != nil {
    return err
}

log.Printf("AgentHub broker: %s v%s", card.GetName(), card.GetVersion())
log.Printf("Protocol version: %s", card.GetProtocolVersion())
log.Printf("Capabilities: streaming=%v", card.GetCapabilities().GetStreaming())

for _, skill := range card.GetSkills() {
    log.Printf("Skill: %s - %s", skill.GetName(), skill.GetDescription())
}

RegisterAgent

Registers an agent with the broker.

Go Example:

agentCard := &pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:           "my-processor-agent",
    Description:    "Data processing agent with A2A compliance",
    Version:        "1.0.0",
    Capabilities: &pb.AgentCapabilities{
        Streaming: true,
    },
    Skills: []*pb.AgentSkill{
        {
            Id:          "data_processing",
            Name:        "Data Processing",
            Description: "Process structured datasets",
            Tags:        []string{"data", "analysis"},
        },
    },
}

response, err := client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
    AgentCard: agentCard,
    Subscriptions: []string{"data_processing", "analytics"},
})

if response.GetSuccess() {
    log.Printf("Agent registered with ID: %s", response.GetAgentId())
} else {
    log.Printf("Registration failed: %s", response.GetError())
}

High-Level A2A Client Abstractions

A2ATaskPublisher

Simplified interface for publishing A2A tasks.

taskPublisher := &agenthub.A2ATaskPublisher{
    Client:         client,
    TraceManager:   traceManager,
    MetricsManager: metricsManager,
    Logger:         logger,
    ComponentName:  "my-publisher",
    AgentID:        "my-agent-id",
}

task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
    TaskType:         "data_analysis",
    Content:          contentParts,
    RequesterAgentID: "my-agent-id",
    ResponderAgentID: "data-processor",
    Priority:         pb.Priority_PRIORITY_MEDIUM,
    ContextID:        "analysis-session-123",
})

A2ATaskSubscriber

Simplified interface for processing A2A tasks.

taskSubscriber := agenthub.NewA2ATaskSubscriber(client, "my-agent-id")

// Register task handlers
taskSubscriber.RegisterTaskHandler("data_analysis", func(ctx context.Context, task *pb.Task, message *pb.Message) (*pb.Artifact, pb.TaskState, string) {
    // Process the A2A task
    result := processDataAnalysis(task, message)

    // Return A2A artifact
    artifact := &pb.Artifact{
        ArtifactId:  fmt.Sprintf("result_%s", task.GetId()),
        Name:        "analysis_result",
        Description: "Data analysis results",
        Parts: []*pb.Part{
            {
                Part: &pb.Part_Data{
                    Data: &pb.DataPart{
                        Data: result,
                    },
                },
            },
        },
    }

    return artifact, pb.TaskState_TASK_STATE_COMPLETED, ""
})

// Start processing A2A tasks
err := taskSubscriber.SubscribeToTasks(ctx)

Error Handling

gRPC Status Codes

AgentHub uses standard gRPC status codes:

InvalidArgument (Code: 3)

Missing required fields (message_id, role, content)
Invalid A2A message structure
Malformed Part content

NotFound (Code: 5)

Task ID not found in GetTask/CancelTask
Agent not registered

Internal (Code: 13)

Server-side processing errors
Message routing failures
A2A validation errors

Retry Patterns

func publishWithRetry(ctx context.Context, client pb.AgentHubClient, req *pb.PublishMessageRequest) error {
    for attempt := 0; attempt < 3; attempt++ {
        _, err := client.PublishMessage(ctx, req)
        if err == nil {
            return nil
        }

        // Check if error is retryable
        if status.Code(err) == codes.InvalidArgument {
            return err // Don't retry validation errors
        }

        // Exponential backoff
        time.Sleep(time.Duration(1<<attempt) * time.Second)
    }
    return fmt.Errorf("max retries exceeded")
}

Performance Considerations

Message Size Limits

Maximum message size: 4MB (gRPC default)
Recommended size: <100KB for optimal A2A compliance
Large content: Use FilePart references for large data

A2A Best Practices

Use structured Parts: Prefer DataPart for structured data over text
Context management: Group related messages with context_id
Artifact structure: Return well-formed Artifact objects
Task lifecycle: Properly manage TaskState transitions
Connection reuse: Maintain persistent gRPC connections

This completes the comprehensive A2A-compliant API reference for AgentHub, covering all message types, operations, and integration patterns with practical examples.

3.2.2 - AgentHub Tracing API Reference

Complete API documentation for AgentHub’s OpenTelemetry tracing integration, span management, context propagation, and instrumentation patterns.

🔍 AgentHub Tracing API Reference

Technical reference: Complete API documentation for AgentHub’s OpenTelemetry tracing integration, span management, context propagation, and instrumentation patterns.

Core Components

TraceManager

The TraceManager provides high-level tracing operations for AgentHub events.

Constructor

func NewTraceManager(serviceName string) *TraceManager

Parameters:

serviceName - Name of the service creating spans

Returns: Configured TraceManager instance

Methods

StartPublishSpan

func (tm *TraceManager) StartPublishSpan(ctx context.Context, responderAgentID, eventType string) (context.Context, trace.Span)

Purpose: Creates a span for event publishing operations

Parameters:

ctx - Parent context (may contain existing trace)
responderAgentID - Target agent for the event
eventType - Type of event being published

Returns:

context.Context - New context with active span
trace.Span - The created span

Attributes Set:

event.type - Event type being published
responder.agent - Target agent ID
operation.type - “publish”

Usage:

ctx, span := tm.StartPublishSpan(ctx, "agent_subscriber", "greeting")
defer span.End()
// ... publishing logic

StartEventProcessingSpan

func (tm *TraceManager) StartEventProcessingSpan(ctx context.Context, eventID, eventType, requesterAgentID, responderAgentID string) (context.Context, trace.Span)

Purpose: Creates a span for event processing operations

Parameters:

ctx - Context with extracted trace information
eventID - Unique identifier for the event
eventType - Type of event being processed
requesterAgentID - Agent that requested processing
responderAgentID - Agent performing processing

Returns:

context.Context - Context with processing span
trace.Span - The processing span

Attributes Set:

event.id - Event identifier
event.type - Event type
requester.agent - Requesting agent ID
responder.agent - Processing agent ID
operation.type - “process”

StartBrokerSpan

func (tm *TraceManager) StartBrokerSpan(ctx context.Context, operation, eventType string) (context.Context, trace.Span)

Purpose: Creates spans for broker operations

Parameters:

ctx - Request context
operation - Broker operation (route, subscribe, unsubscribe)
eventType - Event type being handled

Returns:

context.Context - Context with broker span
trace.Span - The broker span

Attributes Set:

operation.type - Broker operation type
event.type - Event type being handled
component - “broker”

InjectTraceContext

func (tm *TraceManager) InjectTraceContext(ctx context.Context, headers map[string]string)

Purpose: Injects trace context into headers for propagation

Parameters:

ctx - Context containing trace information
headers - Map to inject headers into

Headers Injected:

traceparent - W3C trace context header
tracestate - W3C trace state header (if present)

Usage:

headers := make(map[string]string)
tm.InjectTraceContext(ctx, headers)
// headers now contain trace context for propagation

ExtractTraceContext

func (tm *TraceManager) ExtractTraceContext(ctx context.Context, headers map[string]string) context.Context

Purpose: Extracts trace context from headers

Parameters:

ctx - Base context
headers - Headers containing trace context

Returns: Context with extracted trace information

Usage:

// Extract from event metadata
if metadata := event.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := structFieldsToStringMap(traceHeaders.GetStructValue().Fields)
        ctx = tm.ExtractTraceContext(ctx, headers)
    }
}

RecordError

func (tm *TraceManager) RecordError(span trace.Span, err error)

Purpose: Records an error on a span with proper formatting

Parameters:

span - Span to record error on
err - Error to record

Effects:

Sets span status to error
Records error as span event
Adds error type attribute

SetSpanSuccess

func (tm *TraceManager) SetSpanSuccess(span trace.Span)

Purpose: Marks a span as successful

Parameters:

span - Span to mark as successful

Effects:

Sets span status to OK
Ensures span is properly completed

Context Propagation

W3C Trace Context Standards

AgentHub uses the W3C Trace Context specification for interoperability.

Trace Context Headers

traceparent

Format: 00-{trace-id}-{span-id}-{trace-flags}

00 - Version (currently always 00)
trace-id - 32-character hex string
span-id - 16-character hex string
trace-flags - 2-character hex flags

Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

tracestate

Format: Vendor-specific key-value pairs Example: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE

Propagation Implementation

Manual Injection

// Create headers map
headers := make(map[string]string)

// Inject trace context
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))

// Headers now contain trace context
// Convert to protobuf metadata if needed
metadataStruct, err := structpb.NewStruct(map[string]interface{}{
    "trace_headers": headers,
    "timestamp": time.Now().Format(time.RFC3339),
})

Manual Extraction

// Extract from protobuf metadata
if metadata := task.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := make(map[string]string)
        for k, v := range traceHeaders.GetStructValue().Fields {
            headers[k] = v.GetStringValue()
        }
        ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(headers))
    }
}

Span Lifecycle Management

Creating Spans

Basic Span Creation

tracer := otel.Tracer("my-service")
ctx, span := tracer.Start(ctx, "operation_name")
defer span.End()

Span with Attributes

ctx, span := tracer.Start(ctx, "operation_name", trace.WithAttributes(
    attribute.String("operation.type", "publish"),
    attribute.String("event.type", "greeting"),
    attribute.Int("event.priority", 1),
))
defer span.End()

Child Span Creation

// Parent span
ctx, parentSpan := tracer.Start(ctx, "parent_operation")
defer parentSpan.End()

// Child span (automatically linked)
ctx, childSpan := tracer.Start(ctx, "child_operation")
defer childSpan.End()

Span Attributes

Standard Attributes

AgentHub uses consistent attribute naming:

// Event attributes
attribute.String("event.id", taskID)
attribute.String("event.type", taskType)
attribute.Int("event.priority", priority)

// Agent attributes
attribute.String("agent.id", agentID)
attribute.String("agent.type", agentType)
attribute.String("requester.agent", requesterID)
attribute.String("responder.agent", responderID)

// Operation attributes
attribute.String("operation.type", "publish|process|route")
attribute.String("component", "broker|publisher|subscriber")

// Result attributes
attribute.Bool("success", true)
attribute.String("error.type", "validation|timeout|network")

Custom Attributes

span.SetAttributes(
    attribute.String("business.unit", "sales"),
    attribute.String("user.tenant", "acme-corp"),
    attribute.Int("batch.size", len(items)),
    attribute.Duration("timeout", 30*time.Second),
)

Span Events

Adding Events

// Simple event
span.AddEvent("validation.started")

// Event with attributes
span.AddEvent("cache.miss", trace.WithAttributes(
    attribute.String("cache.key", key),
    attribute.String("cache.type", "redis"),
))

// Event with timestamp
span.AddEvent("external.api.call", trace.WithAttributes(
    attribute.String("api.endpoint", "/v1/users"),
    attribute.Int("api.status_code", 200),
), trace.WithTimestamp(time.Now()))

Common Event Patterns

// Processing milestones
span.AddEvent("processing.started")
span.AddEvent("validation.completed")
span.AddEvent("business.logic.completed")
span.AddEvent("result.published")

// Error events
span.AddEvent("error.occurred", trace.WithAttributes(
    attribute.String("error.message", err.Error()),
    attribute.String("error.stack", debug.Stack()),
))

Span Status

Setting Status

// Success
span.SetStatus(codes.Ok, "")

// Error with message
span.SetStatus(codes.Error, "validation failed")

// Error without message
span.SetStatus(codes.Error, "")

Status Code Mapping

// gRPC codes to OpenTelemetry codes
statusCode := codes.Ok
if err != nil {
    switch {
    case errors.Is(err, context.DeadlineExceeded):
        statusCode = codes.DeadlineExceeded
    case errors.Is(err, context.Canceled):
        statusCode = codes.Cancelled
    default:
        statusCode = codes.Error
    }
}
span.SetStatus(statusCode, err.Error())

Advanced Instrumentation

Baggage Propagation

Setting Baggage

// Add baggage to context
ctx = baggage.ContextWithValues(ctx,
    baggage.String("user.id", userID),
    baggage.String("tenant.id", tenantID),
    baggage.String("request.id", requestID),
)

Reading Baggage

// Read baggage anywhere in the trace
if member := baggage.FromContext(ctx).Member("user.id"); member.Value() != "" {
    userID := member.Value()
    // Use user ID for business logic
}

Span Links

Creating Links

// Link to related span
linkedSpanContext := trace.SpanContextFromContext(relatedCtx)
ctx, span := tracer.Start(ctx, "operation", trace.WithLinks(
    trace.Link{
        SpanContext: linkedSpanContext,
        Attributes: []attribute.KeyValue{
            attribute.String("link.type", "related_operation"),
        },
    },
))

Sampling Control

Conditional Sampling

// Force sampling for important operations
ctx, span := tracer.Start(ctx, "critical_operation",
    trace.WithNewRoot(), // Start new trace
    trace.WithSpanKind(trace.SpanKindServer),
)

// Add sampling priority
span.SetAttributes(
    attribute.String("sampling.priority", "high"),
)

Integration Patterns

gRPC Integration

Server Interceptor

func TracingUnaryInterceptor(tracer trace.Tracer) grpc.UnaryServerInterceptor {
    return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
        ctx, span := tracer.Start(ctx, info.FullMethod)
        defer span.End()

        resp, err := handler(ctx, req)
        if err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, err.Error())
        }
        return resp, err
    }
}

Client Interceptor

func TracingUnaryClientInterceptor(tracer trace.Tracer) grpc.UnaryClientInterceptor {
    return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
        ctx, span := tracer.Start(ctx, method)
        defer span.End()

        err := invoker(ctx, method, req, reply, cc, opts...)
        if err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, err.Error())
        }
        return err
    }
}

HTTP Integration

HTTP Handler Wrapper

func TracingHandler(tracer trace.Tracer, next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
        ctx, span := tracer.Start(ctx, r.Method+" "+r.URL.Path)
        defer span.End()

        span.SetAttributes(
            attribute.String("http.method", r.Method),
            attribute.String("http.url", r.URL.String()),
            attribute.String("http.user_agent", r.UserAgent()),
        )

        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

Error Handling

Error Recording Best Practices

Complete Error Recording

if err != nil {
    // Record error on span
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())

    // Add error context
    span.SetAttributes(
        attribute.String("error.type", classifyError(err)),
        attribute.Bool("error.retryable", isRetryable(err)),
    )

    // Log with context
    logger.ErrorContext(ctx, "Operation failed",
        slog.Any("error", err),
        slog.String("operation", "event_processing"),
    )

    return err
}

Error Classification

func classifyError(err error) string {
    switch {
    case errors.Is(err, context.DeadlineExceeded):
        return "timeout"
    case errors.Is(err, context.Canceled):
        return "cancelled"
    case strings.Contains(err.Error(), "connection"):
        return "network"
    case strings.Contains(err.Error(), "validation"):
        return "validation"
    default:
        return "unknown"
    }
}

Performance Considerations

Span Creation Overhead

Span creation: ~1-2μs per span
Attribute setting: ~100ns per attribute
Event recording: ~200ns per event
Context propagation: ~500ns per injection/extraction

Memory Usage

Active span: ~500 bytes
Completed span buffer: ~1KB per span
Context overhead: ~100 bytes per context

Best Practices

Limit span attributes to essential information
Use batch exporters to reduce network overhead
Sample appropriately for high-throughput services
Pool span contexts where possible
Avoid deep span nesting (>10 levels)

Troubleshooting

Missing Spans Checklist

✅ OpenTelemetry properly initialized
✅ Tracer retrieved from global provider
✅ Context propagated correctly
✅ Spans properly ended
✅ Exporter configured and accessible

Common Issues

Broken Trace Chains

// ❌ Wrong - creates new root trace
ctx, span := tracer.Start(context.Background(), "operation")

// ✅ Correct - continues existing trace
ctx, span := tracer.Start(ctx, "operation")

Missing Context Propagation

// ❌ Wrong - context not propagated
go func() {
    ctx, span := tracer.Start(context.Background(), "async_work")
    // work...
}()

// ✅ Correct - context properly propagated
go func(ctx context.Context) {
    ctx, span := tracer.Start(ctx, "async_work")
    // work...
}(ctx)

🎯 Next Steps:

Implementation: Add Observability to Your Agent

Debugging: Debug with Distributed Tracing

Metrics: Observability Metrics Reference

3.2.3 - Unified Abstraction Library API Reference

The AgentHub unified abstraction library provides simplified APIs for building gRPC-based agent communication systems with built-in observability, automatic configuration, and correlation tracking.

Unified Abstraction Library API Reference

The AgentHub unified abstraction library provides simplified APIs for building gRPC-based agent communication systems with built-in observability, automatic configuration, and correlation tracking.

Package: internal/agenthub

The internal/agenthub package contains the core unified abstraction components that dramatically simplify AgentHub development by providing high-level APIs with automatic observability integration.

Overview

The unified abstraction library reduces agent implementation complexity from 380+ lines to ~29 lines by providing:

Automatic gRPC Setup: One-line server and client creation
Built-in Observability: Integrated OpenTelemetry tracing and metrics
Environment-Based Configuration: Automatic configuration from environment variables
Correlation Tracking: Automatic correlation ID generation and propagation
Pluggable Architecture: Simple task handler registration

Core Components

GRPCConfig

Configuration structure for gRPC servers and clients with environment-based initialization.

type GRPCConfig struct {
    ServerAddr    string // gRPC server listen address (e.g., ":50051")
    BrokerAddr    string // Broker connection address (e.g., "localhost:50051")
    HealthPort    string // Health check endpoint port
    ComponentName string // Component identifier for observability
}

Constructor

func NewGRPCConfig(componentName string) *GRPCConfig

Creates a new gRPC configuration with environment variable defaults:

Environment Variable	Default	Description
`AGENTHUB_BROKER_ADDR`	`localhost`	Broker server host
`AGENTHUB_BROKER_PORT`	`50051`	Broker gRPC port
`AGENTHUB_GRPC_PORT`	`:50051`	Server listen port
`BROKER_HEALTH_PORT`	`8080`	Health endpoint port

Example:

config := agenthub.NewGRPCConfig("my-agent")
// Results in BrokerAddr: "localhost:50051" (automatically combined)

AgentHubServer

High-level gRPC server wrapper with integrated observability.

type AgentHubServer struct {
    Server         *grpc.Server                    // Underlying gRPC server
    Listener       net.Listener                    // Network listener
    Observability  *observability.Observability    // OpenTelemetry integration
    TraceManager   *observability.TraceManager     // Distributed tracing
    MetricsManager *observability.MetricsManager   // Metrics collection
    HealthServer   *observability.HealthServer     // Health monitoring
    Logger         *slog.Logger                    // Structured logging
    Config         *GRPCConfig                     // Configuration
}

Constructor

func NewAgentHubServer(config *GRPCConfig) (*AgentHubServer, error)

Creates a complete gRPC server with:

OpenTelemetry instrumentation
Health check endpoints
Metrics collection
Structured logging with trace correlation

Methods

func (s *AgentHubServer) Start(ctx context.Context) error

Starts the server with automatic:

Health endpoint setup (/health, /ready, /metrics)
Metrics collection goroutine
gRPC server with observability

func (s *AgentHubServer) Shutdown(ctx context.Context) error

Gracefully shuts down all components:

gRPC server graceful stop
Health server shutdown
Observability cleanup

Example:

config := agenthub.NewGRPCConfig("broker")
server, err := agenthub.NewAgentHubServer(config)
if err != nil {
    log.Fatal(err)
}

// Register services
eventBusService := agenthub.NewEventBusService(server)
pb.RegisterEventBusServer(server.Server, eventBusService)

// Start server
if err := server.Start(ctx); err != nil {
    log.Fatal(err)
}

AgentHubClient

High-level gRPC client wrapper with integrated observability.

type AgentHubClient struct {
    Client         pb.EventBusClient               // gRPC client
    Connection     *grpc.ClientConn                // Connection
    Observability  *observability.Observability    // OpenTelemetry integration
    TraceManager   *observability.TraceManager     // Distributed tracing
    MetricsManager *observability.MetricsManager   // Metrics collection
    HealthServer   *observability.HealthServer     // Health monitoring
    Logger         *slog.Logger                    // Structured logging
    Config         *GRPCConfig                     // Configuration
}

Constructor

func NewAgentHubClient(config *GRPCConfig) (*AgentHubClient, error)

Creates a complete gRPC client with:

OpenTelemetry instrumentation
Connection health monitoring
Metrics collection
Automatic retry and timeout handling

Methods

func (c *AgentHubClient) Start(ctx context.Context) error

Initializes client with health monitoring and metrics collection.

func (c *AgentHubClient) Shutdown(ctx context.Context) error

Gracefully closes connection and cleans up resources.

Example:

config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    log.Fatal(err)
}

err = client.Start(ctx)
if err != nil {
    log.Fatal(err)
}

// Use client.Client for gRPC calls

Service Abstractions

EventBusService

Broker service implementation with built-in observability and correlation tracking.

type EventBusService struct {
    Server          *AgentHubServer
    subscriptions   map[string][]Subscription
    resultSubs      map[string][]ResultSubscription
    progressSubs    map[string][]ProgressSubscription
    mu              sync.RWMutex
}

Constructor

func NewEventBusService(server *AgentHubServer) *EventBusService

Creates an EventBus service with automatic:

Subscription management
Task routing and correlation
Observability integration

Key Methods

func (s *EventBusService) PublishTask(ctx context.Context, req *pb.PublishTaskRequest) (*pb.PublishResponse, error)

Publishes tasks with automatic:

Input validation
Correlation ID generation
Distributed tracing
Metrics collection

func (s *EventBusService) SubscribeToTasks(req *pb.SubscribeToTasksRequest, stream pb.EventBus_SubscribeToTasksServer) error

Manages task subscriptions with:

Automatic subscription lifecycle
Context cancellation handling
Error recovery

SubscriberAgent

High-level subscriber implementation with pluggable task handlers.

type SubscriberAgent struct {
    client      *AgentHubClient
    agentID     string
    handlers    map[string]TaskHandler
    ctx         context.Context
    cancel      context.CancelFunc
}

Constructor

func NewSubscriberAgent(client *AgentHubClient, agentID string) *SubscriberAgent

Task Handler Interface

type TaskHandler interface {
    Handle(ctx context.Context, task *pb.TaskMessage) (*pb.TaskResult, error)
}

Methods

func (s *SubscriberAgent) RegisterHandler(taskType string, handler TaskHandler)

Registers handlers for specific task types with automatic:

Task routing
Error handling
Result publishing

func (s *SubscriberAgent) Start(ctx context.Context) error

Starts the subscriber with automatic:

Task subscription
Handler dispatch
Observability integration

Example:

type GreetingHandler struct{}

func (h *GreetingHandler) Handle(ctx context.Context, task *pb.TaskMessage) (*pb.TaskResult, error) {
    // Process greeting task
    return result, nil
}

// Register handler
subscriber.RegisterHandler("greeting", &GreetingHandler{})

Utility Functions

Metadata Operations

func ExtractCorrelationID(ctx context.Context) string
func InjectCorrelationID(ctx context.Context, correlationID string) context.Context
func GenerateCorrelationID() string

Automatic correlation ID management for distributed tracing.

Metrics Helpers

func NewMetricsTicker(ctx context.Context, manager *observability.MetricsManager) *MetricsTicker

Automatic metrics collection with configurable intervals.

Configuration Reference

Environment Variables

The unified abstraction library uses environment-based configuration:

Variable	Type	Default	Description
`AGENTHUB_BROKER_ADDR`	string	`localhost`	Broker server hostname
`AGENTHUB_BROKER_PORT`	string	`50051`	Broker gRPC port
`AGENTHUB_GRPC_PORT`	string	`:50051`	Server listen address
`BROKER_HEALTH_PORT`	string	`8080`	Health endpoint port
`SERVICE_VERSION`	string	`1.0.0`	Service version for observability
`ENVIRONMENT`	string	`development`	Deployment environment

Observability Integration

The unified abstraction automatically configures:

OpenTelemetry Tracing: Automatic span creation and context propagation
Prometheus Metrics: 47+ built-in metrics for performance monitoring
Health Checks: Comprehensive health endpoints for service monitoring
Structured Logging: Correlated logging with trace context

Performance Characteristics

Metric	Standard gRPC	Unified Abstraction	Overhead
Setup Complexity	380+ lines	~29 lines	-92% code
Throughput	10,000+ tasks/sec	9,500+ tasks/sec	-5%
Latency	Baseline	+10ms for tracing	+10ms
Memory	Baseline	+50MB per agent	+50MB
CPU	Baseline	+5% for observability	+5%

Migration Guide

From Standard gRPC

Before (Standard gRPC):

// 380+ lines of boilerplate code
lis, err := net.Listen("tcp", ":50051")
server := grpc.NewServer()
// ... extensive setup code

After (Unified Abstraction):

// 29 lines total
config := agenthub.NewGRPCConfig("my-service")
server, err := agenthub.NewAgentHubServer(config)
service := agenthub.NewEventBusService(server)
pb.RegisterEventBusServer(server.Server, service)
server.Start(ctx)

Observability Benefits

The unified abstraction provides automatic:

Distributed Tracing: Every request automatically traced
Metrics Collection: 47+ metrics without configuration
Health Monitoring: Built-in health and readiness endpoints
Error Correlation: Automatic error tracking across services
Performance Monitoring: Latency, throughput, and error rates

Error Handling

The unified abstraction provides comprehensive error handling:

Automatic Retries: Built-in retry logic for transient failures
Circuit Breaking: Protection against cascading failures
Graceful Degradation: Service continues operating during partial failures
Error Correlation: Distributed error tracking across service boundaries

Best Practices

1. Configuration Management

// Use environment-based configuration
config := agenthub.NewGRPCConfig("my-service")

// Override specific values if needed
config.HealthPort = "8083"

2. Handler Registration

// Register handlers before starting
subscriber.RegisterHandler("task-type", handler)
subscriber.Start(ctx)

3. Graceful Shutdown

// Always implement proper shutdown
defer func() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    server.Shutdown(ctx)
}()

4. Error Handling

// Use context for cancellation
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

result, err := client.Client.PublishTask(ctx, request)
if err != nil {
    // Error is automatically traced and logged
    return fmt.Errorf("failed to publish task: %w", err)
}

3.3 - Observability

Monitoring, metrics, and observability reference

Observability Reference

This section provides reference documentation for all observability features, including metrics, health endpoints, and monitoring capabilities.

Available Documentation

Health Endpoints - Health check and status endpoints reference
Observability Metrics - Available metrics and their meanings

3.3.1 - AgentHub Health Endpoints Reference

Complete documentation for AgentHub’s health monitoring APIs, endpoint specifications, status codes, and integration patterns.

AgentHub Health Endpoints Reference

Technical reference: Complete documentation for AgentHub’s health monitoring APIs, endpoint specifications, status codes, and integration patterns.

Overview

Every observable AgentHub service exposes standardized health endpoints for monitoring, load balancing, and operational management.

Standard Endpoints

Health Check Endpoint

`/health`

Purpose: Comprehensive service health status Method: GET Port: Service-specific (8080-8083)

Response Format:

{
  "status": "healthy|degraded|unhealthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "database_connection",
      "status": "healthy",
      "message": "Database connection is active",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "15.6ms"
    }
  ]
}

Status Codes:

200 OK - All checks healthy
503 Service Unavailable - One or more checks unhealthy
500 Internal Server Error - Health check system failure

Readiness Endpoint

`/ready`

Purpose: Service readiness for traffic acceptance Method: GET

Response Format:

{
  "ready": true,
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "dependencies": [
    {
      "name": "grpc_server",
      "ready": true,
      "message": "gRPC server listening on :50051"
    },
    {
      "name": "observability",
      "ready": true,
      "message": "OpenTelemetry initialized"
    }
  ]
}

Status Codes:

200 OK - Service ready for traffic
503 Service Unavailable - Service not ready

Metrics Endpoint

`/metrics`

Purpose: Prometheus metrics exposure Method: GET Content-Type: text/plain

Response Format:

# HELP events_processed_total Total number of events processed
# TYPE events_processed_total counter
events_processed_total{service="agenthub-broker",event_type="greeting",success="true"} 1234

# HELP system_cpu_usage_percent CPU usage percentage
# TYPE system_cpu_usage_percent gauge
system_cpu_usage_percent{service="agenthub-broker"} 23.4

Status Codes:

200 OK - Metrics available
500 Internal Server Error - Metrics collection failure

Service-Specific Configurations

Broker (Port 8080)

Health Checks:

self - Basic service health
grpc_server - gRPC server status
observability - OpenTelemetry health

Example URLs:

Health: http://localhost:8080/health
Ready: http://localhost:8080/ready
Metrics: http://localhost:8080/metrics

Publisher (Port 8081)

Health Checks:

self - Basic service health
broker_connection - Connection to AgentHub broker
observability - Tracing and metrics health

Example URLs:

Health: http://localhost:8081/health
Ready: http://localhost:8081/ready
Metrics: http://localhost:8081/metrics

Subscriber (Port 8082)

Health Checks:

self - Basic service health
broker_connection - Connection to AgentHub broker
task_processor - Task processing capability
observability - Observability stack health

Example URLs:

Health: http://localhost:8082/health
Ready: http://localhost:8082/ready
Metrics: http://localhost:8082/metrics

Custom Agents (Port 8083+)

Configurable Health Checks:

Custom business logic checks
External dependency checks
Resource availability checks

Health Check Types

BasicHealthChecker

Purpose: Simple function-based health checks

Implementation:

checker := observability.NewBasicHealthChecker("database", func(ctx context.Context) error {
    return db.Ping()
})
healthServer.AddChecker("database", checker)

Use Cases:

Database connectivity
File system access
Configuration validation
Memory/disk space checks

GRPCHealthChecker

Purpose: gRPC connection health verification

Implementation:

checker := observability.NewGRPCHealthChecker("broker_connection", "localhost:50051")
healthServer.AddChecker("broker_connection", checker)

Use Cases:

AgentHub broker connectivity
External gRPC service dependencies
Service mesh health

HTTPHealthChecker

Purpose: HTTP endpoint health verification

Implementation:

checker := observability.NewHTTPHealthChecker("api_gateway", "http://gateway:8080/health")
healthServer.AddChecker("api_gateway", checker)

Use Cases:

REST API dependencies
Web service health
Load balancer backends

Custom Health Checkers

Interface:

type HealthChecker interface {
    Check(ctx context.Context) error
    Name() string
}

Custom Implementation Example:

type BusinessLogicChecker struct {
    name string
    validator func() error
}

func (c *BusinessLogicChecker) Check(ctx context.Context) error {
    return c.validator()
}

func (c *BusinessLogicChecker) Name() string {
    return c.name
}

// Usage
checker := &BusinessLogicChecker{
    name: "license_validation",
    validator: func() error {
        if time.Now().After(licenseExpiry) {
            return errors.New("license expired")
        }
        return nil
    },
}

Health Check Configuration

Check Intervals

Default Intervals:

Active checks: Every 30 seconds
On-demand checks: Per request
Startup checks: During service initialization

Configurable Timing:

config := observability.HealthConfig{
    CheckInterval: 15 * time.Second,
    Timeout:       5 * time.Second,
    RetryCount:    3,
    RetryDelay:    1 * time.Second,
}

Timeout Configuration

Per-Check Timeouts:

checker := observability.NewBasicHealthChecker("slow_service",
    func(ctx context.Context) error {
        // This check will timeout after 10 seconds
        return slowOperation(ctx)
    }).WithTimeout(10 * time.Second)

Global Timeout:

healthServer := observability.NewHealthServer("8080", "my-service", "1.0.0")
healthServer.SetGlobalTimeout(30 * time.Second)

Integration Patterns

Kubernetes Integration

Liveness Probe

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

Readiness Probe

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Startup Probe

startupProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 30

Load Balancer Integration

HAProxy Configuration

backend agentHub_brokers
    balance roundrobin
    option httpchk GET /health
    server broker1 broker1:8080 check
    server broker2 broker2:8080 check

NGINX Configuration

upstream agenthub_backend {
    server broker1:8080;
    server broker2:8080;
}

location /health_check {
    proxy_pass http://agenthub_backend/health;
    proxy_set_header Host $host;
}

Prometheus Integration

Service Discovery

- job_name: 'agenthub-health'
  static_configs:
    - targets:
      - 'broker:8080'
      - 'publisher:8081'
      - 'subscriber:8082'
  metrics_path: '/metrics'
  scrape_interval: 10s
  scrape_timeout: 5s

Health Check Metrics

# Health check status (1=healthy, 0=unhealthy)
health_check_status{service="agenthub-broker",check="database"}

# Health check duration
health_check_duration_seconds{service="agenthub-broker",check="database"}

# Service uptime
service_uptime_seconds{service="agenthub-broker"}

Status Definitions

Service Status Levels

Healthy

Definition: All health checks passing HTTP Status: 200 OK Criteria:

All registered checks return no error
Service is fully operational
All dependencies available

Degraded

Definition: Service operational but with limitations HTTP Status: 200 OK (with warning indicators) Criteria:

Critical checks passing
Non-critical checks may be failing
Service can handle requests with reduced functionality

Unhealthy

Definition: Service cannot handle requests properly HTTP Status: 503 Service Unavailable Criteria:

One or more critical checks failing
Service should not receive new requests
Requires intervention or automatic recovery

Check-Level Status

Passing

Check completed successfully
No errors detected
Within acceptable parameters

Warning

Check completed with minor issues
Service functional but attention needed
May indicate future problems

Critical

Check failed
Service functionality compromised
Immediate attention required

Monitoring and Alerting

Critical Alerts

# Service down alert
- alert: ServiceHealthCheckFailing
  expr: health_check_status == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Service health check failing"
    description: "{{ $labels.service }} health check {{ $labels.check }} is failing"

# Service not ready alert
- alert: ServiceNotReady
  expr: up{job=~"agenthub-.*"} == 0
  for: 30s
  labels:
    severity: critical
  annotations:
    summary: "Service not responding"
    description: "{{ $labels.instance }} is not responding to health checks"

Warning Alerts

# Slow health checks
- alert: SlowHealthChecks
  expr: health_check_duration_seconds > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Health checks taking too long"
    description: "{{ $labels.service }} health check {{ $labels.check }} taking {{ $value }}s"

# Service degraded
- alert: ServiceDegraded
  expr: service_status == 1  # degraded status
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Service running in degraded mode"
    description: "{{ $labels.service }} is degraded but still operational"

API Response Examples

Healthy Service Response

curl http://localhost:8080/health

{
  "status": "healthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "grpc_server",
      "status": "healthy",
      "message": "gRPC server listening on :50051",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "0.8ms"
    },
    {
      "name": "observability",
      "status": "healthy",
      "message": "OpenTelemetry exporter connected",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "12.4ms"
    }
  ]
}

Unhealthy Service Response

curl http://localhost:8080/health

{
  "status": "unhealthy",
  "timestamp": "2025-09-28T21:00:00.000Z",
  "service": "agenthub-broker",
  "version": "1.0.0",
  "uptime": "2h34m12s",
  "checks": [
    {
      "name": "self",
      "status": "healthy",
      "message": "Service is running normally",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "1.2ms"
    },
    {
      "name": "grpc_server",
      "status": "unhealthy",
      "message": "Failed to bind to port :50051: address already in use",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "0.1ms"
    },
    {
      "name": "observability",
      "status": "healthy",
      "message": "OpenTelemetry exporter connected",
      "last_checked": "2025-09-28T21:00:00.000Z",
      "duration": "12.4ms"
    }
  ]
}

Best Practices

Health Check Design

Fast Execution: Keep checks under 5 seconds
Meaningful Tests: Test actual functionality, not just process existence
Idempotent Operations: Checks should not modify system state
Appropriate Timeouts: Set reasonable timeouts for external dependencies
Clear Messages: Provide actionable error messages

Dependency Management

Critical vs Non-Critical: Distinguish between essential and optional dependencies
Cascade Prevention: Avoid cascading failures through dependency chains
Circuit Breakers: Implement circuit breakers for flaky dependencies
Graceful Degradation: Continue operating when non-critical dependencies fail

Operational Considerations

Monitoring: Set up alerts for health check failures
Documentation: Document what each health check validates
Testing: Test health checks in development and staging
Versioning: Version health check APIs for compatibility

🎯 Next Steps:

Implementation: Add Observability to Your Agent

Monitoring: Use Grafana Dashboards

Metrics: Observability Metrics Reference

3.3.2 - AgentHub Observability Metrics Reference

Complete catalog of all metrics exposed by AgentHub’s observability system, their meanings, usage patterns, and query examples.

AgentHub Observability Metrics Reference

Technical reference: Complete catalog of all metrics exposed by AgentHub’s observability system, their meanings, usage patterns, and query examples.

Overview

AgentHub automatically collects 47+ distinct metrics across all observable services, providing comprehensive visibility into event processing, system health, and performance characteristics.

Metric Categories

A2A Message Processing Metrics

`a2a_messages_processed_total`

Type: Counter Description: Total number of A2A messages processed by service Labels:

service - Service name (agenthub, publisher, subscriber)
message_type - Type of A2A message (task_update, message, artifact)
success - Processing success (true/false)
context_id - A2A conversation context (for workflow tracking)

Usage:

# A2A message processing rate per service
rate(a2a_messages_processed_total[5m])

# Success rate by A2A message type
rate(a2a_messages_processed_total{success="true"}[5m]) / rate(a2a_messages_processed_total[5m]) * 100

# Error rate across all A2A services
rate(a2a_messages_processed_total{success="false"}[5m]) / rate(a2a_messages_processed_total[5m]) * 100

# Workflow processing rate by context
rate(a2a_messages_processed_total[5m]) by (context_id)

`a2a_messages_published_total`

Type: Counter Description: Total number of A2A messages published by agents Labels:

message_type - Type of A2A message published
from_agent_id - Publishing agent identifier
to_agent_id - Target agent identifier (empty for broadcast)

Usage:

# A2A publishing rate by message type
rate(a2a_messages_published_total[5m]) by (message_type)

# Most active A2A publishers
topk(5, rate(a2a_messages_published_total[5m]) by (from_agent_id))

# Broadcast vs direct messaging ratio
rate(a2a_messages_published_total{to_agent_id=""}[5m]) / rate(a2a_messages_published_total[5m])

`a2a_message_processing_duration_seconds`

Type: Histogram Description: Time taken to process A2A messages Labels:

service - Service processing the message
message_type - Type of A2A message being processed
task_state - Current A2A task state (for task-related messages)

Buckets: 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10

Usage:

# p95 A2A message processing latency
histogram_quantile(0.95, rate(a2a_message_processing_duration_seconds_bucket[5m]))

# p99 latency by service
histogram_quantile(0.99, rate(a2a_message_processing_duration_seconds_bucket[5m])) by (service)

# Average A2A processing time by task state
rate(a2a_message_processing_duration_seconds_sum[5m]) / rate(a2a_message_processing_duration_seconds_count[5m]) by (task_state)

`a2a_message_errors_total`

Type: Counter Description: Total number of A2A message processing errors Labels:

service - Service where error occurred
message_type - Type of A2A message that failed
error_type - Category of error (grpc_error, validation_error, protocol_error, etc.)
a2a_version - A2A protocol version for compatibility tracking

Usage:

# A2A error rate by error type
rate(a2a_message_errors_total[5m]) by (error_type)

# Services with highest A2A error rates
topk(3, rate(a2a_message_errors_total[5m]) by (service))

# A2A protocol version compatibility issues
rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) by (a2a_version)

AgentHub Broker Metrics

`agenthub_connections_total`

Type: Gauge Description: Number of active agent connections to AgentHub broker Labels:

connection_type - Type of connection (a2a_publisher, a2a_subscriber, unified)
agent_type - Classification of connected agent

Usage:

# Current AgentHub connection count
agenthub_connections_total

# A2A connection growth over time
increase(agenthub_connections_total[1h])

# Connection distribution by type
agenthub_connections_total by (connection_type)

`agenthub_subscriptions_total`

Type: Gauge Description: Number of active A2A message subscriptions Labels:

agent_id - Subscriber agent identifier
subscription_type - Type of A2A subscription (tasks, messages, agent_events)
filter_criteria - Applied subscription filters (task_types, states, etc.)

Usage:

# Total active A2A subscriptions
sum(agenthub_subscriptions_total)

# A2A subscriptions by agent
sum(agenthub_subscriptions_total) by (agent_id)

# Most popular A2A subscription types
sum(agenthub_subscriptions_total) by (subscription_type)

# Filtered vs unfiltered subscriptions
sum(agenthub_subscriptions_total{filter_criteria!=""}) / sum(agenthub_subscriptions_total)

`agenthub_message_routing_duration_seconds`

Type: Histogram Description: Time taken to route A2A messages through AgentHub broker Labels:

routing_type - Type of routing (direct, broadcast, filtered)
message_size_bucket - Message size classification (small, medium, large)

Buckets: 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1

Usage:

# AgentHub A2A routing latency percentiles
histogram_quantile(0.95, rate(agenthub_message_routing_duration_seconds_bucket[5m]))

# A2A routing performance by type
rate(agenthub_message_routing_duration_seconds_sum[5m]) / rate(agenthub_message_routing_duration_seconds_count[5m]) by (routing_type)

# Message size impact on routing
histogram_quantile(0.95, rate(agenthub_message_routing_duration_seconds_bucket[5m])) by (message_size_bucket)

`agenthub_queue_size`

Type: Gauge Description: Number of A2A messages queued awaiting routing Labels:

queue_type - Type of queue (incoming, outgoing, dead_letter, retry)
priority - Message priority level
context_active - Whether messages belong to active A2A contexts

Usage:

# Current A2A queue sizes
agenthub_queue_size by (queue_type)

# A2A queue growth rate
rate(agenthub_queue_size[5m])

# Priority queue distribution
agenthub_queue_size by (priority)

# Active context message backlog
agenthub_queue_size{context_active="true"}

System Health Metrics

`system_cpu_usage_percent`

Type: Gauge Description: CPU utilization percentage Labels:

service - Service name

Usage:

# Current CPU usage
system_cpu_usage_percent

# High CPU services
system_cpu_usage_percent > 80

# Average CPU over time
avg_over_time(system_cpu_usage_percent[1h])

`system_memory_usage_bytes`

Type: Gauge Description: Memory usage in bytes Labels:

service - Service name
type - Memory type (heap, stack, total)

Usage:

# Memory usage in MB
system_memory_usage_bytes / 1024 / 1024

# Memory growth rate
rate(system_memory_usage_bytes[10m])

# Memory usage by type
system_memory_usage_bytes by (type)

`system_goroutines_total`

Type: Gauge Description: Number of active goroutines Labels:

service - Service name

Usage:

# Current goroutine count
system_goroutines_total

# Goroutine leaks detection
increase(system_goroutines_total[1h]) > 1000

# Goroutine efficiency
system_goroutines_total / system_cpu_usage_percent

`system_file_descriptors_used`

Type: Gauge Description: Number of open file descriptors Labels:

service - Service name

Usage:

# Current FD usage
system_file_descriptors_used

# FD growth rate
rate(system_file_descriptors_used[5m])

A2A Task-Specific Metrics

`a2a_tasks_created_total`

Type: Counter Description: Total number of A2A tasks created Labels:

task_type - Type classification of the task
context_id - A2A conversation context
priority - Task priority level

Usage:

# A2A task creation rate
rate(a2a_tasks_created_total[5m])

# Task creation by type
rate(a2a_tasks_created_total[5m]) by (task_type)

# High priority task rate
rate(a2a_tasks_created_total{priority="PRIORITY_HIGH"}[5m])

`a2a_task_state_transitions_total`

Type: Counter Description: Total number of A2A task state transitions Labels:

from_state - Previous task state
to_state - New task state
task_type - Type of task transitioning

Usage:

# Task completion rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_COMPLETED"}[5m])

# Task failure rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m])

# Task state transition patterns
rate(a2a_task_state_transitions_total[5m]) by (from_state, to_state)

`a2a_task_duration_seconds`

Type: Histogram Description: Duration of A2A task execution from submission to completion Labels:

task_type - Type of task
final_state - Final task state (COMPLETED, FAILED, CANCELLED)

Buckets: 0.1, 0.5, 1, 5, 10, 30, 60, 300, 600, 1800

Usage:

# A2A task completion time percentiles
histogram_quantile(0.95, rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m]))

# Task duration by type
histogram_quantile(0.50, rate(a2a_task_duration_seconds_bucket[5m])) by (task_type)

# Failed vs successful task duration comparison
histogram_quantile(0.95, rate(a2a_task_duration_seconds_bucket[5m])) by (final_state)

`a2a_artifacts_produced_total`

Type: Counter Description: Total number of A2A artifacts produced by completed tasks Labels:

artifact_type - Type of artifact (data, file, text)
task_type - Type of task that produced the artifact
artifact_size_bucket - Size classification of artifact

Usage:

# Artifact production rate
rate(a2a_artifacts_produced_total[5m])

# Artifacts by type
rate(a2a_artifacts_produced_total[5m]) by (artifact_type)

# Large artifact production rate
rate(a2a_artifacts_produced_total{artifact_size_bucket="large"}[5m])

gRPC Metrics

`grpc_server_started_total`

Type: Counter Description: Total number of RPCs started on the AgentHub server Labels:

grpc_method - gRPC method name (PublishMessage, SubscribeToTasks, etc.)
grpc_service - gRPC service name (AgentHub)

Usage:

# AgentHub RPC request rate
rate(grpc_server_started_total[5m])

# Most called A2A methods
topk(5, rate(grpc_server_started_total[5m]) by (grpc_method))

# A2A vs EDA method usage
rate(grpc_server_started_total{grpc_method=~".*Message.*|.*Task.*"}[5m])

`grpc_server_handled_total`

Type: Counter Description: Total number of RPCs completed on the AgentHub server Labels:

grpc_method - gRPC method name
grpc_service - gRPC service name (AgentHub)
grpc_code - gRPC status code
a2a_operation - A2A operation type (publish, subscribe, get, cancel)

Usage:

# AgentHub RPC success rate
rate(grpc_server_handled_total{grpc_code="OK"}[5m]) / rate(grpc_server_handled_total[5m]) * 100

# A2A operation error rate
rate(grpc_server_handled_total{grpc_code!="OK"}[5m]) by (a2a_operation)

# A2A method-specific success rates
rate(grpc_server_handled_total{grpc_code="OK"}[5m]) / rate(grpc_server_handled_total[5m]) by (grpc_method)

`grpc_server_handling_seconds`

Type: Histogram Description: Histogram of response latency of AgentHub RPCs Labels:

grpc_method - gRPC method name
grpc_service - gRPC service name (AgentHub)
a2a_operation - A2A operation type

Usage:

# AgentHub gRPC latency percentiles
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m]))

# Slow A2A operations
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m])) by (a2a_operation) > 0.1

# A2A method performance comparison
histogram_quantile(0.95, rate(grpc_server_handling_seconds_bucket[5m])) by (grpc_method)

Health Check Metrics

`health_check_status`

Type: Gauge Description: Health check status (1=healthy, 0=unhealthy) Labels:

service - Service name
check_name - Name of the health check
endpoint - Health check endpoint

Usage:

# Unhealthy services
health_check_status == 0

# Health check success rate
avg_over_time(health_check_status[5m])

`health_check_duration_seconds`

Type: Histogram Description: Time taken to execute health checks Labels:

service - Service name
check_name - Name of the health check

Usage:

# Health check latency
histogram_quantile(0.95, rate(health_check_duration_seconds_bucket[5m]))

# Slow health checks
histogram_quantile(0.95, rate(health_check_duration_seconds_bucket[5m])) by (check_name) > 0.5

OpenTelemetry Metrics

`otelcol_processor_batch_batch_send_size_count`

Type: Counter Description: Number of batches sent by OTEL collector Labels: None

`otelcol_exporter_sent_spans`

Type: Counter Description: Number of spans sent to tracing backend Labels:

exporter - Exporter name (jaeger, otlp)

Usage:

# Span export rate
rate(otelcol_exporter_sent_spans[5m])

# Export success by backend
rate(otelcol_exporter_sent_spans[5m]) by (exporter)

Common Query Patterns

A2A Performance Analysis

# Top 5 slowest A2A message types
topk(5,
  histogram_quantile(0.95,
    rate(a2a_message_processing_duration_seconds_bucket[5m])
  ) by (message_type)
)

# A2A task completion time analysis
histogram_quantile(0.95,
  rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m])
) by (task_type)

# Services exceeding A2A latency SLA (>500ms p95)
histogram_quantile(0.95,
  rate(a2a_message_processing_duration_seconds_bucket[5m])
) by (service) > 0.5

# A2A throughput efficiency (messages per CPU percent)
rate(a2a_messages_processed_total[5m]) / system_cpu_usage_percent

# Task success rate by type
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_COMPLETED"}[5m]) /
rate(a2a_tasks_created_total[5m]) by (task_type)

A2A Error Analysis

# A2A message error rate by service over time
rate(a2a_message_errors_total[5m]) / rate(a2a_messages_processed_total[5m]) * 100

# A2A task failure rate
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
rate(a2a_tasks_created_total[5m]) * 100

# Most common A2A error types
topk(5, rate(a2a_message_errors_total[5m]) by (error_type))

# A2A protocol compatibility issues
rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) by (a2a_version)

# Services with increasing A2A error rates
increase(a2a_message_errors_total[1h]) by (service) > 10

A2A Capacity Planning

# Peak hourly A2A message throughput
max_over_time(
  rate(a2a_messages_processed_total[5m])[1h:]
) * 3600

# Peak A2A task creation rate
max_over_time(
  rate(a2a_tasks_created_total[5m])[1h:]
) * 3600

# Resource utilization during peak A2A load
(
  max_over_time(system_cpu_usage_percent[1h:]) +
  max_over_time(system_memory_usage_bytes[1h:] / 1024 / 1024 / 1024)
) by (service)

# AgentHub connection scaling needs
max_over_time(agenthub_connections_total[24h:])

# A2A queue depth trends
max_over_time(agenthub_queue_size[24h:]) by (queue_type)

A2A System Health

# Overall A2A system health score (0-1)
avg(health_check_status)

# A2A services with degraded performance
(
  system_cpu_usage_percent > 70 or
  system_memory_usage_bytes > 1e9 or
  rate(a2a_message_errors_total[5m]) / rate(a2a_messages_processed_total[5m]) > 0.05
)

# A2A task backlog health
agenthub_queue_size{queue_type="incoming"} > 1000

# A2A protocol health indicators
rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
rate(a2a_tasks_created_total[5m]) > 0.1

# Resource leak detection
increase(system_goroutines_total[1h]) > 1000 or
increase(system_file_descriptors_used[1h]) > 100

Alert Rule Examples

Critical A2A Alerts

# High A2A message processing error rate alert
- alert: HighA2AMessageProcessingErrorRate
  expr: |
    (
      rate(a2a_message_errors_total[5m]) /
      rate(a2a_messages_processed_total[5m])
    ) * 100 > 10    
  for: 2m
  annotations:
    summary: "High A2A message processing error rate"
    description: "{{ $labels.service }} has {{ $value }}% A2A error rate"

# High A2A task failure rate alert
- alert: HighA2ATaskFailureRate
  expr: |
    (
      rate(a2a_task_state_transitions_total{to_state="TASK_STATE_FAILED"}[5m]) /
      rate(a2a_tasks_created_total[5m])
    ) * 100 > 15    
  for: 3m
  annotations:
    summary: "High A2A task failure rate"
    description: "{{ $value }}% of A2A tasks are failing for task type {{ $labels.task_type }}"

# AgentHub service down alert
- alert: AgentHubServiceDown
  expr: health_check_status == 0
  for: 1m
  annotations:
    summary: "AgentHub service health check failing"
    description: "{{ $labels.service }} health check {{ $labels.check_name }} is failing"

# A2A queue backlog alert
- alert: A2AQueueBacklog
  expr: agenthub_queue_size{queue_type="incoming"} > 1000
  for: 5m
  annotations:
    summary: "A2A message queue backlog"
    description: "AgentHub has {{ $value }} messages queued"

A2A Warning Alerts

# High A2A message processing latency warning
- alert: HighA2AMessageProcessingLatency
  expr: |
    histogram_quantile(0.95,
      rate(a2a_message_processing_duration_seconds_bucket[5m])
    ) > 0.5    
  for: 5m
  annotations:
    summary: "High A2A message processing latency"
    description: "{{ $labels.service }} A2A p95 latency is {{ $value }}s"

# Slow A2A task completion warning
- alert: SlowA2ATaskCompletion
  expr: |
    histogram_quantile(0.95,
      rate(a2a_task_duration_seconds_bucket{final_state="TASK_STATE_COMPLETED"}[5m])
    ) > 300    
  for: 10m
  annotations:
    summary: "Slow A2A task completion"
    description: "A2A tasks of type {{ $labels.task_type }} taking {{ $value }}s to complete"

# High CPU usage warning
- alert: HighCPUUsage
  expr: system_cpu_usage_percent > 80
  for: 5m
  annotations:
    summary: "High CPU usage"
    description: "{{ $labels.service }} CPU usage is {{ $value }}%"

# A2A protocol version compatibility warning
- alert: A2AProtocolVersionMismatch
  expr: |
    rate(a2a_message_errors_total{error_type="protocol_error"}[5m]) > 0.1    
  for: 3m
  annotations:
    summary: "A2A protocol version compatibility issues"
    description: "A2A protocol errors detected for version {{ $labels.a2a_version }}"

Metric Retention and Storage

Retention Policies

Raw metrics: 15 days at 15-second resolution
5m averages: 60 days
1h averages: 1 year
1d averages: 5 years

Storage Requirements

Per service: ~2MB/day for all metrics
Complete system: ~10MB/day for 5 services
1 year retention: ~3.6GB total

Performance Considerations

Scrape interval: 10 seconds (configurable)
Evaluation interval: 15 seconds for alerts
Query timeout: 30 seconds
Max samples: 50M per query

Integration Examples

Grafana Dashboard Variables

{
  "service": {
    "query": "label_values(a2a_messages_processed_total, service)",
    "refresh": "on_time_range_changed"
  },
  "message_type": {
    "query": "label_values(a2a_messages_processed_total{service=\"$service\"}, message_type)",
    "refresh": "on_dashboard_load"
  },
  "task_type": {
    "query": "label_values(a2a_tasks_created_total, task_type)",
    "refresh": "on_dashboard_load"
  },
  "context_id": {
    "query": "label_values(a2a_messages_processed_total{service=\"$service\"}, context_id)",
    "refresh": "on_dashboard_load"
  }
}

Custom A2A Application Metrics

// Register custom A2A counter
a2aCustomCounter, err := meter.Int64Counter(
    "a2a_custom_business_metric_total",
    metric.WithDescription("Custom A2A business metric"),
)

// Increment with A2A context and labels
a2aCustomCounter.Add(ctx, 1, metric.WithAttributes(
    attribute.String("task_type", "custom_analysis"),
    attribute.String("context_id", contextID),
    attribute.String("agent_type", "analytics_agent"),
    attribute.String("a2a_version", "1.0"),
))

// Register A2A task-specific histogram
a2aTaskHistogram, err := meter.Float64Histogram(
    "a2a_custom_task_processing_seconds",
    metric.WithDescription("Custom A2A task processing time"),
    metric.WithUnit("s"),
)

// Record A2A task timing
start := time.Now()
// ... process A2A task ...
duration := time.Since(start).Seconds()
a2aTaskHistogram.Record(ctx, duration, metric.WithAttributes(
    attribute.String("task_type", taskType),
    attribute.String("task_state", "TASK_STATE_COMPLETED"),
))

Troubleshooting Metrics

Missing Metrics Checklist

✅ Service built with -tags observability
✅ Prometheus can reach metrics endpoint
✅ Correct port in Prometheus config
✅ Service is actually processing events
✅ OpenTelemetry exporter configured correctly

High Cardinality Warning

Avoid metrics with unbounded label values:

❌ User IDs as labels (millions of values)
❌ Timestamps as labels
❌ Request IDs as labels
✅ Event types (limited set)
✅ Service names (limited set)
✅ Status codes (limited set)

🎯 Next Steps:

Implementation: Add Observability to Your Agent

Monitoring: Use Grafana Dashboards

Understanding: Distributed Tracing Explained

3.4 - Tasks

Task message specifications and reference

Tasks Reference

This section provides detailed reference documentation for task messages, specifications, and data structures used throughout AgentHub.

Available Documentation

Task Reference - Detailed task message specifications and data structures

3.4.1 - A2A Task Reference

Comprehensive reference for all task-related message types and operations in the Agent2Agent protocol implementation.

A2A Task Reference

This document provides a comprehensive reference for all task-related message types and operations in the Agent2Agent (A2A) protocol implementation within AgentHub’s hybrid Event-Driven Architecture.

Core A2A Task Types

A2A Task

The primary message type for managing work requests between agents in the Agent2Agent protocol.

message Task {
  string id = 1;                    // Required: Task identifier
  string context_id = 2;            // Optional: Conversation context
  TaskStatus status = 3;            // Required: Current task status
  repeated Message history = 4;     // Message history for this task
  repeated Artifact artifacts = 5;  // Task output artifacts
  google.protobuf.Struct metadata = 6; // Task metadata
}

Field Reference

Field	Type	Required	Description
`id`	string	Yes	Globally unique identifier for the task
`context_id`	string	No	Groups related tasks in a workflow or conversation
`status`	TaskStatus	Yes	Current execution state and last update
`history`	Message[]	No	Complete message history for this task
`artifacts`	Artifact[]	No	Output artifacts produced by the task
`metadata`	Struct	No	Additional context information

Task ID Format

Task IDs should be globally unique and meaningful for debugging:

// Recommended formats:
taskID := fmt.Sprintf("task_%s_%d", taskType, time.Now().Unix())
taskID := fmt.Sprintf("task_%s_%s", taskType, uuid.New().String())
taskID := fmt.Sprintf("%s_%s_%d", requesterID, taskType, sequence)

A2A TaskStatus

Represents the current state and latest update for a task.

message TaskStatus {
  TaskState state = 1;              // Current task state
  Message update = 2;               // Status update message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}

Field Reference

Field	Type	Required	Description
`state`	TaskState	Yes	Current execution state
`update`	Message	No	Latest status message from the executing agent
`timestamp`	Timestamp	Yes	When this status was last updated

A2A Message

Agent-to-agent communication within task context.

message Message {
  string message_id = 1;       // Required: Unique message identifier
  string context_id = 2;       // Optional: Conversation context
  string task_id = 3;          // Optional: Associated task
  Role role = 4;               // Required: USER or AGENT
  repeated Part content = 5;   // Required: Message content parts
  google.protobuf.Struct metadata = 6; // Optional: Additional metadata
  repeated string extensions = 7;       // Optional: Protocol extensions
}

Message Content Parts

Messages contain structured content using A2A Part definitions:

message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}

message DataPart {
  google.protobuf.Struct data = 1;    // Structured data content
  string description = 2;             // Optional data description
}

message FilePart {
  string file_id = 1;                 // File identifier or URI
  string filename = 2;                // Original filename
  string mime_type = 3;               // MIME type
  int64 size_bytes = 4;               // File size in bytes
  google.protobuf.Struct metadata = 5; // Additional file metadata
}

A2A Artifact

Structured output produced by completed tasks.

message Artifact {
  string artifact_id = 1;           // Required: Artifact identifier
  string name = 2;                  // Human-readable name
  string description = 3;           // Artifact description
  repeated Part parts = 4;          // Artifact content parts
  google.protobuf.Struct metadata = 5; // Artifact metadata
}

Field Reference

Field	Type	Required	Description
`artifact_id`	string	Yes	Unique identifier for this artifact
`name`	string	No	Human-readable artifact name
`description`	string	No	Description of the artifact contents
`parts`	Part[]	Yes	Structured content using A2A Part format
`metadata`	Struct	No	Additional artifact information

Enumerations

TaskState

Current state of A2A task execution.

enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created and submitted
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed with error
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}

State Transition Rules

Valid state transitions:

TASK_STATE_SUBMITTED → TASK_STATE_WORKING → TASK_STATE_COMPLETED
TASK_STATE_SUBMITTED → TASK_STATE_WORKING → TASK_STATE_FAILED
TASK_STATE_SUBMITTED → TASK_STATE_WORKING → TASK_STATE_CANCELLED
TASK_STATE_SUBMITTED → TASK_STATE_CANCELLED (before execution starts)

Invalid transitions:

Any state → TASK_STATE_SUBMITTED
TASK_STATE_COMPLETED → any other state
TASK_STATE_FAILED → any other state (except for retry scenarios)

Role

Identifies the role of the message sender in A2A communication.

enum Role {
  USER = 0;    // Message from requesting agent
  AGENT = 1;   // Message from responding agent
}

Priority

Task priority levels for scheduling and resource allocation.

enum Priority {
  PRIORITY_UNSPECIFIED = 0;  // Default value, treated as MEDIUM
  PRIORITY_LOW = 1;          // Low priority, can be delayed
  PRIORITY_MEDIUM = 2;       // Normal priority
  PRIORITY_HIGH = 3;         // High priority, expedited processing
  PRIORITY_CRITICAL = 4;     // Critical priority, immediate processing
}

Priority Usage Guidelines

Priority	Use Cases	SLA Expectations
`LOW`	Background jobs, cleanup tasks, analytics	Hours to days
`MEDIUM`	Standard user requests, routine processing	Minutes to hours
`HIGH`	User-visible operations, time-sensitive tasks	Seconds to minutes
`CRITICAL`	Emergency operations, system health tasks	Immediate

AgentHub EDA Request/Response Messages

Task Publishing

PublishTaskUpdateRequest

Request to publish a task status update through the EDA broker.

message PublishTaskUpdateRequest {
  a2a.Task task = 1;                      // Updated A2A task
  AgentEventMetadata routing = 2;         // EDA routing metadata
}

PublishTaskArtifactRequest

Request to publish a task artifact through the EDA broker.

message PublishTaskArtifactRequest {
  string task_id = 1;                     // Associated task ID
  a2a.Artifact artifact = 2;              // A2A artifact
  AgentEventMetadata routing = 3;         // EDA routing metadata
}

Task Subscription

SubscribeToTasksRequest

Request to subscribe to A2A task events through the EDA broker.

message SubscribeToTasksRequest {
  string agent_id = 1;                    // Agent ID for subscription
  repeated string task_types = 2;         // Optional task type filter
  repeated a2a.TaskState states = 3;      // Optional state filter
}

Usage Examples

// Subscribe to all tasks for this agent
req := &pb.SubscribeToTasksRequest{
    AgentId: "data_processor_01",
}

// Subscribe only to working and completed tasks
req := &pb.SubscribeToTasksRequest{
    AgentId: "workflow_orchestrator",
    States: []a2a.TaskState{
        a2a.TaskState_TASK_STATE_WORKING,
        a2a.TaskState_TASK_STATE_COMPLETED,
    },
}

Task Management

GetTaskRequest

Request to retrieve the current state of an A2A task.

message GetTaskRequest {
  string task_id = 1;                     // Task identifier
  int32 history_length = 2;               // History limit (optional)
}

CancelTaskRequest

Request to cancel an active A2A task.

message CancelTaskRequest {
  string task_id = 1;                     // Task to cancel
  string reason = 2;                      // Cancellation reason
}

ListTasksRequest

Request to list A2A tasks matching criteria.

message ListTasksRequest {
  string agent_id = 1;                    // Filter by agent
  repeated a2a.TaskState states = 2;      // Filter by states
  google.protobuf.Timestamp since = 3;    // Filter by timestamp
  int32 limit = 4;                        // Results limit
}

gRPC Service Methods

Task Publishing Methods

PublishTaskUpdate

Publishes a task status update to the EDA broker.

rpc PublishTaskUpdate (PublishTaskUpdateRequest) returns (PublishResponse);

Example:

// Create updated task status
status := &a2a.TaskStatus{
    State: a2a.TaskState_TASK_STATE_WORKING,
    Update: &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    taskID,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Processing data analysis...",
                },
            },
        },
    },
    Timestamp: timestamppb.Now(),
}

task := &a2a.Task{
    Id:     taskID,
    Status: status,
}

req := &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "processor_01",
        EventType:   "task.status_update",
    },
}
res, err := client.PublishTaskUpdate(ctx, req)

PublishTaskArtifact

Publishes a task artifact to the EDA broker.

rpc PublishTaskArtifact (PublishTaskArtifactRequest) returns (PublishResponse);

Example:

// Create artifact with results
artifact := &a2a.Artifact{
    ArtifactId:  "artifact_" + uuid.New().String(),
    Name:        "Analysis Results",
    Description: "Statistical analysis of sales data",
    Parts: []*a2a.Part{
        {
            Part: &a2a.Part_Data{
                Data: &a2a.DataPart{
                    Data: structData, // Contains analysis results
                    Description: "Sales analysis summary statistics",
                },
            },
        },
        {
            Part: &a2a.Part_File{
                File: &a2a.FilePart{
                    FileId:   "file_123",
                    Filename: "analysis_report.pdf",
                    MimeType: "application/pdf",
                    SizeBytes: 1024576,
                },
            },
        },
    },
}

req := &pb.PublishTaskArtifactRequest{
    TaskId:   taskID,
    Artifact: artifact,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "processor_01",
        EventType:   "task.artifact",
    },
}
res, err := client.PublishTaskArtifact(ctx, req)

Task Subscription Methods

SubscribeToTasks

Subscribes to receive A2A task events through the EDA broker.

rpc SubscribeToTasks (SubscribeToTasksRequest) returns (stream AgentEvent);

Returns: Stream of AgentEvent objects containing A2A task updates

Example:

req := &pb.SubscribeToTasksRequest{
    AgentId: "processor_01",
    States: []a2a.TaskState{a2a.TaskState_TASK_STATE_SUBMITTED},
}
stream, err := client.SubscribeToTasks(ctx, req)

for {
    event, err := stream.Recv()
    if err != nil {
        break
    }

    // Extract A2A task from event
    if task := event.GetTask(); task != nil {
        go processA2ATask(task)
    }
}

Task Management Methods

GetTask

Retrieves the current state of an A2A task by ID.

rpc GetTask (GetTaskRequest) returns (a2a.Task);

CancelTask

Cancels an active A2A task and notifies subscribers.

rpc CancelTask (CancelTaskRequest) returns (a2a.Task);

ListTasks

Returns A2A tasks matching the specified criteria.

rpc ListTasks (ListTasksRequest) returns (ListTasksResponse);

A2A Task Workflow Patterns

Simple Request-Response

// 1. Agent A creates and publishes task request
task := &a2a.Task{
    Id:        "task_analysis_123",
    ContextId: "workflow_456",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            TaskId:    "task_analysis_123",
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please analyze the Q4 sales data",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: dataStruct, // Contains parameters
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}

// 2. Agent B receives task and updates status to WORKING
// 3. Agent B publishes progress updates during execution
// 4. Agent B publishes final artifacts and COMPLETED status

Multi-Step Workflow

// 1. Orchestrator creates main task
mainTask := &a2a.Task{
    Id:        "workflow_main_789",
    ContextId: "workflow_context_789",
    // ... initial message
}

// 2. Create subtasks with same context_id
subtask1 := &a2a.Task{
    Id:        "subtask_data_prep_790",
    ContextId: "workflow_context_789", // Same context
    // ... data preparation request
}

subtask2 := &a2a.Task{
    Id:        "subtask_analysis_791",
    ContextId: "workflow_context_789", // Same context
    // ... analysis request (depends on subtask1)
}

// 3. Tasks linked by context_id for workflow tracking

Error Handling Reference

A2A Task Error Patterns

Parameter Validation Errors

// Task fails with validation error
failedTask := &a2a.Task{
    Id: taskID,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_FAILED,
        Update: &a2a.Message{
            Role: a2a.Role_AGENT,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Task failed: Required parameter 'dataset_path' is missing",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: errorDetails, // Structured error info
                            Description: "Validation error details",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}

Resource Errors

// Task fails due to resource unavailability
failedTask := &a2a.Task{
    Id: taskID,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_FAILED,
        Update: &a2a.Message{
            Role: a2a.Role_AGENT,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Cannot access dataset file: /data/sales_2024.csv",
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}

Error Handling Best Practices

Use structured error messages in A2A format for programmatic handling
Include actionable error descriptions in text parts for human operators
Add detailed error data in data parts for debugging and retry logic
Maintain task history to preserve error context
Consider partial results using artifacts for partially successful operations

Migration from Legacy EventBus

Message Type Mappings

Legacy EventBus	A2A Equivalent	Notes
`TaskMessage`	`a2a.Task` with initial `Message`	Task creation with request message
`TaskResult`	`a2a.Task` with final `Artifact`	Task completion with result artifacts
`TaskProgress`	`a2a.Task` with status `Message`	Progress updates via status messages
`TaskStatus` enum	`a2a.TaskState` enum	State names updated (e.g., `IN_PROGRESS` → `TASK_STATE_WORKING`)

API Method Mappings

Legacy EventBus	A2A Equivalent	Notes
`PublishTask`	`PublishTaskUpdate`	Now publishes A2A task objects
`PublishTaskResult`	`PublishTaskArtifact`	Results published as artifacts
`PublishTaskProgress`	`PublishTaskUpdate`	Progress via task status updates
`SubscribeToTasks`	`SubscribeToTasks`	Now returns A2A task events
`SubscribeToTaskResults`	`SubscribeToTasks` (filtered)	Filter by COMPLETED state

This reference provides the complete specification for A2A task-related messages and operations in the AgentHub Event-Driven Architecture, enabling robust distributed task coordination with full Agent2Agent protocol compliance.

3.5 - Cortex Reference

Technical reference for the Cortex orchestration engine

Cortex Reference Documentation

Complete API and configuration reference for Cortex.

Overview

Cortex is an asynchronous AI orchestration engine that coordinates multi-agent workflows through event-driven architecture.

Version: 0.1.0 (POC) Status: Production-Ready Architecture (Mock LLM)

Core Interfaces

StateManager Interface

Manages conversation persistence.

package state

type StateManager interface {
    // Get retrieves conversation state for a session
    Get(sessionID string) (*ConversationState, error)

    // Set persists conversation state
    Set(sessionID string, state *ConversationState) error

    // Delete removes conversation state
    Delete(sessionID string) error

    // WithLock executes a function with exclusive session access
    WithLock(sessionID string, fn func(*ConversationState) error) error
}

Implementations:

InMemoryStateManager - POC implementation (in-memory)
Future: RedisStateManager, PostgresStateManager

Usage:

sm := state.NewInMemoryStateManager()

// Get state
state, err := sm.Get("session-123")

// Update with lock
err = sm.WithLock("session-123", func(state *ConversationState) error {
    state.Messages = append(state.Messages, newMessage)
    return nil
})

LLM Client Interface

Abstraction for AI decision-making.

package llm

type Client interface {
    // Decide analyzes state and returns actions to take
    Decide(
        ctx context.Context,
        conversationHistory []*pb.Message,
        availableAgents []*pb.AgentCard,
        newEvent *pb.Message,
    ) (*Decision, error)
}

Decision Structure:

type Decision struct {
    Reasoning string   // Why these actions were chosen
    Actions   []Action // Actions to execute
}

type Action struct {
    Type string // "chat.response" or "task.request"

    // For chat.response
    ResponseText string

    // For task.request
    TaskType    string
    TaskPayload map[string]interface{}
    TargetAgent string

    CorrelationID string
}

Implementations:

MockClient - Testing implementation
Future: VertexAIClient, OpenAIClient

Usage:

llmClient := llm.NewMockClient()

decision, err := llmClient.Decide(
    ctx,
    conversationHistory,
    availableAgents,
    newMessage,
)

for _, action := range decision.Actions {
    // Execute action
}

MessagePublisher Interface

Publishes messages to Event Bus.

package cortex

type MessagePublisher interface {
    PublishMessage(
        ctx context.Context,
        msg *pb.Message,
        routing *pb.AgentEventMetadata,
    ) error
}

Implementation:

type AgentHubMessagePublisher struct {
    client *agenthub.AgentHubClient
}

func (a *AgentHubMessagePublisher) PublishMessage(
    ctx context.Context,
    msg *pb.Message,
    routing *pb.AgentEventMetadata,
) error {
    _, err := a.client.Client.PublishMessage(ctx, &pb.PublishMessageRequest{
        Message: msg,
        Routing: routing,
    })
    return err
}

Data Structures

ConversationState

type ConversationState struct {
    SessionID        string                      // Unique session identifier
    Messages         []*pb.Message               // Full conversation history
    PendingTasks     map[string]*TaskContext     // In-flight tasks
    RegisteredAgents map[string]*pb.AgentCard    // Available agents
}

Fields:

SessionID - Unique identifier for the conversation (e.g., “session-123”)
Messages - Complete message history (USER and AGENT messages)
PendingTasks - Tasks awaiting completion (keyed by task_id)
RegisteredAgents - Agents available for this session (keyed by agent_id)

TaskContext

type TaskContext struct {
    TaskID        string       // Unique task identifier
    TaskType      string       // Type of task (e.g., "transcription")
    RequestedAt   int64        // Unix timestamp when task was created
    OriginalInput *pb.Message  // Original user message that triggered task
    UserNotified  bool         // Whether user received acknowledgment
}

Cortex Core API

Constructor

func NewCortex(
    stateManager StateManager,
    llmClient llm.Client,
    messagePublisher MessagePublisher,
) *Cortex

Parameters:

stateManager - State persistence implementation
llmClient - LLM decision engine
messagePublisher - Message publishing adapter

Returns: Configured Cortex instance

Example:

cortex := cortex.NewCortex(
    state.NewInMemoryStateManager(),
    llm.NewMockClient(),
    &AgentHubMessagePublisher{client: agentHubClient},
)

RegisterAgent

func (c *Cortex) RegisterAgent(agentID string, card *pb.AgentCard)

Registers an agent’s capabilities with Cortex.

Parameters:

agentID - Unique agent identifier
card - Agent capability card

Example:

cortex.RegisterAgent("transcriber-1", &pb.AgentCard{
    Name: "Audio Transcriber",
    Description: "Transcribes audio files to text",
    Skills: []*pb.AgentSkill{
        {
            Id: "transcribe",
            Name: "Transcription",
            Description: "Converts speech to text",
        },
    },
})

GetAvailableAgents

func (c *Cortex) GetAvailableAgents() []*pb.AgentCard

Returns all registered agents.

Returns: Slice of agent cards

Example:

agents := cortex.GetAvailableAgents()
for _, agent := range agents {
    fmt.Printf("Agent: %s - %s\n", agent.Name, agent.Description)
}

HandleMessage

func (c *Cortex) HandleMessage(ctx context.Context, msg *pb.Message) error

Main entry point for processing messages.

Parameters:

ctx - Context for cancellation and tracing
msg - A2A protocol message

Returns: Error if processing failed

Message Types Handled:

Chat requests (role=USER) → Cortex decides response
Task results (role=AGENT, task_id set) → Cortex synthesizes result

Example:

message := &pb.Message{
    MessageId: "msg-123",
    ContextId: "session-456",
    Role: pb.Role_ROLE_USER,
    Content: []*pb.Part{
        {Part: &pb.Part_Text{Text: "Hello"}},
    },
}

err := cortex.HandleMessage(ctx, message)

Configuration

Environment Variables

# LLM Configuration
CORTEX_LLM_MODEL=vertex-ai://gemini-2.0-flash  # LLM model to use

# AgentHub Connection
AGENTHUB_GRPC_PORT=127.0.0.1:50051            # Broker gRPC address
AGENTHUB_BROKER_ADDR=127.0.0.1                # Broker host

# Health Check
CORTEX_HEALTH_PORT=8086                       # Health check HTTP port

# Logging
LOG_LEVEL=info                                # Log level (debug, info, warn, error)

Programmatic Configuration

// Create state manager
stateManager := state.NewInMemoryStateManager()
// Or for production:
// stateManager := redis.NewRedisStateManager(redisClient)

// Create LLM client
llmClient := llm.NewMockClient()
// Or for production:
// llmClient := vertexai.NewClient(os.Getenv("CORTEX_LLM_MODEL"))

// Create message publisher
agentHubClient, _ := agenthub.NewAgentHubClient(config)
messagePublisher := &AgentHubMessagePublisher{client: agentHubClient}

// Create Cortex
cortex := cortex.NewCortex(stateManager, llmClient, messagePublisher)

Message Correlation

Session Management

Each conversation has a unique context_id (session ID).

Session Lifecycle:

CLI creates session: cli_session_<timestamp>
All messages in conversation share this context_id
Cortex maintains state per context_id
State persists across restarts (if using persistent storage)

Task Correlation

Tasks are correlated via task_id.

Task Lifecycle:

User message triggers task
Cortex creates task_id: task_<timestamp>
Cortex adds to PendingTasks map
Agent receives task (with task_id)
Agent publishes result (same task_id)
Cortex matches result to pending task
Cortex removes from PendingTasks

Error Handling

State Errors

type StateError struct {
    Op  string // Operation that failed
    Err string // Error message
}

Common Errors:

Empty session ID
Nil state
Lock timeout

Handling:

state, err := sm.Get(sessionID)
if err != nil {
    if stateErr, ok := err.(*state.StateError); ok {
        log.Printf("State operation %s failed: %s", stateErr.Op, stateErr.Err)
    }
}

LLM Errors

Handling:

decision, err := llmClient.Decide(ctx, history, agents, event)
if err != nil {
    // Fallback: send generic error response to user
    cortex.publishErrorResponse(ctx, sessionID, err)
    return err
}

Message Processing Errors

Errors during HandleMessage are logged but don’t crash Cortex:

err := cortex.HandleMessage(ctx, msg)
if err != nil {
    logger.ErrorContext(ctx, "Failed to handle message",
        "error", err,
        "message_id", msg.GetMessageId(),
    )
    // Cortex continues processing other messages
}

Performance

Concurrency

Thread-Safe: All operations use proper locking
Per-Session Locks: No global bottleneck
Tested: 100 concurrent goroutines, zero lost updates

Complexity

Operation	Time Complexity	Notes
Get State	O(1)	Map lookup
Set State	O(1)	Map insert
WithLock	O(1) + fn	Per-session lock
HandleMessage	O(n)	n = message history for LLM

Scalability

Vertical Scaling:

In-memory state limited by RAM
Recommendation: ~10,000 active sessions per instance

Horizontal Scaling (Future):

Partition sessions by context_id hash
Multiple Cortex instances
Shared persistent state (Redis Cluster)

Testing

Unit Tests

Run all tests:

go test -v ./agents/cortex/...

Coverage:

State manager: 5 tests
LLM client: 4 tests
Cortex core: 4 tests

Integration Testing

Use demo script:

./demo_cortex.sh

Mock LLM

For testing custom decision logic:

mockLLM := llm.NewMockClientWithFunc(
    func(ctx context.Context, history []*pb.Message, agents []*pb.AgentCard, event *pb.Message) (*llm.Decision, error) {
        // Custom logic
        return &llm.Decision{
            Actions: []llm.Action{
                {Type: "chat.response", ResponseText: "Test response"},
            },
        }, nil
    },
)

cortex := cortex.NewCortex(stateManager, mockLLM, publisher)

Migration Guide

From Mock LLM to Real LLM

Implement llm.Client interface:

type VertexAIClient struct {
    client *genai.Client
    model  string
}

func (v *VertexAIClient) Decide(...) (*Decision, error) {
    // Build prompt
    prompt := buildPrompt(conversationHistory, availableAgents, newEvent)

    // Call Vertex AI
    response, err := v.client.Generate(ctx, prompt)
    if err != nil {
        return nil, err
    }

    // Parse response into Decision
    return parseDecision(response)
}

Update configuration:

llmClient := vertexai.NewClient(os.Getenv("CORTEX_LLM_MODEL"))
cortex := cortex.NewCortex(stateManager, llmClient, publisher)

From In-Memory to Persistent State

Implement state.StateManager interface:

type RedisStateManager struct {
    client *redis.Client
}

func (r *RedisStateManager) Get(sessionID string) (*ConversationState, error) {
    data, err := r.client.Get(ctx, sessionID).Bytes()
    // Deserialize and return
}

func (r *RedisStateManager) Set(sessionID string, state *ConversationState) error {
    data, _ := json.Marshal(state)
    return r.client.Set(ctx, sessionID, data, 24*time.Hour).Err()
}

Update configuration:

redisClient := redis.NewClient(&redis.Options{Addr: "localhost:6379"})
stateManager := redis.NewRedisStateManager(redisClient)
cortex := cortex.NewCortex(stateManager, llmClient, publisher)

Resources

Cortex Architecture - Design explanation
Getting Started Tutorial - Hands-on guide
Source Code - Implementation
SPEC.md - Original specification

4 - Explanations

Understanding-oriented discussions that provide context and background

Explanations

This section provides in-depth discussions and explanations to help you understand the concepts, design decisions, and architecture behind AgentHub. These materials are designed to broaden your understanding beyond just how to use the system.

📚 Explanation Categories

Architecture - System design and architectural principles
Core Concepts - Fundamental concepts and mental models
Features - Deep dives into specific features and capabilities

🎯 How to Use These Explanations

These documents are designed to:

Provide context for why things work the way they do
Explain trade-offs and design decisions
Offer multiple perspectives on the same concepts
Help you make informed decisions about using AgentHub

📖 Reading Path

For Understanding Core Concepts

Core Concepts - Start with fundamental principles
Architecture - Understand system design
Features - Explore specific capabilities

For System Design

Architecture - System design patterns
Features - Performance and scaling considerations

For Implementation Details

Features - Technical implementation deep dives

💡 Discussion Topics

These explanations discuss:

Design philosophy and principles
Architectural decisions and their rationale
Performance considerations and trade-offs
Future directions and possibilities
Alternative approaches and their pros/cons

Note

Explanations focus on understanding rather than implementation. For practical guidance, see the tutorials and how-to guides.

4.1 - Architecture

Deep dive into AgentHub’s system architecture and design

Architecture Explanations

Understand the fundamental architecture and design principles behind AgentHub’s distributed agent system.

Available Documentation

Broker Architecture - Central broker design and communication patterns
Cortex Architecture - Asynchronous AI orchestration engine design
Hexagonal A2A Architecture - Clean architecture patterns for A2A agents

4.1.1 - A2A-Compliant EDA Broker Architecture

Deep dive into the internal architecture of the AgentHub EDA broker, how it implements Agent2Agent (A2A) protocol-compliant communication patterns while maintaining Event-Driven Architecture benefits.

AgentHub A2A-Compliant EDA Broker Architecture

This document explains the internal architecture of the AgentHub Event-Driven Architecture (EDA) broker, how it implements Agent2Agent (A2A) protocol-compliant communication patterns, and the design decisions behind its hybrid approach.

Architectural Overview

The AgentHub broker serves as a centralized Event-Driven Architecture hub that transports Agent2Agent (A2A) protocol-compliant messages between distributed agents. It combines the scalability benefits of EDA with the interoperability guarantees of the A2A protocol.

┌─────────────────────────────────────────────────────────────────┐
│                     AgentHub Broker                             │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐│
│  │   Task Router   │    │   Subscriber    │    │   Progress      ││
│  │                 │    │   Manager       │    │   Tracker       ││
│  │ • Route tasks   │    │                 │    │                 ││
│  │ • Apply filters │    │ • Manage agent  │    │ • Track task    ││
│  │ • Broadcast     │    │   subscriptions │    │   progress      ││
│  │ • Load balance  │    │ • Handle        │    │ • Update        ││
│  │                 │    │   disconnects   │    │   requesters    ││
│  └─────────────────┘    └─────────────────┘    └─────────────────┘│
├─────────────────────────────────────────────────────────────────┤
│                        gRPC Interface                           │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐│
│  │ PublishTask     │    │SubscribeToTasks│    │SubscribeToTask  ││
│  │ PublishResult   │    │SubscribeToRes  │    │ Progress        ││
│  │ PublishProgress │    │                 │    │                 ││
│  └─────────────────┘    └─────────────────┘    └─────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Event Bus Server

The main server implementation at broker/main.go:22 provides the central coordination point:

type eventBusServer struct {
    pb.UnimplementedEventBusServer

    // Subscription management
    taskSubscribers         map[string][]chan *pb.TaskMessage
    taskResultSubscribers   map[string][]chan *pb.TaskResult
    taskProgressSubscribers map[string][]chan *pb.TaskProgress
    taskMu                  sync.RWMutex
}

Key characteristics:

Thread-safe: Uses sync.RWMutex to protect concurrent access to subscriber maps
Channel-based: Uses Go channels for efficient message passing
Non-blocking: Implements timeouts to prevent blocking on slow consumers
Stateless: No persistent storage - all state is in-memory

2. Task Routing Engine

The routing logic determines how tasks are delivered to agents:

Direct Routing

When a task specifies a ResponderAgentId, it’s routed directly to that agent:

if responderID := req.GetTask().GetResponderAgentId(); responderID != "" {
    if subs, ok := s.taskSubscribers[responderID]; ok {
        targetChannels = subs
    }
}

Broadcast Routing

When no specific responder is set, tasks are broadcast to all subscribed agents:

} else {
    // Broadcast to all task subscribers
    for _, subs := range s.taskSubscribers {
        targetChannels = append(targetChannels, subs...)
    }
}

Routing Features

Immediate delivery: Tasks are routed immediately upon receipt
Multiple subscribers: Single agent can have multiple subscription channels
Timeout protection: 5-second timeout prevents blocking on unresponsive agents
Error isolation: Failed delivery to one agent doesn’t affect others

3. Subscription Management

The broker manages three types of subscriptions:

Task Subscriptions

Agents subscribe to receive tasks assigned to them:

func (s *eventBusServer) SubscribeToTasks(req *pb.SubscribeToTasksRequest, stream pb.EventBus_SubscribeToTasksServer) error

Agent-specific: Tasks are delivered based on agent ID
Type filtering: Optional filtering by task types
Long-lived streams: Connections persist until agent disconnects
Automatic cleanup: Subscriptions are removed when connections close

Result Subscriptions

Publishers subscribe to receive results of tasks they requested:

func (s *eventBusServer) SubscribeToTaskResults(req *pb.SubscribeToTaskResultsRequest, stream pb.EventBus_SubscribeToTaskResultsServer) error

Progress Subscriptions

Publishers can track progress of long-running tasks:

func (s *eventBusServer) SubscribeToTaskProgress(req *pb.SubscribeToTaskResultsRequest, stream pb.EventBus_SubscribeToTaskProgressServer) error

4. Message Flow Architecture

Task Publication Flow

Validation: Incoming tasks are validated for required fields
Routing: Tasks are routed to appropriate subscribers
Delivery: Messages are sent via Go channels with timeout protection
Response: Publisher receives acknowledgment of successful publication

Result Flow

Receipt: Agents publish task results back to the broker
Broadcasting: Results are broadcast to all result subscribers
Filtering: Subscribers receive results for their requested tasks
Delivery: Results are streamed back to requesting agents

Progress Flow

Updates: Executing agents send periodic progress updates
Distribution: Progress updates are sent to interested subscribers
Real-time delivery: Updates are streamed immediately upon receipt

Design Decisions and Trade-offs

In-Memory State Management

Decision: Store all subscription state in memory using Go maps and channels.

Benefits:

High performance: No database overhead for message routing
Low latency: Sub-millisecond message routing
Simplicity: Easier to develop, test, and maintain
Concurrent efficiency: Go’s garbage collector handles channel cleanup

Trade-offs:

No persistence: Broker restart loses all subscription state
Memory usage: Large numbers of agents increase memory requirements
Single point of failure: No built-in redundancy

When this works well:

Development and testing environments
Small to medium-scale deployments
Scenarios where agents can re-establish subscriptions on broker restart

Asynchronous Message Delivery

Decision: Use Go channels with timeout-based delivery.

Implementation:

go func(ch chan *pb.TaskMessage, task pb.TaskMessage) {
    select {
    case ch <- &task:
        // Message sent successfully
    case <-ctx.Done():
        log.Printf("Context cancelled while sending task %s", task.GetTaskId())
    case <-time.After(5 * time.Second):
        log.Printf("Timeout sending task %s. Dropping message.", task.GetTaskId())
    }
}(subChan, taskToSend)

Benefits:

Non-blocking: Slow agents don’t block the entire system
Fault tolerance: Timeouts prevent resource leaks
Scalability: Concurrent delivery to multiple agents
Resource protection: Prevents unbounded queue growth

Trade-offs:

Message loss: Timed-out messages are dropped
Complexity: Requires careful timeout tuning
No delivery guarantees: No acknowledgment of successful processing

gRPC Streaming for Subscriptions

Decision: Use bidirectional gRPC streams for agent subscriptions.

Benefits:

Real-time delivery: Messages are pushed immediately
Connection awareness: Broker knows when agents disconnect
Flow control: gRPC handles backpressure automatically
Type safety: Protocol Buffer messages ensure data consistency

Trade-offs:

Connection overhead: Each agent maintains persistent connections
Resource usage: Streams consume memory and file descriptors
Network sensitivity: Transient network issues can break connections

Concurrent Access Patterns

Decision: Use read-write mutexes with channel-based message passing.

Implementation:

s.taskMu.RLock()
// Read subscriber information
var targetChannels []chan *pb.TaskMessage
for _, subs := range s.taskSubscribers {
    targetChannels = append(targetChannels, subs...)
}
s.taskMu.RUnlock()

// Send messages without holding locks
for _, subChan := range targetChannels {
    go func(ch chan *pb.TaskMessage, task pb.TaskMessage) {
        // Async delivery
    }(subChan, taskToSend)
}

Benefits:

High concurrency: Multiple readers can access subscriptions simultaneously
Lock-free delivery: Message sending doesn’t hold locks
Deadlock prevention: Clear lock ordering and minimal critical sections
Performance: Read operations are optimized for the common case

Scalability Characteristics

Throughput

Task routing: ~10,000+ tasks/second on modern hardware
Concurrent connections: Limited by file descriptor limits (typically ~1,000s)
Memory usage: ~1KB per active subscription

Latency

Task routing: <1ms for local network delivery
End-to-end: <10ms for simple task processing cycles
Progress updates: Real-time streaming with minimal buffering

Resource Usage

CPU: Low CPU usage, primarily network I/O bound
Memory: Linear growth with number of active subscriptions
Network: Efficient binary Protocol Buffer encoding

Error Handling and Resilience

Connection Failures

Automatic cleanup: Subscriptions are removed when connections close
Graceful degradation: Failed agents don’t affect others
Reconnection support: Agents can re-establish subscriptions

Message Delivery Failures

Timeout handling: Messages that can’t be delivered are dropped
Logging: All failures are logged for debugging
Isolation: Per-agent timeouts prevent cascading failures

Resource Protection

Channel buffering: Limited buffer sizes prevent memory exhaustion
Timeout mechanisms: Prevent resource leaks from stuck operations
Graceful shutdown: Proper cleanup during server shutdown

Monitoring and Observability

Built-in Logging

The broker provides comprehensive logging:

Task routing decisions
Subscription lifecycle events
Error conditions and recovery
Performance metrics

Integration Points

Health checks: HTTP endpoints for monitoring
Metrics export: Prometheus/metrics integration points
Distributed tracing: Context propagation support

Future Enhancements

Persistence Layer

Database backend: Store subscription state for broker restarts
Message queuing: Durable task queues for reliability
Transaction support: Atomic message delivery guarantees

Clustering Support

Horizontal scaling: Multiple broker instances
Load balancing: Distribute agents across brokers
Consensus protocols: Consistent state across brokers

Advanced Routing

Capability-based routing: Route tasks based on agent capabilities
Load-aware routing: Consider agent load in routing decisions
Geographic routing: Route based on agent location

Security Enhancements

Authentication: Agent identity verification
Authorization: Task-level access controls
Encryption: TLS for all communications

The AgentHub broker architecture provides a solid foundation for Agent2Agent communication while maintaining simplicity and performance. Its design supports the immediate needs of most agent systems while providing clear paths for future enhancement as requirements evolve.

4.1.2 - Cortex Architecture

Understanding the Cortex asynchronous AI orchestration engine

Cortex Architecture

Cortex is an asynchronous, event-driven AI orchestration engine that serves as the “brain” of multi-agent systems. It manages conversations, coordinates tasks across specialized agents, and uses LLM-based decision-making to route work intelligently.

Overview

Traditional chatbots block on long-running operations. Cortex enables non-blocking conversations where users can interact while background tasks execute asynchronously.

Key Innovation

Traditional:  User → Request → [BLOCKED] → Response
Cortex:       User → Request → Immediate Ack → [Async Work] → Final Response

Users receive instant acknowledgments and can continue conversing while agents process tasks in the background.

Architecture Diagram

┌─────────────────┐      ┌────────────────┐      ┌─────────────┐
│   Chat CLI      │─────>│   Event Bus    │<─────│   Cortex    │
│  (User I/O)     │      │   (Broker)     │      │ Orchestrator│
└─────────────────┘      └────────────────┘      └─────────────┘
        ▲                        ▲                      │
        │ chat.response          │ task.result          │ task.request
        │                        │                      │
        │                  ┌─────────────┐              │
        └──────────────────│  Agent(s)   │◄─────────────┘
                           │ (Workers)   │
                           └─────────────┘

Core Components

1. Cortex Orchestrator

The central decision-making engine that:

Maintains conversation state - Full history per session
Registers agents dynamically - Discovers capabilities via Agent Cards
Decides actions via LLM - Uses AI to route work intelligently
Coordinates tasks - Tracks pending work and correlates results

File: agents/cortex/cortex.go

2. State Manager

Manages conversational state with thread-safe operations:

type ConversationState struct {
    SessionID        string
    Messages         []*pb.Message
    PendingTasks     map[string]*TaskContext
    RegisteredAgents map[string]*pb.AgentCard
}

Key Features:

Per-session locking (no global bottleneck)
Interface-based (swappable implementations)
Currently in-memory (POC), production uses Redis/PostgreSQL

Files: agents/cortex/state/

3. LLM Client

Abstraction for AI-powered decision-making:

type Client interface {
    Decide(
        ctx context.Context,
        conversationHistory []*pb.Message,
        availableAgents []*pb.AgentCard,
        newEvent *pb.Message,
    ) (*Decision, error)
}

The LLM analyzes:

Conversation history
Available agent capabilities
New incoming messages

And returns decisions:

chat.response - Reply to user
task.request - Dispatch work to agent

Files: agents/cortex/llm/

IntelligentDecider: Context-Aware Orchestration

The IntelligentDecider is a mock LLM implementation that demonstrates intelligent, intent-based task orchestration. Unlike simple dispatchers that route every message to agents, it analyzes user intent before deciding whether to orchestrate with specialized agents or respond directly.

Key Characteristics:

Intent Detection: Analyzes message content for keywords indicating specific needs
- Echo requests: “echo”, “repeat”, “say back”
- Future: “translate”, “summarize”, “transcribe”, etc.
Conditional Orchestration: Only dispatches to agents when user explicitly requests functionality
- User: “echo hello” → Dispatches to echo_agent
- User: “hello” → Responds directly (no agent needed)
Transparent Reasoning: Always explains decision-making process
- All decisions include detailed reasoning visible in observability traces
- Users understand why Cortex chose specific actions

Example Flow:

// User message: "echo hello world"
decision := IntelligentDecider()(ctx, history, agents, userMsg)

// Returns:
Decision{
    Reasoning: "User message 'echo hello world' contains an explicit echo request (detected keywords: echo/repeat/say back). I'm dispatching this to the echo_agent which specializes in repeating messages back.",
    Actions: [
        {
            Type: "chat.response",
            ResponseText: "I detected you want me to echo something. I'm asking the echo agent to handle this for you.",
        },
        {
            Type: "task.request",
            TaskType: "echo",
            TargetAgent: "agent_echo",
            TaskPayload: {"input": "echo hello world"},
        },
    ],
}

Comparison to Simple Dispatchers:

Approach	Every Message	Intent Detection	Explains Reasoning	Responds Directly
TaskDispatcherDecider (deprecated)	Dispatches to agent	No	Minimal	No
IntelligentDecider	Analyzes first	Yes	Detailed	Yes

Design Benefits:

Reduced Latency: Simple queries get immediate responses without agent roundtrip
Resource Efficiency: Agents only invoked when their specialized capabilities are needed
Better UX: Users understand what Cortex is doing and why
Debuggability: Reasoning in traces makes orchestration logic transparent
Extensibility: Easy to add new intent patterns for new agent types

Future Evolution:

In production, the IntelligentDecider pattern should be replaced with a real LLM that performs function calling:

// Production LLM receives tools/functions
tools := convertAgentCardsToTools(availableAgents)
decision := realLLM.Decide(history, tools, newMsg)

// LLM naturally decides:
// - "hello" → No function call, direct response
// - "echo hello" → Calls echo_agent function
// - "translate this to French" → Calls translation_agent function

The IntelligentDecider serves as a working example of the decision patterns a real LLM would follow.

4. Message Publisher

Interface for publishing messages to the Event Bus:

type MessagePublisher interface {
    PublishMessage(
        ctx context.Context,
        msg *pb.Message,
        routing *pb.AgentEventMetadata,
    ) error
}

Adapts AgentHub client to Cortex’s needs.

Message Flow

Simple Chat Request

1. User types "Hello" in CLI
   ↓
2. CLI publishes A2A Message (role=USER, context_id=session-1)
   ↓
3. Event Bus routes to Cortex
   ↓
4. Cortex retrieves conversation state
   ↓
5. Cortex calls LLM.Decide(history, agents, newMsg)
   ↓
6. LLM returns Decision: [chat.response: "Hello! How can I help?"]
   ↓
7. Cortex publishes A2A Message (role=AGENT, response text)
   ↓
8. Event Bus routes to CLI
   ↓
9. CLI displays response
   ↓
10. Cortex updates state with both messages

Asynchronous Task Execution

1. User: "Transcribe this audio file"
   ↓
2. Cortex LLM decides: [chat.response + task.request]
   ↓
3. Cortex publishes:
   - Message to user: "I'll start transcription, this may take a few minutes"
   - Task request to transcription agent
   ↓
4. User sees immediate acknowledgment ✅
   User can continue chatting!
   ↓
5. Transcription agent processes (background, may take minutes)
   ↓
6. Agent publishes task.result with transcribed text
   ↓
7. Cortex receives result, calls LLM.Decide()
   ↓
8. LLM decides: [chat.response: "Transcription complete: <text>"]
   ↓
9. Cortex publishes final response to user
   ↓
10. User sees final result

Design Patterns

1. Interface Segregation

All major components are interfaces:

StateManager - Easy to swap (in-memory → Redis)
LLM Client - Easy to test (mock → real AI)
MessagePublisher - Decoupled from transport

Benefits:

Testability (use mocks)
Flexibility (swap implementations)
Clear contracts

2. Session-Level Concurrency

Each session has its own lock:

// NOT this (global bottleneck):
globalMutex.Lock()
updateState()
globalMutex.Unlock()

// But this (per-session):
sessionLock := getSessionLock(sessionID)
sessionLock.Lock()
updateState()
sessionLock.Unlock()

Benefits:

Multiple sessions can update concurrently
No contention across sessions
Scales horizontally

3. LLM as Control Plane

Instead of hard-coded if/else routing:

// Old way:
if strings.Contains(input, "transcribe") {
    dispatchToTranscriber()
} else if strings.Contains(input, "translate") {
    dispatchToTranslator()
}

// Cortex way:
decision := llm.Decide(history, agents, input)
executeActions(decision.Actions)

Benefits:

Flexible - LLM adapts to context
Extensible - Add agents, LLM discovers them
Natural - Mimics human reasoning

Implementation: The IntelligentDecider (see LLM Client section above) demonstrates this pattern by analyzing user intent and making intelligent routing decisions with transparent reasoning.

4. Message Self-Containment

Every message is fully self-describing:

message Message {
    string message_id = 1;   // Unique ID
    string context_id = 2;   // Session/conversation ID
    string task_id = 3;      // Task correlation (if applicable)
    Role role = 4;           // USER or AGENT
    repeated Part content = 5;
    Struct metadata = 6;
}

Benefits:

Agents are stateless (all context in message)
Easy correlation (context_id, task_id)
Traceable (message_id)

State Management

ConversationState Structure

type ConversationState struct {
    SessionID        string
    Messages         []*pb.Message      // Full history
    PendingTasks     map[string]*TaskContext
    RegisteredAgents map[string]*pb.AgentCard
}

TaskContext Tracking

type TaskContext struct {
    TaskID        string
    TaskType      string
    RequestedAt   int64
    OriginalInput *pb.Message
    UserNotified  bool  // Did we acknowledge?
}

Cortex tracks:

Which tasks are in-flight
What the user originally requested
Whether we’ve acknowledged the request

State Lifecycle

1. Get session state (or create new)
2. Lock session for updates
3. Add new message to history
4. Call LLM to decide actions
5. Execute actions (publish messages)
6. Update pending tasks
7. Save state
8. Release lock

Agent Discovery

Agent Card Registration

Agents publish capabilities on startup:

type AgentCard struct {
    Name        string
    Description string
    Skills      []*AgentSkill
}

Cortex maintains a registry:

registeredAgents map[string]*pb.AgentCard

Dynamic Tool List

When making LLM decisions, Cortex provides available agents:

decision := llm.Decide(
    ctx,
    conversationHistory,
    cortex.GetAvailableAgents(),  // ← Dynamic list
    newEvent,
)

The LLM sees which tools are available and chooses appropriately.

Scaling & Performance

Concurrency Model

Lock Granularity: Per-session (not global)
State Access: O(1) lookups via map
Message Processing: Asynchronous (non-blocking)

Horizontal Scaling

Future: Partition sessions across multiple Cortex instances:

Cortex-1: handles sessions A-M
Cortex-2: handles sessions N-Z

Event Bus routes messages to correct instance based on context_id.

Performance Characteristics

State Get: O(1) with read lock
State Set: O(1) with write lock
Concurrent Sessions: No contention (per-session locks)

Tested: 100 goroutines updating same session → zero lost updates ✅

Error Handling

Agent Failures

When an agent fails:

Agent publishes task.result with status=“failed”
Cortex receives result
LLM decides how to handle (inform user, retry, try alternative)
Cortex publishes response

LLM Failures

If LLM client errors:

decision, err := llm.Decide(...)
if err != nil {
    // Fallback: publish generic error response
    publishErrorResponse(ctx, session, err)
    return err
}

State Corruption

Protected by:

Transaction-like WithLock pattern
Copy-on-read to prevent external mutations
Validation on state load/save

Implementation Status

✅ Implemented (POC)

Core orchestrator logic
In-memory state management
Mock LLM client with IntelligentDecider (intent-based routing)
Agent registration
Message routing
Task correlation
CLI interface
Echo agent (demo)
Distributed tracing with OpenTelemetry

🚧 Future Work

Persistent state (Redis, PostgreSQL)
Real LLM integration (Vertex AI, OpenAI)
Agent health monitoring
Web UI with WebSockets
Retry logic & timeouts
Advanced error recovery

Code Organization

agents/cortex/
├── cortex.go              # Core orchestrator with full observability
├── cortex_test.go         # Core tests (4 tests)
├── state/
│   ├── interface.go       # StateManager interface
│   ├── memory.go          # In-memory implementation
│   └── memory_test.go     # State tests (5 tests)
├── llm/
│   ├── interface.go       # LLM Client interface
│   ├── mock.go            # Mock implementations
│   │                      # - IntelligentDecider (intent-based)
│   │                      # - TaskDispatcherDecider (deprecated)
│   │                      # - SimpleEchoDecider
│   └── mock_test.go       # LLM tests (4 tests)
└── cmd/
    └── main.go            # Service entry point

Total: ~1,200 lines of production code + 500 lines of tests

Testing Strategy

Unit Tests

State Manager: CRUD, concurrency, locking (5 tests)
LLM Client: Mock behavior, decision functions (4 tests)
Cortex Core: Registration, chat, tasks (4 tests)

All tests use interfaces and mocks (no external dependencies).

Concurrency Testing

func TestInMemoryStateManager_Concurrency(t *testing.T) {
    // Launch 100 goroutines updating same session
    for i := 0; i < 100; i++ {
        go func() {
            sm.WithLock(sessionID, func(state *ConversationState) error {
                state.Messages = append(state.Messages, msg)
                return nil
            })
        }()
    }

    // Assert: Exactly 100 messages (no lost updates)
}

Integration Testing

Demo script (demo_cortex.sh) tests:

Broker startup
Cortex initialization
Agent registration
End-to-end message flow

Configuration

Environment Variables

# LLM Model (future)
CORTEX_LLM_MODEL=vertex-ai://gemini-2.0-flash

# AgentHub connection
AGENTHUB_GRPC_PORT=127.0.0.1:50051
AGENTHUB_BROKER_ADDR=127.0.0.1

# Health check
CORTEX_HEALTH_PORT=8086

Programmatic Configuration

cortex := cortex.NewCortex(
    state.NewInMemoryStateManager(),  // or Redis/Postgres
    llm.NewVertexAIClient(model),     // or Mock for testing
    messagePublisher,
)

Observability

Logging

Structured logging with context:

client.Logger.InfoContext(ctx, "Cortex received message",
    "message_id", message.GetMessageId(),
    "context_id", message.GetContextId(),
    "role", message.GetRole().String(),
)

Tracing

OpenTelemetry spans already in AgentHub client:

Trace ID propagation
Span relationships (parent → child)
Error recording

Metrics (Future)

Messages processed per session
LLM decision latency
Task completion rates
Error rates by type

Security Considerations

Current (POC)

No authentication (all agents trusted)
No authorization (all agents can do anything)
No message validation (trusts well-formed protobufs)

Future

Agent authentication via mTLS
Message signing & verification
Rate limiting per agent
Input sanitization for LLM prompts

Best Practices

For Cortex Operators

Monitor state size - Large conversation histories impact memory
Configure LLM timeouts - Prevent hanging on slow AI responses
Use persistent state - In-memory is POC only
Enable tracing - Essential for debugging async flows

For Agent Developers

Publish clear Agent Cards - Cortex needs good descriptions
Handle errors gracefully - Publish failed task results, don’t crash
Use correlation IDs - Essential for Cortex to track work
Be stateless - All context should be in messages

Comparison to Alternatives

Approach	Blocking	State Management	Extensibility
Traditional Chatbot	Yes ✗	Simple	Hard-coded
Function Calling	Yes ✗	Per-request	Config files
Cortex	No ✓	Persistent	Dynamic

Cortex enables truly asynchronous, extensible AI systems.

Resources

SPEC.md - Original specification
Implementation Summary - Build details
README - Usage guide
Source Code - Full implementation

Next Steps

Read the Cortex Tutorial to build your first orchestrator
See How to Create Agents for agent development
Check Cortex API Reference for detailed interface documentation

4.1.3 - Hexagonal Architecture & A2A Protocol Implementation

Understanding AgentHub’s hexagonal architecture with A2A protocol, gRPC communication, and event-driven design

Hexagonal Architecture & A2A Protocol Implementation

This document explains how AgentHub implements hexagonal architecture principles with the Agent2Agent (A2A) protocol, gRPC communication, and event-driven design patterns.

Overview

AgentHub follows hexagonal architecture (also known as Ports and Adapters) to achieve:

Domain isolation: Core A2A protocol logic separated from infrastructure
Testability: Clean interfaces enable comprehensive testing
Flexibility: Multiple adapters for different communication protocols
Maintainability: Clear separation of concerns and dependencies

System Architecture

graph TB
    subgraph "AgentHub Ecosystem"
        subgraph "External Agents"
            A["Agent A<br/>(Chat REPL)"]
            B["Agent B<br/>(Chat Responder)"]
            C["Agent C<br/>(Custom Agent)"]
        end

        subgraph "AgentHub Broker"
            subgraph "Adapters (Infrastructure)"
                GRPC["gRPC Server<br/>Adapter"]
                HEALTH["Health Check<br/>Adapter"]
                METRICS["Metrics<br/>Adapter"]
                TRACING["Tracing Adapter<br/>(OTLP/Jaeger)"]
            end

            subgraph "Ports (Interfaces)"
                SP["AgentHub<br/>Service Port"]
                PP["Message<br/>Publisher Port"]
                EP["Event<br/>Subscriber Port"]
                OP["Observability<br/>Port"]
            end

            subgraph "Domain (Core Logic)"
                A2A["A2A Protocol<br/>Engine"]
                ROUTER["Event Router<br/>& Broker"]
                VALIDATOR["Message<br/>Validator"]
                CONTEXT["Context<br/>Manager"]
                TASK["Task<br/>Lifecycle"]
            end
        end

        subgraph "External Systems"
            OTLP["OTLP Collector<br/>& Jaeger"]
            STORE["Event Store<br/>(Memory)"]
        end
    end

    %% External agent connections
    A -->|"gRPC calls<br/>(PublishMessage,<br/>SubscribeToMessages)"| GRPC
    B -->|"gRPC calls"| GRPC
    C -->|"gRPC calls"| GRPC

    %% Adapter to Port connections
    GRPC -->|"implements"| SP
    HEALTH -->|"implements"| OP
    METRICS -->|"implements"| OP
    TRACING -->|"implements"| OP

    %% Port to Domain connections
    SP -->|"delegates to"| A2A
    PP -->|"delegates to"| ROUTER
    EP -->|"delegates to"| ROUTER
    OP -->|"observes"| A2A

    %% Domain internal connections
    A2A -->|"uses"| VALIDATOR
    A2A -->|"uses"| CONTEXT
    A2A -->|"uses"| TASK
    ROUTER -->|"persists events"| STORE
    TRACING -->|"exports traces"| OTLP

    %% Styling
    classDef agents fill:#add8e6
    classDef adapters fill:#ffa500
    classDef ports fill:#e0ffff
    classDef domain fill:#ffb6c1
    classDef external fill:#dda0dd

    class A,B,C agents
    class GRPC,HEALTH,METRICS,TRACING adapters
    class SP,PP,EP,OP ports
    class A2A,ROUTER,VALIDATOR,CONTEXT,TASK domain
    class OTLP,STORE external

Architecture Notes:

Domain Core: Pure A2A protocol logic with message validation, event routing, context correlation, and task state management
Ports: Clean, technology-agnostic interfaces providing testable contracts and dependency inversion
Adapters: Infrastructure concerns including gRPC communication, observability exports, and protocol adaptations

A2A Message Flow

sequenceDiagram
    participant REPL as Chat REPL<br/>Agent
    participant gRPC as gRPC<br/>Adapter
    participant A2A as A2A Protocol<br/>Engine
    participant Router as Event<br/>Router
    participant Responder as Chat Responder<br/>Agent

    rect rgb(240, 248, 255)
        Note over REPL, Router: A2A Message Publishing
        REPL->>+gRPC: PublishMessage(A2AMessage)
        gRPC->>+A2A: validateA2AMessage()
        A2A->>A2A: check MessageId, Role, Content
        A2A-->>-gRPC: validation result
        gRPC->>+Router: routeA2AEvent(messageEvent)
        Router->>Router: identify subscribers<br/>by agent_id/broadcast
        Router->>Router: create tracing span<br/>with A2A attributes
        Router-->>Responder: deliver message event
        Router-->>-gRPC: routing success
        gRPC-->>-REPL: PublishResponse(event_id)
    end

    rect rgb(255, 248, 240)
        Note over Responder, Router: A2A Message Processing
        Responder->>+gRPC: SubscribeToMessages(agent_id)
        gRPC->>Router: register subscriber
        Router-->>gRPC: subscription stream
        gRPC-->>-Responder: message stream
        Note over Responder: Process A2A message<br/>with tracing spans
        Responder->>+gRPC: PublishMessage(A2AResponse)
        gRPC->>A2A: validateA2AMessage()
        A2A->>A2A: check AGENT role,<br/>ContextId correlation
        gRPC->>Router: routeA2AEvent(responseEvent)
        Router-->>REPL: deliver response event
        gRPC-->>-Responder: PublishResponse
    end

    Note over REPL, Responder: A2A Protocol ensures:<br/>• Message structure compliance<br/>• Role semantics (USER/AGENT)<br/>• Context correlation<br/>• Event-driven routing

Core Components

1. A2A Protocol Engine (Domain Core)

The heart of the system implementing A2A protocol specifications:

// Core domain logic - technology agnostic
type A2AProtocolEngine struct {
    messageValidator MessageValidator
    contextManager   ContextManager
    taskLifecycle    TaskLifecycle
}

// A2A message validation
func (e *A2AProtocolEngine) ValidateMessage(msg *Message) error {
    // A2A compliance checks
    if msg.MessageId == "" { return ErrMissingMessageId }
    if msg.Role == ROLE_UNSPECIFIED { return ErrInvalidRole }
    if len(msg.Content) == 0 { return ErrEmptyContent }
    return nil
}

2. Event Router (Domain Core)

Manages event-driven communication between agents:

type EventRouter struct {
    messageSubscribers map[string][]chan *AgentEvent
    taskSubscribers    map[string][]chan *AgentEvent
    eventSubscribers   map[string][]chan *AgentEvent
}

func (r *EventRouter) RouteEvent(event *AgentEvent) error {
    // Route based on A2A metadata
    routing := event.GetRouting()
    subscribers := r.getSubscribers(routing.ToAgentId, event.PayloadType)

    // Deliver with tracing
    for _, sub := range subscribers {
        go r.deliverWithTracing(sub, event)
    }
}

3. gRPC Adapter (Infrastructure)

Translates between gRPC and domain logic:

type GrpcAdapter struct {
    a2aEngine    A2AProtocolEngine
    eventRouter  EventRouter
    tracer       TracingAdapter
}

func (a *GrpcAdapter) PublishMessage(ctx context.Context, req *PublishMessageRequest) (*PublishResponse, error) {
    // Start tracing span
    ctx, span := a.tracer.StartA2AMessageSpan(ctx, "publish_message", req.Message.MessageId, req.Message.Role)
    defer span.End()

    // Validate using domain logic
    if err := a.a2aEngine.ValidateMessage(req.Message); err != nil {
        a.tracer.RecordError(span, err)
        return nil, err
    }

    // Route using domain logic
    event := a.createA2AEvent(req)
    if err := a.eventRouter.RouteEvent(event); err != nil {
        return nil, err
    }

    return &PublishResponse{Success: true, EventId: event.EventId}, nil
}

Hexagonal Architecture Benefits

1. Domain Isolation

A2A protocol logic is pure, testable business logic
No infrastructure dependencies in the core domain
Technology-agnostic implementation

2. Adapter Pattern

gRPC Adapter: Handles Protocol Buffer serialization/deserialization
Tracing Adapter: OTLP/Jaeger integration without domain coupling
Health Adapter: Service health monitoring
Metrics Adapter: Prometheus metrics collection

3. Port Interfaces

// Clean, testable interfaces
type MessagePublisher interface {
    PublishMessage(ctx context.Context, msg *Message) (*PublishResponse, error)
}

type EventSubscriber interface {
    SubscribeToMessages(ctx context.Context, agentId string) (MessageStream, error)
}

type ObservabilityPort interface {
    StartSpan(ctx context.Context, operation string) (context.Context, Span)
    RecordMetric(name string, value float64, labels map[string]string)
}

4. Dependency Inversion

Domain depends on abstractions (ports), not concrete implementations
Adapters depend on domain through well-defined interfaces
Easy testing with mock implementations

A2A Protocol Integration

Message Structure Compliance

classDiagram
    class A2AMessage {
        +string MessageId
        +string ContextId
        +Role Role
        +Part Content
        +Metadata Metadata
        +string TaskId
    }

    class Part {
        +string Text
        +bytes Data
        +FileData File
    }

    class EventMetadata {
        +string FromAgentId
        +string ToAgentId
        +string EventType
        +Priority Priority
    }

    class Role {
        <<enumeration>>
        USER
        AGENT
    }

    class Metadata {
        +Fields map
    }

    A2AMessage "1" --> "0..*" Part : contains
    A2AMessage "1" --> "1" EventMetadata : routed_with
    A2AMessage "1" --> "1" Role : has
    A2AMessage "1" --> "0..1" Metadata : includes

Event-Driven Architecture

The system implements pure event-driven architecture:

Publishers emit A2A-compliant events
Broker routes events based on metadata
Subscribers receive relevant events
Correlation through ContextId maintains conversation flow

Observability Integration

Distributed Tracing

sequenceDiagram
    participant A as Agent A
    participant B as Broker
    participant AB as Agent B
    participant OTLP as OTLP Collector
    participant J as Jaeger

    A->>+B: PublishMessage<br/>[trace_id: 123]
    B->>B: Create A2A spans<br/>with structured attributes
    B->>+AB: RouteEvent<br/>[trace_id: 123]
    AB->>AB: Process with<br/>child spans
    AB->>-B: PublishResponse<br/>[trace_id: 123]
    B->>-A: Success<br/>[trace_id: 123]

    par Observability Export
        B->>OTLP: Export spans<br/>with A2A attributes
        OTLP->>J: Store traces
        J->>J: Build trace timeline<br/>with correlation
    end

    Note over A, J: End-to-end tracing<br/>with A2A protocol visibility

Structured Attributes

Each span includes A2A-specific attributes:

a2a.message.id
a2a.message.role
a2a.context.id
a2a.event.type
a2a.routing.from_agent
a2a.routing.to_agent

Testing Strategy

Unit Testing (Domain Core)

func TestA2AEngine_ValidateMessage(t *testing.T) {
    engine := NewA2AProtocolEngine()

    // Test A2A compliance
    msg := &Message{
        MessageId: "test_msg_123",
        Role: ROLE_USER,
        Content: []*Part{{Text: "hello"}},
    }

    err := engine.ValidateMessage(msg)
    assert.NoError(t, err)
}

Integration Testing (Adapters)

func TestGrpcAdapter_PublishMessage(t *testing.T) {
    // Mock domain dependencies
    mockEngine := &MockA2AEngine{}
    mockRouter := &MockEventRouter{}

    adapter := NewGrpcAdapter(mockEngine, mockRouter)

    // Test adapter behavior
    resp, err := adapter.PublishMessage(ctx, validRequest)
    assert.NoError(t, err)
    assert.True(t, resp.Success)
}

Conclusion

AgentHub’s hexagonal architecture with A2A protocol provides:

Clean Architecture: Separation of concerns with domain-driven design
A2A Compliance: Full protocol implementation with validation
Event-Driven Design: Scalable, loosely-coupled communication
Rich Observability: Comprehensive tracing and metrics
Testability: Clean interfaces enable thorough testing
Flexibility: Easy to extend with new adapters and protocols

This architecture ensures maintainable, scalable, and observable agent communication while maintaining strict A2A protocol compliance.

4.2 - Core Concepts

Fundamental concepts and principles of AgentHub

Core Concepts

Explore the fundamental concepts, principles, and mental models that underpin AgentHub’s agent-to-agent communication system.

Available Documentation

The Agent-to-Agent Principle - Core philosophy of direct agent communication
The Tasks - Understanding task messages and data structures

4.2.1 - Agent2Agent (A2A) Protocol Migration

Understanding the migration to Agent2Agent protocol compliance while maintaining Event-Driven Architecture benefits.

Agent2Agent (A2A) Protocol Migration

This document explains the migration of AgentHub to full Agent2Agent (A2A) protocol compliance while maintaining the essential Event-Driven Architecture (EDA) patterns that make the system scalable and resilient.

What is the Agent2Agent Protocol?

The Agent2Agent (A2A) protocol is a standardized specification for communication between AI agents. It defines:

Standardized Message Formats: Using Message, Part, Task, and Artifact structures
Task Lifecycle Management: Clear states (SUBMITTED, WORKING, COMPLETED, FAILED, CANCELLED)
Agent Discovery: Using AgentCard for capability advertisement
Interoperability: Ensuring agents can communicate across different platforms

Why Migrate to A2A?

Benefits of A2A Compliance

Interoperability: AgentHub can now communicate with any A2A-compliant agent or system
Standardization: Clear, well-defined message formats reduce integration complexity
Ecosystem Compatibility: Join the growing ecosystem of A2A-compatible tools
Future-Proofing: Built on industry standards rather than custom protocols

Maintained EDA Benefits

Scalability: Event-driven routing scales to thousands of agents
Resilience: Asynchronous communication handles network partitions gracefully
Flexibility: Topic-based routing and priority queues enable sophisticated workflows
Observability: Built-in tracing and metrics for production deployments

Hybrid Architecture

AgentHub implements a hybrid approach that combines the best of both worlds:

┌─────────────────────────────────────────────────────────────────┐
│                   A2A Protocol Layer                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐│
│  │ A2A Message │  │  A2A Task   │  │ A2A Artifact│  │A2A Agent││
│  │  (standard) │  │ (standard)  │  │ (standard)  │  │  Card   ││
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘│
├─────────────────────────────────────────────────────────────────┤
│                    EDA Transport Layer                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐│
│  │ AgentEvent  │  │Event Router │  │ Subscribers │  │Priority ││
│  │  Wrapper    │  │             │  │  Manager    │  │ Queues  ││
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘│
├─────────────────────────────────────────────────────────────────┤
│                      gRPC Infrastructure                       │
└─────────────────────────────────────────────────────────────────┘

How It Works

A2A Messages are created using standard A2A structures (Message, Task, etc.)
EDA Wrapper wraps A2A messages in AgentEvent for transport
Event Routing uses EDA patterns (pub/sub, priority, topics) for delivery
A2A Compliance ensures messages follow A2A protocol semantics

API Changes

Before (Legacy API)

// Legacy TaskMessage (deprecated)
taskPublisher.PublishTask(ctx, &agenthub.PublishTaskRequest{
    TaskType: "greeting",
    Parameters: map[string]interface{}{
        "name": "Claude",
    },
    RequesterAgentID: "my_agent",
    ResponderAgentID: "target_agent",
})

After (A2A-Compliant API)

// A2A-compliant task publishing
content := []*pb.Part{
    {
        Part: &pb.Part_Text{
            Text: "Hello! Please provide a greeting for Claude.",
        },
    },
}

task, err := taskPublisher.PublishTask(ctx, &agenthub.A2APublishTaskRequest{
    TaskType:         "greeting",
    Content:          content,
    RequesterAgentID: "my_agent",
    ResponderAgentID: "target_agent",
    Priority:         pb.Priority_PRIORITY_MEDIUM,
    ContextID:        "conversation_123",
})

Message Structure Changes

A2A Message Format

message Message {
  string message_id = 1;       // Unique message identifier
  string context_id = 2;       // Conversation context
  string task_id = 3;          // Associated task (optional)
  Role role = 4;               // USER or AGENT
  repeated Part content = 5;   // Message content parts
  google.protobuf.Struct metadata = 6; // Additional metadata
}

message Part {
  oneof part {
    string text = 1;           // Text content
    DataPart data = 2;         // Structured data
    FilePart file = 3;         // File reference
  }
}

A2A Task Format

message Task {
  string id = 1;                    // Task identifier
  string context_id = 2;            // Conversation context
  TaskStatus status = 3;            // Current status
  repeated Message history = 4;     // Message history
  repeated Artifact artifacts = 5;  // Task outputs
  google.protobuf.Struct metadata = 6; // Task metadata
}

enum TaskState {
  TASK_STATE_SUBMITTED = 0;    // Task created
  TASK_STATE_WORKING = 1;      // Task in progress
  TASK_STATE_COMPLETED = 2;    // Task completed successfully
  TASK_STATE_FAILED = 3;       // Task failed
  TASK_STATE_CANCELLED = 4;    // Task cancelled
}

Migration Guide

For Publishers

Replace TaskPublisher with A2ATaskPublisher
Use A2APublishTaskRequest with A2A Part structures
Handle returned A2A Task objects

For Subscribers

Replace TaskSubscriber with A2ATaskSubscriber
Update handlers to process A2A Task and Message objects
Return A2A Artifact objects instead of custom results

For Custom Integrations

Update protobuf imports to use events/a2a package
Replace custom message structures with A2A equivalents
Use AgentHub service instead of EventBus

Backward Compatibility

The migration maintains wire-level compatibility through:

Deprecated Types: Legacy message types marked as deprecated but still supported
Automatic Conversion: EDA broker converts between legacy and A2A formats when needed
Graceful Migration: Existing agents can migrate incrementally

Testing A2A Compliance

Run the demo to verify A2A compliance:

# Terminal 1: Start A2A broker
make run-server

# Terminal 2: Start A2A subscriber
make run-subscriber

# Terminal 3: Start A2A publisher
make run-publisher

Expected output shows successful A2A task processing:

Publisher: “Published A2A task”
Subscriber: “Task processing completed”
Artifacts generated in A2A format

Best Practices

Use A2A Types: Always use A2A message structures for new code
Context Management: Use context_id to group related messages
Proper Parts: Structure content using appropriate Part types
Artifact Returns: Return structured Artifact objects from tasks
Status Updates: Properly manage task lifecycle states

The A2A migration ensures AgentHub remains both standards-compliant and highly scalable through its hybrid EDA+A2A architecture.

4.2.2 - Understanding Tasks in Agent2Agent Communication

Tasks are the fundamental unit of work exchange in the Agent2Agent protocol. Deep dive into task semantics, lifecycle, and design patterns.

Understanding Tasks in Agent2Agent Communication

Tasks are the fundamental unit of work exchange in the Agent2Agent protocol. This document provides a deep dive into task semantics, lifecycle, and design patterns.

Task Anatomy

Core Components

Every task in the Agent2Agent system consists of several key components that define its identity, purpose, and execution context:

A2A Task Identity

string id = 1;                         // Unique task identifier
string context_id = 2;                 // Optional conversation context

The id serves as a unique identifier that allows all participants to track the task throughout its lifecycle. It should be globally unique and meaningful for debugging purposes.

The context_id groups related tasks in a conversation or workflow context, enabling sophisticated multi-task coordination patterns.

Task classification in A2A is handled through the initial Message content rather than a separate task_type field, providing more flexibility for complex task descriptions.

A2A Task Status and History

TaskStatus status = 3;                 // Current task status
repeated Message history = 4;          // Message history for this task
repeated Artifact artifacts = 5;       // Task output artifacts
google.protobuf.Struct metadata = 6;   // Task metadata

In A2A, task data is contained within Message content using the structured Part format:

// A2A task request message
message Message {
  string message_id = 1;
  string context_id = 2;
  string task_id = 3;
  Role role = 4;                    // USER (requester) or AGENT (responder)
  repeated Part content = 5;        // Structured task content
}

message Part {
  oneof part {
    string text = 1;               // Text description
    DataPart data = 2;             // Structured data
    FilePart file = 3;             // File references
  }
}

// Example: A2A data analysis task
taskMessage := &a2a.Message{
    MessageId: "msg_" + uuid.New().String(),
    ContextId: "analysis_workflow_123",
    TaskId:    "task_analysis_456",
    Role:      a2a.Role_USER,
    Content: []*a2a.Part{
        {
            Part: &a2a.Part_Text{
                Text: "Please perform trend analysis on Q4 sales data",
            },
        },
        {
            Part: &a2a.Part_Data{
                Data: &a2a.DataPart{
                    Data: analysisParams, // Structured parameters
                    Description: "Analysis configuration",
                },
            },
        },
    },
}

Metadata in A2A tasks provides additional context for execution, auditing, or debugging:

// A2A task metadata
taskMetadata, _ := structpb.NewStruct(map[string]interface{}{
    "workflow_id":     "workflow_abc123",
    "user_id":         "user_456",
    "request_source":  "web_ui",
    "correlation_id":  "trace_789",
    "priority":        "high",
    "expected_duration": "5m",
})

task := &a2a.Task{
    Id:        "task_analysis_456",
    ContextId: "analysis_workflow_123",
    Metadata:  taskMetadata,
}

A2A Agent Coordination

In A2A, agent coordination is handled through the EDA routing metadata:

message AgentEventMetadata {
  string from_agent_id = 1;           // Source agent identifier
  string to_agent_id = 2;             // Target agent ID (empty = broadcast)
  string event_type = 3;              // Event classification
  repeated string subscriptions = 4;   // Topic-based routing tags
  Priority priority = 5;              // Delivery priority
}

This enables flexible routing patterns:

from_agent_id identifies the requesting agent
to_agent_id can specify a target agent or be empty for broadcast
subscriptions enable topic-based routing for specialized agents
priority ensures urgent tasks get precedence

A2A Execution Context

A2A handles execution context through the TaskStatus structure:

message TaskStatus {
  TaskState state = 1;                   // SUBMITTED, WORKING, COMPLETED, FAILED, CANCELLED
  Message update = 2;                    // Latest status message
  google.protobuf.Timestamp timestamp = 3; // Status timestamp
}

enum TaskState {
  TASK_STATE_SUBMITTED = 0;
  TASK_STATE_WORKING = 1;
  TASK_STATE_COMPLETED = 2;
  TASK_STATE_FAILED = 3;
  TASK_STATE_CANCELLED = 4;
}

This context helps agents make intelligent scheduling decisions:

deadline enables time-sensitive prioritization
priority provides explicit urgency ranking
created_at enables age-based scheduling policies

Task Lifecycle

1. A2A Task Creation and Publishing

A2A tasks begin their lifecycle when a requesting agent creates a task with an initial message:

// Create A2A task with initial request message
task := &a2a.Task{
    Id:        "task_analysis_" + uuid.New().String(),
    ContextId: "workflow_orchestration_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            TaskId:    "task_analysis_" + uuid.New().String(),
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please analyze the quarterly sales data for trends",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data: analysisParams,
                            Description: "Analysis configuration",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}

// Publish to AgentHub broker
client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "data_orchestrator",
        ToAgentId:   "data_processor_01", // Optional: specific agent
        EventType:   "task.submitted",
        Priority:    pb.Priority_PRIORITY_HIGH,
    },
})

2. A2A Task Discovery and Acceptance

Agents subscribe to A2A task events and evaluate whether to accept them:

// Agent receives A2A task event
func (a *Agent) evaluateA2ATask(event *pb.AgentEvent) bool {
    task := event.GetTask()
    if task == nil || task.Status.State != a2a.TaskState_TASK_STATE_SUBMITTED {
        return false
    }

    // Analyze task content to understand requirements
    requestMessage := task.Status.Update
    taskDescription := a.extractTaskDescription(requestMessage)

    // Check if agent can handle this task type
    if !a.canHandleTaskType(taskDescription) {
        return false
    }

    // Check capacity constraints
    if a.getCurrentLoad() > a.maxCapacity {
        return false
    }

    // Estimate duration from task content and metadata
    estimatedDuration := a.estimateA2ATaskDuration(task)
    if estimatedDuration > a.maxTaskDuration {
        return false
    }

    return true
}

func (a *Agent) extractTaskDescription(msg *a2a.Message) string {
    for _, part := range msg.Content {
        if textPart := part.GetText(); textPart != "" {
            return textPart
        }
    }
    return ""
}

3. A2A Task Execution with Progress Reporting

Accepted A2A tasks enter the execution phase with regular status updates:

func (a *Agent) executeA2ATask(task *a2a.Task) {
    // Update task to WORKING state
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Task started")

    // Phase 1: Preparation
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Preparing data analysis")
    prepareResult := a.prepareA2AExecution(task)

    // Phase 2: Main processing
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Processing data - 50% complete")
    processResult := a.processA2AData(prepareResult)

    // Phase 3: Finalization
    a.updateTaskStatus(task, a2a.TaskState_TASK_STATE_WORKING, "Finalizing results - 75% complete")
    finalResult := a.finalizeA2AResults(processResult)

    // Completion with artifacts
    a.completeTaskWithArtifacts(task, finalResult)
}

func (a *Agent) updateTaskStatus(task *a2a.Task, state a2a.TaskState, message string) {
    statusUpdate := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: message,
                },
            },
        },
    }

    task.Status = &a2a.TaskStatus{
        State:     state,
        Update:    statusUpdate,
        Timestamp: timestamppb.Now(),
    }

    // Publish task update
    a.client.PublishTaskUpdate(context.Background(), &pb.PublishTaskUpdateRequest{
        Task: task,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.status_update",
        },
    })
}

4. A2A Result Delivery

A2A task completion delivers results through structured artifacts:

func (a *Agent) completeTaskWithArtifacts(task *a2a.Task, resultData interface{}) {
    // Create completion message
    completionMessage := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Analysis completed successfully",
                },
            },
        },
    }

    // Create result artifact
    resultArtifact := &a2a.Artifact{
        ArtifactId:  "artifact_" + uuid.New().String(),
        Name:        "Analysis Results",
        Description: "Quarterly sales trend analysis",
        Parts: []*a2a.Part{
            {
                Part: &a2a.Part_Data{
                    Data: &a2a.DataPart{
                        Data:        resultData.(structpb.Struct),
                        Description: "Analysis results and metrics",
                    },
                },
            },
        },
    }

    // Update task to completed
    task.Status = &a2a.TaskStatus{
        State:     a2a.TaskState_TASK_STATE_COMPLETED,
        Update:    completionMessage,
        Timestamp: timestamppb.Now(),
    }
    task.Artifacts = append(task.Artifacts, resultArtifact)

    // Publish final task update
    a.client.PublishTaskUpdate(context.Background(), &pb.PublishTaskUpdateRequest{
        Task: task,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.completed",
        },
    })

    // Publish artifact separately
    a.client.PublishTaskArtifact(context.Background(), &pb.PublishTaskArtifactRequest{
        TaskId:   task.Id,
        Artifact: resultArtifact,
        Routing: &pb.AgentEventMetadata{
            FromAgentId: a.agentId,
            EventType:   "task.artifact",
        },
    })
}

A2A Task Design Patterns

1. Simple A2A Request-Response

The most basic pattern where one agent requests work from another using A2A messages:

Agent A ──[A2A Task]──> AgentHub ──[TaskEvent]──> Agent B
Agent A <─[Artifact]─── AgentHub <─[TaskUpdate]── Agent B

A2A Implementation:

// Agent A creates task
task := &a2a.Task{
    Id: "simple_task_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Convert CSV to JSON"}}},
        },
    },
}

// Agent B responds with artifact
artifact := &a2a.Artifact{
    Name: "Converted Data",
    Parts: []*a2a.Part{{Part: &a2a.Part_File{File: &a2a.FilePart{FileId: "converted.json"}}}},
}

Use cases:

File format conversion
Simple calculations
Data validation
Content generation

2. A2A Broadcast Processing

One agent broadcasts a task to multiple potential processors using A2A context-aware routing:

Agent A ──[A2A Task]──> AgentHub ──[TaskEvent]──> Agent B₁
                                ├─[TaskEvent]──> Agent B₂
                                └─[TaskEvent]──> Agent B₃

A2A Implementation:

// Broadcast task with shared context
task := &a2a.Task{
    Id:        "broadcast_task_456",
    ContextId: "parallel_processing_context",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{
                {Part: &a2a.Part_Text{Text: "Process data chunk"}},
                {Part: &a2a.Part_Data{Data: &a2a.DataPart{Data: chunkData}}},
            },
        },
    },
}

// Publish without specific target (broadcast)
client.PublishTaskUpdate(ctx, &pb.PublishTaskUpdateRequest{
    Task: task,
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "orchestrator",
        // No ToAgentId = broadcast
        EventType: "task.broadcast",
    },
})

Use cases:

Distributed computation
Load testing
Content distribution
Parallel processing

3. A2A Pipeline Processing

Tasks flow through a series of specialized agents using shared A2A context:

Agent A ──[A2A Task₁]──> Agent B ──[A2A Task₂]──> Agent C ──[A2A Task₃]──> Agent D
       <──[Final Artifact]───────────────────────────────────────────────────┘

A2A Implementation:

// Shared context for pipeline
pipelineContext := "data_pipeline_" + uuid.New().String()

// Stage 1: Data extraction
task1 := &a2a.Task{
    Id:        "extract_" + uuid.New().String(),
    ContextId: pipelineContext,
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Extract data from source"}}},
        },
    },
}

// Stage 2: Data transformation (triggered by Stage 1 completion)
task2 := &a2a.Task{
    Id:        "transform_" + uuid.New().String(),
    ContextId: pipelineContext, // Same context
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Transform extracted data"}}},
        },
    },
}

// Context linking enables pipeline coordination

Use cases:

Data processing pipelines
Image processing workflows
Document processing chains
ETL operations

4. A2A Hierarchical Decomposition

Complex tasks are broken down into subtasks using A2A context hierarchy:

Agent A ──[A2A ComplexTask]──> Coordinator
                                  ├──[A2A SubTask₁]──> Specialist₁
                                  ├──[A2A SubTask₂]──> Specialist₂
                                  └──[A2A SubTask₃]──> Specialist₃

A2A Implementation:

// Parent task
parentTask := &a2a.Task{
    Id:        "complex_analysis_789",
    ContextId: "business_workflow_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            Role: a2a.Role_USER,
            Content: []*a2a.Part{{Part: &a2a.Part_Text{Text: "Perform comprehensive business analysis"}}},
        },
    },
}

// Coordinator creates subtasks with hierarchical context
subtask1 := &a2a.Task{
    Id:        "financial_analysis_790",
    ContextId: "business_workflow_123", // Same parent context
    Metadata: map[string]interface{}{
        "parent_task_id": "complex_analysis_789",
        "subtask_type":   "financial",
    },
}

subtask2 := &a2a.Task{
    Id:        "market_analysis_791",
    ContextId: "business_workflow_123", // Same parent context
    Metadata: map[string]interface{}{
        "parent_task_id": "complex_analysis_789",
        "subtask_type":   "market",
    },
}

// Context enables coordination and result aggregation

Use cases:

Complex business workflows
Multi-step analysis
Orchestrated services
Batch job coordination

5. Competitive Processing

Multiple agents compete to handle the same task (first-come-first-served):

Agent A ──[Task]──> Broker ──[Task]──> Agent B₁ (accepts)
                           ├─[Task]──> Agent B₂ (rejects)
                           └─[Task]──> Agent B₃ (rejects)

Use cases:

Resource-constrained environments
Load balancing
Fault tolerance
Performance optimization

A2A Task Content and Semantics

A2A Message-Based Classification

In A2A, task classification is handled through message content rather than rigid type fields, providing more flexibility:

Content-Based Classification

// Data processing task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Analyze quarterly sales data for trends"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Analysis parameters"}}},
    },
}

// Image processing task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Generate product image with specifications"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Image requirements"}}},
    },
}

// Notification task
message := &a2a.Message{
    Content: []*a2a.Part{
        {Part: &a2a.Part_Text{Text: "Send completion notification to user"}},
        {Part: &a2a.Part_Data{Data: &a2a.DataPart{Description: "Notification details"}}},
    },
}

Operation-Based Classification

create.*        - Creation operations
update.*        - Modification operations
delete.*        - Removal operations
analyze.*       - Analysis operations
transform.*     - Transformation operations

Complexity-Based Classification

simple.*        - Quick, low-resource tasks
standard.*      - Normal processing tasks
complex.*       - Resource-intensive tasks
background.*    - Long-running batch tasks

A2A Content Design Guidelines

Be Explicit: Include all information needed for execution in structured Parts

// Good: Explicit A2A content
content := []*a2a.Part{
    {
        Part: &a2a.Part_Text{
            Text: "Convert CSV file to JSON format with specific options",
        },
    },
    {
        Part: &a2a.Part_Data{
            Data: &a2a.DataPart{
                Data: structpb.NewStruct(map[string]interface{}{
                    "source_format":   "csv",
                    "target_format":   "json",
                    "include_headers": true,
                    "delimiter":       ",",
                    "encoding":        "utf-8",
                }),
                Description: "Conversion parameters",
            },
        },
    },
    {
        Part: &a2a.Part_File{
            File: &a2a.FilePart{
                FileId:   "source_data.csv",
                Filename: "data.csv",
                MimeType: "text/csv",
            },
        },
    },
}

// Poor: Ambiguous A2A content
content := []*a2a.Part{
    {
        Part: &a2a.Part_Text{
            Text: "Convert file", // Too vague
        },
    },
}

Use Standard Data Types: Leverage common formats for interoperability

// Good: Standard formats
{
  "timestamp": "2024-01-15T10:30:00Z",      // ISO 8601
  "amount": "123.45",                        // String for precision
  "coordinates": {"lat": 40.7128, "lng": -74.0060}
}

Include Validation Information: Help agents validate inputs

{
  "email": "user@example.com",
  "email_format": "rfc5322",
  "max_length": 254,
  "required": true
}

A2A Error Handling and Edge Cases

A2A Task Rejection

Agents should provide meaningful rejection reasons using A2A message format:

func (a *Agent) rejectA2ATask(task *a2a.Task, reason string) {
    // Create rejection message
    rejectionMessage := &a2a.Message{
        MessageId: "msg_" + uuid.New().String(),
        TaskId:    task.Id,
        Role:      a2a.Role_AGENT,
        Content: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: "Task rejected: " + reason,
                },
            },
            {
                Part: &a2a.Part_Data{
                    Data: &a2a.DataPart{
                        Data: structpb.NewStruct(map[string]interface{}{
                            "rejection_reason": reason,
                            "agent_id":         a.agentId,
                            "timestamp":        time.Now().Unix(),
                        }),
                        Description: "Rejection details",
                    },
                },
            },
        },
    }

    // Update task status to failed
    task.Status = &a2a.TaskStatus{
        State:     a2a.TaskState_TASK_STATE_FAILED,
        Update:    rejectionMessage,
        Timestamp: timestamppb.Now(),
    }

    a.publishTaskUpdate(task)
}

Common rejection reasons:

UNSUPPORTED_TASK_TYPE: Agent doesn’t handle this task type
CAPACITY_EXCEEDED: Agent is at maximum capacity
DEADLINE_IMPOSSIBLE: Cannot complete within deadline
INVALID_PARAMETERS: Task parameters are malformed
RESOURCE_UNAVAILABLE: Required external resources unavailable

Timeout Handling

Both requesters and processors should handle timeouts gracefully:

// Requester timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

select {
case result := <-resultChannel:
    // Process result
case <-ctx.Done():
    // Handle timeout - possibly retry or fail
}

// Processor timeout
func (a *Agent) executeWithTimeout(task *pb.TaskMessage) {
    deadline := task.GetDeadline().AsTime()
    ctx, cancel := context.WithDeadline(context.Background(), deadline)
    defer cancel()

    select {
    case result := <-a.processTask(ctx, task):
        a.publishResult(task, result, pb.TaskStatus_TASK_STATUS_COMPLETED)
    case <-ctx.Done():
        a.publishResult(task, nil, pb.TaskStatus_TASK_STATUS_FAILED, "Deadline exceeded")
    }
}

Partial Results

For long-running tasks, consider supporting partial results:

type PartialResult struct {
    TaskId          string
    CompletedPortion float64    // 0.0 to 1.0
    IntermediateData interface{}
    CanResume       bool
    ResumeToken     string
}

Best Practices

Task Design

Make task types granular but not too fine-grained
Design for idempotency when possible
Include retry information in metadata
Use consistent parameter naming across similar task types
Version your task schemas to enable evolution

Performance Considerations

Batch related tasks when appropriate
Use appropriate priority levels to avoid starvation
Set realistic deadlines based on historical performance
Include resource hints to help with scheduling
Monitor task completion rates to identify bottlenecks

Security Considerations

Validate all task parameters before processing
Sanitize user-provided data in task parameters
Include authorization context in metadata
Log task execution for audit trails
Encrypt sensitive parameters when necessary

A2A tasks form the foundation of Agent2Agent communication, enabling sophisticated distributed processing patterns through structured messages, artifacts, and context-aware coordination. The A2A protocol’s flexible message format and EDA integration provide robust, scalable agent networks with clear semantics and strong observability. Proper A2A task design leverages the protocol’s strengths for building maintainable, interoperable agent systems.

4.2.3 -

Agent Discovery Workflow Explained

This document explains how the agent discovery workflow operates in AgentHub, enabling dynamic registration and LLM-based orchestration.

Overview

Agent discovery is the process by which agents dynamically register their capabilities with the Cortex orchestrator, making themselves available for intelligent task delegation via an LLM (Large Language Model).

The Problem This Solves

Traditional multi-agent systems require:

Hard-coded agent configurations
Static routing rules
Manual updates when adding new agents
No intelligence in task routing

Agent discovery with Cortex provides:

Dynamic registration: Agents announce themselves when they start
Intelligent routing: LLM decides which agent to use based on capabilities
Zero configuration: No central registry to update
Scalable: Add or remove agents without system changes

How It Works

The Five-Step Flow

┌──────────────┐
│ 1. Agent     │ Agent starts and creates an AgentCard
│    Startup   │ describing its capabilities
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 2. Register  │ Agent calls RegisterAgent RPC
│    with      │ sending the AgentCard to broker
│    Broker    │
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 3. Event     │ Broker publishes AgentCardEvent
│    Publishing│ broadcasting to all subscribers
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 4. Cortex    │ Cortex receives event and stores
│    Discovery │ agent in its registry
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 5. LLM       │ Agent is now available in LLM
│    Integration│ prompts for intelligent delegation
└──────────────┘

Step 1: Agent Startup

When an agent starts, it creates an AgentCard that describes:

agentCard := &pb.AgentCard{
    Name:        "agent_translator",
    Description: "Language translation service",
    Version:     "1.0.0",
    Skills: []*pb.AgentSkill{
        {
            Name: "Text Translation",
            Description: "Translates text between languages",
            Examples: [
                "Translate this to Spanish",
                "Convert to French",
            ],
        },
    },
}

Key Components:

Name: Unique identifier (used for routing)
Description: What the agent does (helps LLM understand)
Skills: Specific capabilities with examples (used for matching)

Step 2: Registration with Broker

The agent registers by calling the broker’s RegisterAgent RPC:

client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
    AgentCard:     agentCard,
    Subscriptions: []string{"translation_request"},
})

What happens:

Broker validates the AgentCard
Stores agent in its registry: registeredAgents[agentID] = card
Returns success response

Step 3: Event Publishing

The broker immediately publishes an AgentCardEvent:

event := &pb.AgentEvent{
    EventId:   "agent_registered_translator_...",
    Timestamp: now(),
    Payload: &pb.AgentEvent_AgentCard{
        AgentCard: &pb.AgentCardEvent{
            AgentId:   "agent_translator",
            AgentCard: agentCard,
            EventType: "registered",
        },
    },
    Routing: &pb.AgentEventMetadata{
        FromAgentId: "agent_translator",
        ToAgentId:   "", // Broadcast
        EventType:   "agent.registered",
        Priority:    PRIORITY_HIGH,
    },
}

Routing characteristics:

Broadcast to all subscribers (empty ToAgentId)
High priority (processed immediately)
Event type clearly marked as “agent.registered”

Step 4: Cortex Discovery

Cortex subscribes to agent events:

stream, _ := client.SubscribeToAgentEvents(ctx, &pb.SubscribeToAgentEventsRequest{
    AgentId:    "cortex",
    EventTypes: []string{"agent.registered", "agent.updated"},
})

When receiving an agent card event, Cortex:

func handleAgentCardEvent(event *pb.AgentCardEvent) {
    agentID := event.GetAgentId()
    agentCard := event.GetAgentCard()

    // Store agent
    cortex.RegisterAgent(agentID, agentCard)

    // Log skills for visibility
    log.Info("Agent registered",
        "agent_id", agentID,
        "skills", extractSkillNames(agentCard.Skills))
}

Result: Agent is now in Cortex’s registeredAgents map.

Step 5: LLM Integration

When a user sends a request, Cortex queries the LLM:

decision, _ := llm.Decide(
    conversationHistory,
    availableAgents,  // Includes our new agent!
    newUserMessage,
)

The LLM sees:

Available agents:
- agent_translator: Language translation service
  Skills:
    * Text Translation: Translates text between languages
      Examples: "Translate this to Spanish", "Convert to French"

Decision making:

User asks: “Can you translate this to Spanish?”
LLM sees “agent_translator” with matching examples
LLM decides: Delegate to agent_translator
Cortex sends task to agent_translator
Agent processes and responds
Cortex synthesizes final response

Message Flow Diagram

sequenceDiagram
    participant A as Translation Agent
    participant B as Broker
    participant C as Cortex
    participant L as LLM (VertexAI)
    participant U as User

    Note over A: Step 1: Startup
    A->>A: Create AgentCard

    Note over A,B: Step 2: Registration
    A->>B: RegisterAgent(card)
    B->>B: Store in registry

    Note over B: Step 3: Event Publishing
    B->>C: AgentCardEvent (broadcast)

    Note over C: Step 4: Discovery
    C->>C: RegisterAgent(id, card)
    C->>C: total_agents++

    Note over U,L: Step 5: LLM Integration
    U->>C: "Translate to Spanish"
    C->>L: Decide(availableAgents)

    Note over L: Sees translator agent<br/>with matching examples

    L-->>C: {delegate: agent_translator}
    C->>A: Task message
    A->>A: Process translation
    A->>C: Result
    C->>L: Synthesize
    L-->>C: Final response
    C->>U: "Here's the Spanish: ..."

Technical Implementation Details

Thread Safety

Agent registration is thread-safe:

type AgentHubService struct {
    registeredAgents map[string]*pb.AgentCard
    agentsMu         sync.RWMutex
}

func (s *AgentHubService) RegisterAgent(...) {
    s.agentsMu.Lock()
    s.registeredAgents[agentID] = card
    s.agentsMu.Unlock()
}

Multiple agents can register concurrently without conflicts.

Event Delivery

Events are delivered asynchronously:

for _, subChan := range targetChannels {
    go func(ch chan *pb.AgentEvent) {
        select {
        case ch <- event:
            // Delivered
        case <-time.After(5 * time.Second):
            // Timeout
        }
    }(subChan)
}

Benefits:

Non-blocking: Broker doesn’t wait for all deliveries
Resilient: Timeout prevents hanging
Concurrent: Multiple subscribers receive events in parallel

LLM Prompt Generation

Cortex builds prompts dynamically:

func buildOrchestrationPrompt(availableAgents []*pb.AgentCard) string {
    prompt := "Available agents:\n"

    for _, agent := range availableAgents {
        prompt += fmt.Sprintf("- %s: %s\n",
            agent.Name, agent.Description)

        for _, skill := range agent.Skills {
            prompt += fmt.Sprintf("  Skills:\n")
            prompt += fmt.Sprintf("    * %s: %s\n",
                skill.Name, skill.Description)
        }
    }

    return prompt
}

Updated automatically when new agents register.

Timing and Performance

Typical timings for agent discovery:

Agent startup:        100-200ms
RegisterAgent RPC:    < 10ms
Event publishing:     < 5ms
Event delivery:       < 50ms
Cortex processing:    < 10ms
Total discovery time: < 300ms

Fast enough that agents are available for routing within milliseconds of starting.

Error Handling

Registration Failures

If registration fails:

_, err := client.RegisterAgent(ctx, req)
if err != nil {
    log.Error("Registration failed", "error", err)
    // Agent should retry or exit
    panic(err)
}

Common causes:

Broker not running
Network issues
Invalid AgentCard (empty name)

Event Delivery Failures

If event delivery fails:

if err := s.routeEvent(ctx, event); err != nil {
    log.Warn("Event routing failed", "error", err)
    // Continue anyway - registration still succeeded
}

Graceful degradation: Registration succeeds even if event routing fails.

Cortex Not Subscribed

If Cortex isn’t subscribed yet:

Events are still published
Cortex can query GetAgentCard() RPC later
Or register when Cortex starts

Resilient: System handles various startup orders.

Observability

Broker Logs

level=INFO msg="Agent registered" agent_id=agent_translator
level=DEBUG msg="Routing event to subscribers"
    event_type=agent.registered subscriber_count=2
level=DEBUG msg="Event delivered to subscriber"

Cortex Logs

level=INFO msg="Received agent card event"
    agent_id=agent_translator event_type=registered
level=INFO msg="Agent skills registered"
    skills="[Text Translation: Translates...]"
level=INFO msg="Agent registered with Cortex orchestrator"
    total_agents=3

Distributed Tracing

Agent registration creates trace spans:

agent_registered_translator
  └─ broker.route_event
      ├─ deliver_to_cortex
      └─ deliver_to_monitor

Visibility into the entire discovery flow.

Lifecycle Management

Agent Startup Sequence

1. Create AgentHub client
2. Connect to broker
3. Create AgentCard
4. Call RegisterAgent
5. Subscribe to messages
6. Enter processing loop

Agent Shutdown

Currently agents don’t explicitly unregister. For graceful shutdown:

// In future enhancement:
defer client.UnregisterAgent(ctx, &pb.UnregisterAgentRequest{
    AgentId: myAgentID,
})

This would trigger an “agent.unregistered” event.

Agent Updates

To update capabilities:

// Modify AgentCard
agentCard.Skills = append(agentCard.Skills, newSkill)

// Re-register
client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
    AgentCard: agentCard,
})

// Triggers "agent.updated" event

Cortex receives update and refreshes its registry.

Comparison with Other Patterns

vs. Service Discovery (Consul, etcd)

Agent Discovery:

Includes capability metadata (skills)
Optimized for LLM consumption
Event-driven notification
Rich semantic information

Service Discovery:

Network location only
Health checks
Static metadata
Pull-based queries

vs. API Gateway

Agent Discovery:

Dynamic routing based on content
LLM makes intelligent decisions
Supports complex multi-step workflows

API Gateway:

Path-based routing
Static configuration
Single request-response

vs. Message Queues

Agent Discovery:

Agents know their capabilities
Centralized intelligence (Cortex)
Rich metadata for decisions

Message Queues:

Topic-based routing
No central intelligence
Minimal metadata

Design Decisions

Why Broadcast Events?

Decision: Publish agent cards to all subscribers

Alternatives considered:

Point-to-point to Cortex only
Store-and-query model

Rationale:

Multiple orchestrators can coexist
Monitoring agents can track all agents
Extensible for future use cases
Low overhead (events are small)

Why High Priority?

Decision: Agent registration events use PRIORITY_HIGH

Rationale:

New agents should be available quickly
User requests may come immediately
Discovery is time-sensitive
Low volume (not many registrations)

Why Skills with Examples?

Decision: Include example user requests in skills

Rationale:

LLMs learn by example
Natural language is ambiguous
Examples disambiguate capabilities
Improves matching accuracy

Future Enhancements

See AGENT_DECIDE.md for planned improvements:

Agent Health Monitoring: Track agent availability
Agent Deregistration: Explicit removal from registry
Agent Versioning: Support multiple versions simultaneously
Capability Queries: Search agents by capability
Load Balancing: Distribute work among multiple instances

Conclusion

The agent discovery workflow enables:

Zero-configuration agent deployment
Intelligent routing via LLM
Dynamic scaling of agent pools
Automatic orchestration based on capabilities
Flexible, extensible multi-agent systems

This architecture supports truly autonomous, self-organizing agent networks that can adapt to changing requirements without manual intervention.

Create Agent with Cortex - Implementation guide
Design Agent Cards - Best practices
A2A Protocol - Protocol specification
Cortex Architecture - Orchestrator design

4.2.4 -

The Agent2Agent Protocol and AgentHub Implementation

This document explores the core principles of Google’s Agent2Agent protocol and how AgentHub implements a communication broker based on these concepts. We distinguish between the Agent2Agent protocol specification (task structures and communication patterns) and our custom AgentHub broker implementation.

Agent2Agent vs AgentHub: What’s What

Agent2Agent Protocol (Google)

The Agent2Agent protocol defines:

Task Message Structures: TaskMessage, TaskResult, TaskProgress with their fields and semantics
Task Status and Priority Enums: Standardized task lifecycle and priority levels
Communication Patterns: Asynchronous task delegation and result reporting concepts

AgentHub Implementation (This Project)

AgentHub provides:

Event Bus Broker: Centralized gRPC service that routes tasks between agents
Pub/Sub Architecture: Publisher-subscriber pattern for task distribution
Subscription Mechanisms: SubscribeToTasks, SubscribeToTaskResults, SubscribeToTaskProgress methods
Agent Implementations: Sample publisher and subscriber agents demonstrating the protocol

Philosophy and Core Concepts

Beyond Simple Request-Response

Traditional software architectures rely heavily on synchronous request-response patterns where a client requests a service and waits for an immediate response. While effective for simple operations, this pattern has limitations when dealing with:

Complex, multi-step processes that require coordination between multiple specialized services
Long-running operations that may take minutes or hours to complete
Dynamic workload distribution where the best processor for a task may vary over time
Autonomous decision-making where agents need to collaborate without central coordination

The Agent2Agent protocol addresses these limitations by defining task structures and communication patterns for autonomous agents. AgentHub implements a broker-based system that enables agents to communicate using Agent2Agent-inspired task structures:

Delegating work to other agents based on their capabilities
Accepting and processing tasks according to their specializations
Reporting progress during long-running operations
Making collaborative decisions about task distribution and execution

Autonomous Collaboration

In an Agent2Agent system, each agent operates with a degree of autonomy, making decisions about:

Which tasks to accept based on current capacity and capabilities
How to prioritize work when multiple tasks are pending
When to delegate subtasks to other specialized agents
How to report progress and handle failures

This autonomy enables the system to be more resilient, scalable, and adaptive compared to centrally-controlled architectures.

Key Design Principles

1. Asynchronous Communication

Agent2Agent communication is fundamentally asynchronous. When Agent A requests work from Agent B:

Agent A doesn’t block waiting for completion
Agent B can process the task when resources are available
Progress updates provide visibility into long-running operations
Results are delivered when the work is complete

This asynchronicity enables:

Better resource utilization as agents aren’t blocked waiting
Improved scalability as systems can handle more concurrent operations
Enhanced resilience as temporary agent unavailability doesn’t block the entire system

2. Rich Task Semantics (Agent2Agent Protocol)

The Agent2Agent protocol defines rich task message structures that AgentHub implements:

message TaskMessage {
  string task_id = 1;                    // Unique identifier for tracking
  string task_type = 2;                  // Semantic type (e.g., "data_analysis")
  google.protobuf.Struct parameters = 3; // Flexible parameters
  string requester_agent_id = 4;         // Who requested the work
  string responder_agent_id = 5;         // Who should do the work (optional)
  google.protobuf.Timestamp deadline = 6; // When it needs to be done
  Priority priority = 7;                 // How urgent it is
  google.protobuf.Struct metadata = 8;   // Additional context
}

This rich structure enables:

Intelligent routing based on task type and agent capabilities
Priority-based scheduling to ensure urgent tasks are handled first
Deadline awareness for time-sensitive operations
Context preservation for better decision-making

3. Explicit Progress Tracking

Long-running tasks benefit from explicit progress reporting:

message TaskProgress {
  string task_id = 1;                    // Which task this refers to
  TaskStatus status = 2;                 // Current status
  string progress_message = 3;           // Human-readable description
  int32 progress_percentage = 4;         // Quantitative progress (0-100)
  google.protobuf.Struct progress_data = 5; // Structured progress information
}

This enables:

Visibility into system operations for monitoring and debugging
User experience improvements with real-time progress indicators
Resource planning by understanding how long operations typically take
Early failure detection when progress stalls unexpectedly

4. Flexible Agent Addressing

The protocol supports multiple addressing patterns:

Direct addressing: Tasks sent to specific agents by ID
Broadcast addressing: Tasks sent to all capable agents
Capability-based routing: Tasks routed based on agent capabilities
Load-balanced routing: Tasks distributed among agents with similar capabilities

This flexibility enables different architectural patterns within the same system.

Architectural Patterns

Microservices Enhancement

In a microservices architecture, Agent2Agent can enhance service communication by:

Replacing synchronous HTTP calls with asynchronous task delegation
Adding progress visibility to long-running service operations
Enabling service composition through task chaining
Improving resilience through task retry and timeout mechanisms

Event-Driven Architecture Integration

Agent2Agent complements event-driven architectures by:

Adding structure to event processing with explicit task semantics
Enabling bidirectional communication where events can trigger tasks that produce responses
Providing progress tracking for complex event processing workflows
Supporting task-based coordination alongside pure event broadcasting

Workflow Orchestration

Complex business processes can be modeled as Agent2Agent workflows:

Process Initiation: A workflow agent receives a high-level business request
Task Decomposition: The request is broken down into specific tasks
Agent Coordination: Tasks are distributed to specialized agents
Progress Aggregation: Individual task progress is combined into overall workflow status
Result Assembly: Task results are combined into a final business outcome

Benefits and Trade-offs

Benefits

Scalability: Asynchronous operation and agent autonomy enable horizontal scaling without central bottlenecks.

Resilience: Agent failures don’t cascade as easily since tasks can be retried or redistributed.

Flexibility: New agent types can be added without modifying existing agents.

Observability: Rich task semantics and progress reporting provide excellent visibility into system operations.

Modularity: Agents can be developed, deployed, and scaled independently.

Trade-offs

Complexity: The system requires more sophisticated error handling and state management compared to simple request-response patterns.

Latency: For simple operations, the overhead of task creation and routing may add latency compared to direct calls.

Debugging: Distributed, asynchronous operations can be more challenging to debug than synchronous call chains.

Consistency: Managing data consistency across asynchronous agent operations requires careful design.

When to Use Agent2Agent

Agent2Agent is particularly well-suited for:

Complex Processing Pipelines

When work involves multiple steps that can be performed by different specialized agents:

Data ingestion → validation → transformation → analysis → reporting
Image upload → virus scan → thumbnail generation → metadata extraction
Order processing → inventory check → payment processing → fulfillment

Long-Running Operations

When operations take significant time and users need progress feedback:

Large file processing
Machine learning model training
Complex data analysis
Batch job processing

Dynamic Load Distribution

When workload characteristics vary and different agents may be better suited for different tasks:

Multi-tenant systems with varying customer requirements
Resource-intensive operations that need specialized hardware
Geographic distribution where local processing is preferred

System Integration

When connecting heterogeneous systems that need to coordinate:

Third-party service coordination
Cross-platform workflows

Comparison with Other Patterns

vs. Message Queues

Traditional message queues provide asynchronous communication but lack:

Rich task semantics
Progress tracking
Bidirectional result delivery
Priority and deadline awareness

vs. RPC/HTTP APIs

RPC and HTTP APIs provide structured communication but are typically:

Synchronous (blocking)
Lacking progress visibility
Point-to-point rather than flexible routing
Without built-in retry and timeout semantics

vs. Event Sourcing

Event sourcing provides audit trails and state reconstruction but:

Focuses on state changes rather than work coordination
Lacks explicit progress tracking
Doesn’t provide direct task completion feedback
Requires more complex query patterns for current state

The SubAgent Library: Simplifying Agent Development

While the Agent2Agent protocol and AgentHub broker provide powerful capabilities for building distributed agent systems, implementing agents from scratch requires significant boilerplate code. The SubAgent library addresses this by providing a high-level abstraction that handles infrastructure concerns, letting developers focus on business logic.

The Problem: Too Much Boilerplate

Traditional agent implementation requires:

~200+ lines of setup code: gRPC client configuration, connection management, health checks
A2A protocol compliance: Correct AgentCard structure with all required fields
Subscription management: Setting up task streams and handling lifecycle
Observability integration: Manual tracing span creation, logging, metrics
Error handling: Graceful shutdown, signal handling, resource cleanup

This creates several issues:

High barrier to entry: New agents require deep knowledge of the infrastructure
Code duplication: Every agent reimplements the same patterns
Maintenance burden: Infrastructure changes require updates across all agents
Inconsistent quality: Some agents may have better observability or error handling than others

The Solution: Infrastructure as a Library

The SubAgent library encapsulates all infrastructure concerns into a simple, composable API:

// 1. Configure your agent
config := &subagent.Config{
    AgentID:     "my_agent",
    Name:        "My Agent",
    Description: "Does something useful",
}

// 2. Create and register skills
agent, _ := subagent.New(config)
agent.MustAddSkill("Skill Name", "Description", handlerFunc)

// 3. Run (everything else is automatic)
agent.Run(ctx)

This reduces agent implementation from ~200 lines to ~50 lines (75% reduction), letting developers focus entirely on their domain logic.

Architecture

The SubAgent library implements a layered architecture:

┌─────────────────────────────────────────┐
│         Your Business Logic             │
│    (Handler Functions: ~30 lines)       │
├─────────────────────────────────────────┤
│         SubAgent Library                │
│  - Config & Validation                  │
│  - AgentCard Creation (A2A compliant)   │
│  - Task Subscription & Routing          │
│  - Automatic Observability              │
│  - Lifecycle Management                 │
├─────────────────────────────────────────┤
│      AgentHub Client Library            │
│  - gRPC Connection                      │
│  - Message Publishing/Subscription      │
│  - TraceManager, Metrics, Logging       │
├─────────────────────────────────────────┤
│         AgentHub Broker                 │
│  - Event Routing                        │
│  - Agent Registry                       │
│  - Task Distribution                    │
└─────────────────────────────────────────┘

Key Features

1. Declarative Configuration

Instead of imperative setup code, agents use declarative configuration:

config := &subagent.Config{
    AgentID:     "agent_translator",     // Required
    Name:        "Translation Agent",    // Required
    Description: "Translates text",      // Required
    Version:     "1.0.0",                // Optional, defaults
    HealthPort:  "8087",                 // Optional, defaults
}

The library:

Validates all required fields
Applies sensible defaults for optional fields
Returns clear error messages for configuration issues

2. Skill-Based Programming Model

Agents define capabilities as “skills” - discrete units of functionality:

agent.MustAddSkill(
    "Language Translation",              // Name (shown to LLM)
    "Translates text between languages", // Description
    translateHandler,                    // Implementation
)

Each skill maps to a handler function with a clear signature:

func (ctx, task, message) -> (artifact, state, errorMessage)

This model:

Encourages single-responsibility design
Makes capabilities explicit and discoverable
Simplifies testing (handlers are pure functions)
Enables skill-based task routing

3. Automatic A2A Compliance

The library generates complete, A2A-compliant AgentCards:

// Developer writes:
agent.MustAddSkill("Translate", "Translates text", handler)

// Library generates:
&pb.AgentCard{
    ProtocolVersion: "0.2.9",
    Name:            "agent_translator",
    Description:     "Translation Agent",
    Version:         "1.0.0",
    Skills: []*pb.AgentSkill{
        {
            Id:          "skill_0",
            Name:        "Translate",
            Description: "Translates text",
            Tags:        []string{"Translate"},
            InputModes:  []string{"text/plain"},
            OutputModes: []string{"text/plain"},
        },
    },
    Capabilities: &pb.AgentCapabilities{
        Streaming:         false,
        PushNotifications: false,
    },
}

This ensures all agents follow protocol standards without manual effort.

4. Built-In Observability

Every task execution is automatically wrapped with observability:

Tracing:

// Automatic span creation for each task
taskSpan := traceManager.StartSpan(ctx, "agent.{agentID}.handle_task")
traceManager.AddA2ATaskAttributes(taskSpan, taskID, skillName, contextID, ...)
traceManager.SetSpanSuccess(taskSpan)  // or RecordError()

Logging:

// Automatic structured logging
logger.InfoContext(ctx, "Processing task", "task_id", taskID, "skill", skillName)
logger.ErrorContext(ctx, "Task failed", "error", err)

Metrics:

Task processing duration
Success/failure counts
Active task count
(via AgentHubClient metrics)

Developers get full distributed tracing and logging without writing any observability code.

5. Lifecycle Management

The library handles the complete agent lifecycle:

Startup:

Validate configuration
Connect to broker (with retries)
Register AgentCard
Subscribe to tasks
Start health check server
Signal “ready”

Runtime:

Receive tasks from broker
Route to appropriate handler
Execute with tracing/logging
Publish results
Handle errors gracefully

Shutdown:

Catch SIGINT/SIGTERM signals
Stop accepting new tasks
Wait for in-flight tasks (with timeout)
Close broker connection
Cleanup resources
Exit cleanly

All automatically - developers never write lifecycle code.

Design Patterns

The Handler Pattern

Handlers are pure functions that transform inputs to outputs:

func myHandler(ctx context.Context, task *pb.Task, message *pb.Message)
    (*pb.Artifact, pb.TaskState, string) {

    // Extract input
    input := extractInput(message)

    // Validate
    if err := validate(input); err != nil {
        return nil, TASK_STATE_FAILED, err.Error()
    }

    // Process
    result := process(ctx, input)

    // Create artifact
    artifact := createArtifact(result)

    return artifact, TASK_STATE_COMPLETED, ""
}

This pattern:

Testable: Pure functions are easy to unit test
Composable: Handlers can call other functions
Error handling: Explicit return of state and error message
Context-aware: Receives context for cancellation and tracing

The Configuration Pattern

Configuration is separated from code:

// Development
config := &subagent.Config{
    AgentID:    "my_agent",
    HealthPort: "8080",
}

// Production (from environment)
config := &subagent.Config{
    AgentID:    os.Getenv("AGENT_ID"),
    BrokerAddr: os.Getenv("BROKER_ADDR"),
    HealthPort: os.Getenv("HEALTH_PORT"),
}

This enables:

Different configs for dev/staging/prod
Easy testing with mock configs
Container-friendly (12-factor app)

Benefits

For Developers:

Faster development: 75% less code to write
Lower complexity: Focus on business logic, not infrastructure
Better quality: Automatic best practices (observability, error handling)
Easier testing: Handler functions are pure and testable
Clearer structure: Skill-based organization is intuitive

For Operations:

Consistent observability: All agents have same tracing/logging
Standard health checks: Uniform health endpoints
Predictable behavior: Lifecycle management is consistent
Easy monitoring: Metrics are built-in
Reliable shutdown: Graceful handling is automatic

For the System:

Better integration: All agents follow same patterns
Easier debugging: Consistent trace structure across agents
Simplified maintenance: Library updates improve all agents
Reduced errors: Less custom code means fewer bugs

Evolution Path

The SubAgent library provides a clear evolution path for agent development:

Phase 1: Simple Agents (Current)

Single skills, synchronous processing
Text input/output
Uses library defaults

Phase 2: Advanced Agents

Multiple skills per agent
Streaming responses
Custom capabilities
Extended AgentCard fields

Phase 3: Specialized Agents

Custom observability (additional traces/metrics)
Advanced error handling
Multi-modal input/output
Stateful processing

The library supports all phases through its extensibility points (GetClient(), GetLogger(), custom configs).

Comparison with Manual Implementation

Aspect	Manual Implementation	SubAgent Library
Lines of Code	~200 lines setup	~50 lines total
Configuration	50+ lines imperative	10 lines declarative
AgentCard	Manual struct creation	Automatic generation
Observability	Manual span/log calls	Automatic wrapping
Lifecycle	Custom signal handling	Built-in management
Error Handling	Scattered throughout	Centralized in library
Testing	Must mock infrastructure	Test handlers directly
Maintenance	Per-agent updates needed	Library update benefits all
Learning Curve	High (need infrastructure knowledge)	Low (focus on logic)
Time to First Agent	Several hours	Under 30 minutes

Real-World Impact

The Echo Agent demonstrates the library’s impact:

Before SubAgent Library (211 lines):

Manual client setup: 45 lines
AgentCard creation: 30 lines
Task subscription: 60 lines
Handler implementation: 50 lines
Lifecycle management: 26 lines

With SubAgent Library (82 lines):

Configuration: 10 lines
Skill registration: 5 lines
Handler implementation: 50 lines
Run: 2 lines
Everything else: automatic

The business logic (50 lines) stays the same, but infrastructure code (161 lines) is eliminated.

When to Use SubAgent Library

Use SubAgent Library when:

Building new agents from scratch
Agent has 1-10 skills with clear boundaries
Standard A2A protocol is sufficient
You want consistent observability across agents
Quick development time is important

Consider Manual Implementation when:

Highly custom protocol requirements
Need very specific lifecycle control
Existing agent migration (may not be worth refactoring)
Experimental/research agents with non-standard patterns

For 99% of agent development, the SubAgent library is the right choice.

Future Evolution

The Agent2Agent principle opens possibilities for:

Intelligent Agent Networks

Agents that learn about each other’s capabilities and performance characteristics to make better delegation decisions.

Self-Organizing Systems

Agent networks that automatically reconfigure based on workload patterns and agent availability.

Cross-Organization Collaboration

Extending Agent2Agent protocols across organizational boundaries for B2B workflow automation.

AI Agent Integration

Natural integration points for AI agents that can understand task semantics and make autonomous decisions about task acceptance and delegation.

The Agent2Agent principle represents a foundational shift toward more intelligent, autonomous, and collaborative software systems that can handle the complexity of modern distributed applications while providing the visibility and control that operators need.

4.3 - Features

Deep explanations of AgentHub’s key features and capabilities

Feature Explanations

Detailed explanations of AgentHub’s advanced features, their design rationale, and implementation details.

Available Documentation

Distributed Tracing - OpenTelemetry integration and tracing architecture
Architecture Evolution - Evolution from build tags to unified abstractions
Performance and Scaling - Performance characteristics and scaling strategies
Unified Abstraction Library - High-level API design and abstractions

4.3.1 - Distributed Tracing & OpenTelemetry

Deep dive into distributed tracing concepts, OpenTelemetry architecture, and how AgentHub implements comprehensive observability for event-driven systems.

🔍 Distributed Tracing & OpenTelemetry

Understanding-oriented: Deep dive into distributed tracing concepts, OpenTelemetry architecture, and how AgentHub implements comprehensive observability for event-driven systems.

The Problem: Observing Distributed Systems

Traditional monolithic applications are relatively easy to debug—everything happens in one process, on one machine, with one log file. But modern event-driven architectures like AgentHub present unique challenges:

The Complexity of Event-Driven Systems

Request Flow in AgentHub:
User → Publisher Agent → AgentHub Broker → Subscriber Agent → Result → Publisher Agent

Each step involves:

Different processes (potentially on different machines)
Asynchronous communication (events, not direct calls)
Multiple protocol layers (gRPC, HTTP, network)
Independent failure modes (network partitions, service crashes)
Varying performance characteristics (CPU, memory, I/O)

Traditional Debugging Challenges

Without distributed tracing:

Publisher logs:   "Published task task_123 at 10:00:01"
Broker logs:     "Received task from agent_pub at 10:00:01"
                 "Routed task to agent_sub at 10:00:01"
Subscriber logs: "Processing task task_456 at 10:00:02"
                 "Completed task task_789 at 10:00:03"

Questions you can’t answer:

Which subscriber processed task_123?
How long did task_123 take end-to-end?
Where did task_123 fail?
What was the complete flow for a specific request?

The Solution: Distributed Tracing

Distributed tracing solves these problems by creating a unified view of requests as they flow through multiple services.

Core Concepts

Trace

A trace represents a complete request journey through the system. In AgentHub, a trace might represent:

Publishing a task
Processing the task
Publishing the result
Receiving the result

Trace ID: a1b2c3d4e5f67890
Duration: 150ms
Services: 3 (publisher, broker, subscriber)
Spans: 5
Status: Success

Span

A span represents a single operation within a trace. Each span has:

Name: What operation it represents
Start/End time: When it happened
Tags: Metadata about the operation
Logs: Events that occurred during the operation
Status: Success, error, or timeout

Span: "publish_event"
  Service: agenthub-publisher
  Duration: 25ms
  Tags:
    event.type: "greeting"
    event.id: "task_123"
    responder.agent: "agent_demo_subscriber"
  Status: OK

Span Context

The glue that connects spans across service boundaries. Contains:

Trace ID: Unique identifier for the entire request
Span ID: Unique identifier for the current operation
Trace Flags: Sampling decisions, debug mode, etc.

How Tracing Works in AgentHub

1. Trace Initiation

When a publisher creates a task, it starts a new trace:

// Publisher starts a trace
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()

// Add metadata
span.SetAttributes(
    attribute.String("event.type", "greeting"),
    attribute.String("event.id", taskID),
)

2. Context Propagation

The trace context is injected into the task metadata:

// Inject trace context into task headers
headers := make(map[string]string)
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))

// Embed headers in task metadata
task.Metadata = &structpb.Struct{
    Fields: map[string]*structpb.Value{
        "trace_headers": structpb.NewStructValue(&structpb.Struct{
            Fields: stringMapToStructFields(headers),
        }),
    },
}

3. Context Extraction

The broker and subscriber extract the trace context:

// Extract trace context from task metadata
if metadata := task.GetMetadata(); metadata != nil {
    if traceHeaders, ok := metadata.Fields["trace_headers"]; ok {
        headers := structFieldsToStringMap(traceHeaders.GetStructValue().Fields)
        ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(headers))
    }
}

// Continue the trace
ctx, span := tracer.Start(ctx, "process_event")
defer span.End()

4. Complete Request Flow

The result is a complete trace showing the entire request journey:

Trace: a1b2c3d4e5f67890
├── publish_event (agenthub-publisher) [25ms]
│   ├── event.type: greeting
│   └── event.id: task_123
├── route_task (agenthub-broker) [2ms]
│   ├── source.agent: agent_demo_publisher
│   └── target.agent: agent_demo_subscriber
├── consume_event (agenthub-subscriber) [5ms]
│   └── messaging.operation: receive
├── process_task (agenthub-subscriber) [98ms]
│   ├── task.type: greeting
│   ├── task.parameter.name: Claude
│   └── processing.status: completed
└── publish_result (agenthub-subscriber) [20ms]
    └── result.status: success

OpenTelemetry Architecture

OpenTelemetry is the observability framework that powers AgentHub’s tracing implementation.

The OpenTelemetry Stack

┌─────────────────────────────────────────────────────────┐
│                    Applications                         │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐      │
│  │  Publisher  │ │   Broker    │ │ Subscriber  │      │
│  └─────────────┘ └─────────────┘ └─────────────┘      │
└─────────────────┬───────────────┬───────────────┬─────┘
                  │               │               │
┌─────────────────▼───────────────▼───────────────▼─────┐
│              OpenTelemetry SDK                        │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐      │
│  │   Tracer    │ │    Meter    │ │   Logger    │      │
│  └─────────────┘ └─────────────┘ └─────────────┘      │
└─────────────────────────────────┬───────────────────────┘
                                  │
┌─────────────────────────────────▼───────────────────────┐
│            OpenTelemetry Collector                     │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐      │
│  │  Receivers  │ │ Processors  │ │  Exporters  │      │
│  └─────────────┘ └─────────────┘ └─────────────┘      │
└─────────────────┬───────────────┬───────────────┬─────┘
                  │               │               │
┌─────────────────▼─────┐ ┌───────▼───────┐ ┌─────▼─────┐
│      Jaeger           │ │  Prometheus   │ │   Logs    │
│   (Tracing)           │ │  (Metrics)    │ │(Logging)  │
└───────────────────────┘ └───────────────┘ └───────────┘

Core Components

Tracer

Creates and manages spans:

tracer := otel.Tracer("agenthub-publisher")
ctx, span := tracer.Start(ctx, "publish_event")
defer span.End()

Meter

Creates and manages metrics:

meter := otel.Meter("agenthub-publisher")
counter, _ := meter.Int64Counter("events_published_total")
counter.Add(ctx, 1)

Propagators

Handle context propagation across service boundaries:

// Inject context
otel.GetTextMapPropagator().Inject(ctx, carrier)

// Extract context
ctx = otel.GetTextMapPropagator().Extract(ctx, carrier)

Exporters

Send telemetry data to backend systems:

OTLP Exporter: Sends to OpenTelemetry Collector
Jaeger Exporter: Sends directly to Jaeger
Prometheus Exporter: Exposes metrics for Prometheus

AgentHub’s OpenTelemetry Implementation

Configuration

func NewObservability(config Config) (*Observability, error) {
    // Create resource (service identification)
    res, err := resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceName(config.ServiceName),
            semconv.ServiceVersion(config.ServiceVersion),
        ),
    )

    // Setup tracing
    traceExporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint(config.JaegerEndpoint),
        otlptracegrpc.WithInsecure(),
    )

    tracerProvider := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(traceExporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.AlwaysSample()),
    )

    otel.SetTracerProvider(tracerProvider)

    // Setup metrics
    meterProvider := sdkmetric.NewMeterProvider(
        sdkmetric.WithResource(res),
        sdkmetric.WithReader(promExporter),
    )

    otel.SetMeterProvider(meterProvider)
}

Custom slog Handler Integration

AgentHub’s custom logging handler automatically correlates logs with traces:

func (h *ObservabilityHandler) Handle(ctx context.Context, r slog.Record) error {
    // Extract trace context
    if span := trace.SpanFromContext(ctx); span.SpanContext().IsValid() {
        spanCtx := span.SpanContext()
        attrs = append(attrs,
            slog.String("trace_id", spanCtx.TraceID().String()),
            slog.String("span_id", spanCtx.SpanID().String()),
        )
    }

    // Structured log output with trace correlation
    logData := map[string]interface{}{
        "time":     r.Time.Format(time.RFC3339),
        "level":    r.Level.String(),
        "msg":      r.Message,
        "trace_id": spanCtx.TraceID().String(),
        "span_id":  spanCtx.SpanID().String(),
        "service":  h.serviceName,
    }
}

Observability Patterns in Event-Driven Systems

Pattern 1: Event Correlation

Challenge: Correlating events across async boundaries Solution: Inject trace context into event metadata

// Publisher injects context
headers := make(map[string]string)
otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(headers))
event.Metadata["trace_headers"] = headers

// Consumer extracts context
ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier(event.Metadata["trace_headers"]))

Pattern 2: Async Operation Tracking

Challenge: Tracking operations that complete asynchronously Solution: Create child spans that can outlive their parents

// Start async operation
ctx, span := tracer.Start(ctx, "async_operation")

go func() {
    defer span.End()
    // Long-running async work
    processTask()
    span.SetStatus(2, "") // Success
}()

// Parent can continue/return immediately

Pattern 3: Error Propagation

Challenge: Understanding how errors flow through the system Solution: Record errors at each span and propagate error status

if err != nil {
    span.RecordError(err)
    span.SetStatus(1, err.Error()) // Error status

    // Optionally add error details
    span.SetAttributes(
        attribute.String("error.type", "validation_error"),
        attribute.String("error.message", err.Error()),
    )
}

Pattern 4: Performance Attribution

Challenge: Understanding where time is spent in complex flows Solution: Detailed span hierarchy with timing

// High-level operation
ctx, span := tracer.Start(ctx, "process_task")
defer span.End()

// Sub-operations
ctx, validateSpan := tracer.Start(ctx, "validate_input")
// ... validation logic
validateSpan.End()

ctx, computeSpan := tracer.Start(ctx, "compute_result")
// ... computation logic
computeSpan.End()

ctx, persistSpan := tracer.Start(ctx, "persist_result")
// ... persistence logic
persistSpan.End()

Benefits of AgentHub’s Observability Implementation

1. Complete Request Visibility

See every step of event processing
Understand inter-service dependencies
Track request flows across async boundaries

2. Performance Analysis

Identify bottlenecks in event processing
Understand where time is spent
Optimize critical paths

3. Error Diagnosis

Pinpoint exactly where failures occur
Understand error propagation patterns
Correlate errors with system state

4. Capacity Planning

Understand system throughput characteristics
Identify scaling bottlenecks
Plan resource allocation

5. Troubleshooting

Correlate logs, metrics, and traces
Understand system behavior under load
Debug complex distributed issues

Advanced Tracing Concepts

Sampling

Not every request needs to be traced. Sampling reduces overhead:

// Probability sampling (trace 10% of requests)
sdktrace.WithSampler(sdktrace.ParentBased(
    sdktrace.TraceIDRatioBased(0.1),
))

// Rate limiting sampling (max 100 traces/second)
sdktrace.WithSampler(sdktrace.ParentBased(
    sdktrace.RateLimited(100),
))

Custom Attributes

Add business context to spans:

span.SetAttributes(
    attribute.String("user.id", userID),
    attribute.String("tenant.id", tenantID),
    attribute.Int("batch.size", len(items)),
    attribute.String("workflow.type", workflowType),
)

Span Events

Add timestamped events within spans:

span.AddEvent("validation.started")
// ... validation logic
span.AddEvent("validation.completed", trace.WithAttributes(
    attribute.Int("validation.rules.evaluated", ruleCount),
))

Baggage

Propagate key-value pairs across the entire trace:

// Set baggage
ctx = baggage.ContextWithValues(ctx,
    baggage.String("user.tier", "premium"),
    baggage.String("feature.flag", "new_algorithm"),
)

// Read baggage in any service
if member := baggage.FromContext(ctx).Member("user.tier"); member.Value() == "premium" {
    // Use premium algorithm
}

Performance Considerations

Overhead Analysis

AgentHub’s observability adds:

CPU: ~5% overhead for tracing
Memory: ~50MB per service for buffers and metadata
Network: Minimal (async batched export)
Latency: ~10ms additional end-to-end latency

Optimization Strategies

Sampling: Reduce trace volume for high-throughput systems
Batching: Export spans in batches to reduce network overhead
Async Processing: Never block business logic for observability
Resource Limits: Use memory limiters in the collector

Production Recommendations

Enable sampling for high-volume systems
Monitor collector performance and scale horizontally if needed
Set retention policies for traces and metrics
Use dedicated infrastructure for observability stack

Troubleshooting Common Issues

Missing Traces

Symptoms: No traces appear in Jaeger Causes:

Context not propagated correctly
Exporter configuration issues
Collector connectivity problems

Debugging:

# Check if spans are being created
curl http://localhost:8080/metrics | grep trace

# Check collector logs
docker-compose logs otel-collector

# Verify Jaeger connectivity
curl http://localhost:16686/api/traces

Broken Trace Chains

Symptoms: Spans appear disconnected Causes:

Context not extracted properly
New context created instead of continuing existing

Debugging:

// Always check if context contains active span
if span := trace.SpanFromContext(ctx); span.SpanContext().IsValid() {
    fmt.Printf("Active trace: %s\n", span.SpanContext().TraceID())
} else {
    fmt.Println("No active trace context")
}

High Memory Usage

Symptoms: Observability causing OOM errors Causes:

Too many spans in memory
Large span attributes
Export failures causing backlog

Solutions:

// Configure memory limits
config := sdktrace.NewTracerProvider(
    sdktrace.WithSpanLimits(sdktrace.SpanLimits{
        AttributeCountLimit: 128,
        EventCountLimit:     128,
        LinkCountLimit:      128,
    }),
)

The Future of Observability

Emerging Trends

eBPF-based Observability: Automatic instrumentation without code changes
AI-Powered Analysis: Automatic anomaly detection and root cause analysis
Unified Observability: Single pane of glass for metrics, traces, logs, and profiles
Real-time Alerting: Faster detection and response to issues

OpenTelemetry Roadmap

Profiling: Continuous profiling integration
Client-side Observability: Browser and mobile app tracing
Database Instrumentation: Automatic query tracing
Infrastructure Correlation: Link application traces to infrastructure metrics

Conclusion

Distributed tracing transforms debugging from guesswork into precise investigation. AgentHub’s implementation with OpenTelemetry provides:

Complete visibility into event-driven workflows
Performance insights for optimization
Error correlation for faster resolution
Business context through custom attributes

The investment in observability pays dividends in:

Reduced MTTR (Mean Time To Resolution)
Improved performance through data-driven optimization
Better user experience through proactive monitoring
Team productivity through better tooling

🎯 Ready to Implement?

Hands-on: Observability Demo Tutorial

Production: Add Observability to Your Agent

Deep Dive: Observability Architecture

4.3.2 - Observability Span Naming Convention

Standard naming convention for OpenTelemetry spans across all agents

Observability Span Naming Convention

Overview

To enable quick visual identification of which component is performing which operation in distributed traces, AgentHub follows a consistent span naming convention across all agents and services.

Naming Standard

Format

All manually-created spans MUST follow this format:

{component}.{operation}

Where:

component: The agent or service name (lowercase, underscores for multi-word)
operation: The operation being performed (lowercase, underscores for multi-word)

Examples

Component	Operation	Span Name
Broker	publish_event	`broker.publish_event`
Broker	route_event	`broker.route_event`
Broker	subscribe_messages	`broker.subscribe_messages`
Cortex	handle_message	`cortex.handle_message`
Cortex	llm_decide	`cortex.llm_decide`
Cortex	execute_actions	`cortex.execute_actions`
Cortex	send_chat_response	`cortex.send_chat_response`
Echo Agent	handle_request	`echo_agent.handle_request`
Echo Agent	publish_response	`echo_agent.publish_response`
Chat CLI	publish_message	`chat_cli.publish_message`
Chat CLI	display_response	`chat_cli.display_response`

Component Names

Standard component names for AgentHub:

Component	Span Prefix	Description
Event Bus Broker	`broker.`	Core message routing service
Cortex Orchestrator	`cortex.`	AI orchestration engine
Echo Agent	`echo_agent.`	Echo/repeat agent
Chat CLI	`chat_cli.`	Command-line chat interface
Publisher Agent	`publisher.`	Demo publisher
Subscriber Agent	`subscriber.`	Demo subscriber
Chat Responder	`chat_responder.`	Chat response agent

For new agents, use the agent’s ID or a short descriptive name.

Implementation

Creating Spans

When creating spans, use the component name as prefix:

// Good: Clear component identification
ctx, span := traceManager.StartSpan(ctx, "cortex.handle_message")
defer span.End()

// Bad: Missing component prefix
ctx, span := traceManager.StartSpan(ctx, "handle_message")  // ❌
defer span.End()

Component Attribute

In addition to the span name prefix, ALWAYS add the component attribute for filtering and querying:

ctx, span := traceManager.StartSpan(ctx, "cortex.handle_message")
defer span.End()

// Add component attribute
traceManager.AddComponentAttribute(span, "cortex")

This enables:

Visual identification: Span name shows component in trace waterfall
Query filtering: Component attribute enables filtering traces by component

A2A Message Spans

For spans specifically tracking A2A message handling, use the specialized method:

ctx, span := traceManager.StartA2AMessageSpan(
    ctx,
    "cortex.handle_message",  // Note: includes component prefix
    message.GetMessageId(),
    message.GetRole().String(),
)
defer span.End()

// Add component attribute
traceManager.AddComponentAttribute(span, "cortex")

Operation Naming Guidelines

Use Action Verbs

Operations should describe what the component is doing:

handle_message - Processing an incoming message
publish_event - Publishing an event
route_event - Routing an event to subscribers
execute_actions - Executing a list of actions
send_response - Sending a response

Be Specific

When possible, be specific about what kind of operation:

✅ cortex.llm_decide (specific: LLM decision making)
❌ cortex.decide (too generic)
✅ echo_agent.handle_echo_request (specific: echo request handling)
❌ echo_agent.handle (too generic)

Use Underscores

Separate words with underscores, not hyphens or camelCase:

✅ broker.publish_event
❌ broker.publish-event (hyphens)
❌ broker.publishEvent (camelCase)

Auto-Generated Spans

Some spans are auto-generated by instrumentation libraries (e.g., gRPC):

agenthub.AgentHub/PublishMessage
agenthub.AgentHub/SubscribeToMessages
agenthub.AgentHub/RegisterAgent

Why These Are Acceptable

These auto-generated gRPC spans are acceptable and should NOT be changed because:

Standard Format: They follow gRPC’s OpenTelemetry standard naming convention: package.Service/Method
Automatic Instrumentation: Generated automatically by gRPC’s built-in OpenTelemetry interceptors
Breaking Changes: Modifying them would break standard gRPC tracing and break compatibility with observability tools
Clear Indication: The format clearly indicates these are RPC calls (the / separator is distinctive)
Component Context: Parent spans provide the component context

Visual Example in Traces

In practice, you’ll see this pattern:

cortex.handle_message                              ← Manual span (component prefix)
├─ cortex.llm_decide                              ← Manual span (component prefix)
├─ cortex.execute_actions                         ← Manual span (component prefix)
│  └─ cortex.send_chat_response                   ← Manual span (component prefix)
│     └─ agenthub.AgentHub/PublishMessage         ← Auto-generated gRPC span (standard)
│        └─ broker.publish_event                  ← Manual span (component prefix)
│           └─ broker.route_event                 ← Manual span (component prefix)
└─ echo_agent.handle_request                      ← Manual span (component prefix)
   └─ echo_agent.publish_response                 ← Manual span (component prefix)
      └─ agenthub.AgentHub/PublishMessage         ← Auto-generated gRPC span (standard)

Key Point: The gRPC spans (agenthub.AgentHub/*) are nested within component-prefixed spans, so the component context is always clear from the parent span.

Best Practice: Create Parent Spans

For clarity, always create a parent span with your component prefix that wraps gRPC calls:

// Good: Parent span with component prefix
ctx, span := traceManager.StartSpan(ctx, "cortex.send_chat_response")
defer span.End()
traceManager.AddComponentAttribute(span, "cortex")

// Child span will be auto-created by gRPC instrumentation
_, err := client.Client.PublishMessage(ctx, request)
// Creates child span: agenthub.AgentHub/PublishMessage

// Result in trace:
// cortex.send_chat_response (your span)
//   └─ agenthub.AgentHub/PublishMessage (gRPC auto-span)

This pattern ensures:

Component identification at the operation level
Standard gRPC tracing compatibility
Clear parent-child relationships
No modification of auto-generated spans

Benefits

Visual Clarity

In trace visualizations, you can immediately identify components:

broker.publish_event
├─ broker.route_event
│  ├─ cortex.handle_message
│  │  ├─ cortex.llm_decide
│  │  └─ cortex.execute_actions
│  │     └─ cortex.send_chat_response
│  └─ echo_agent.handle_request
│     └─ echo_agent.publish_response
└─ broker.route_event
   └─ chat_cli.display_response

Query & Filter

Filter traces by component using attributes:

# All cortex operations
component = "cortex"

# All message handling across components
span.name LIKE "%.handle_message"

# Echo agent operations only
component = "echo_agent"

Debugging

When debugging issues:

Look at span name to identify which component failed
No need to expand span details to find component
Quickly trace request flow across components

Migration

Existing code should be updated to follow this convention:

Before

// Inconsistent naming
ctx, span := traceManager.StartSpan(ctx, "handle_message")  // Who is handling?
ctx, span := traceManager.StartSpan(ctx, "cortex_chat_request")  // Inconsistent separator
ctx, span := traceManager.StartSpan(ctx, "cli_publish_user_message")  // Inconsistent prefix

After

// Consistent naming
ctx, span := traceManager.StartSpan(ctx, "cortex.handle_message")
ctx, span := traceManager.StartSpan(ctx, "cortex.chat_request")
ctx, span := traceManager.StartSpan(ctx, "chat_cli.publish_message")

Validation

To ensure compliance, span names should be validated:

Code Review: Check span names follow {component}.{operation} format
Testing: Verify component attribute is set on all spans
Trace Review: Inspect actual traces to confirm naming consistency

4.3.3 - Architecture Evolution: From Build Tags to Unified Abstractions

Understanding AgentHub’s evolution from build tag-based conditional compilation to unified abstractions with built-in observability.

🔄 Architecture Evolution: From Build Tags to Unified Abstractions

Understanding-oriented: Learn how AgentHub evolved from build tag-based conditional compilation to a unified abstraction approach that dramatically simplifies development while providing comprehensive observability.

The Journey: Why AgentHub Moved Away from Build Tags

Legacy Approach: Build Tags for Conditional Features

AgentHub originally used Go build tags to handle different deployment scenarios:

Development: Fast builds with minimal features (go build)
Production: Full observability builds (go build -tags observability)
Testing: Lightweight versions for testing environments

Problems with Build Tags:

Maintenance overhead: Separate code paths for different builds
Testing complexity: Hard to ensure feature parity across variants
Developer experience: Multiple build commands and configurations
Binary complexity: Different feature sets in different binaries

Modern Solution: Unified Abstractions

AgentHub now uses a unified abstraction layer (internal/agenthub/) that provides:

Single codebase: No more separate files for different builds
Built-in observability: Always available, configured via environment
Simplified development: One build command, one binary
Runtime configuration: Features controlled by environment variables

The New Architecture

Core Components

The unified abstraction provides these key components:

1. AgentHubServer

// Single server implementation with built-in observability
server, err := agenthub.NewAgentHubServer(config)
if err != nil {
    return err
}

// Automatic OpenTelemetry, metrics, health checks
err = server.Start(ctx)

2. AgentHubClient

// Single client implementation with built-in observability
client, err := agenthub.NewAgentHubClient(config)
if err != nil {
    return err
}

// Automatic tracing, metrics, structured logging
err = client.Start(ctx)

3. TaskPublisher & TaskSubscriber

// High-level abstractions with automatic correlation
publisher := &agenthub.TaskPublisher{
    Client: client.Client,
    TraceManager: client.TraceManager,
    // Built-in observability
}

subscriber := agenthub.NewTaskSubscriber(client, agentID)
// Automatic task processing with tracing

Before vs After Comparison

Old Build Tag Approach

File Structure (Legacy):

agents/publisher/
├── main.go                 # Basic version (~200 lines)
├── main_observability.go   # Observable version (~380 lines)
├── shared.go              # Common code
└── config.go              # Configuration

broker/
├── main.go                 # Basic broker (~150 lines)
├── main_observability.go   # Observable broker (~300 lines)
└── server.go              # Core logic

Build Commands (Legacy):

# Basic build
go build -o bin/publisher agents/publisher/

# Observable build
go build -tags observability -o bin/publisher-obs agents/publisher/

# Testing observable features
go test -tags observability ./...

New Unified Approach

File Structure (Current):

agents/publisher/
└── main.go                 # Single implementation (~50 lines)

agents/subscriber/
└── main.go                 # Single implementation (~60 lines)

broker/
└── main.go                 # Single implementation (~30 lines)

internal/agenthub/          # Unified abstraction layer
├── grpc.go                # Client/server with observability
├── subscriber.go          # Task processing abstractions
├── broker.go             # Event bus implementation
└── metadata.go           # Correlation and metadata

Build Commands (Current):

# Single build for all use cases
go build -o bin/publisher agents/publisher/
go build -o bin/subscriber agents/subscriber/
go build -o bin/broker broker/

# Testing (no special tags needed)
go test ./...

Configuration Evolution

Environment-Based Configuration

Instead of build tags, features are now controlled via environment variables:

# Observability configuration
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="agenthub"
export OTEL_SERVICE_VERSION="1.0.0"

# Health and metrics ports
export BROKER_HEALTH_PORT="8080"

# Broker connection
export AGENTHUB_BROKER_ADDR="localhost"
export AGENTHUB_BROKER_PORT="50051"

Automatic Feature Detection

The unified abstractions automatically configure features based on environment:

// Observability is automatically configured
config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)

// If JAEGER_ENDPOINT is set → tracing enabled
// If BROKER_HEALTH_PORT is set → health server enabled
// Always includes structured logging and basic metrics

Benefits of the New Architecture

1. Developer Experience

Single build command: No more tag confusion
Consistent behavior: Same binary for all environments
Easier testing: No need for multiple test runs
Simplified CI/CD: One build pipeline

2. Maintenance Reduction

90% less code: From 380+ lines to 29 lines for broker
Single code path: No more duplicate implementations
Unified testing: Test once, works everywhere
Automatic features: Observability included by default

3. Operational Benefits

Runtime configuration: Change behavior without rebuilding
Consistent deployment: Same binary across environments
Better observability: Always available when needed
Easier debugging: Full context always present

Migration Guide

For users migrating from the old build tag approach:

Old Commands → New Commands

# OLD: Basic builds
go build -o bin/publisher agents/publisher/
# NEW: Same command (unchanged)
go build -o bin/publisher agents/publisher/

# OLD: Observable builds
go build -tags observability -o bin/publisher-obs agents/publisher/
# NEW: Same binary, configure via environment
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
go build -o bin/publisher agents/publisher/

# OLD: Testing with tags
go test -tags observability ./...
# NEW: Standard testing
go test ./...

Configuration Migration

# OLD: Feature controlled by build tags
go build -tags observability

# NEW: Feature controlled by environment
export JAEGER_ENDPOINT="http://localhost:14268/api/traces"
export OTEL_SERVICE_NAME="my-service"

Architecture Philosophy

From Compile-Time to Runtime

The move from build tags to unified abstractions represents a fundamental shift:

Build Tags Philosophy (Old):

“Choose features at compile time”
“Different binaries for different needs”
“Minimize what’s included”

Unified Abstractions Philosophy (New):

“Include everything, configure at runtime”
“One binary, many configurations”
“Maximize developer experience”

Why This Change?

Cloud-Native Reality: Modern deployments use containers with environment-based config
Developer Productivity: Unified approach eliminates confusion and errors
Testing Simplicity: One code path means reliable testing
Operational Excellence: Runtime configuration enables better operations

Performance Considerations

Resource Impact

The unified approach has minimal overhead:

Binary Size:
- Old basic: ~8MB
- Old observable: ~15MB
- New unified: ~12MB

Memory Usage:
- Baseline: ~10MB
- With observability: ~15MB (when enabled)
- Without observability: ~10MB (minimal overhead)

Startup Time:
- With observability enabled: ~150ms
- With observability disabled: ~50ms

Optimization Strategy

The abstractions use lazy initialization:

// Observability components only initialize if configured
if config.JaegerEndpoint != "" {
    // Initialize tracing
}

if config.HealthPort != "" {
    // Start health server
}

// Always minimal logging and basic metrics

Future Evolution

Planned Enhancements

Plugin Architecture: Dynamic feature loading
Configuration Profiles: Predefined environment sets
Feature Flags: Runtime feature toggling
Auto-Configuration: Intelligent environment detection

Compatibility Promise

The unified abstractions maintain backward compatibility:

Old environment variables still work
Gradual migration path available
No breaking changes in core APIs

This architectural evolution demonstrates how AgentHub prioritizes developer experience and operational simplicity while maintaining full observability capabilities. The move from build tags to unified abstractions represents a maturation of the platform toward cloud-native best practices.

4.3.4 - Performance and Scaling Considerations

Explore the performance characteristics of AgentHub, scaling patterns, and optimization strategies for different deployment scenarios.

Performance and Scaling Considerations

This document explores the performance characteristics of AgentHub, scaling patterns, and optimization strategies for different deployment scenarios.

Performance Characteristics

Baseline Performance Metrics

Test Environment:

4-core Intel i7 processor
16GB RAM
Local network (localhost)
Go 1.24

Measured Performance:

Task throughput: 8,000-12,000 tasks/second
Task routing latency: 0.1-0.5ms average
End-to-end latency: 2-10ms (including processing)
Memory per agent: ~1KB active subscription state
Concurrent agents: 1,000+ agents per broker instance

Performance Factors

1. Task Routing Performance

Task routing is the core performance bottleneck in AgentHub:

// Fast path: Direct agent routing
if responderID := req.GetTask().GetResponderAgentId(); responderID != "" {
    if subs, ok := s.taskSubscribers[responderID]; ok {
        targetChannels = subs  // O(1) lookup
    }
}

Optimization factors:

Direct routing: O(1) lookup time for targeted tasks
Broadcast routing: O(n) where n = number of subscribed agents
Channel delivery: Concurrent delivery via goroutines
Lock contention: Read locks allow concurrent routing

2. Message Serialization

Protocol Buffers provide efficient serialization:

Binary encoding: ~60% smaller than JSON
Zero-copy operations: Direct memory mapping where possible
Schema evolution: Backward/forward compatibility
Type safety: Compile-time validation

3. Memory Usage Patterns

// Memory usage breakdown per agent:
type agentMemoryFootprint struct {
    SubscriptionState    int // ~200 bytes (map entry + channel)
    ChannelBuffer       int // ~800 bytes (10 message buffer * 80 bytes avg)
    ConnectionOverhead  int // ~2KB (gRPC stream state)
    // Total: ~3KB per active agent
}

Memory optimization strategies:

Bounded channels: Prevent unbounded growth
Connection pooling: Reuse gRPC connections
Garbage collection: Go’s GC handles cleanup automatically

Scaling Patterns

Vertical Scaling (Scale Up)

Increasing resources on a single broker instance:

CPU Scaling

Multi-core utilization: Go’s runtime leverages multiple cores
Goroutine efficiency: Lightweight concurrency (2KB stack)
CPU-bound operations: Message serialization, routing logic

// Configure for CPU optimization
export GOMAXPROCS=8  // Match available CPU cores

Memory Scaling

Linear growth: Memory usage scales with number of agents
Buffer tuning: Adjust channel buffer sizes based on throughput

// Memory-optimized configuration
subChan := make(chan *pb.TaskMessage, 5)  // Smaller buffers for memory-constrained environments
// vs
subChan := make(chan *pb.TaskMessage, 50) // Larger buffers for high-throughput environments

Network Scaling

Connection limits: OS file descriptor limits (ulimit -n)
Bandwidth utilization: Protocol Buffers minimize bandwidth usage
Connection keepalive: Efficient connection reuse

Horizontal Scaling (Scale Out)

Distributing load across multiple broker instances:

1. Agent Partitioning

Static Partitioning:

Agent Groups:
├── Broker 1: agents_1-1000
├── Broker 2: agents_1001-2000
└── Broker 3: agents_2001-3000

Hash-based Partitioning:

func selectBroker(agentID string) string {
    hash := fnv.New32a()
    hash.Write([]byte(agentID))
    brokerIndex := hash.Sum32() % uint32(len(brokers))
    return brokers[brokerIndex]
}

2. Task Type Partitioning

Specialized Brokers:

Task Routing:
├── Broker 1: data_processing, analytics
├── Broker 2: image_processing, ml_inference
└── Broker 3: notifications, logging

3. Geographic Partitioning

Regional Distribution:

Geographic Deployment:
├── US-East: Broker cluster for East Coast agents
├── US-West: Broker cluster for West Coast agents
└── EU: Broker cluster for European agents

Load Balancing Strategies

1. Round-Robin Agent Distribution

type LoadBalancer struct {
    brokers []string
    current int
    mu      sync.Mutex
}

func (lb *LoadBalancer) NextBroker() string {
    lb.mu.Lock()
    defer lb.mu.Unlock()

    broker := lb.brokers[lb.current]
    lb.current = (lb.current + 1) % len(lb.brokers)
    return broker
}

2. Capacity-Based Routing

type BrokerMetrics struct {
    ActiveAgents int
    TasksPerSec  float64
    CPUUsage     float64
    MemoryUsage  float64
}

func selectBestBroker(brokers []BrokerMetrics) int {
    // Select broker with lowest load score
    bestIndex := 0
    bestScore := calculateLoadScore(brokers[0])

    for i, broker := range brokers[1:] {
        score := calculateLoadScore(broker)
        if score < bestScore {
            bestScore = score
            bestIndex = i + 1
        }
    }
    return bestIndex
}

Performance Optimization Strategies

1. Message Batching

For high-throughput scenarios, implement message batching:

type BatchProcessor struct {
    tasks     []*pb.TaskMessage
    batchSize int
    timeout   time.Duration
    ticker    *time.Ticker
}

func (bp *BatchProcessor) processBatch() {
    batch := make([]*pb.TaskMessage, len(bp.tasks))
    copy(batch, bp.tasks)
    bp.tasks = bp.tasks[:0] // Clear slice

    // Process entire batch
    go bp.routeBatch(batch)
}

2. Connection Pooling

Optimize gRPC connections for better resource utilization:

type ConnectionPool struct {
    connections map[string]*grpc.ClientConn
    maxConns    int
    mu          sync.RWMutex
}

func (cp *ConnectionPool) GetConnection(addr string) (*grpc.ClientConn, error) {
    cp.mu.RLock()
    if conn, exists := cp.connections[addr]; exists {
        cp.mu.RUnlock()
        return conn, nil
    }
    cp.mu.RUnlock()

    // Create new connection
    return cp.createConnection(addr)
}

3. Adaptive Channel Sizing

Dynamically adjust channel buffer sizes based on load:

func calculateOptimalBufferSize(avgTaskRate float64, processingTime time.Duration) int {
    // Buffer size = rate * processing time + safety margin
    bufferSize := int(avgTaskRate * processingTime.Seconds()) + 10

    // Clamp to reasonable bounds
    if bufferSize < 5 {
        return 5
    }
    if bufferSize > 100 {
        return 100
    }
    return bufferSize
}

4. Memory Optimization

Reduce memory allocations in hot paths:

// Use sync.Pool for frequent allocations
var taskPool = sync.Pool{
    New: func() interface{} {
        return &pb.TaskMessage{}
    },
}

func processTaskOptimized(task *pb.TaskMessage) {
    // Reuse task objects
    pooledTask := taskPool.Get().(*pb.TaskMessage)
    defer taskPool.Put(pooledTask)

    // Copy and process
    *pooledTask = *task
    // ... processing logic
}

Monitoring and Metrics

Key Performance Indicators (KPIs)

Throughput Metrics

type ThroughputMetrics struct {
    TasksPerSecond     float64
    ResultsPerSecond   float64
    ProgressPerSecond  float64
    MessagesPerSecond  float64
}

Latency Metrics

type LatencyMetrics struct {
    RoutingLatency     time.Duration // Broker routing time
    ProcessingLatency  time.Duration // Agent processing time
    EndToEndLatency    time.Duration // Total task completion time
    P50, P95, P99      time.Duration // Percentile latencies
}

Resource Metrics

type ResourceMetrics struct {
    ActiveAgents       int
    ActiveTasks        int
    MemoryUsage        int64
    CPUUsage           float64
    GoroutineCount     int
    OpenConnections    int
}

Monitoring Implementation

import "github.com/prometheus/client_golang/prometheus"

var (
    taskCounter = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "agenthub_tasks_total",
            Help: "Total number of tasks processed",
        },
        []string{"task_type", "status"},
    )

    latencyHistogram = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "agenthub_task_duration_seconds",
            Help:    "Task processing duration",
            Buckets: prometheus.DefBuckets,
        },
        []string{"task_type"},
    )
)

Scaling Recommendations

Small Deployments (1-100 agents)

Single broker instance: Sufficient for most small deployments
Vertical scaling: Add CPU/memory as needed
Simple monitoring: Basic logging and health checks

Medium Deployments (100-1,000 agents)

Load balancing: Implement agent distribution
Resource monitoring: Track CPU, memory, and throughput
Optimization: Tune channel buffer sizes and timeouts

Large Deployments (1,000+ agents)

Horizontal scaling: Multiple broker instances
Partitioning strategy: Implement agent or task type partitioning
Advanced monitoring: Full metrics and alerting
Performance testing: Regular load testing and optimization

High-Throughput Scenarios (10,000+ tasks/second)

Message batching: Implement batch processing
Connection optimization: Use connection pooling
Hardware optimization: SSD storage, high-speed networking
Profiling: Regular performance profiling and optimization

Troubleshooting Performance Issues

Common Performance Problems

1. High Latency

Symptoms: Slow task processing times Causes: Network latency, overloaded agents, inefficient routing Solutions: Optimize routing, add caching, scale horizontally

2. Memory Leaks

Symptoms: Increasing memory usage over time Causes: Unclosed channels, goroutine leaks, connection leaks Solutions: Proper cleanup, monitoring, garbage collection tuning

3. Connection Limits

Symptoms: New agents can’t connect Causes: OS file descriptor limits, broker resource limits Solutions: Increase limits, implement connection pooling

4. Message Loss

Symptoms: Tasks not reaching agents or results not returned Causes: Timeout issues, network problems, buffer overflows Solutions: Increase timeouts, improve error handling, adjust buffer sizes

Performance Testing

Load Testing Script

func loadTest() {
    // Create multiple publishers
    publishers := make([]Publisher, 10)
    for i := range publishers {
        publishers[i] = NewPublisher(fmt.Sprintf("publisher_%d", i))
    }

    // Send tasks concurrently
    taskRate := 1000 // tasks per second
    duration := 60 * time.Second

    ticker := time.NewTicker(time.Duration(1e9 / taskRate))
    timeout := time.After(duration)

    for {
        select {
        case <-ticker.C:
            publisher := publishers[rand.Intn(len(publishers))]
            go publisher.PublishTask(generateRandomTask())
        case <-timeout:
            return
        }
    }
}

The AgentHub architecture provides solid performance for most use cases and clear scaling paths for growing deployments. Regular monitoring and optimization ensure continued performance as your agent ecosystem evolves.

4.3.5 - The Unified Abstraction Library

The AgentHub Unified Abstraction Library dramatically simplifies the development of agents and brokers while providing built-in observability, environment-based configuration, and automatic correlation tracking.

The A2A-Compliant Unified Abstraction Library

Overview

The AgentHub Unified Abstraction Library (internal/agenthub/) is a comprehensive set of A2A protocol-compliant abstractions that dramatically simplifies the development of A2A agents and brokers while providing built-in observability, environment-based configuration, and automatic correlation tracking.

Key Benefits

Before and After Comparison

Before (Legacy approach):

broker/main_observability.go: 380+ lines of boilerplate
Manual OpenTelemetry setup in every component
Duplicate configuration handling across components
Manual correlation ID management
Separate observability and non-observability variants

After (Unified abstractions):

broker/main.go: 29 lines using abstractions
Automatic OpenTelemetry integration
Environment-based configuration
Automatic correlation ID generation and propagation
Single implementation with built-in observability

Core Components

1. gRPC Abstractions (`grpc.go`)

AgentHubServer

Provides a complete gRPC server abstraction with:

Automatic OpenTelemetry instrumentation
Environment-based configuration
Built-in health checks
Metrics collection
Graceful shutdown

// Create and start a broker in one line
func StartBroker(ctx context.Context) error {
    config := NewGRPCConfig("broker")
    server, err := NewAgentHubServer(config)
    if err != nil {
        return err
    }
    return server.Start(ctx)
}

AgentHubClient

Provides a complete gRPC client abstraction with:

Automatic connection management
Built-in observability
Environment-based server discovery
Health monitoring

// Create a client with built-in observability
config := agenthub.NewGRPCConfig("publisher")
client, err := agenthub.NewAgentHubClient(config)

2. A2A Task Management Abstractions (`a2a.go`)

A2ATaskPublisher

Simplifies A2A task publishing with:

Automatic A2A message generation
Built-in observability tracing
A2A context management
Structured error handling
A2A-compliant message formatting

a2aPublisher := &agenthub.A2ATaskPublisher{
    Client:         client.Client,
    TraceManager:   client.TraceManager,
    MetricsManager: client.MetricsManager,
    Logger:         client.Logger,
    ComponentName:  "a2a_publisher",
}

// Create A2A task with structured message content
task := &a2a.Task{
    Id:        "task_greeting_" + uuid.New().String(),
    ContextId: "conversation_123",
    Status: &a2a.TaskStatus{
        State: a2a.TaskState_TASK_STATE_SUBMITTED,
        Update: &a2a.Message{
            MessageId: "msg_" + uuid.New().String(),
            Role:      a2a.Role_USER,
            Content: []*a2a.Part{
                {
                    Part: &a2a.Part_Text{
                        Text: "Please process greeting task",
                    },
                },
                {
                    Part: &a2a.Part_Data{
                        Data: &a2a.DataPart{
                            Data:        greetingParams,
                            Description: "Greeting parameters",
                        },
                    },
                },
            },
        },
        Timestamp: timestamppb.Now(),
    },
}

err := a2aPublisher.PublishA2ATask(ctx, task, &pb.AgentEventMetadata{
    FromAgentId: "publisher_id",
    ToAgentId:   "subscriber_id",
    EventType:   "task.submitted",
    Priority:    pb.Priority_PRIORITY_MEDIUM,
})

A2ATaskProcessor

Provides full observability for A2A task processing:

Automatic A2A trace propagation
Rich A2A span annotations with context and message details
A2A message processing metrics
A2A conversation context tracking
Error tracking with A2A-compliant error messages

3. A2A Subscriber Abstractions (`a2a_subscriber.go`)

A2ATaskSubscriber

Complete A2A subscriber implementation with:

A2A-compliant task handler system
Built-in A2A message processors
Automatic A2A artifact publishing
Full A2A observability integration
A2A conversation context awareness

a2aSubscriber := agenthub.NewA2ATaskSubscriber(client, agentID)
a2aSubscriber.RegisterDefaultA2AHandlers()

// Custom A2A task handlers
a2aSubscriber.RegisterA2ATaskHandler("greeting", func(ctx context.Context, event *pb.AgentEvent) error {
    task := event.GetTask()
    if task == nil {
        return fmt.Errorf("no task in event")
    }

    // Process A2A task content
    requestMessage := task.Status.Update
    response := a2aSubscriber.ProcessA2AMessage(ctx, requestMessage)

    // Create completion artifact
    artifact := &a2a.Artifact{
        ArtifactId: "artifact_" + uuid.New().String(),
        Name:       "Greeting Response",
        Description: "Processed greeting task result",
        Parts: []*a2a.Part{
            {
                Part: &a2a.Part_Text{
                    Text: response,
                },
            },
        },
    }

    // Complete task with artifact
    return a2aSubscriber.CompleteA2ATaskWithArtifact(ctx, task, artifact)
})

go a2aSubscriber.SubscribeToA2ATasks(ctx)
go a2aSubscriber.SubscribeToA2AMessages(ctx)

4. A2A Broker Service (`a2a_broker.go`)

Complete A2A-compliant AgentHub service implementation that handles:

A2A message routing and delivery
A2A subscription management with context filtering
A2A artifact distribution
A2A task state management
EDA+A2A hybrid routing
Full A2A observability

// A2A broker service with unified abstractions
type A2ABrokerService struct {
    // A2A-specific components
    MessageRouter    *A2AMessageRouter
    TaskManager      *A2ATaskManager
    ContextManager   *A2AContextManager
    ArtifactManager  *A2AArtifactManager

    // EDA integration
    EventBus         *EDAEventBus
    SubscriptionMgr  *A2ASubscriptionManager

    // Observability
    TraceManager     *TraceManager
    MetricsManager   *A2AMetricsManager
}

A2A Environment-Based Configuration

The library uses environment variables for zero-configuration A2A setup:

# Core AgentHub A2A Settings
export AGENTHUB_BROKER_ADDR=localhost
export AGENTHUB_BROKER_PORT=50051

# A2A Protocol Configuration
export AGENTHUB_A2A_PROTOCOL_VERSION=1.0
export AGENTHUB_MESSAGE_BUFFER_SIZE=100
export AGENTHUB_CONTEXT_TIMEOUT=30s
export AGENTHUB_ARTIFACT_MAX_SIZE=10MB

# Observability Endpoints
export JAEGER_ENDPOINT=127.0.0.1:4317
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317

# A2A Health Check Ports
export AGENTHUB_HEALTH_PORT=8080
export A2A_PUBLISHER_HEALTH_PORT=8081
export A2A_SUBSCRIBER_HEALTH_PORT=8082

A2A Automatic Observability

A2A Distributed Tracing

Automatic A2A instrumentation: OpenTelemetry gRPC interceptors handle A2A trace propagation
A2A service naming: Unified “agenthub” service with A2A component differentiation
Rich A2A annotations: Message content, conversation context, task state transitions, and artifact details
A2A context tracking: Complete conversation thread visibility across multiple agents

A2A Metrics Collection

A2A message metrics: Message processing rates, A2A error rates, latencies by message type
A2A task metrics: Task completion rates, state transition times, artifact production metrics
A2A context metrics: Conversation context tracking, multi-agent coordination patterns
A2A system metrics: Health checks, A2A connection status, protocol version compatibility
A2A component metrics: Per-agent A2A performance, broker routing efficiency

Health Monitoring

Automatic endpoints: /health, /ready, /metrics
Component tracking: Individual health per service
Graceful shutdown: Proper cleanup and connection management

A2A Correlation and Context Tracking

Automatic A2A Correlation IDs

// A2A task ID generation
taskID := fmt.Sprintf("task_%s_%s", taskDescription, uuid.New().String())

// A2A message ID generation
messageID := fmt.Sprintf("msg_%d_%s", time.Now().Unix(), uuid.New().String())

// A2A context ID for conversation threading
contextID := fmt.Sprintf("ctx_%s_%s", workflowType, uuid.New().String())

A2A Context Propagation

A2A conversation threading: Context IDs link related tasks across agents
A2A message history: Complete audit trail of all messages in a conversation
A2A workflow tracking: End-to-end visibility of multi-agent workflows

Trace Propagation

W3C Trace Context: Standard distributed tracing headers
Automatic propagation: gRPC interceptors handle context passing
End-to-end visibility: Publisher → Broker → Subscriber traces

A2A Migration Guide

From Legacy EventBus to A2A Abstractions

Before (Legacy EventBus):

// 50+ lines of observability setup
obs, err := observability.New(ctx, observability.Config{...})
server := grpc.NewServer(grpc.UnaryInterceptor(...))
pb.RegisterEventBusServer(server, &eventBusService{...})

// Manual task message creation
task := &pb.TaskMessage{
    TaskId:   "task_123",
    TaskType: "greeting",
    // ... manual field population
}

After (A2A Abstractions):

// One line A2A broker startup
err := agenthub.StartA2ABroker(ctx)

// A2A task creation with abstractions
task := a2aPublisher.CreateA2ATask("greeting", greetingContent, "conversation_123")
err := a2aPublisher.PublishA2ATask(ctx, task, routingMetadata)

Best Practices

1. Use Environment Configuration

Let the library handle configuration automatically:

source .envrc  # Load all environment variables
go run broker/main.go

2. Register Custom A2A Handlers

Extend functionality with custom A2A task handlers:

a2aSubscriber.RegisterA2ATaskHandler("my_task", myCustomA2AHandler)

// A2A handler signature with event and context
func myCustomA2AHandler(ctx context.Context, event *pb.AgentEvent) error {
    task := event.GetTask()
    // Process A2A message content
    return a2aSubscriber.CompleteA2ATaskWithArtifact(ctx, task, resultArtifact)
}

3. Leverage Built-in Observability

The library provides comprehensive observability by default - no additional setup required.

4. Use A2A Structured Logging

The library provides structured loggers with A2A trace correlation:

// A2A-aware logging with context
client.Logger.InfoContext(ctx, "Processing A2A task",
    "task_id", task.GetId(),
    "context_id", task.GetContextId(),
    "message_count", len(task.GetHistory()),
    "current_state", task.GetStatus().GetState().String(),
)

A2A Architecture Benefits

Code Reduction with A2A Abstractions

A2A Broker: 380+ lines → 29 lines (92% reduction)
A2A Publisher: 150+ lines → 45 lines (70% reduction)
A2A Subscriber: 200+ lines → 55 lines (72% reduction)
A2A Message Handling: Complex manual parsing → automatic Part processing
A2A Context Management: Manual tracking → automatic conversation threading

A2A Maintainability

A2A protocol compliance: Centralized A2A message handling ensures protocol adherence
Consistent A2A patterns: Same abstractions across all A2A components
A2A-aware configuration: Environment variables tuned for A2A performance
A2A context preservation: Automatic conversation context management

A2A Developer Experience

Zero A2A boilerplate: Built-in A2A message parsing and artifact handling
A2A-native architecture: Easy to extend with custom A2A message processors
Automatic A2A setup: One-line A2A service creation with protocol compliance
A2A debugging: Rich conversation context and message history for troubleshooting

A2A Future Extensibility

The A2A abstraction library is designed for A2A protocol extension:

Custom A2A Part types: Easy to add new content types (text, data, files, custom)
Custom A2A observability: Extend A2A metrics and conversation tracing
A2A configuration: Override A2A protocol defaults with environment variables
A2A transport options: Extend beyond gRPC while maintaining A2A compliance
A2A protocol evolution: Built-in version compatibility and migration support

A2A Protocol Extension Points

// Custom A2A Part type
type CustomPart struct {
    CustomData interface{} `json:"custom_data"`
    Format     string      `json:"format"`
}

// Custom A2A artifact processor
type CustomArtifactProcessor struct {
    SupportedTypes []string
    ProcessFunc    func(ctx context.Context, artifact *a2a.Artifact) error
}

// Custom A2A context manager
type CustomContextManager struct {
    ContextRules map[string]ContextRule
    RouteFunc    func(contextId string, message *a2a.Message) []string
}

This A2A-compliant unified approach provides a solid foundation for building complex multi-agent systems with full Agent2Agent protocol support while maintaining simplicity, comprehensive observability, and rich conversation capabilities.

Documentation

AgentHub Documentation

🚀 Quick Start

New to AgentHub?

Want to understand the concepts?

Ready to build agents?

Need technical details?

📚 Documentation Types

Tutorials - Learning by doing

How-to Guides - Solving specific problems

Reference - Technical specifications

Explanation - Understanding concepts

🎯 Learning Paths

For Beginners

For Experienced Developers

For System Architects

🔧 System Overview

🛠️ Key Technologies

1 - Tutorials

Tutorials

📚 Tutorial Categories

🎯 Learning Path

1. Start Here

2. Build Systems

3. Monitor & Observe

📋 Prerequisites

💡 Tutorial Tips

Note

1.1 - Getting Started

Getting Started Tutorials

Available Tutorials

1.1.1 - Installation and Setup Tutorial

Installation and Setup Tutorial

Prerequisites Check

Step 1: Verify Go Installation

Step 2: Verify Protocol Buffers Compiler

Step 3: Install Go Protocol Buffer Plugins

Installing AgentHub

Step 4: Clone the Repository

Step 5: Verify Project Structure

Step 6: Initialize Go Module

Step 7: Generate Protocol Buffer Code

Step 8: Build All Components

Verification Test

Step 9: Test the Installation

Development Environment Setup

Step 10: Configure Your Editor

Step 11: Set Up Environment Variables (Recommended)

Step 12: Verify Make Targets

Common Issues and Solutions

Issue: “protoc-gen-go: program not found”

Issue: “go.mod not found”

Issue: Port 50051 already in use

Issue: Permission denied on binaries

Next Steps

Getting Help

1.1.2 - Running the A2A-Compliant AgentHub Demo

Running the A2A-Compliant AgentHub Demo

Prerequisites

Step 1: Build the A2A-Compliant Components

Step 2: Start the AgentHub Broker Server

Step 3: Start an Agent (Subscriber)

Step 4: Send A2A-Compliant Tasks

Step 5: Observe A2A Task Processing

Step 6: Check the Broker Logs

Understanding What Happened

Next Steps

Troubleshooting

1.2 - Observability

Observability Tutorials

Available Tutorials

1.2.1 - Interactive Dashboard Tour

Interactive Dashboard Tour

Prerequisites

Quick Setup Reminder

Dashboard Navigation

Accessing the Main Dashboard

Dashboard Layout Overview

Interactive Tour

Tour 1: Understanding Event Flow (3 minutes)