Agent Discovery Workflow Explained
This document explains how the agent discovery workflow operates in AgentHub, enabling dynamic registration and LLM-based orchestration.
Overview
Agent discovery is the process by which agents dynamically register their capabilities with the Cortex orchestrator, making themselves available for intelligent task delegation via an LLM (Large Language Model).
The Problem This Solves
Traditional multi-agent systems require:
- Hard-coded agent configurations
- Static routing rules
- Manual updates when adding new agents
- No intelligence in task routing
Agent discovery with Cortex provides:
- Dynamic registration: Agents announce themselves when they start
- Intelligent routing: LLM decides which agent to use based on capabilities
- Zero configuration: No central registry to update
- Scalable: Add or remove agents without system changes
How It Works
The Five-Step Flow
┌──────────────┐
│ 1. Agent │ Agent starts and creates an AgentCard
│ Startup │ describing its capabilities
└──────┬───────┘
│
▼
┌──────────────┐
│ 2. Register │ Agent calls RegisterAgent RPC
│ with │ sending the AgentCard to broker
│ Broker │
└──────┬───────┘
│
▼
┌──────────────┐
│ 3. Event │ Broker publishes AgentCardEvent
│ Publishing│ broadcasting to all subscribers
└──────┬───────┘
│
▼
┌──────────────┐
│ 4. Cortex │ Cortex receives event and stores
│ Discovery │ agent in its registry
└──────┬───────┘
│
▼
┌──────────────┐
│ 5. LLM │ Agent is now available in LLM
│ Integration│ prompts for intelligent delegation
└──────────────┘
Step 1: Agent Startup
When an agent starts, it creates an AgentCard that describes:
agentCard := &pb.AgentCard{
Name: "agent_translator",
Description: "Language translation service",
Version: "1.0.0",
Skills: []*pb.AgentSkill{
{
Name: "Text Translation",
Description: "Translates text between languages",
Examples: [
"Translate this to Spanish",
"Convert to French",
],
},
},
}
Key Components:
- Name: Unique identifier (used for routing)
- Description: What the agent does (helps LLM understand)
- Skills: Specific capabilities with examples (used for matching)
Step 2: Registration with Broker
The agent registers by calling the broker’s RegisterAgent RPC:
client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
AgentCard: agentCard,
Subscriptions: []string{"translation_request"},
})
What happens:
- Broker validates the AgentCard
- Stores agent in its registry:
registeredAgents[agentID] = card - Returns success response
Step 3: Event Publishing
The broker immediately publishes an AgentCardEvent:
event := &pb.AgentEvent{
EventId: "agent_registered_translator_...",
Timestamp: now(),
Payload: &pb.AgentEvent_AgentCard{
AgentCard: &pb.AgentCardEvent{
AgentId: "agent_translator",
AgentCard: agentCard,
EventType: "registered",
},
},
Routing: &pb.AgentEventMetadata{
FromAgentId: "agent_translator",
ToAgentId: "", // Broadcast
EventType: "agent.registered",
Priority: PRIORITY_HIGH,
},
}
Routing characteristics:
- Broadcast to all subscribers (empty
ToAgentId) - High priority (processed immediately)
- Event type clearly marked as “agent.registered”
Step 4: Cortex Discovery
Cortex subscribes to agent events:
stream, _ := client.SubscribeToAgentEvents(ctx, &pb.SubscribeToAgentEventsRequest{
AgentId: "cortex",
EventTypes: []string{"agent.registered", "agent.updated"},
})
When receiving an agent card event, Cortex:
func handleAgentCardEvent(event *pb.AgentCardEvent) {
agentID := event.GetAgentId()
agentCard := event.GetAgentCard()
// Store agent
cortex.RegisterAgent(agentID, agentCard)
// Log skills for visibility
log.Info("Agent registered",
"agent_id", agentID,
"skills", extractSkillNames(agentCard.Skills))
}
Result: Agent is now in Cortex’s registeredAgents map.
Step 5: LLM Integration
When a user sends a request, Cortex queries the LLM:
decision, _ := llm.Decide(
conversationHistory,
availableAgents, // Includes our new agent!
newUserMessage,
)
The LLM sees:
Available agents:
- agent_translator: Language translation service
Skills:
* Text Translation: Translates text between languages
Examples: "Translate this to Spanish", "Convert to French"
Decision making:
- User asks: “Can you translate this to Spanish?”
- LLM sees “agent_translator” with matching examples
- LLM decides: Delegate to agent_translator
- Cortex sends task to agent_translator
- Agent processes and responds
- Cortex synthesizes final response
Message Flow Diagram
sequenceDiagram
participant A as Translation Agent
participant B as Broker
participant C as Cortex
participant L as LLM (VertexAI)
participant U as User
Note over A: Step 1: Startup
A->>A: Create AgentCard
Note over A,B: Step 2: Registration
A->>B: RegisterAgent(card)
B->>B: Store in registry
Note over B: Step 3: Event Publishing
B->>C: AgentCardEvent (broadcast)
Note over C: Step 4: Discovery
C->>C: RegisterAgent(id, card)
C->>C: total_agents++
Note over U,L: Step 5: LLM Integration
U->>C: "Translate to Spanish"
C->>L: Decide(availableAgents)
Note over L: Sees translator agent<br/>with matching examples
L-->>C: {delegate: agent_translator}
C->>A: Task message
A->>A: Process translation
A->>C: Result
C->>L: Synthesize
L-->>C: Final response
C->>U: "Here's the Spanish: ..."Technical Implementation Details
Thread Safety
Agent registration is thread-safe:
type AgentHubService struct {
registeredAgents map[string]*pb.AgentCard
agentsMu sync.RWMutex
}
func (s *AgentHubService) RegisterAgent(...) {
s.agentsMu.Lock()
s.registeredAgents[agentID] = card
s.agentsMu.Unlock()
}
Multiple agents can register concurrently without conflicts.
Event Delivery
Events are delivered asynchronously:
for _, subChan := range targetChannels {
go func(ch chan *pb.AgentEvent) {
select {
case ch <- event:
// Delivered
case <-time.After(5 * time.Second):
// Timeout
}
}(subChan)
}
Benefits:
- Non-blocking: Broker doesn’t wait for all deliveries
- Resilient: Timeout prevents hanging
- Concurrent: Multiple subscribers receive events in parallel
LLM Prompt Generation
Cortex builds prompts dynamically:
func buildOrchestrationPrompt(availableAgents []*pb.AgentCard) string {
prompt := "Available agents:\n"
for _, agent := range availableAgents {
prompt += fmt.Sprintf("- %s: %s\n",
agent.Name, agent.Description)
for _, skill := range agent.Skills {
prompt += fmt.Sprintf(" Skills:\n")
prompt += fmt.Sprintf(" * %s: %s\n",
skill.Name, skill.Description)
}
}
return prompt
}
Updated automatically when new agents register.
Timing and Performance
Typical timings for agent discovery:
Agent startup: 100-200ms
RegisterAgent RPC: < 10ms
Event publishing: < 5ms
Event delivery: < 50ms
Cortex processing: < 10ms
Total discovery time: < 300ms
Fast enough that agents are available for routing within milliseconds of starting.
Error Handling
Registration Failures
If registration fails:
_, err := client.RegisterAgent(ctx, req)
if err != nil {
log.Error("Registration failed", "error", err)
// Agent should retry or exit
panic(err)
}
Common causes:
- Broker not running
- Network issues
- Invalid AgentCard (empty name)
Event Delivery Failures
If event delivery fails:
if err := s.routeEvent(ctx, event); err != nil {
log.Warn("Event routing failed", "error", err)
// Continue anyway - registration still succeeded
}
Graceful degradation: Registration succeeds even if event routing fails.
Cortex Not Subscribed
If Cortex isn’t subscribed yet:
- Events are still published
- Cortex can query
GetAgentCard()RPC later - Or register when Cortex starts
Resilient: System handles various startup orders.
Observability
Broker Logs
level=INFO msg="Agent registered" agent_id=agent_translator
level=DEBUG msg="Routing event to subscribers"
event_type=agent.registered subscriber_count=2
level=DEBUG msg="Event delivered to subscriber"
Cortex Logs
level=INFO msg="Received agent card event"
agent_id=agent_translator event_type=registered
level=INFO msg="Agent skills registered"
skills="[Text Translation: Translates...]"
level=INFO msg="Agent registered with Cortex orchestrator"
total_agents=3
Distributed Tracing
Agent registration creates trace spans:
agent_registered_translator
└─ broker.route_event
├─ deliver_to_cortex
└─ deliver_to_monitor
Visibility into the entire discovery flow.
Lifecycle Management
Agent Startup Sequence
1. Create AgentHub client
2. Connect to broker
3. Create AgentCard
4. Call RegisterAgent
5. Subscribe to messages
6. Enter processing loop
Agent Shutdown
Currently agents don’t explicitly unregister. For graceful shutdown:
// In future enhancement:
defer client.UnregisterAgent(ctx, &pb.UnregisterAgentRequest{
AgentId: myAgentID,
})
This would trigger an “agent.unregistered” event.
Agent Updates
To update capabilities:
// Modify AgentCard
agentCard.Skills = append(agentCard.Skills, newSkill)
// Re-register
client.RegisterAgent(ctx, &pb.RegisterAgentRequest{
AgentCard: agentCard,
})
// Triggers "agent.updated" event
Cortex receives update and refreshes its registry.
Comparison with Other Patterns
vs. Service Discovery (Consul, etcd)
Agent Discovery:
- Includes capability metadata (skills)
- Optimized for LLM consumption
- Event-driven notification
- Rich semantic information
Service Discovery:
- Network location only
- Health checks
- Static metadata
- Pull-based queries
vs. API Gateway
Agent Discovery:
- Dynamic routing based on content
- LLM makes intelligent decisions
- Supports complex multi-step workflows
API Gateway:
- Path-based routing
- Static configuration
- Single request-response
vs. Message Queues
Agent Discovery:
- Agents know their capabilities
- Centralized intelligence (Cortex)
- Rich metadata for decisions
Message Queues:
- Topic-based routing
- No central intelligence
- Minimal metadata
Design Decisions
Why Broadcast Events?
Decision: Publish agent cards to all subscribers
Alternatives considered:
- Point-to-point to Cortex only
- Store-and-query model
Rationale:
- Multiple orchestrators can coexist
- Monitoring agents can track all agents
- Extensible for future use cases
- Low overhead (events are small)
Why High Priority?
Decision: Agent registration events use PRIORITY_HIGH
Rationale:
- New agents should be available quickly
- User requests may come immediately
- Discovery is time-sensitive
- Low volume (not many registrations)
Why Skills with Examples?
Decision: Include example user requests in skills
Rationale:
- LLMs learn by example
- Natural language is ambiguous
- Examples disambiguate capabilities
- Improves matching accuracy
Future Enhancements
See AGENT_DECIDE.md for planned improvements:
- Agent Health Monitoring: Track agent availability
- Agent Deregistration: Explicit removal from registry
- Agent Versioning: Support multiple versions simultaneously
- Capability Queries: Search agents by capability
- Load Balancing: Distribute work among multiple instances
Conclusion
The agent discovery workflow enables:
- Zero-configuration agent deployment
- Intelligent routing via LLM
- Dynamic scaling of agent pools
- Automatic orchestration based on capabilities
- Flexible, extensible multi-agent systems
This architecture supports truly autonomous, self-organizing agent networks that can adapt to changing requirements without manual intervention.
Related Documentation
- Create Agent with Cortex - Implementation guide
- Design Agent Cards - Best practices
- A2A Protocol - Protocol specification
- Cortex Architecture - Orchestrator design
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.