Building an AI Agent System with Hermes: A Remote Architecture Guide
How I set up Hermes AI agents to run on a VPS while connecting to a home LLM backend via Tailscale. Complete setup guide with architecture diagrams.
- AI Agents
- Hermes
- LLM
- Remote Setup
- Tailscale
- Telegram
Note: This article was written using the Hermes agent itself via Telegram — the same system described here.
Introduction
Autonomous AI agents have transformed how I approach development tasks. After evaluating several frameworks, I chose Hermes — an open-source agent framework that combines powerful LLM reasoning with practical tool integrations.
The core challenge: I wanted 24/7 availability and reliable network access from a VPS, while keeping my expensive GPU hardware at home for cost efficiency. This guide documents my complete setup, architecture decisions, and provides step-by-step instructions for replicating it.
System Architecture
This setup combines a reliable VPS for 24/7 availability with cost-effective home GPU inference. Here’s how the components interconnect:
┌─────────────────────────────────────────────────────────────────────────────┐
│ HOME NETWORK (LAN) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ GPU Server (RTX 3090 24GB) │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ │
│ │ │ llama.cpp │ │ Quantized │ │ Local Storage & │ │ │
│ │ │ Server │◄─┤ LLM Models │◄─┤ Model Weights │ │ │
│ │ │ (port 8000) │ │ (GGUF) │ │ │ │ │
│ │ └────────┬────────┘ └─────────────────┘ └─────────────────────┘ │ │
│ │ │ API Keys │ │ │
│ │ │ │ │
│ └───────────┼────────────────────────────────────────────────────────────┘ │
│ │ │
└──────────────┼────────────────────────────────────────────────────────────────┘
│
│ Tailscale Encrypted Tunnel
│ (100.64.x.x network)
│
┌──────────────┼────────────────────────────────────────────────────────────────┐
│ ▼ VPS (Remote Server) │
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
│ │ Hermes Agent Gateway ││
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ ││
│ │ │ Telegram │ │ Tool │ │ Cron │ │ Session │ ││
│ │ │ Bot API │ │ Executor │ │ Scheduler │ │ Manager │ ││
│ │ │ │ │ │ │ │ │ │ ││
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └───────┬────────┘ ││
│ │ │ │ │ │ ││
│ │ └─────────────────┬┴──────────────────┴──────────────────┘ ││
│ │ │ ││
│ │ ┌──────▼──────┐ ││
│ │ │ LLM Client │─────────────────┐ ││
│ │ │ (OpenAI API) │ │ ││
│ │ └──────────────┘ │ ││
│ └──────────────────────────────────────────────────────┼───────────────────────┘│
│ │ │
│ ┌─────────────────┐ ┌─────────────────────────────────┼───────────────────────┐│
│ │ Tools Available│ │ │ ││
│ │ ───────────────│ │ External Services │ ││
│ │ • Terminal │ │ ┌──────────────┐ ┌─────────────▼─────────────┐ ││
│ │ • File I/O │ │ │ GitHub │ │ Internet │ ││
│ │ • Web Search │ │ │ API │ │ (Web Scraping/APIs) │ ││
│ │ • Browser │ │ └──────────────┘ └───────────────────────────┘ ││
│ │ • Delegate │ │ ││
│ │ • GitHub │◄─┤ ││
│ │ • Code Exec │ │ ││
│ │ • Cron/Sched │ │ ││
│ └─────────────────┘ └──────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Data Flow:
- Message arrives at the Telegram bot
- Hermes Gateway receives it via the Telegram Bot API
- Gateway constructs a prompt with context and available tools
- Request routes through Tailscale tunnel to the home GPU
- llama.cpp generates the response
- Hermes executes any requested tool calls
- Final response returns to Telegram
Key Components
1. Home GPU Server
- Hardware: RTX 3090 (24GB VRAM) for running quantized LLMs
- Software: llama.cpp serving GGUF models via OpenAI-compatible API
- Benefits: Cost-effective inference with full control over model selection and quantization
2. llama.cpp Server
- Purpose: Serve local LLMs with an OpenAI-compatible HTTP API
- Configuration:
llama-server --model models/qwen-2.5-7b-instruct-q4_k_m.gguf \ --host 0.0.0.0 --port 8000 \ --ctx-size 8192 --threads 8 - Why llama.cpp: Excellent GGUF support, minimal memory overhead, straightforward HTTP API
3. Tailscale VPN
- Purpose: Secure, encrypted tunnel between VPS and home network
- Configuration:
- Home server receives a Tailscale IP (e.g.,
100.64.1.10) - VPS joins the same Tailscale network
- LLM backend listens on
100.64.1.10:8000
- Home server receives a Tailscale IP (e.g.,
4. API Gateway (api.lucasnicolas.dev)
- Purpose: Single, stable OpenAI-compatible endpoint for Hermes agents and Honcho memory services
- Implementation: Caddy reverse proxy on the VPS, terminating TLS and routing by path
- Current routing:
POST /v1/chat/completions→ llama.cpp backend (home GPU, port 8080)GET /v1/models→ llama.cpp backendPOST /v1/embeddings→ embedding backend (intfloat/multilingual-e5-large-instruct, port 8081)
- Active model:
qwen3.5-35b(served via llama.cpp with GGUF quantization) - Benefit: Hermes uses one HTTPS URL regardless of backend port or service changes; clean separation between chat and embedding backends
5. Hermes Agent Gateway
- Location: VPS (Ubuntu/Debian)
- Role: Central orchestrator managing:
- Message routing from Telegram
- Tool execution and result aggregation
- Conversation state management
- Scheduled task coordination
6. Telegram Bot
- Purpose: Primary interface for agent interaction
- Benefits: Mobile access, push notifications, persistent chat history
- Setup: BotFather creates the bot token; stored securely in
~/.hermes/.env
Why This Architecture?
Advantages
✅ Cost Efficiency: Home GPU costs far less than cloud GPU instances
✅ Privacy: Sensitive data never leaves your private network
✅ Reliability: VPS ensures 24/7 uptime with a public IP
✅ Flexibility: Easy to swap models, add tools, or customize behavior
✅ Security: Tailscale provides encrypted, private networking
✅ Low Latency: llama.cpp has minimal overhead compared to heavier frameworks
Trade-offs
⚠️ Latency: Network hop adds ~10-50ms per API call
⚠️ Complexity: More components to configure and maintain
⚠️ Bandwidth: Model weights transfer once (then cached locally)
⚠️ Home Internet: Upload speed affects request/response times
Complete Setup Guide
Step 1: Prepare Your Home GPU Server
Install llama.cpp
Option A: Build from source (Recommended)
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Build with CUDA support (for NVIDIA GPUs)
make LLAMA_CUDA=1
# Or use CMake for more configuration options
mkdir build && cd build
cmake .. -DLLAMA_CUDA=ON
make -j$(nproc)
Option B: Use pre-built release
# Download latest release from GitHub
wget https://github.com/ggerganov/llama.cpp/releases/download/b3062/llama-b3062-bin-ubuntu-x64.zip
unzip llama-b3062-bin-ubuntu-x64.zip
Download and Serve a Model
# Download a quantized model (example: Qwen 2.5 7B)
# Get GGUF models from Hugging Face: https://huggingface.co/models?search=gguf
# Example: Qwen 2.5 7B Instruct (Q4_K_M quantization)
wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
# Start the server
./llama-server \
--model qwen2.5-7b-instruct-q4_k_m.gguf \
--host 0.0.0.0 \
--port 8000 \
--ctx-size 8192 \
--threads 8 \
--gpu-layers 35
Configure Tailscale
# Install Tailscale on home server
curl -fsSL https://tailscale.com/install.sh | sh
# Start Tailscale and authenticate
sudo tailscale up
# Note your Tailscale IP (e.g., 100.64.1.10)
hostname -I
Important: Enable “Subnet Router” if you plan to access other devices:
sudo tailscale advertise-routes 100.64.1.10/32
Expose the LLM Backend
If behind a firewall/NAT:
# Option 1: Port forward on router (less secure)
# Forward external port 8000 → internal IP:8000
# Option 2: Use Tailscale as reverse proxy (recommended)
# Access via: http://100.64.1.10:8000
Step 2: Set Up the VPS
Prerequisites
- Ubuntu 22.04 LTS or Debian 12+
- Python 3.10+
- Node.js 18+ (for some tools)
- Git
- Tailscale client
Install Dependencies
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python and pip
sudo apt install -y python3 python3-pip python3-venv
# Install Node.js (if needed for certain tools)
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
# Install Git
sudo apt install -y git
# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
Configure Tailscale on VPS
# Start Tailscale and authenticate with same account as home server
sudo tailscale up
# Verify connection to home server
ping 100.64.1.10
# Test LLM backend connectivity
curl http://100.64.1.10:8000/v1/models
Configure api.lucasnicolas.dev (Caddy Reverse Proxy)
Rather than pointing Hermes directly to a Tailscale IP, I expose a single stable HTTPS endpoint on the VPS that proxies traffic to the appropriate backend. This setup provides:
- TLS termination with automatic Let’s Encrypt certificates
- Path-based routing to separate chat and embedding backends
- Port abstraction so backend changes don’t require client updates
Why llama.cpp over vLLM?
I evaluated switching to vLLM for better throughput, but chose to stay with llama.cpp because:
- Unsloth GGUF workflow: I’m using Unsloth-quantized GGUF models, which are native to llama.cpp. Converting to vLLM-compatible formats (AWQ/GPTQ) would require re-training or complex conversion that loses Unsloth optimizations.
- Single-agent use case: For my primary workflow (single-agent interactive work), the performance difference is negligible.
- Simplicity: llama.cpp works directly with GGUF files without additional tooling.
When to consider vLLM:
- Running multiple concurrent agents (better batching)
- Production serving with high throughput requirements
- You have native PyTorch checkpoints (not GGUF)
See the Performance Considerations section for tuning llama.cpp on your RTX 3090.
Create /etc/caddy/Caddyfile:
api.lucasnicolas.dev {
encode gzip
# Embeddings endpoint (CPU-friendly embedding server)
handle_path /v1/embeddings {
reverse_proxy 100.64.1.10:8081
}
# All other OpenAI-compatible routes (llama.cpp on home GPU)
handle {
reverse_proxy 100.64.1.10:8080
}
}
Apply and validate:
sudo caddy validate --config /etc/caddy/Caddyfile
sudo systemctl reload caddy
# Verify routes
curl https://api.lucasnicolas.dev/v1/models
curl -X POST https://api.lucasnicolas.dev/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"intfloat/multilingual-e5-large-instruct","input":"hello"}'
This gives Hermes, Honcho, and any other OpenAI-compatible client a single base URL to use:
https://api.lucasnicolas.dev/v1
Clone and Install Hermes
# Create working directory
mkdir -p ~/.hermes && cd ~/.hermes
# Clone Hermes repository (if using source)
git clone https://github.com/hermes-agent/hermes-agent.git
cd hermes-agent
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -e .
Step 3: Configure Hermes
Create Configuration File
# Create config directory
mkdir -p ~/.hermes
# Generate default config
hermes setup
Edit ~/.hermes/config.yaml:
# ~/.hermes/config.yaml
display:
theme: default
tool_progress_command: kawaii
model:
provider: custom
# Point to your HTTPS OpenAI-compatible gateway
model: "https://api.lucasnicolas.dev/v1"
api_key: "${LLM_API_KEY}" # Set in ~/.hermes/.env
tools:
enabled_toolsets:
- terminal
- file
- web
- delegate
- github
# Telegram configuration (set via env var)
telegram:
bot_token: "${HERMES_TELEGRAM_BOT_TOKEN}"
allowed_users:
- "your_telegram_username"
Set Environment Variables
Create ~/.hermes/.env:
# ~/.hermes/.env
HERMES_TELEGRAM_BOT_TOKEN=***
GITHUB_TOKEN=***
LLM_API_KEY=***
Get your Telegram Bot Token:
- Open Telegram and message @BotFather
- Send
/newbotand follow prompts - Copy the API token it provides
Get your GitHub Token:
- Go to https://github.com/settings/tokens
- Create a new token with
reposcope - Copy the token
Set your LLM API key:
- Use the key expected by your OpenAI-compatible endpoint (
api.lucasnicolas.dev) - Store it as
LLM_API_KEYin~/.hermes/.env - Keep file permissions strict:
chmod 600 ~/.hermes/.env
Test the Connection
# Activate virtual environment
cd ~/.hermes/hermes-agent
source venv/bin/activate
# Test LLM backend connectivity
python3 -c "
import requests
response = requests.get('https://api.lucasnicolas.dev/v1/models')
print('LLM Backend Status:', response.status_code)
print('Models:', response.json())
"
# Optional: test embeddings route
curl -X POST https://api.lucasnicolas.dev/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"intfloat/multilingual-e5-large-instruct","input":"Hermes health check"}'
Step 4: Run the Agent
Start the Gateway (Telegram Mode)
# In virtual environment
cd ~/.hermes/hermes-agent
# Start Telegram gateway
python3 -m hermes.gateway --platform telegram
The gateway will:
- Connect to your Telegram bot
- Listen for messages
- Forward requests to your home LLM backend
- Execute tools and return responses
Make it Persistent with systemd
Create ~/.hermes/hermes-gateway.service:
[Unit]
Description=Hermes AI Agent Gateway
After=network.target tailscaled.service
[Service]
Type=simple
User=lucas
WorkingDirectory=/home/lucas/.hermes/hermes-agent
Environment="PATH=/home/lucas/.hermes/hermes-agent/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/home/lucas/.hermes/hermes-agent/venv/bin/python3 -m hermes.gateway --platform telegram
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable and start:
# Copy to systemd directory
sudo cp ~/.hermes/hermes-gateway.service /etc/systemd/system/
# Reload systemd
sudo systemctl daemon-reload
# Enable and start
sudo systemctl enable hermes-gateway
sudo systemctl start hermes-gateway
# Check status
sudo systemctl status hermes-gateway
# View logs
sudo journalctl -u hermes-gateway -f
Step 5: Verify Everything Works
Test from Telegram
- Open Telegram and find your bot
- Send a simple greeting:
Hello - You should receive a response from the agent
Test Tool Access
# In Telegram, try these commands:
/git status # Check git tool access
/web_search "Hermes AI" # Test web search
/terminal "echo hello" # Test terminal access
Monitor Logs
# View real-time logs
sudo journalctl -u hermes-gateway -f
# Or check gateway output manually
tail -f ~/.hermes/hermes-agent/logs/gateway.log
Troubleshooting
Common Issues
”Connection refused” when accessing LLM backend
Symptoms: Agent can’t connect to home GPU server
Solutions:
- Verify Tailscale is running on both machines:
tailscale status - Check if you can ping the home IP:
ping 100.64.1.10 - Ensure llama.cpp is listening on
0.0.0.0not just127.0.0.1:# Correct - listens on all interfaces ./llama-server --host 0.0.0.0 --port 8000 # Wrong - only listens locally ./llama-server --host 127.0.0.1 --port 8000
“401 Unauthorized” from Telegram
Symptoms: Bot token rejected
Solutions:
- Verify bot token in
~/.hermes/.env - Ensure bot is active (not deleted)
- Check allowed_users list includes your Telegram username
Model loading fails or times out
Symptoms: llama.cpp returns errors or OOM
Solutions:
- Check GPU memory usage on home server:
nvidia-smi - Use a smaller context size:
./llama-server --ctx-size 4096 # instead of 8192 - Try a smaller quantization (Q4_K_M instead of Q8_0):
# Q4_K_M uses ~4.5GB for 7B model # Q8_0 uses ~7GB for 7B model
Out of GPU memory
Symptoms: CUDA out of memory errors
Solutions:
- Reduce
--gpu-layersparameter:# Fewer layers on GPU = less VRAM ./llama-server --gpu-layers 20 - Use CPU offload for some layers:
./llama-server --gpu-layers 10 --threads 8 - Try a smaller model (7B instead of 14B)
“model not found” on /v1/embeddings
Symptoms: API returns an error like:
The provided model=BAAI/bge-m3 has not been found ... use model=intfloat/multilingual-e5-large-instruct instead.
Root cause: Your embedding backend is serving intfloat/multilingual-e5-large-instruct (1024-d vectors), but configuration files reference a different model.
Solutions:
-
Update embedding model everywhere to match what your endpoint serves:
intfloat/multilingual-e5-large-instruct- Vector dimensions: 1024
-
Check all config locations:
# Honcho config grep -r "BAAI/bge-m3" ~/honcho/ # Hermes config grep -r "embedding" ~/.hermes/config.yaml # Environment files grep -r "EMBEDDING_MODEL" ~/.hermes/ ~/honcho/ -
Restart services that cache config:
cd ~/honcho && docker compose restart api deriver sudo systemctl restart hermes-gateway -
Verify the fix:
curl -X POST https://api.lucasnicolas.dev/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"model":"intfloat/multilingual-e5-large-instruct","input":"test"}'
Important: The embedding model name must exactly match what your backend serves. If unsure, check your embedding server logs or test with different model names.
Debugging Tips
# Test gateway endpoint directly from VPS
curl -X POST https://api.lucasnicolas.dev/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-35b",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7
}'
# Test embeddings route and model name
curl -X POST https://api.lucasnicolas.dev/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"intfloat/multilingual-e5-large-instruct","input":"health check"}'
# Check Tailscale network (upstream path from VPS proxy to home server)
tailscale ping 100.64.1.10
tailscale status
# Verify file permissions
ls -la ~/.hermes/.env
chmod 600 ~/.hermes/.env
# Monitor llama.cpp directly (on home server)
# Check for errors in the terminal output
Advanced Configuration
Adding More Tools
Edit ~/.hermes/config.yaml:
tools:
enabled_toolsets:
- terminal # Shell commands
- file # File I/O operations
- web # Web search and extraction
- delegate # Subagent delegation
- github # GitHub repository access
- browser # Browser automation (requires API key)
- cron # Scheduled task management
disabled_toolsets: []
Honcho Memory Integration
Hermes integrates with Honcho, a self-hosted memory system for long-term conversation context and semantic search.
Architecture:
- Storage: PostgreSQL + Redis on the VPS (
~/honchodirectory) - Embeddings: Routed through
api.lucasnicolas.dev/v1/embeddingsto your embedding server at port 8081 - Vector dimensions: 1024 (matches
intfloat/multilingual-e5-large-instruct)
Verification:
# Check Hermes Honcho connection
hermes honcho status
# Expected output:
# ✓ Connection OK
# - API: http://localhost:8000
# - Embeddings: https://api.lucasnicolas.dev/v1/embeddings
# - Dimensions: 1024
If Honcho shows connection issues:
- Verify Honcho services are running:
cd ~/honcho && docker compose ps # Expected: api, database, redis, deriver all healthy - Check embedding configuration:
# Ensure EMBEDDING_MODEL matches your backend grep "EMBEDDING_MODEL" ~/honcho/.env # Should be: intfloat/multilingual-e5-large-instruct - Restart Honcho if needed:
cd ~/honcho && docker compose restart api deriver
Configuring Scheduled Tasks
Hermes supports cron-like scheduling for automated tasks:
# In your Telegram chat, send:
/cron create "Daily backup" "0 3 * * *" "tar -czf /backup/home.tar.gz /home"
/cron list # View all scheduled jobs
/cron pause "Daily backup"
/cron resume "Daily backup"
/cron remove "Daily backup"
Custom Skill Sets
Create custom skills in ~/.hermes/skills/:
mkdir -p ~/.hermes/skills/my-skills
Example skill file (~/.hermes/skills/my-skills/deploy.md):
---
name: deploy-to-vps
description: Deploy code to production server
version: 1.0.0
---
## Deployment Workflow
1. Build the project
2. Test locally
3. Push to production branch
4. Restart services
Model Selection for llama.cpp
| Model | GGUF File | VRAM (Q4_K_M) | Use Case |
|---|---|---|---|
| Qwen 2.5 7B | qwen2.5-7b-instruct-q4_k_m.gguf | ~6GB | Fast, general purpose |
| Llama 3.1 8B | llama-3.1-8b-instruct-q4_k_m.gguf | ~6GB | Reasoning, code |
| Mistral 7B | mistral-7b-instruct-v0.3-q4_k_m.gguf | ~6GB | Code, reasoning |
| Qwen 2.5 14B | qwen2.5-14b-instruct-q4_k_m.gguf | ~10GB | Complex reasoning |
| Qwen 2.5 32B | qwen2.5-32b-instruct-q4_k_m.gguf | ~20GB | Research, deep analysis |
Recommendation: Start with Qwen 2.5 7B or Llama 3.1 8B for most use cases.
Performance Tuning
# Optimal llama.cpp settings for RTX 3090 (24GB)
./llama-server \
--model qwen2.5-14b-instruct-q4_k_m.gguf \
--host 0.0.0.0 \
--port 8000 \
--ctx-size 8192 \
--threads 8 \
--gpu-layers 40 \
--batch-size 512 \
--ubatch-size 128 \
--n-gpu-layers 40
Performance Considerations
Latency Breakdown
| Component | Typical Latency |
|---|---|
| Telegram API | 50-150ms |
| Tailscale tunnel | 10-50ms |
| llama.cpp inference (7B, RTX 3090) | 20-50 tokens/s |
| VPS internal processing | <10ms |
| Total first token latency | ~200-500ms |
Cost Analysis
| Component | Monthly Cost | Notes |
|---|---|---|
| Home GPU electricity | ~$15-30 | RTX 3090 idle + active usage |
| VPS (basic) | $5-20 | DigitalOcean, Linode, etc. |
| Tailscale | Free | Up to 3 users, unlimited nodes |
| Total | ~$20-50/month | vs. $100-300 for cloud GPU |
Optimization Tips
- Use quantized models: Q4_K_M offers best quality/size ratio
- Batch requests: Combine multiple queries when possible
- Cache responses: Hermes supports prompt caching for repeated queries
- Keep context small: Smaller contexts = faster inference
- Use GPU layers wisely: Balance GPU/CPU offload for your hardware
Conclusion
This remote Hermes setup gives you the best of both worlds:
- Reliability: VPS provides 24/7 availability and public IP
- Privacy: Your data stays on your home network
- Cost: Home GPU is far cheaper than cloud alternatives
- Flexibility: Easy to modify, extend, and customize
- Performance: llama.cpp is lean and fast with minimal overhead
- Memory: Honcho provides semantic search and long-term context
The key components are Tailscale for secure networking, api.lucasnicolas.dev as a unified gateway, and understanding the trade-offs between latency and cost.
Why This Setup Works
- llama.cpp + GGUF: Leverages Unsloth-quantized models without format conversion
- Single HTTPS endpoint:
api.lucasnicolas.dev/v1abstracts away backend complexity - Separate embedding route: Keeps CPU-intensive embeddings off the main inference pipeline
- Self-hosted memory: Honcho runs locally with 1024-d vectors matching your embedding model
Next Steps
- Start small: Begin with a 7B Q4_K_M model and basic tools
- Monitor performance: Use the built-in logging to track latency
- Iterate: Add features gradually as you become comfortable
- Share: Contribute back to the Hermes community with your improvements
Written by: Lucas Nicolas
Last Updated: April 12, 2026
License: MIT
Source Code: github.com/lucas-nicolas-viseo/portfolio
This article was written using the Hermes agent via Telegram, demonstrating exactly the kind of workflow this architecture enables.