Add repository harness for coding agents (#1157)

Co-authored-by: xingzhi <chuzihao.czh@alibaba-inc.com>
This commit is contained in:
sir1st
2026-05-30 18:57:22 +08:00
committed by GitHub
parent ce04b10eee
commit 6da5cd605a
9 changed files with 490 additions and 0 deletions
+3
View File
@@ -36,6 +36,9 @@ jobs:
npm ci --ignore-scripts
npm rebuild node-pty
- name: Check repository harness
run: npm run harness:check
- name: Test with coverage
run: npm run test:coverage
+52
View File
@@ -0,0 +1,52 @@
# Agent Map
This file is a short map for coding agents. Keep detailed guidance in `docs/`
and keep this file small enough to fit into every task context.
## First Reads
- `DEVELOPMENT.md` - project commands, coding rules, test rules, and PR shape.
- `ARCHITECTURE.md` - package boundaries, data ownership, and runtime flow.
- `docs/harness/README.md` - how this repository is prepared for agent work.
- `docs/harness/validation.md` - which checks to run for each change type.
- `docs/harness/worktree-runbook.md` - isolated local dev and test setup.
- `docs/harness/pr-review.md` - self-review checklist before pushing.
## Common Commands
```bash
npm ci --ignore-scripts
npm run harness:check
npm run test
npm run test:e2e
npm run build
```
Use the smallest relevant check while iterating. Before a broad PR, run
`npm run harness:check`, `npm run test:coverage`, `npm run test:e2e`, and
`npm run build`.
## Code Ownership Map
- `packages/client/src` - Vue 3 client, stores, routes, i18n, API helpers.
- `packages/server/src` - Koa API, Socket.IO, persistence, Hermes integration.
- `packages/desktop` - Electron wrapper, bundled Python/Hermes runtime, release artifacts.
- `tests/client`, `tests/server`, `tests/shared` - Vitest coverage.
- `tests/e2e` - Playwright browser coverage with mocked backend services.
- `.github/workflows` - CI, release, Docker, and desktop packaging automation.
## Hard Rules
- Keep routes thin: put request handling in controllers and reusable behavior in services.
- Keep Web UI state under `HERMES_WEB_UI_HOME` or `HERMES_WEBUI_STATE_DIR`.
- Keep Hermes Agent state separate from Web UI state.
- Register local API routes before proxy catch-all routes.
- Use structured APIs and argument arrays instead of shell string construction.
- Add user-facing strings to every locale file.
- Do not mix unrelated refactors into a bug fix.
## When The Agent Gets Stuck
Improve the harness instead of repeating the same prompt. Add missing docs,
tests, logs, scripts, or CI checks so the next agent can see and verify the
constraint directly.
+89
View File
@@ -0,0 +1,89 @@
# Architecture
Hermes Web UI is a TypeScript monorepo that ships a browser dashboard, a Koa
backend, and an Electron desktop distribution around Hermes Agent.
## Package Boundaries
| Area | Path | Responsibility |
| --- | --- | --- |
| Client | `packages/client/src` | Vue UI, routing, Pinia stores, API wrappers, i18n, browser-visible state. |
| Server | `packages/server/src` | HTTP API, auth, Socket.IO, SQLite stores, file access, Hermes runtime integration. |
| Desktop | `packages/desktop` | Electron shell, local Web UI server bootstrap, updater, bundled Python/Hermes runtime. |
| Tests | `tests` | Vitest unit/integration tests and Playwright browser tests. |
| CI | `.github/workflows` | Build, e2e, lockfile, Docker, and desktop release automation. |
## Request Flow
1. The browser loads the Vite-built client from the Koa server.
2. Client modules call API helpers from `packages/client/src/api`.
3. Server routes in `packages/server/src/routes` wire HTTP paths to controllers.
4. Controllers validate request concerns and delegate reusable behavior to services.
5. Services own side effects: files, SQLite, Hermes profiles, subprocesses, bridges, and credentials.
6. Long-running chat and group-chat flows use Socket.IO namespaces managed by server services.
Keep each layer narrow. Routes should not grow business logic, and client code
should not duplicate server persistence rules.
## State And Data Ownership
- Web UI state defaults to `~/.hermes-web-ui` through `config.appHome`.
- `HERMES_WEB_UI_HOME` and `HERMES_WEBUI_STATE_DIR` override Web UI state location.
- Hermes Agent state lives under Hermes profile directories and must stay distinct from Web UI state.
- Uploads default to `config.uploadDir`, which is derived from the Web UI home unless `UPLOAD_DIR` is set.
- Profile-scoped Hermes data should use existing profile helpers instead of manually joining paths.
## Server Structure
- `routes/` registers HTTP and WebSocket entry points.
- `controllers/` handles request-level behavior.
- `services/` owns reusable IO, domain behavior, external process calls, and integration logic.
- `db/` owns SQLite schemas and stores.
- `middleware/` owns request middleware such as user auth.
- `shared/` contains cross-server constants and helpers.
Architecture rules:
- Register local API routes before proxy catch-all routes.
- Keep auth behavior centralized in `packages/server/src/services/auth.ts`.
- Prefer `execFile` or `spawn` with argument arrays over shell command strings.
- Use structured file and YAML/JSON parsers when editing structured data.
## Client Structure
- `views/` contains route-level screens.
- `components/` contains reusable UI.
- `stores/` contains Pinia state.
- `api/` contains HTTP clients and should use `packages/client/src/api/client.ts`.
- `i18n/` contains locale messages for user-facing strings.
- `styles/` contains global styling and theme primitives.
Frontend rules:
- Use Vue 3 Composition API with `<script setup lang="ts">`.
- Use existing Naive UI patterns before adding new UI conventions.
- Add visible text to all locale files.
- Keep component styles scoped unless the style is intentionally global.
## Desktop Release Flow
Desktop packaging is intentionally split:
- Pull requests run a Linux desktop smoke test in `.github/workflows/build.yml`.
- Published releases and manual dispatches run `.github/workflows/desktop-release.yml`.
- Each release matrix target uploads only the artifact globs for its own platform.
Do not make a Windows job require macOS `.dmg` files or a Linux job require
Windows installers. Keep `fail_on_unmatched_files: true` where platform-specific
artifact lists make the expectation explicit.
## Validation Surface
The minimum mechanical harness is:
- `npm run harness:check` for repository docs, workflow, and package-script invariants.
- `npm run test` or focused Vitest tests for local logic.
- `npm run test:e2e` for browser-visible routing/auth/chat regressions.
- `npm run build` for type checking and production bundles.
See `docs/harness/validation.md` for change-specific commands.
+40
View File
@@ -0,0 +1,40 @@
# Harness Overview
This harness turns recurring project knowledge into files and checks that an
agent can discover without chat history.
## Goals
- Make repository context legible through short maps and deeper docs.
- Keep architecture constraints close to the code they protect.
- Give agents a deterministic validation path before opening or updating a PR.
- Prefer mechanical checks over reminder text when a rule can be verified.
## Entry Points
- `AGENTS.md` is the root map for coding agents.
- `ARCHITECTURE.md` documents package boundaries and state ownership.
- `DEVELOPMENT.md` remains the contributor rules and command reference.
- `docs/harness/validation.md` maps change types to checks.
- `docs/harness/worktree-runbook.md` explains isolated worktree development.
- `docs/harness/pr-review.md` provides a PR self-review checklist.
- `scripts/harness-check.mjs` enforces baseline repository invariants.
## Operating Model
1. Read the root map and the specific doc for the task.
2. Make the smallest scoped change.
3. Add or update focused tests when behavior changes.
4. Run `npm run harness:check` and the relevant validation commands.
5. If a failure pattern repeats, improve this harness with docs, tests, scripts,
or CI instead of relying on a longer prompt.
## What Belongs In The Harness
- Facts that future agents must know to work safely.
- Checklists that prevent repeated PR review comments.
- Scripts that fail fast on repository-wide invariants.
- Runbooks for local, CI, release, and desktop packaging flows.
Do not put long implementation notes in `AGENTS.md`. Add them under `docs/` and
link to them from the map.
+41
View File
@@ -0,0 +1,41 @@
# PR Self-Review
Use this checklist before pushing or updating a pull request.
## Scope
- The PR title states the behavior being changed.
- The diff is limited to the requested task and required harness updates.
- Unrelated formatting or refactors are not bundled into the change.
- User-facing text has locale coverage.
## Architecture
- Client code uses shared API helpers and existing UI patterns.
- Server routes stay thin and delegate reusable behavior to controllers/services.
- Web UI state uses `config.appHome` or documented helpers.
- Hermes Agent state and Web UI state remain separate.
- Subprocess calls use argument arrays instead of shell string construction.
## Tests And Validation
- A focused test was added or updated for behavior changes.
- Browser-visible flows have e2e coverage when the risk justifies it.
- `npm run harness:check` passes.
- The PR body lists validation commands that actually ran.
- Known limitations or follow-ups are called out.
## Release And CI
- Workflow changes were checked with `npm run harness:check`.
- Desktop release artifacts remain platform-specific.
- `fail_on_unmatched_files: true` is preserved when each matrix target has its
own expected artifact list.
- Package manifest changes have matching lockfile changes when dependencies
change.
## Before Merge
- CI is green or failures are explained as unrelated.
- The branch is mergeable.
- The PR does not depend on hidden local state, credentials, or uncommitted files.
+68
View File
@@ -0,0 +1,68 @@
# Validation Guide
Run the smallest relevant checks while iterating. Escalate to the broad checks
when touching shared behavior, release automation, auth, persistence, or chat.
## Always Run For PRs
```bash
npm run harness:check
```
For broad or shared changes, also run:
```bash
npm run test:coverage
npm run test:e2e
npm run build
```
## Change-Type Matrix
| Change | Minimum local validation |
| --- | --- |
| Docs only | `npm run harness:check` |
| Client component/store/API | focused `npm run test -- <pattern>`, then `npm run build` |
| User-visible browser flow | focused Vitest plus `npm run test:e2e` |
| Server controller/service/db | focused `npm run test -- tests/server/<file>` |
| Auth, profile, or credential behavior | focused server tests plus relevant e2e auth tests |
| Chat, Socket.IO, group chat | focused server tests plus relevant e2e chat tests |
| Desktop packaging | `npm run harness:check`, `npm run build`, and a platform-specific desktop build when practical |
| GitHub workflow | `npm run harness:check` and `actionlint` when available |
| Package manifests | `npm ci --ignore-scripts` and lockfile workflow expectations |
## CI Mapping
- Build workflow: installs dependencies, runs coverage, builds production assets,
then runs a Linux desktop smoke test on pull requests.
- Playwright workflow: runs browser e2e tests.
- NPM lockfile workflow: verifies `package-lock.json` is synchronized.
- Desktop release workflow: builds and uploads platform-specific desktop artifacts
for release tags.
- Docker workflow: builds and publishes release images.
## Release Workflow Guardrail
Desktop release jobs must upload only the artifacts that their matrix target can
produce. Keep artifact globs in matrix data and keep `fail_on_unmatched_files:
true` so missing expected files still fail.
Expected desktop release outputs:
| Target | Required release globs |
| --- | --- |
| macOS | `*.dmg`, `*.dmg.blockmap`, `latest*.yml` |
| Windows | `*.exe`, `*.exe.blockmap`, `latest*.yml` |
| Linux x64 | `*.AppImage`, `*.deb`, `latest*.yml` |
| Linux arm64 | `*.AppImage`, `latest*.yml` |
## Failure Handling
When a command fails:
1. Read the first actionable error, not just the final stack trace.
2. Check whether the failure indicates missing context, missing test coverage,
or a missing mechanical rule.
3. Fix the product bug when there is one.
4. Update docs or `scripts/harness-check.mjs` when the same class of mistake
should be prevented next time.
+64
View File
@@ -0,0 +1,64 @@
# Worktree Runbook
Use a separate git worktree for agent changes so local user work remains
untouched.
## Create A Worktree
```bash
git fetch origin --prune
git worktree add -b codex/<short-topic> ../worktrees/hermes-web-ui-<short-topic> origin/main
cd ../worktrees/hermes-web-ui-<short-topic>
```
If the repository uses a fork remote, push to the remote requested by the task.
Do not rewrite or reset unrelated branches.
## Install
```bash
npm ci --ignore-scripts
npm rebuild node-pty
```
Desktop package dependencies are separate:
```bash
npm ci --prefix packages/desktop --no-audit --no-fund
```
## Isolated Runtime
Use per-worktree state and ports to avoid colliding with a running local app:
```bash
export PORT=18648
export HERMES_WEB_UI_HOME="$PWD/.tmp/hermes-web-ui"
export HERMES_WEBUI_STATE_DIR="$HERMES_WEB_UI_HOME"
export UPLOAD_DIR="$PWD/.tmp/uploads"
npm run dev
```
Do not point `HERMES_WEB_UI_HOME` at a user's real `~/.hermes-web-ui` when a task
only needs local verification.
## Browser Checks
For browser-visible changes:
```bash
npm run test:e2e
```
Prefer existing Playwright fixtures and mocked backend services. Add real-service
requirements only when the behavior cannot be represented with mocks.
## Cleanup
After a PR is pushed and no more local work is needed:
```bash
git worktree remove ../worktrees/hermes-web-ui-<short-topic>
```
Only remove the worktree you created.
+1
View File
@@ -38,6 +38,7 @@
"dev:client": "cross-env HERMES_WEB_UI_BACKEND_PORT=8647 vite --host --port 8649 --strictPort",
"dev:server": "nodemon",
"build": "vue-tsc -b && vite build && tsc --noEmit -p packages/server/tsconfig.json && node scripts/build-server.mjs",
"harness:check": "node scripts/harness-check.mjs",
"prepare": "[ -d dist ] || npm run build",
"preview": "NODE_ENV=production vite preview",
"test": "vitest run",
+132
View File
@@ -0,0 +1,132 @@
#!/usr/bin/env node
import { readFile } from 'node:fs/promises'
import { existsSync } from 'node:fs'
import path from 'node:path'
const root = process.cwd()
const failures = []
function fail(message) {
failures.push(message)
}
async function readText(relativePath) {
return readFile(path.join(root, relativePath), 'utf8')
}
function requireFile(relativePath) {
if (!existsSync(path.join(root, relativePath))) {
fail(`Missing required harness file: ${relativePath}`)
}
}
function requireDir(relativePath) {
if (!existsSync(path.join(root, relativePath))) {
fail(`Missing required project directory: ${relativePath}`)
}
}
for (const file of [
'AGENTS.md',
'ARCHITECTURE.md',
'DEVELOPMENT.md',
'docs/harness/README.md',
'docs/harness/validation.md',
'docs/harness/worktree-runbook.md',
'docs/harness/pr-review.md',
]) {
requireFile(file)
}
for (const dir of [
'packages/client/src',
'packages/server/src',
'packages/desktop',
'tests/client',
'tests/server',
'tests/e2e',
'.github/workflows',
]) {
requireDir(dir)
}
const agents = await readText('AGENTS.md')
const agentLines = agents.trimEnd().split(/\r?\n/)
if (agentLines.length > 120) {
fail(`AGENTS.md should stay short; found ${agentLines.length} lines, expected <= 120`)
}
for (const requiredLink of [
'DEVELOPMENT.md',
'ARCHITECTURE.md',
'docs/harness/README.md',
'docs/harness/validation.md',
'docs/harness/worktree-runbook.md',
'docs/harness/pr-review.md',
]) {
if (!agents.includes(requiredLink)) {
fail(`AGENTS.md must link to ${requiredLink}`)
}
}
const packageJson = JSON.parse(await readText('package.json'))
for (const scriptName of [
'harness:check',
'test',
'test:coverage',
'test:e2e',
'build',
]) {
if (!packageJson.scripts?.[scriptName]) {
fail(`package.json is missing script: ${scriptName}`)
}
}
const architecture = await readText('ARCHITECTURE.md')
for (const phrase of [
'packages/client/src',
'packages/server/src',
'packages/desktop',
'HERMES_WEB_UI_HOME',
'fail_on_unmatched_files: true',
]) {
if (!architecture.includes(phrase)) {
fail(`ARCHITECTURE.md should document: ${phrase}`)
}
}
const buildWorkflow = await readText('.github/workflows/build.yml')
if (!buildWorkflow.includes('npm run harness:check')) {
fail('Build workflow must run npm run harness:check')
}
const desktopReleaseWorkflow = await readText('.github/workflows/desktop-release.yml')
if (!desktopReleaseWorkflow.includes('files: ${{ matrix.artifact_files }}')) {
fail('desktop-release.yml must upload matrix-specific artifact_files')
}
for (const target of ['target_os: darwin', 'target_os: win32', 'target_os: linux']) {
if (!desktopReleaseWorkflow.includes(target)) {
fail(`desktop-release.yml is missing matrix target ${target}`)
}
}
for (const expectedGlob of ['*.dmg', '*.exe', '*.AppImage', 'latest*.yml']) {
if (!desktopReleaseWorkflow.includes(expectedGlob)) {
fail(`desktop-release.yml is missing expected artifact glob ${expectedGlob}`)
}
}
if (!desktopReleaseWorkflow.includes('fail_on_unmatched_files: true')) {
fail('desktop-release.yml must keep fail_on_unmatched_files: true')
}
if (failures.length > 0) {
console.error('Harness check failed:')
for (const failure of failures) {
console.error(`- ${failure}`)
}
process.exit(1)
}
console.log('Harness check passed')