Add repository harness for coding agents (#1157)
Co-authored-by: xingzhi <chuzihao.czh@alibaba-inc.com>
This commit is contained in:
@@ -36,6 +36,9 @@ jobs:
|
||||
npm ci --ignore-scripts
|
||||
npm rebuild node-pty
|
||||
|
||||
- name: Check repository harness
|
||||
run: npm run harness:check
|
||||
|
||||
- name: Test with coverage
|
||||
run: npm run test:coverage
|
||||
|
||||
|
||||
@@ -0,0 +1,52 @@
|
||||
# Agent Map
|
||||
|
||||
This file is a short map for coding agents. Keep detailed guidance in `docs/`
|
||||
and keep this file small enough to fit into every task context.
|
||||
|
||||
## First Reads
|
||||
|
||||
- `DEVELOPMENT.md` - project commands, coding rules, test rules, and PR shape.
|
||||
- `ARCHITECTURE.md` - package boundaries, data ownership, and runtime flow.
|
||||
- `docs/harness/README.md` - how this repository is prepared for agent work.
|
||||
- `docs/harness/validation.md` - which checks to run for each change type.
|
||||
- `docs/harness/worktree-runbook.md` - isolated local dev and test setup.
|
||||
- `docs/harness/pr-review.md` - self-review checklist before pushing.
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
npm ci --ignore-scripts
|
||||
npm run harness:check
|
||||
npm run test
|
||||
npm run test:e2e
|
||||
npm run build
|
||||
```
|
||||
|
||||
Use the smallest relevant check while iterating. Before a broad PR, run
|
||||
`npm run harness:check`, `npm run test:coverage`, `npm run test:e2e`, and
|
||||
`npm run build`.
|
||||
|
||||
## Code Ownership Map
|
||||
|
||||
- `packages/client/src` - Vue 3 client, stores, routes, i18n, API helpers.
|
||||
- `packages/server/src` - Koa API, Socket.IO, persistence, Hermes integration.
|
||||
- `packages/desktop` - Electron wrapper, bundled Python/Hermes runtime, release artifacts.
|
||||
- `tests/client`, `tests/server`, `tests/shared` - Vitest coverage.
|
||||
- `tests/e2e` - Playwright browser coverage with mocked backend services.
|
||||
- `.github/workflows` - CI, release, Docker, and desktop packaging automation.
|
||||
|
||||
## Hard Rules
|
||||
|
||||
- Keep routes thin: put request handling in controllers and reusable behavior in services.
|
||||
- Keep Web UI state under `HERMES_WEB_UI_HOME` or `HERMES_WEBUI_STATE_DIR`.
|
||||
- Keep Hermes Agent state separate from Web UI state.
|
||||
- Register local API routes before proxy catch-all routes.
|
||||
- Use structured APIs and argument arrays instead of shell string construction.
|
||||
- Add user-facing strings to every locale file.
|
||||
- Do not mix unrelated refactors into a bug fix.
|
||||
|
||||
## When The Agent Gets Stuck
|
||||
|
||||
Improve the harness instead of repeating the same prompt. Add missing docs,
|
||||
tests, logs, scripts, or CI checks so the next agent can see and verify the
|
||||
constraint directly.
|
||||
@@ -0,0 +1,89 @@
|
||||
# Architecture
|
||||
|
||||
Hermes Web UI is a TypeScript monorepo that ships a browser dashboard, a Koa
|
||||
backend, and an Electron desktop distribution around Hermes Agent.
|
||||
|
||||
## Package Boundaries
|
||||
|
||||
| Area | Path | Responsibility |
|
||||
| --- | --- | --- |
|
||||
| Client | `packages/client/src` | Vue UI, routing, Pinia stores, API wrappers, i18n, browser-visible state. |
|
||||
| Server | `packages/server/src` | HTTP API, auth, Socket.IO, SQLite stores, file access, Hermes runtime integration. |
|
||||
| Desktop | `packages/desktop` | Electron shell, local Web UI server bootstrap, updater, bundled Python/Hermes runtime. |
|
||||
| Tests | `tests` | Vitest unit/integration tests and Playwright browser tests. |
|
||||
| CI | `.github/workflows` | Build, e2e, lockfile, Docker, and desktop release automation. |
|
||||
|
||||
## Request Flow
|
||||
|
||||
1. The browser loads the Vite-built client from the Koa server.
|
||||
2. Client modules call API helpers from `packages/client/src/api`.
|
||||
3. Server routes in `packages/server/src/routes` wire HTTP paths to controllers.
|
||||
4. Controllers validate request concerns and delegate reusable behavior to services.
|
||||
5. Services own side effects: files, SQLite, Hermes profiles, subprocesses, bridges, and credentials.
|
||||
6. Long-running chat and group-chat flows use Socket.IO namespaces managed by server services.
|
||||
|
||||
Keep each layer narrow. Routes should not grow business logic, and client code
|
||||
should not duplicate server persistence rules.
|
||||
|
||||
## State And Data Ownership
|
||||
|
||||
- Web UI state defaults to `~/.hermes-web-ui` through `config.appHome`.
|
||||
- `HERMES_WEB_UI_HOME` and `HERMES_WEBUI_STATE_DIR` override Web UI state location.
|
||||
- Hermes Agent state lives under Hermes profile directories and must stay distinct from Web UI state.
|
||||
- Uploads default to `config.uploadDir`, which is derived from the Web UI home unless `UPLOAD_DIR` is set.
|
||||
- Profile-scoped Hermes data should use existing profile helpers instead of manually joining paths.
|
||||
|
||||
## Server Structure
|
||||
|
||||
- `routes/` registers HTTP and WebSocket entry points.
|
||||
- `controllers/` handles request-level behavior.
|
||||
- `services/` owns reusable IO, domain behavior, external process calls, and integration logic.
|
||||
- `db/` owns SQLite schemas and stores.
|
||||
- `middleware/` owns request middleware such as user auth.
|
||||
- `shared/` contains cross-server constants and helpers.
|
||||
|
||||
Architecture rules:
|
||||
|
||||
- Register local API routes before proxy catch-all routes.
|
||||
- Keep auth behavior centralized in `packages/server/src/services/auth.ts`.
|
||||
- Prefer `execFile` or `spawn` with argument arrays over shell command strings.
|
||||
- Use structured file and YAML/JSON parsers when editing structured data.
|
||||
|
||||
## Client Structure
|
||||
|
||||
- `views/` contains route-level screens.
|
||||
- `components/` contains reusable UI.
|
||||
- `stores/` contains Pinia state.
|
||||
- `api/` contains HTTP clients and should use `packages/client/src/api/client.ts`.
|
||||
- `i18n/` contains locale messages for user-facing strings.
|
||||
- `styles/` contains global styling and theme primitives.
|
||||
|
||||
Frontend rules:
|
||||
|
||||
- Use Vue 3 Composition API with `<script setup lang="ts">`.
|
||||
- Use existing Naive UI patterns before adding new UI conventions.
|
||||
- Add visible text to all locale files.
|
||||
- Keep component styles scoped unless the style is intentionally global.
|
||||
|
||||
## Desktop Release Flow
|
||||
|
||||
Desktop packaging is intentionally split:
|
||||
|
||||
- Pull requests run a Linux desktop smoke test in `.github/workflows/build.yml`.
|
||||
- Published releases and manual dispatches run `.github/workflows/desktop-release.yml`.
|
||||
- Each release matrix target uploads only the artifact globs for its own platform.
|
||||
|
||||
Do not make a Windows job require macOS `.dmg` files or a Linux job require
|
||||
Windows installers. Keep `fail_on_unmatched_files: true` where platform-specific
|
||||
artifact lists make the expectation explicit.
|
||||
|
||||
## Validation Surface
|
||||
|
||||
The minimum mechanical harness is:
|
||||
|
||||
- `npm run harness:check` for repository docs, workflow, and package-script invariants.
|
||||
- `npm run test` or focused Vitest tests for local logic.
|
||||
- `npm run test:e2e` for browser-visible routing/auth/chat regressions.
|
||||
- `npm run build` for type checking and production bundles.
|
||||
|
||||
See `docs/harness/validation.md` for change-specific commands.
|
||||
@@ -0,0 +1,40 @@
|
||||
# Harness Overview
|
||||
|
||||
This harness turns recurring project knowledge into files and checks that an
|
||||
agent can discover without chat history.
|
||||
|
||||
## Goals
|
||||
|
||||
- Make repository context legible through short maps and deeper docs.
|
||||
- Keep architecture constraints close to the code they protect.
|
||||
- Give agents a deterministic validation path before opening or updating a PR.
|
||||
- Prefer mechanical checks over reminder text when a rule can be verified.
|
||||
|
||||
## Entry Points
|
||||
|
||||
- `AGENTS.md` is the root map for coding agents.
|
||||
- `ARCHITECTURE.md` documents package boundaries and state ownership.
|
||||
- `DEVELOPMENT.md` remains the contributor rules and command reference.
|
||||
- `docs/harness/validation.md` maps change types to checks.
|
||||
- `docs/harness/worktree-runbook.md` explains isolated worktree development.
|
||||
- `docs/harness/pr-review.md` provides a PR self-review checklist.
|
||||
- `scripts/harness-check.mjs` enforces baseline repository invariants.
|
||||
|
||||
## Operating Model
|
||||
|
||||
1. Read the root map and the specific doc for the task.
|
||||
2. Make the smallest scoped change.
|
||||
3. Add or update focused tests when behavior changes.
|
||||
4. Run `npm run harness:check` and the relevant validation commands.
|
||||
5. If a failure pattern repeats, improve this harness with docs, tests, scripts,
|
||||
or CI instead of relying on a longer prompt.
|
||||
|
||||
## What Belongs In The Harness
|
||||
|
||||
- Facts that future agents must know to work safely.
|
||||
- Checklists that prevent repeated PR review comments.
|
||||
- Scripts that fail fast on repository-wide invariants.
|
||||
- Runbooks for local, CI, release, and desktop packaging flows.
|
||||
|
||||
Do not put long implementation notes in `AGENTS.md`. Add them under `docs/` and
|
||||
link to them from the map.
|
||||
@@ -0,0 +1,41 @@
|
||||
# PR Self-Review
|
||||
|
||||
Use this checklist before pushing or updating a pull request.
|
||||
|
||||
## Scope
|
||||
|
||||
- The PR title states the behavior being changed.
|
||||
- The diff is limited to the requested task and required harness updates.
|
||||
- Unrelated formatting or refactors are not bundled into the change.
|
||||
- User-facing text has locale coverage.
|
||||
|
||||
## Architecture
|
||||
|
||||
- Client code uses shared API helpers and existing UI patterns.
|
||||
- Server routes stay thin and delegate reusable behavior to controllers/services.
|
||||
- Web UI state uses `config.appHome` or documented helpers.
|
||||
- Hermes Agent state and Web UI state remain separate.
|
||||
- Subprocess calls use argument arrays instead of shell string construction.
|
||||
|
||||
## Tests And Validation
|
||||
|
||||
- A focused test was added or updated for behavior changes.
|
||||
- Browser-visible flows have e2e coverage when the risk justifies it.
|
||||
- `npm run harness:check` passes.
|
||||
- The PR body lists validation commands that actually ran.
|
||||
- Known limitations or follow-ups are called out.
|
||||
|
||||
## Release And CI
|
||||
|
||||
- Workflow changes were checked with `npm run harness:check`.
|
||||
- Desktop release artifacts remain platform-specific.
|
||||
- `fail_on_unmatched_files: true` is preserved when each matrix target has its
|
||||
own expected artifact list.
|
||||
- Package manifest changes have matching lockfile changes when dependencies
|
||||
change.
|
||||
|
||||
## Before Merge
|
||||
|
||||
- CI is green or failures are explained as unrelated.
|
||||
- The branch is mergeable.
|
||||
- The PR does not depend on hidden local state, credentials, or uncommitted files.
|
||||
@@ -0,0 +1,68 @@
|
||||
# Validation Guide
|
||||
|
||||
Run the smallest relevant checks while iterating. Escalate to the broad checks
|
||||
when touching shared behavior, release automation, auth, persistence, or chat.
|
||||
|
||||
## Always Run For PRs
|
||||
|
||||
```bash
|
||||
npm run harness:check
|
||||
```
|
||||
|
||||
For broad or shared changes, also run:
|
||||
|
||||
```bash
|
||||
npm run test:coverage
|
||||
npm run test:e2e
|
||||
npm run build
|
||||
```
|
||||
|
||||
## Change-Type Matrix
|
||||
|
||||
| Change | Minimum local validation |
|
||||
| --- | --- |
|
||||
| Docs only | `npm run harness:check` |
|
||||
| Client component/store/API | focused `npm run test -- <pattern>`, then `npm run build` |
|
||||
| User-visible browser flow | focused Vitest plus `npm run test:e2e` |
|
||||
| Server controller/service/db | focused `npm run test -- tests/server/<file>` |
|
||||
| Auth, profile, or credential behavior | focused server tests plus relevant e2e auth tests |
|
||||
| Chat, Socket.IO, group chat | focused server tests plus relevant e2e chat tests |
|
||||
| Desktop packaging | `npm run harness:check`, `npm run build`, and a platform-specific desktop build when practical |
|
||||
| GitHub workflow | `npm run harness:check` and `actionlint` when available |
|
||||
| Package manifests | `npm ci --ignore-scripts` and lockfile workflow expectations |
|
||||
|
||||
## CI Mapping
|
||||
|
||||
- Build workflow: installs dependencies, runs coverage, builds production assets,
|
||||
then runs a Linux desktop smoke test on pull requests.
|
||||
- Playwright workflow: runs browser e2e tests.
|
||||
- NPM lockfile workflow: verifies `package-lock.json` is synchronized.
|
||||
- Desktop release workflow: builds and uploads platform-specific desktop artifacts
|
||||
for release tags.
|
||||
- Docker workflow: builds and publishes release images.
|
||||
|
||||
## Release Workflow Guardrail
|
||||
|
||||
Desktop release jobs must upload only the artifacts that their matrix target can
|
||||
produce. Keep artifact globs in matrix data and keep `fail_on_unmatched_files:
|
||||
true` so missing expected files still fail.
|
||||
|
||||
Expected desktop release outputs:
|
||||
|
||||
| Target | Required release globs |
|
||||
| --- | --- |
|
||||
| macOS | `*.dmg`, `*.dmg.blockmap`, `latest*.yml` |
|
||||
| Windows | `*.exe`, `*.exe.blockmap`, `latest*.yml` |
|
||||
| Linux x64 | `*.AppImage`, `*.deb`, `latest*.yml` |
|
||||
| Linux arm64 | `*.AppImage`, `latest*.yml` |
|
||||
|
||||
## Failure Handling
|
||||
|
||||
When a command fails:
|
||||
|
||||
1. Read the first actionable error, not just the final stack trace.
|
||||
2. Check whether the failure indicates missing context, missing test coverage,
|
||||
or a missing mechanical rule.
|
||||
3. Fix the product bug when there is one.
|
||||
4. Update docs or `scripts/harness-check.mjs` when the same class of mistake
|
||||
should be prevented next time.
|
||||
@@ -0,0 +1,64 @@
|
||||
# Worktree Runbook
|
||||
|
||||
Use a separate git worktree for agent changes so local user work remains
|
||||
untouched.
|
||||
|
||||
## Create A Worktree
|
||||
|
||||
```bash
|
||||
git fetch origin --prune
|
||||
git worktree add -b codex/<short-topic> ../worktrees/hermes-web-ui-<short-topic> origin/main
|
||||
cd ../worktrees/hermes-web-ui-<short-topic>
|
||||
```
|
||||
|
||||
If the repository uses a fork remote, push to the remote requested by the task.
|
||||
Do not rewrite or reset unrelated branches.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
npm ci --ignore-scripts
|
||||
npm rebuild node-pty
|
||||
```
|
||||
|
||||
Desktop package dependencies are separate:
|
||||
|
||||
```bash
|
||||
npm ci --prefix packages/desktop --no-audit --no-fund
|
||||
```
|
||||
|
||||
## Isolated Runtime
|
||||
|
||||
Use per-worktree state and ports to avoid colliding with a running local app:
|
||||
|
||||
```bash
|
||||
export PORT=18648
|
||||
export HERMES_WEB_UI_HOME="$PWD/.tmp/hermes-web-ui"
|
||||
export HERMES_WEBUI_STATE_DIR="$HERMES_WEB_UI_HOME"
|
||||
export UPLOAD_DIR="$PWD/.tmp/uploads"
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Do not point `HERMES_WEB_UI_HOME` at a user's real `~/.hermes-web-ui` when a task
|
||||
only needs local verification.
|
||||
|
||||
## Browser Checks
|
||||
|
||||
For browser-visible changes:
|
||||
|
||||
```bash
|
||||
npm run test:e2e
|
||||
```
|
||||
|
||||
Prefer existing Playwright fixtures and mocked backend services. Add real-service
|
||||
requirements only when the behavior cannot be represented with mocks.
|
||||
|
||||
## Cleanup
|
||||
|
||||
After a PR is pushed and no more local work is needed:
|
||||
|
||||
```bash
|
||||
git worktree remove ../worktrees/hermes-web-ui-<short-topic>
|
||||
```
|
||||
|
||||
Only remove the worktree you created.
|
||||
@@ -38,6 +38,7 @@
|
||||
"dev:client": "cross-env HERMES_WEB_UI_BACKEND_PORT=8647 vite --host --port 8649 --strictPort",
|
||||
"dev:server": "nodemon",
|
||||
"build": "vue-tsc -b && vite build && tsc --noEmit -p packages/server/tsconfig.json && node scripts/build-server.mjs",
|
||||
"harness:check": "node scripts/harness-check.mjs",
|
||||
"prepare": "[ -d dist ] || npm run build",
|
||||
"preview": "NODE_ENV=production vite preview",
|
||||
"test": "vitest run",
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
#!/usr/bin/env node
|
||||
import { readFile } from 'node:fs/promises'
|
||||
import { existsSync } from 'node:fs'
|
||||
import path from 'node:path'
|
||||
|
||||
const root = process.cwd()
|
||||
const failures = []
|
||||
|
||||
function fail(message) {
|
||||
failures.push(message)
|
||||
}
|
||||
|
||||
async function readText(relativePath) {
|
||||
return readFile(path.join(root, relativePath), 'utf8')
|
||||
}
|
||||
|
||||
function requireFile(relativePath) {
|
||||
if (!existsSync(path.join(root, relativePath))) {
|
||||
fail(`Missing required harness file: ${relativePath}`)
|
||||
}
|
||||
}
|
||||
|
||||
function requireDir(relativePath) {
|
||||
if (!existsSync(path.join(root, relativePath))) {
|
||||
fail(`Missing required project directory: ${relativePath}`)
|
||||
}
|
||||
}
|
||||
|
||||
for (const file of [
|
||||
'AGENTS.md',
|
||||
'ARCHITECTURE.md',
|
||||
'DEVELOPMENT.md',
|
||||
'docs/harness/README.md',
|
||||
'docs/harness/validation.md',
|
||||
'docs/harness/worktree-runbook.md',
|
||||
'docs/harness/pr-review.md',
|
||||
]) {
|
||||
requireFile(file)
|
||||
}
|
||||
|
||||
for (const dir of [
|
||||
'packages/client/src',
|
||||
'packages/server/src',
|
||||
'packages/desktop',
|
||||
'tests/client',
|
||||
'tests/server',
|
||||
'tests/e2e',
|
||||
'.github/workflows',
|
||||
]) {
|
||||
requireDir(dir)
|
||||
}
|
||||
|
||||
const agents = await readText('AGENTS.md')
|
||||
const agentLines = agents.trimEnd().split(/\r?\n/)
|
||||
if (agentLines.length > 120) {
|
||||
fail(`AGENTS.md should stay short; found ${agentLines.length} lines, expected <= 120`)
|
||||
}
|
||||
|
||||
for (const requiredLink of [
|
||||
'DEVELOPMENT.md',
|
||||
'ARCHITECTURE.md',
|
||||
'docs/harness/README.md',
|
||||
'docs/harness/validation.md',
|
||||
'docs/harness/worktree-runbook.md',
|
||||
'docs/harness/pr-review.md',
|
||||
]) {
|
||||
if (!agents.includes(requiredLink)) {
|
||||
fail(`AGENTS.md must link to ${requiredLink}`)
|
||||
}
|
||||
}
|
||||
|
||||
const packageJson = JSON.parse(await readText('package.json'))
|
||||
for (const scriptName of [
|
||||
'harness:check',
|
||||
'test',
|
||||
'test:coverage',
|
||||
'test:e2e',
|
||||
'build',
|
||||
]) {
|
||||
if (!packageJson.scripts?.[scriptName]) {
|
||||
fail(`package.json is missing script: ${scriptName}`)
|
||||
}
|
||||
}
|
||||
|
||||
const architecture = await readText('ARCHITECTURE.md')
|
||||
for (const phrase of [
|
||||
'packages/client/src',
|
||||
'packages/server/src',
|
||||
'packages/desktop',
|
||||
'HERMES_WEB_UI_HOME',
|
||||
'fail_on_unmatched_files: true',
|
||||
]) {
|
||||
if (!architecture.includes(phrase)) {
|
||||
fail(`ARCHITECTURE.md should document: ${phrase}`)
|
||||
}
|
||||
}
|
||||
|
||||
const buildWorkflow = await readText('.github/workflows/build.yml')
|
||||
if (!buildWorkflow.includes('npm run harness:check')) {
|
||||
fail('Build workflow must run npm run harness:check')
|
||||
}
|
||||
|
||||
const desktopReleaseWorkflow = await readText('.github/workflows/desktop-release.yml')
|
||||
if (!desktopReleaseWorkflow.includes('files: ${{ matrix.artifact_files }}')) {
|
||||
fail('desktop-release.yml must upload matrix-specific artifact_files')
|
||||
}
|
||||
|
||||
for (const target of ['target_os: darwin', 'target_os: win32', 'target_os: linux']) {
|
||||
if (!desktopReleaseWorkflow.includes(target)) {
|
||||
fail(`desktop-release.yml is missing matrix target ${target}`)
|
||||
}
|
||||
}
|
||||
|
||||
for (const expectedGlob of ['*.dmg', '*.exe', '*.AppImage', 'latest*.yml']) {
|
||||
if (!desktopReleaseWorkflow.includes(expectedGlob)) {
|
||||
fail(`desktop-release.yml is missing expected artifact glob ${expectedGlob}`)
|
||||
}
|
||||
}
|
||||
|
||||
if (!desktopReleaseWorkflow.includes('fail_on_unmatched_files: true')) {
|
||||
fail('desktop-release.yml must keep fail_on_unmatched_files: true')
|
||||
}
|
||||
|
||||
if (failures.length > 0) {
|
||||
console.error('Harness check failed:')
|
||||
for (const failure of failures) {
|
||||
console.error(`- ${failure}`)
|
||||
}
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
console.log('Harness check passed')
|
||||
Reference in New Issue
Block a user