Initial commit: LinkSyncServer and LinkSyncExtension projects with complete documentation, models, API endpoints, tests, and extension implementation

This commit is contained in:
DavidSaylor
2026-05-11 17:37:10 -05:00
parent ad0b12b452
commit aed69afdfd
691 changed files with 181874 additions and 28 deletions

View File

@@ -0,0 +1,429 @@
# Agent Evaluation Framework
This document defines how to evaluate agent performance and make re-thinking decisions across your MyWorkspace projects.
## Evaluation Criteria
### Primary Metrics
| Metric | Threshold | Action |
|--------|-----------|--------|
| **Progress Rate** | < 10% per 30 min | Re-evaluate approach |
| **Same Error Pattern** | > 3 failures | Investigate root cause |
| **Test Harness** | Time per iteration | Track convergence speed |
| **File Changes** | No meaningful changes | Agent stuck or unclear task |
| **Time Elapsed** | > 2x estimate | Re-think strongly advised |
| **Time Elapsed** | > 3x estimate | Re-think required |
### Secondary Metrics
- **Context Usage**: Monitor token usage in chatlog
- **Git Commits**: Track meaningful changes
- **Test Pass Rate**: Monitor improvement over iterations
- **API Call Success**: For browser automation tasks
## Re-think Decision Tree
```
┌─────────────────────────────────────┐
│ Task Running with Agent │
└─────────────────────────────────────┘
┌─────────────────────────────┐
│ Is time > 50% of estimate? │
└─────────────────────────────┘
│ │
YES │ NO
↓ │
┌──────────────────┐
│ Check progress │
│ Still on track? │
└──────────────────┘
YES │ NO
↓ ↓
Continue ┌──────────────┐
checkpoint │ Review │
│ blockers │
└──────────────┘
↓ │
┌──────────────────┐
│ Time > 90%? │
└──────────────────┘
YES │ NO
↓ │
┌──────────────────┐
│ Near completion │
│ Keep going │
└──────────────────┘
↓ │
Complete ┌──────────────┐
│ Time > 2x? │
└──────────────┘
YES │ NO
↓ │
┌──────────────┐
│ Re-evaluate │
│ - Check task │
│ - Review AGENTS.md
│ - Adjust approach
└──────────────┘
┌──────────────┐
│ Time > 3x? │
└──────────────┘
YES │ NO
↓ │
┌──────────────┐
│ Strong │
│ Re-think │
│ - Clear task │
│ - New brief │
│ - New tool │
└──────────────┘
```
## Agent-Specific Evaluation
### OpenCode Evaluation
**Expected Behavior:**
- Reads AGENTS.md for context
- Writes files directly to project
- Runs tests repeatedly
- Reports blockers clearly
**Good Signs:**
- Multiple git commits per session
- Test failure patterns changing
- Iteration time decreasing
- Clear progress indicators
**Bad Signs:**
- Repeating same error
- Only small/pointless changes
- Session time increasing
- Agent "thinking" with no output
**Actions:**
- **Minor stall**: Wait 5-10 min
- **Repeated errors**: Update AGENTS.md, clarify task
- **No progress**: Pause, re-evaluate task brief
### Aider Evaluation
**Expected Behavior:**
- CLI-based, simple interactions
- Works well for single-file changes
- Requires model configuration
**Good Signs:**
- Quick response times
- Clean diff output
- Minimal context needed
**Bad Signs:**
- Repeated file overwrites
- Model timeout errors
- Large context required
### Playwright Evaluation
**Expected Behavior:**
- Test files in `tests/` folder
- HTML report output
- Screenshot on failure
**Good Signs:**
- Tests running successfully
- Reports capturing issues
- Network interception working
**Bad Signs:**
- Browser not launching
- API calls timing out
- Element not found errors
## Task Progress Tracking
### For Each Task
Create/Update: `<project-root>/tasks.md`
```markdown
# Task: Increase Test Coverage for LinkdingSync
## Start Time
2026-05-09 08:00
## Estimated Duration
45 minutes
## Current Progress
25% - Test structure created
## Current Blockers
None
## Next Steps
1. Implement auth test
2. Implement API call test
3. Run full suite
```
### Checkpoint Questions
**At 50% time:**
1. Is the agent still making progress?
2. Are tests converging or regressing?
3. Have blockers been identified?
**At 90% time:**
1. Should be near completion
2. Review remaining work
3. Decide: continue or adjust
**After 2x time:**
1. Review AGENTS.md for missing context
2. Check task brief clarity
3. Consider tool change
**After 3x time:**
1. Strong evidence of stuck loop
2. Re-think required
3. New approach or tool needed
## Tool Evaluation
### When to Switch Tools
| Current Tool | Switch If... | To... |
|--------------|---------------|-------|
| OpenCode | Simple one-off | Aider |
| OpenCode | Very complex refactoring | Consider re-scoping |
| Aider | Complex iterative task | OpenCode |
| Playwright | Test runner errors | Fix config, continue |
| Any | 3x time with no progress | Re-evaluate approach |
### Cross-Project Patterns
**Document in `docs/tools.md`:**
- What worked well
- What didn't work
- Tool preferences by project type
- Configuration lessons learned
## Documentation Requirements
### AGENTS.md (Per Project)
```markdown
# AGENTS.md
## Project Overview
[What this project does]
## Setup Commands
```bash
npm install
npm run dev
npm test
```
## Architecture
[Brief notes]
## Testing
- Unit tests: `npm test`
- E2E tests: `npx playwright test`
- Coverage target: 80%
## Conventions
- Use TypeScript strict mode
- Error handling with try/catch
- API calls must timeout
## Known Issues
- [List if any]
## Project Tools
- Playwright for browser tests
- OpenCode for iteration
- API: `https://api.linkding.com`
```
### task-brief.md (Per Task)
```markdown
# Task Brief
## Context
[Why this task]
## Goal
[What needs done]
## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
## Constraints
- [ ] Constraint 1
## Related Files
- File 1
- File 2
```
## Example Evaluation Log
```markdown
# Evaluation Log: LinkdingSync Test Harness
## Session 1 (2026-05-09)
### Agent: OpenCode
### Task: Add Playwright tests
### Progress
- [x] Test structure created
- [x] First test implemented
- [ ] Tests converging
### Time Elapsed
30 min (of 60 estimated)
### Issues
- API calls timing out intermittently
### Decision
Continue - tests improving
---
## Session 2 (2026-05-09)
### Time Elapsed
55 min
### Progress
- [x] Tests converging
- [ ] 2 of 3 scenarios passing
### Issues
- Resolved API timeout with retry logic
### Decision
Continue - approaching completion
---
## Final Summary
### Time Actual: 75 min
### Time Estimated: 60 min
### Deviation: +25%
### Outcome
SUCCESS - All acceptance criteria met
### Lessons
- API retry logic needed upfront
- Playwright config requires specific timeout values
```
## Integration with Chat Logs
### Automatic Logging
Chat logs are automatically written to:
- `<project-root>/chatlog.md`
### Key Information to Capture
**At task start:**
- Task brief summary
- AGENTS.md reference
- Estimated time
**At checkpoints:**
- Current progress
- Issues encountered
- Decision made
**At completion:**
- Time actual vs estimated
- Lessons learned
- Recommendations
## Re-think Workflow
When re-thinking is triggered:
1. **Stop agent** (if running in terminal)
2. **Review chatlog.md** for session history
3. **Check tasks.md** for progress notes
4. **Review AGENTS.md** for missing context
5. **Document in tasks.md**:
- What went wrong
- What's changed
- New estimates
6. **Clear task brief** or update
7. **Resume or restart** agent
## Escalation Path
```
Agent Struggling → Check AGENTS.md → Update context
→ Continue → Still stuck → Re-evaluate approach
→ Clear approach → Time > 2x → Re-think
Time > 3x or No Progress
Re-think Required:
- New task brief
- Different tool
- New approach
```
## Quick Reference Commands
### OpenCode
```bash
# Start new task
opencode --task task-brief.md
# Stop (Ctrl+C in terminal)
```
### Aider
```bash
# Start
aider
# Stop
Ctrl+C
```
### Playwright
```bash
# Run tests
npx playwright test
# With specific project
npx playwright test --project=chromium
```
### Git for Verification
```bash
# Check recent commits
git log --oneline -10
# Check what changed
git diff HEAD~5..HEAD
# Check for stuck state (no new commits)
git status