Initial commit: LinkSyncServer and LinkSyncExtension projects with complete documentation, models, API endpoints, tests, and extension implementation

2026-05-11 17:37:10 -05:00
parent ad0b12b452
commit aed69afdfd
691 changed files with 181874 additions and 28 deletions
--- a/docs/agent-evaluation-framework.md
+++ b/docs/agent-evaluation-framework.md
@@ -0,0 +1,429 @@
+# Agent Evaluation Framework
+
+This document defines how to evaluate agent performance and make re-thinking decisions across your MyWorkspace projects.
+
+## Evaluation Criteria
+
+### Primary Metrics
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| **Progress Rate** | < 10% per 30 min | Re-evaluate approach |
+| **Same Error Pattern** | > 3 failures | Investigate root cause |
+| **Test Harness** | Time per iteration | Track convergence speed |
+| **File Changes** | No meaningful changes | Agent stuck or unclear task |
+| **Time Elapsed** | > 2x estimate | Re-think strongly advised |
+| **Time Elapsed** | > 3x estimate | Re-think required |
+
+### Secondary Metrics
+
+- **Context Usage**: Monitor token usage in chatlog
+- **Git Commits**: Track meaningful changes
+- **Test Pass Rate**: Monitor improvement over iterations
+- **API Call Success**: For browser automation tasks
+
+## Re-think Decision Tree
+
+```
+┌─────────────────────────────────────┐
+│   Task Running with Agent           │
+└─────────────────────────────────────┘
+                ↓
+    ┌─────────────────────────────┐
+    │ Is time > 50% of estimate?   │
+    └─────────────────────────────┘
+            │          │
+         YES │         NO
+            ↓          │
+    ┌──────────────────┐
+    │ Check progress    │
+    │ Still on track?   │
+    └──────────────────┘
+            │
+       YES │ NO
+            ↓          ↓
+   Continue    ┌──────────────┐
+   checkpoint  │ Review       │
+               │ blockers     │
+               └──────────────┘
+            ↓          │
+    ┌──────────────────┐
+    │ Time > 90%?      │
+    └──────────────────┘
+            │
+       YES │ NO
+            ↓          │
+    ┌──────────────────┐
+    │ Near completion  │
+    │ Keep going       │
+    └──────────────────┘
+            ↓          │
+   Complete      ┌──────────────┐
+                 │ Time > 2x?   │
+                 └──────────────┘
+                         │
+                    YES │ NO
+                         ↓          │
+                    ┌──────────────┐
+                    │ Re-evaluate  │
+                    │ - Check task │
+                    │ - Review AGENTS.md
+                    │ - Adjust approach
+                    └──────────────┘
+                         ↓
+                    ┌──────────────┐
+                    │ Time > 3x?   │
+                    └──────────────┘
+                             │
+                        YES │ NO
+                             ↓          │
+                        ┌──────────────┐
+                        │ Strong       │
+                        │ Re-think     │
+                        │ - Clear task │
+                        │ - New brief  │
+                        │ - New tool   │
+                        └──────────────┘
+```
+
+## Agent-Specific Evaluation
+
+### OpenCode Evaluation
+
+**Expected Behavior:**
+- Reads AGENTS.md for context
+- Writes files directly to project
+- Runs tests repeatedly
+- Reports blockers clearly
+
+**Good Signs:**
+- Multiple git commits per session
+- Test failure patterns changing
+- Iteration time decreasing
+- Clear progress indicators
+
+**Bad Signs:**
+- Repeating same error
+- Only small/pointless changes
+- Session time increasing
+- Agent "thinking" with no output
+
+**Actions:**
+- **Minor stall**: Wait 5-10 min
+- **Repeated errors**: Update AGENTS.md, clarify task
+- **No progress**: Pause, re-evaluate task brief
+
+### Aider Evaluation
+
+**Expected Behavior:**
+- CLI-based, simple interactions
+- Works well for single-file changes
+- Requires model configuration
+
+**Good Signs:**
+- Quick response times
+- Clean diff output
+- Minimal context needed
+
+**Bad Signs:**
+- Repeated file overwrites
+- Model timeout errors
+- Large context required
+
+### Playwright Evaluation
+
+**Expected Behavior:**
+- Test files in `tests/` folder
+- HTML report output
+- Screenshot on failure
+
+**Good Signs:**
+- Tests running successfully
+- Reports capturing issues
+- Network interception working
+
+**Bad Signs:**
+- Browser not launching
+- API calls timing out
+- Element not found errors
+
+## Task Progress Tracking
+
+### For Each Task
+
+Create/Update: `<project-root>/tasks.md`
+
+```markdown
+# Task: Increase Test Coverage for LinkdingSync
+
+## Start Time
+2026-05-09 08:00
+
+## Estimated Duration
+45 minutes
+
+## Current Progress
+25% - Test structure created
+
+## Current Blockers
+None
+
+## Next Steps
+1. Implement auth test
+2. Implement API call test
+3. Run full suite
+```
+
+### Checkpoint Questions
+
+**At 50% time:**
+1. Is the agent still making progress?
+2. Are tests converging or regressing?
+3. Have blockers been identified?
+
+**At 90% time:**
+1. Should be near completion
+2. Review remaining work
+3. Decide: continue or adjust
+
+**After 2x time:**
+1. Review AGENTS.md for missing context
+2. Check task brief clarity
+3. Consider tool change
+
+**After 3x time:**
+1. Strong evidence of stuck loop
+2. Re-think required
+3. New approach or tool needed
+
+## Tool Evaluation
+
+### When to Switch Tools
+
+| Current Tool | Switch If... | To... |
+|--------------|---------------|-------|
+| OpenCode | Simple one-off | Aider |
+| OpenCode | Very complex refactoring | Consider re-scoping |
+| Aider | Complex iterative task | OpenCode |
+| Playwright | Test runner errors | Fix config, continue |
+| Any | 3x time with no progress | Re-evaluate approach |
+
+### Cross-Project Patterns
+
+**Document in `docs/tools.md`:**
+- What worked well
+- What didn't work
+- Tool preferences by project type
+- Configuration lessons learned
+
+## Documentation Requirements
+
+### AGENTS.md (Per Project)
+
+```markdown
+# AGENTS.md
+
+## Project Overview
+[What this project does]
+
+## Setup Commands
+```bash
+npm install
+npm run dev
+npm test
+```
+
+## Architecture
+[Brief notes]
+
+## Testing
+- Unit tests: `npm test`
+- E2E tests: `npx playwright test`
+- Coverage target: 80%
+
+## Conventions
+- Use TypeScript strict mode
+- Error handling with try/catch
+- API calls must timeout
+
+## Known Issues
+- [List if any]
+
+## Project Tools
+- Playwright for browser tests
+- OpenCode for iteration
+- API: `https://api.linkding.com`
+```
+
+### task-brief.md (Per Task)
+
+```markdown
+# Task Brief
+
+## Context
+[Why this task]
+
+## Goal
+[What needs done]
+
+## Acceptance Criteria
+- [ ] Criterion 1
+- [ ] Criterion 2
+
+## Constraints
+- [ ] Constraint 1
+
+## Related Files
+- File 1
+- File 2
+```
+
+## Example Evaluation Log
+
+```markdown
+# Evaluation Log: LinkdingSync Test Harness
+
+## Session 1 (2026-05-09)
+
+### Agent: OpenCode
+### Task: Add Playwright tests
+
+### Progress
+- [x] Test structure created
+- [x] First test implemented
+- [ ] Tests converging
+
+### Time Elapsed
+30 min (of 60 estimated)
+
+### Issues
+- API calls timing out intermittently
+
+### Decision
+Continue - tests improving
+
+---
+
+## Session 2 (2026-05-09)
+
+### Time Elapsed
+55 min
+
+### Progress
+- [x] Tests converging
+- [ ] 2 of 3 scenarios passing
+
+### Issues
+- Resolved API timeout with retry logic
+
+### Decision
+Continue - approaching completion
+
+---
+
+## Final Summary
+
+### Time Actual: 75 min
+### Time Estimated: 60 min
+### Deviation: +25%
+
+### Outcome
+SUCCESS - All acceptance criteria met
+
+### Lessons
+- API retry logic needed upfront
+- Playwright config requires specific timeout values
+```
+
+## Integration with Chat Logs
+
+### Automatic Logging
+
+Chat logs are automatically written to:
+- `<project-root>/chatlog.md`
+
+### Key Information to Capture
+
+**At task start:**
+- Task brief summary
+- AGENTS.md reference
+- Estimated time
+
+**At checkpoints:**
+- Current progress
+- Issues encountered
+- Decision made
+
+**At completion:**
+- Time actual vs estimated
+- Lessons learned
+- Recommendations
+
+## Re-think Workflow
+
+When re-thinking is triggered:
+
+1. **Stop agent** (if running in terminal)
+2. **Review chatlog.md** for session history
+3. **Check tasks.md** for progress notes
+4. **Review AGENTS.md** for missing context
+5. **Document in tasks.md**:
+   - What went wrong
+   - What's changed
+   - New estimates
+6. **Clear task brief** or update
+7. **Resume or restart** agent
+
+## Escalation Path
+
+```
+Agent Struggling → Check AGENTS.md → Update context
+                 → Continue → Still stuck → Re-evaluate approach
+                 → Clear approach → Time > 2x → Re-think
+                                    ↓
+                            Time > 3x or No Progress
+                                    ↓
+                            Re-think Required:
+                            - New task brief
+                            - Different tool
+                            - New approach
+```
+
+## Quick Reference Commands
+
+### OpenCode
+```bash
+# Start new task
+opencode --task task-brief.md
+
+# Stop (Ctrl+C in terminal)
+```
+
+### Aider
+```bash
+# Start
+aider
+
+# Stop
+Ctrl+C
+```
+
+### Playwright
+```bash
+# Run tests
+npx playwright test
+
+# With specific project
+npx playwright test --project=chromium
+```
+
+### Git for Verification
+```bash
+# Check recent commits
+git log --oneline -10
+
+# Check what changed
+git diff HEAD~5..HEAD
+
+# Check for stuck state (no new commits)
+git status