The Actor-Critic Loop
This document explains the core feedback loop that drives codeloops.
Concept
The actor-critic pattern comes from reinforcement learning, where:
- The actor takes actions (makes code changes)
- The critic evaluates those actions (reviews the changes)
In codeloops:
- The actor is a coding agent executing your task
- The critic is another agent instance evaluating the work
- Feedback flows from critic to actor until the task is complete
State Machine
┌──────────────────┐
│ START │
└────────┬─────────┘
│
▼
┌──────────────────┐
┌────▶│ ACTOR_EXECUTING │◀────┐
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ CAPTURING_DIFF │ │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │CRITIC_EVALUATING │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────┐ ┌────────┐ ┌─────┐ │
│ │DONE│ │CONTINUE│ │ERROR│ │
│ └──┬─┘ └───┬────┘ └──┬──┘ │
│ │ │ │ │
│ │ │ feedback │ │
│ │ └──────────┼────┘
│ │ │
│ │ recovery │
│ │ suggestion │
│ │ ┌──────────┘
│ │ │
│ ▼ ▼
│ ┌──────┐ ┌──────────────┐
│ │ END │ │ FEED_BACK │
│ └──────┘ └──────┬───────┘
│ │
└──────────────────┘
Decision Types
The critic returns one of three decisions:
DONE
The task is complete. The critic has determined that:
- All requirements in the prompt are met
- The implementation is correct
- No further changes are needed
Response includes:
- Summary of what was accomplished
- Confidence score (0.0 to 1.0)
CONTINUE
More work is needed. The critic has determined that:
- The prompt requirements are not fully met
- There are issues that need addressing
- Additional changes are required
Response includes:
- Feedback explaining what's missing or wrong
- Guidance for the next iteration
ERROR
Something went wrong. This occurs when:
- The actor's exit code indicates failure
- The actor produced error output
- The changes broke something
Response includes:
- Analysis of what went wrong
- Recovery suggestion for the actor
Iteration Flow
1. Actor Execution
The actor receives:
- Original prompt (always)
- Previous feedback (if CONTINUE or ERROR)
┌─────────────────────────────────────────────────┐
│ Actor Input │
├─────────────────────────────────────────────────┤
│ Original Prompt: │
│ Add input validation to the login endpoint. │
│ │
│ Previous Feedback (if any): │
│ The email validation is good, but password │
│ validation is missing. Please add checks for │
│ minimum length and required characters. │
└─────────────────────────────────────────────────┘
The actor executes with full filesystem access in the working directory.
2. Diff Capture
After the actor completes, codeloops captures:
- Git diff (all changes since session start)
- Number of files changed
- Actor stdout and stderr
- Actor exit code
- Execution duration
3. Critic Evaluation
The critic receives:
- Original prompt
- Actor's output (stdout)
- Git diff of changes
- Iteration number
- Previous history (summarized)
┌─────────────────────────────────────────────────┐
│ Critic Input │
├─────────────────────────────────────────────────┤
│ Task Prompt: │
│ Add input validation to the login endpoint. │
│ │
│ Actor Output: │
│ I've added email validation using regex and │
│ password length checking... │
│ │
│ Git Diff: │
│ diff --git a/src/auth.rs b/src/auth.rs │
│ + if !is_valid_email(&email) { ... } │
│ + if password.len() < 8 { ... } │
│ │
│ Iteration: 1 │
└─────────────────────────────────────────────────┘
4. Decision Parsing
The critic's response is parsed to extract:
- Decision (DONE, CONTINUE, or ERROR)
- Feedback or summary text
- Confidence score (for DONE)
Expected critic output format:
DECISION: DONE
SUMMARY: Input validation has been added to the login endpoint.
Email addresses are validated using RFC 5321 compliant regex.
Passwords require minimum 8 characters.
CONFIDENCE: 0.95
Or for CONTINUE:
DECISION: CONTINUE
FEEDBACK: The email validation looks good, but password validation
only checks length. The requirements also specified:
- At least one uppercase letter
- At least one number
Please add these additional password requirements.
5. Loop Control
Based on the decision:
DONE: Session ends successfully
- SessionEnd written with outcome="success"
- Summary and confidence recorded
- Exit code 0
CONTINUE: Actor runs again
- Feedback passed to actor
- New iteration begins
- Same prompt + feedback
ERROR: Recovery attempted
- Recovery suggestion passed to actor
- New iteration begins
- If repeated errors, may fail session
Termination Conditions
The loop ends when:
- Success: Critic returns DONE
- Max iterations: Configured limit reached (exit code 1)
- Error: Unrecoverable error occurs (exit code 2)
- Interrupt: User presses Ctrl+C (exit code 130)
Confidence Scoring
When the critic returns DONE, it provides a confidence score:
| Score | Meaning |
|---|---|
| 0.9 - 1.0 | High confidence, all requirements clearly met |
| 0.7 - 0.9 | Good confidence, requirements met with minor uncertainty |
| 0.5 - 0.7 | Moderate confidence, some requirements unclear |
| < 0.5 | Low confidence, task may be incomplete |
The score is recorded but doesn't affect loop behavior. It's informational for users reviewing sessions.
Feedback Quality
Good critic feedback:
- Specific about what's missing or wrong
- References the original requirements
- Provides actionable guidance
- Prioritizes issues by importance
Example of good feedback:
FEEDBACK: The validation is partially implemented:
DONE:
- Email format validation using regex
MISSING:
1. Password minimum length check (required: 8 characters)
2. Password uppercase letter requirement
3. Password digit requirement
Please implement the missing password validations and return
appropriate error messages for each case.
Actor Recovery
When the actor fails (non-zero exit code), the critic provides recovery guidance:
DECISION: ERROR
ANALYSIS: The actor encountered a compilation error:
error[E0599]: no method named `validate_email` found
RECOVERY: The `validate_email` method doesn't exist. You need to
either:
1. Import it from the `validators` crate, or
2. Implement it in src/utils/validation.rs
Check the project's existing validation patterns in src/utils/.
The actor then receives this recovery suggestion and attempts to fix the issue.
Iteration Limits
Without a limit, loops could run indefinitely. Set limits with:
codeloops --max-iterations 5
Or in configuration:
max_iterations = 5
When the limit is reached:
- Outcome is "max_iterations_reached"
- Exit code is 1
- Session is complete but task may be unfinished
Best Practices
For Prompts
Clear prompts lead to accurate critic evaluation:
- Include acceptance criteria
- Be specific about requirements
- Define what "done" looks like
For Iteration Limits
Choose limits based on task complexity:
- Simple fixes: 2-3 iterations
- Medium features: 5-10 iterations
- Complex tasks: Consider breaking into smaller prompts
For Agent Selection
Consider critic thoroughness:
- More thorough critic = better feedback but slower
- Faster critic = quicker iterations but may miss issues
Implementation Details
The loop is implemented in codeloops-core/src/loop_runner.rs:
#![allow(unused)] fn main() { pub async fn run(&self, context: LoopContext) -> Result<LoopOutcome, LoopError> { loop { // Run actor let actor_output = self.actor.execute(&context.build_prompt()).await?; // Capture diff let diff = self.diff_capture.capture()?; // Run critic let critic_output = self.critic.evaluate(&actor_output, &diff).await?; // Parse decision match critic_output.decision { Decision::Done { summary, confidence } => { return Ok(LoopOutcome::Success { ... }); } Decision::Continue { feedback } => { context.set_feedback(feedback); continue; } Decision::Error { recovery } => { context.set_feedback(recovery); continue; } } } } }