<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>AI on Code is cheap, let&#39;s talk</title>
    <link>https://blog.ferstar.org/en/tags/ai/</link>
    <description>Code is cheap, let&#39;s talk</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <copyright>Copyright 2026 ferstar</copyright>
    <lastBuildDate>Sat, 09 May 2026 14:38:00 +0800</lastBuildDate>
    <ttl>60</ttl><atom:link href="https://blog.ferstar.org/en/tags/ai/index.xml" rel="self" type="application/rss+xml" /><image>
      <url>https://blog.ferstar.org/site-logo.png</url>
      <title>Code is cheap, let&#39;s talk</title>
      <link>https://blog.ferstar.org/</link>
    </image>
    
    <item>
      <title>Putting Semantic Search into an AI Coding Harness: Notes on Open-Sourcing ace-wrapper</title>
      <link>https://blog.ferstar.org/en/posts/ace-wrapper-semantic-search-ai-coding-harness/</link>
      <pubDate>Sat, 09 May 2026 14:38:00 +0800</pubDate>
      
      <guid isPermaLink="true">https://blog.ferstar.org/en/posts/ace-wrapper-semantic-search-ai-coding-harness/</guid>
      <description>Long AI coding tasks often fail because the agent reads the wrong files; use ace-wrapper to put semantic retrieval into Read -&gt; Search -&gt; Change -&gt; Verify; let agents find candidate files first, then verify evidence to reduce blind edits and wasted context.</description><content:encoded><![CDATA[<blockquote><p>I am not a native English speaker; this article was translated by AI.</p>
</blockquote><p>In the <a href="/en/posts/ai-coding-harness-engineering-workflow/" >previous post</a> about Harness Engineering, I compressed my default AI coding workflow into a few steps:</p>
<ol>
<li>Read</li>
<li>Search</li>
<li>Change</li>
<li>Verify</li>
<li>Record</li>
</ol>
<p>Among these steps, <code>Search</code> is the easiest one to underestimate.</p>
<p>Many agents fail because they read the wrong place first. The user describes a behavior, a bug, or a cross-layer workflow, while the code may not contain a function with the same name. Running <code>rg login</code>, <code>rg upload</code>, or <code>rg session</code> is fast, but it only works when the keyword is already known. If the keyword is unknown, speed just helps the agent drift faster.</p>
<p>So I open-sourced a small layer I have been using recently:</p>
<p><a href="https://github.com/ferstar/ace-wrapper"  target="_blank" rel="noreferrer">ferstar/ace-wrapper</a></p>
<p>It does one narrow thing: wrap Augment Context Engine’s filesystem context search as an <code>ace</code> command, so coding agents can do semantic retrieval from the shell before editing.</p>

<h3 class="relative group">Why This Layer Exists
    <div id="why-this-layer-exists" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-this-layer-exists" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>The target is concrete: make the search action part of the harness.</p>
<p>I used to see this path often:</p>
<pre class="not-prose mermaid">
flowchart LR
  A[User describes behavior] --> B[Agent guesses keywords]
  B --> C[Reads nearby files]
  C --> D[Edits plausible code]
  D --> E[Verification fails]
  E --> B
</pre>

<p>The problem with this loop is that, after failure, the agent often keeps circling around the same wrong files. It can edit code; it needs a better entry point into candidate files.</p>
<p><code>ace-wrapper</code> is meant to patch this part:</p>
<pre class="not-prose mermaid">
flowchart LR
  A[User describes behavior] --> B[ace semantic retrieval]
  B --> C[Candidate files]
  C --> D[Read returned files]
  D --> E[rg / tests confirm evidence]
  E --> F[Small patch]
  F --> G[Verify]
</pre>

<p>The important part is the order: <code>ace</code> only finds candidate files. Conclusions still require reading files, exact search, and tests.</p>

<h3 class="relative group">Usage Is Short
    <div id="usage-is-short" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#usage-is-short" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>Install it:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">uv tool install ace-wrapper</span></span></code></pre></div></div>
<p>Install a local development checkout:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">uv tool install /path/to/ace-wrapper</span></span></code></pre></div></div>
<p>Search for a workflow when the exact keyword is unknown:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">timeout 60s ace <span class="s2">"user uploads an unsupported file and should see skipped-file feedback"</span> -w /repo
</span></span><span class="line"><span class="cl">rg -n <span class="s2">"unsupported|skipped|upload|file"</span> /repo</span></span></code></pre></div></div>
<p>The first command answers “which files may be relevant.” The second command confirms “which identifiers, events, copy, or tests actually exist in the code.”</p>
<p>I usually put this rule into a project’s <code>AGENTS.md</code>:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Use `timeout 60s ace "<query>" -w <repo-root>` for semantic codebase discovery.
</span></span><span class="line"><span class="cl">Treat `ace` results as candidate files.
</span></span><span class="line"><span class="cl">After it returns results, read the relevant files and use exact search before using them as evidence.</span></span></code></pre></div></div>
<p>These lines work better than “read more context,” because they give the agent a concrete action and a boundary against false conclusions.</p>

<h3 class="relative group">How It Works with rg
    <div id="how-it-works-with-rg" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-it-works-with-rg" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p><code>ace</code> and <code>rg</code> work better as consecutive steps.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Scenario</th>
          <th style="text-align: left">Use First</th>
          <th style="text-align: left">Why</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">You know the behavior but not the implementation location</td>
          <td style="text-align: left"><code>ace</code></td>
          <td style="text-align: left">Behavior descriptions can find candidate entry points across files and naming styles</td>
      </tr>
      <tr>
          <td style="text-align: left">You know the function name, event name, or error text</td>
          <td style="text-align: left"><code>rg</code></td>
          <td style="text-align: left">It is exact, complete, and enumerable</td>
      </tr>
      <tr>
          <td style="text-align: left">You need a structural refactor</td>
          <td style="text-align: left"><code>ast-grep</code></td>
          <td style="text-align: left">AST-level matching is needed; textual proximity falls short</td>
      </tr>
      <tr>
          <td style="text-align: left">You need to confirm whether a feature exists</td>
          <td style="text-align: left"><code>ace</code> + read files + <code>rg</code></td>
          <td style="text-align: left">A semantic hit cannot prove the feature exists</td>
      </tr>
  </tbody>
</table>
<p>I intentionally wrote this boundary into the README: ACE returns candidate files, while evidence still has to come from code and tests. That boundary matters.</p>
<p>Semantic retrieval returns “nearby” things. If you ask about a feature that does not exist, it may still find files that look related. If an agent treats “there are results” as “the feature exists,” it starts inventing a story. A conclusion is only defensible after reading an implementation, test, route, config, or call site.</p>

<h3 class="relative group">Where It Fits in Harness Engineering
    <div id="where-it-fits-in-harness-engineering" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-it-fits-in-harness-engineering" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p><code>ace-wrapper</code> is small, and I want it to stay that way. It is closer to a small gear in the harness: it turns open-ended code discovery into a repeatable, constrained command.</p>
<p>I now prefer this project rule:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">Read -> Search -> Change -> Verify</span></span></code></pre></div></div>
<p>Here, <code>Search</code> means choosing the tool by problem type:</p>
<ul>
<li>Open-ended behavior and cross-layer workflows: use <code>ace</code> first</li>
<li>Exact identifiers, errors, routes, and config keys: use <code>rg</code></li>
<li>Structural replacements: use <code>ast-grep</code></li>
<li>External strategy and industry practice: use web research</li>
<li>Old decisions and repeated lessons: use memory</li>
</ul>
<p>The useful part of this split is reduced agent randomness. The agent first uses semantic retrieval to narrow the reading surface, then uses deterministic tools to confirm facts, and only then changes code.</p>

<h3 class="relative group">The Prompt Matters Most
    <div id="the-prompt-matters-most" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-prompt-matters-most" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>A good <code>ace</code> query describes behavior and avoids keyword piles:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">timeout 60s ace <span class="s2">"frontend sends requestId to backend and starts a processing job"</span> -w /repo
</span></span><span class="line"><span class="cl">timeout 60s ace <span class="s2">"用户拖入不支持的文件后应该显示跳过文件提示"</span> -w /repo
</span></span><span class="line"><span class="cl">timeout 60s ace <span class="s2">"how provider config is persisted and restored after app restart"</span> -w /repo</span></span></code></pre></div></div>
<p>I try to include four kinds of information:</p>
<ul>
<li>User action: click, drag, upload, stop generation</li>
<li>Runtime boundary: frontend to backend, CLI handler to core service</li>
<li>Expected effect: persist config, abort loop, show skipped-file feedback</li>
<li>Known fields: <code>sessionId</code>, <code>requestId</code>, <code>files</code>, <code>workspace</code></li>
</ul>
<p>This is much more stable than only searching <code>upload</code> or <code>provider</code>. It lets the retrieval system look for behavior and data flow, and it reminds the agent that this step is still semantic retrieval.</p>

<h3 class="relative group">Why I Open-Sourced It
    <div id="why-i-open-sourced-it" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-i-open-sourced-it" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p><code>ace-wrapper</code> has very little code. The core is just <code>FileSystemContext.create(str(workspace))</code> plus <code>context.search(args.query)</code>. I wanted to preserve the workflow constraints around those few lines:</p>
<ol>
<li>If the keyword is unknown, start with semantic retrieval.</li>
<li>Ask one workflow per query.</li>
<li>Treat results as candidate files.</li>
<li>Read the files, then use <code>rg</code> to confirm exact evidence.</li>
<li>Do not conclude without evidence.</li>
</ol>
<p>Once these rules live in the tool README, skill, and agent prompt, they become much more likely to stick. Otherwise every session depends on a human reminding the agent again.</p>
<p>The previous post said Harness Engineering means putting an engineering track around AI. <code>ace-wrapper</code> is one small piece of that track: its job is modest, helping the agent read the right place first.</p>
]]></content:encoded>
      
    </item>
    
    <item>
      <title>From Vibe Coding to Harness Engineering: How My AI Coding Workflow Changed</title>
      <link>https://blog.ferstar.org/en/posts/ai-coding-harness-engineering-workflow/</link>
      <pubDate>Sat, 09 May 2026 14:19:00 +0800</pubDate>
      
      <guid isPermaLink="true">https://blog.ferstar.org/en/posts/ai-coding-harness-engineering-workflow/</guid>
      <description>AI coding can generate code but long-running delivery drifts easily; use Harness Engineering to control tasks, context, verification, and recovery; turn AI output into an executable, verifiable, reviewable engineering workflow.</description><content:encoded><![CDATA[<blockquote><p>I am not a native English speaker; this article was translated by AI.</p>
</blockquote><p>This is the written version of an internal team sharing session. The slides are here:</p>
<p><a href="/slides/harness-engineering-ai-coding/" >From Vibe Coding to Harness Engineering</a></p>
<div style="position:relative;width:100%;aspect-ratio:16/9;margin:1.5rem 0 2rem;border:1px solid rgba(127,127,127,.25);overflow:hidden;">
  <iframe src="/slides/harness-engineering-ai-coding/" title="From Vibe Coding to Harness Engineering" style="position:absolute;inset:0;width:100%;height:100%;border:0;" loading="lazy" allowfullscreen></iframe>
</div>
<p>In the previous phase, I cared about one question: can AI take over most coding work?</p>
<p>The answer is now fairly clear. If project context, quality gates, and verification workflows are in place, AI-generated code can enter the engineering workflow reliably. Human time gradually moves from “writing” to “verifying”: requirement breakdown, architecture judgment, context organization, boundary checks, and failure handling.</p>
<p>Recent practice moved one step further. The problem is no longer just “how to write prompts.” The real question is whether the whole workflow can support long-running tasks.</p>

<h3 class="relative group">What Changed
    <div id="what-changed" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-changed" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>Early Vibe Coding solved the entry problem: explain the requirement clearly, put project rules into <code>AGENTS.md</code> / <code>CLAUDE.md</code>, and let tests, lint, and review catch model output.</p>
<p>That still works, but it is closer to single-task engineering. Once a task runs longer, new problems show up:</p>
<ul>
<li>Context keeps growing until the model loses the important part.</li>
<li>Repeated retries can push the fix further away from the actual problem.</li>
<li>Without external references, strategy turns into guesswork.</li>
<li>After many rounds, it becomes hard to tell which changes should be kept.</li>
<li>User rejection, permission blocks, and empty output need explicit stop semantics.</li>
</ul>
<p>So I now prefer calling this layer <strong>Harness Engineering</strong>. The focus is to put an engineering track around AI so that tasks are executable, results are verifiable, and failures are recoverable.</p>
<pre class="not-prose mermaid">
flowchart LR
  A[Task scope] --> B[Context route]
  B --> C[Agent loop]
  C --> D[Verification gate]
  D --> E[Recovery / memory]
  D -->|failed| F[Patch harness]
  F --> C
</pre>


<h3 class="relative group">The Four Things I Manage First
    <div id="the-four-things-i-manage-first" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-four-things-i-manage-first" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>The first thing is task boundaries.</p>
<p>Before a medium-sized task starts, I want at least <code>done when</code>, <code>out of scope</code>, the change surface, and the verification command. This does not need to be a long document. Five lines are often enough. The key is to let the executor know when to stop.</p>
<p>The second thing is context routing.</p>
<p><code>AGENTS.md</code> should not become an encyclopedia. It works better as an index: what the project rules are, where the entry points are, what command verifies the change, what must not be touched, and where the next layer of docs lives. Long context should be read on demand instead of being dumped into the session.</p>
<p>The third thing is the verification loop.</p>
<p>My default order is now:</p>
<ol>
<li>Read: read README, AGENTS, older notes, and key implementation files</li>
<li>Search: use <code>ace</code>, <code>rg</code>, <code>ast-grep</code>, <code>nmem</code>, and Exa to find evidence</li>
<li>Change: apply a small patch and avoid drive-by refactors</li>
<li>Verify: run narrow checks first, then expand by risk</li>
<li>Record: write repeated lessons back into rules, tests, or memory</li>
</ol>
<p>This order looks ordinary, but it prevents many runaway cases. Reading and searching first reduce model guesswork. Narrow verification avoids a large change where nobody knows which step broke.</p>
<p>The fourth thing is failure handling.</p>
<p>After a failure, I classify it first: stop, retry, patch the harness, or record it.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Type</th>
          <th style="text-align: left">When to Use It</th>
          <th style="text-align: left">Handling</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">Stop</td>
          <td style="text-align: left">User rejection, permission block, side effect risk, repeated spinning</td>
          <td style="text-align: left">Break the loop and return control</td>
      </tr>
      <tr>
          <td style="text-align: left">Retry</td>
          <td style="text-align: left">Network jitter, fixable parameter, read failure without side effects</td>
          <td style="text-align: left">Retry in small steps and keep logs</td>
      </tr>
      <tr>
          <td style="text-align: left">Patch</td>
          <td style="text-align: left">Same class of error appears twice</td>
          <td style="text-align: left">Add tests, rules, scripts, or logs</td>
      </tr>
      <tr>
          <td style="text-align: left">Record</td>
          <td style="text-align: left">The case will likely happen again</td>
          <td style="text-align: left">Save trigger conditions, verification commands, and evidence entry points</td>
      </tr>
  </tbody>
</table>
<p>I used to treat many failures as “try again.” Now I am more careful. Retry only the failures that are actually retryable. Stop conditions must stop.</p>

<h3 class="relative group">Where External Research Fits
    <div id="where-external-research-fits" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-external-research-fits" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>In this workflow, Exa or similar web search tools have a clearer role.</p>
<p>I usually do not search for broad trends. I search for concrete engineering questions:</p>
<ul>
<li>What timeout should be used?</li>
<li>Should this failure be retried?</li>
<li>How should the default strategy be split?</li>
<li>What boundaries do mainstream tools provide?</li>
<li>What failure samples show up in real issues?</li>
</ul>
<p>I do not copy the external solution directly. External material gives me a reference frame, and the final decision still has to fit the current repo. Useful conclusions should land in specs, project rules, tests, or scripts. Otherwise I will have to search again next time.</p>

<h3 class="relative group">Autoresearch and Ralph Loop
    <div id="autoresearch-and-ralph-loop" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autoresearch-and-ralph-loop" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>Autoresearch works best for long loops with a clear metric. Give the agent a goal, a guard, and a verification command first. Each round should allow only one rollback-friendly change.</p>
<p>I currently treat Ralph Loop as persistent single-owner execution. The same owner keeps driving the work. PRD and test spec come first, then the agent runs the long task. The point is to preserve context, judgment, and verification clues during long-running work before bringing in more agents.</p>
<p>Both patterns share the same idea: define the track before letting the agent run. The track needs metrics, boundaries, verification, and keep/discard rules.</p>

<h3 class="relative group">Three Steps Worth Copying First
    <div id="three-steps-worth-copying-first" class="anchor"></div>
    
    <span
        class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
        <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#three-steps-worth-copying-first" aria-label="Anchor">#</a>
    </span>
    
</h3>
<p>If this needs to move into a team workflow, I would not start with platform work. Three steps are enough to copy tomorrow:</p>
<ol>
<li>Write <code>done when</code> and <code>out of scope</code> for every medium-sized task.</li>
<li>Ask the agent to list files, evidence, and the change surface before allowing edits.</li>
<li>After one failure, patch tests, rules, or scripts before letting the agent continue.</li>
</ol>
<p>Once these three steps are in place, AI coding moves a bit from “it can produce output” toward “it can be shipped.” Autoresearch, Ralph Loop, team workers, and memory become much easier to reason about after that.</p>
]]></content:encoded>
      
    </item>
    
  </channel>
</rss>
