# 策展 · X (Twitter) 🔥🔥🔥

> 作者：Kyle Jeong (@kylejeong) · 平台：X (Twitter) · 日期：2026-05-07

> 原始來源：https://x.com/kylejeong/status/2052103973377867913

## 中文摘要

# Autobrowse：Browser Agent 的「神話時刻」已經到來

**TL;DR**：Browser Agent 有健忘症。它們每次執行時都像從零開始重新探索網站，永遠都在支付昂貴的「探索稅」。Autobrowse 解決了這個問題：它讓 Agent 在真實任務上不斷迭代直到收斂，然後將獲勝的策略轉化為一種持久、可重複使用的技能。這項技能就是「記憶」，下一個 Agent、隊友或客戶可以直接拿來執行，無需再「重新學習」已經學過的東西。

特別感謝 @_shubhankar 在內部開發了 Autobrowse，並協助撰寫本文。

## 沒有海馬迴的天才

如果你曾經將 Browser Agent 部署到生產環境，你一定很清楚這個問題的樣貌。

在一個新網站上的第一次執行總是令人興奮。Agent 四處遊走、摸索頁面，最終完成了任務。第二次執行看起來幾乎一模一樣。但第一百次執行就讓人感到沮喪。到那時，你已經為同樣的探索過程付費了一百次，成本曲線是一條直線上升的線，而你手邊甚至沒有一個乾淨的產出物可以交給隊友說：「這就是我們執行這項工作的方式。」

真實的網站很混亂。它們會根據不同的 User Agent 呈現不同內容、將內容鎖在 JavaScript 後面、將你真正想要的資料藏在未經記載的 JSON 介面中、在無法識別會話時拋出驗證碼，有時甚至會在週二突然重新設計流程。通用的 Agent 迴圈可以在當下流暢地應對這一切，但一旦會話關閉，它就會忘記所有事情。解決週一問題的推理過程，隨著會話結束而煙消雲散。

生產環境中 Browser Agent 的真正瓶頸在於「記憶」，一種人類和 Agent 都能讀取且信任的形式。推理能力早已不再是限制因素。

## 什麼是 Autobrowse？

Autobrowse 是一個利用 AI 來改進 AI 的工作流程。你給 Agent 一個真實網站上的真實任務。它會端到端 (End to End) 地執行任務、研究自己產生的執行軌跡 (trace)、迭代策略，並持續進行直到工作流程變得可靠，而不僅僅是靠運氣。一旦收斂，它就會將獲勝的策略轉化為一項可重複使用的技能：一個 Markdown 文件，加上重複該工作所需的確定性膠水程式碼（CLI 呼叫、fetch、選擇器、輔助腳本）。

這與 Karpathy 的 autoresearch harness 類似，但目標是為了更快、更便宜地學習瀏覽器技能。第一次執行是有意為之的昂貴開銷，因為這次執行支付了後續所有工作的成本。

產出物才是重點。每一次 Autobrowse 的執行都會產生一個持久的 Markdown 文件，任何未來的 Agent 都可以載入並執行它，這是在你從執行本身獲得的價值之外的額外收穫。

## 運作原理：

學習迴圈：

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778117390925-iaHHiYmLeaIAEfwkYpng.png)

核心迴圈很簡單：

1. **目標 (Objective)**：給 Agent 一個真實網站上的真實任務（例如：「在 OpenTable 上預訂這家餐廳晚上 7 點的晚餐」）。

2. **執行 (Run)**：讓 Agent 對著即時瀏覽器端到端地嘗試任務。

3. **研究 (Study)**：Agent 讀取自己的執行軌跡。它在哪裡卡住了？在哪裡猜測？在哪裡浪費了不必要的 token？

4. **策略 (Strategy)**：執行的「外層迴圈」維護一個 `strategy.md` 文件，這基本上是一個草稿本，Agent 在每次迭代後會將觀察結果傾倒在裡面（什麼有效、什麼壞了、下一步該試什麼、什麼該停止做）。在下一次迭代中，Agent 會先讀取 `strategy.md` 並將其作為 context 使用，因此改進是累積的，而不是每次執行都重置。

5. **迭代 (Iterate)**：根據這些筆記改進策略。刪除那些沒發揮作用的步驟。盡可能依賴確定性的輔助工具（browse fetch、browse search、自訂 Python 腳本）。

6. **收斂 (Converge)**：一旦連續迭代在成本或步驟數上不再有顯著改進，就提前結束。

7. **畢業 (Graduate)**：將 `SKILL.md` 以及任何輔助文件寫入公開的技能儲存庫。

在實務上，我們將迭代次數限制在較低範圍（約 3 到 5 次）並積極提前結束。目標是獲得一條可靠、低成本且足以重複使用的路徑（技術上來說，這可能未達理論上的全域最佳解）。

產出是什麼？

最後出來的是一個小巧、易讀的文件。沒有轉錄稿、沒有 embedding 向量、沒有截圖輪播。只有 Markdown（隨便瀏覽一下）：

```markdown
---
name: "craigslist-search-listings"
title: "Craigslist Search Listings"
description: "Search Craigslist in a given city and category for listings matching a query, returning each listing's title, price, location, posting date, and listing URL."
website: "craigslist.org"
category: "marketplace"
tags: ["craigslist", "marketplace", "listings", "search"]
status: "autobrowsed-run-004 + prod-validated-002"
source: "autobrowse + browser-trace · 4 iters · converged 2026-04-30 · cross-region prod validation 2026-05-01 (NY/SF/Boston, cta+apa) · postal-override discovery 2026-05-01 (Chicago/NYC, mca+apa)"
updated: "2026-05-01"
recommended_method: "api"
alternative_methods:
  - method: "browser"
    rationale: "When the JSON API is rate-limited or blocked (rare — no auth or anti-bot today), fall back to browsing the city subdomain's /search/{category} page and extracting listings from the rendered text + anchor hrefs."
---

# Craigslist Search Listings — Browser Skill

## Purpose

Return a list of Craigslist postings matching a query in a given city and category — title, price, location, posting time, lat/lon, posting ID, and canonical listing URL. Read-only; never posts or edits.

## When to Use

- Daily / hourly monitoring of new listings matching a query.
- Bulk extraction across multiple cities or categories.
- Anywhere you'd otherwise scrape Craigslist HTML — the API is faster, cheaper, and structurally more reliable.

## Workflow

The Craigslist web UI is a thin client over a public JSON API at `https://sapi.craigslist.org` — no auth, cookies, session state, or anti-bot stealth. Send a `Referer` header matching the target city subdomain; if your outbound IP is in a different region than the target city, add `postal=<zip>&search_distance=<mi>` to the query — the API geo-scopes by IP only when no `postal` is supplied (see the gotcha below). **A residential proxy is not required.** Lead with the API path; the browser path works as a fallback but pays a ~100× cost premium because the search page is fully JS-rendered (`browse snapshot` returns 0 a11y refs and harvesting per-listing URLs costs ~3 turns each).

1. **Pick city + category** (and optionally subarea). City is the Craigslist subdomain (`sfbay`, `newyork`, `losangeles`, `seattle`, ...). Category is the search-path abbreviation (`sss` for-sale-all, `cta` cars+trucks, `apa` apartments, `ggg` for-sale-by-owner, etc.). To scope to a specific subarea (city-within-region), prefix the category in `searchPath` — e.g. `searchPath=sfc/apa` for SF-proper apartments, `searchPath=eby/cta` for East Bay cars. Subarea codes are listed in each response's `data.decode.locations[i][2]`. Subarea-scoping is significantly more efficient than fetching region-wide and filtering client-side (e.g. `apa` returns 9,798 bay-wide vs. 253 for `sfc/apa`).

2. **First page**:


GET https://sapi.craigslist.org/web/v8/postings/search/full

?searchPath={cat}

&query={q}

&sort={date|rel|priceasc|pricedsc}

&batch=1-0-360-1-0

&lang=en&cc=us

Referer: https://{city}.[craigslist.org/](http://craigslist.org/)


   Returns JSON with `data.totalResultCount`, `data.items[]`, and decode tables under `data.decode`. Confirm the response is scoped to the right region via `data.areas` (e.g. `{"1": {"name": "newyork"}}`) — if it shows the wrong city, add `postal=<zip>&search_distance=<mi>` (any ZIP in the target metro) to override the IP-based geo-scope.

   **Common filter params** (append as query args; check `data.humanReadableParams` to confirm acceptance): `min_price`, `max_price`, `min_bedrooms`, `max_bedrooms`, `min_bathrooms`, `bundleDuplicates=1`, `hasPic=1`, `postal=<zip>`, `search_distance=<mi>`, `availabilityMode=available`, `auto_make_model=<text>`, `min_auto_year`/`max_auto_year`, `min_auto_miles`/`max_auto_miles`. Unrecognized params are silently dropped.

3. **Decode each item**. `data.items[]` is an array of positional arrays. **Critical: many fields are offsets / lookup keys, not absolute values** — always read against `data.decode.*`:
   - `item[0]` — `postingIdOffset`. Absolute id = `data.decode.minPostingId + item[0]`.
   - `item[1]` — `postedDateOffset` (seconds). Absolute epoch = `data.decode.minPostedDate + item[1]`.
   - `item[2]` — `categoryId` (integer). Maps to a 3-letter sub-category abbreviation (`cat3`) used in canonical URLs. The mapping is **not** in the response — it's a fixed Craigslist enum. Observed during iter-2 verification: `68 → bik` (bicycles), `93 → spo` (sporting goods), `122 → pts` (parts), `197 → bop` (bicycle parts/accessories). Other categories will need to be back-derived from the `data.decode` block or a click-through verification.
   - `item[3]` — price as integer (0 or missing for free items).
   - `item[4]` — `"locIdx:hoodDescIdx:hoodIdx~lat~lon"`. Look up `data.decode.locations[locIdx]` → `[1, city, subareaAbbr]`; `data.decode.locationDescriptions[hoodDescIdx]` → display location string; parse `lat~lon` for coordinates.
   - **Title** — last array element that is a plain string (i.e. not a tagged `[code, ...]` block). For `cta` (cars+trucks) this is `item[-1]`. For `apa` (apartments) and other housing categories, a trailing `[5, beds, sqft]` housing-meta block pushes the title earlier — iterate from the end and take the first plain string.
   - Tagged blocks `[code, value]` mid-array: `code === 5` is `[beds, sqft]` (housing categories); `code === 6` is the URL slug; `code === 10` is the formatted price string ("$1,350"); other codes carry image refs and metadata.

4. **Construct canonical post URL**:


https://{city}.[craigslist.org/{subareaAbbr}/{cat3}/d/{slug}/{postingId}.html](http://craigslist.org/{subareaAbbr}/{cat3}/d/{slug}/{postingId}.html)


   - `postingId` from step 3 (offset + minPostingId)
   - `subareaAbbr` from `data.decode.locations[locIdx][2]` (e.g. `nby`, `sby`, `sfc`, `eby`, `pen`)
   - `cat3` from the categoryId enum (step 3)
   - `slug` from the `[6, ...]` tagged block

   **Wrong `cat3` will 404**. If you don't know the mapping for a categoryId, fall back to `https://{city}.craigslist.org/search/{cat}?postingId={postingId}` which redirects to the canonical URL.

5. **Paginate** (only if results > 360):


GET https://sapi.craigslist.org/web/v8/postings/search/batch

?batch=1-{OFFSET}-1080-1-0-{startTs}-{endTs}

&cacheId={cacheId from step 2}

Referer: https://{city}.[craigslist.org/](http://craigslist.org/)


   Increment `OFFSET` in steps of 1080. `startTs`/`endTs` are the `data.cacheTs` and current epoch.

## Site-Specific Gotchas

- **Geo-redirect on bare domain**: `https://www.craigslist.org/` redirects to a city based on the request IP (in our trace, `provo.craigslist.org`). Always open `{city}.craigslist.org` directly.
- **API geolocates by request IP — `postal=<zip>&search_distance=<mi>` overrides it**: No auth, no cookies, no anti-bot — but if no `postal` is supplied, the API scopes results to the city corresponding to the request's source IP, not the `Referer` header (e.g. a NY query from an SF IP silently returns `{"1": {"name": "sfbay"}}` results). Adding `postal=<zip>` for any ZIP in the target metro plus `search_distance=<mi>` forces the result set to that region. Verified 2026-05-01 with direct curl from an SF IP returning correct Chicago (`postal=60601&search_distance=50`, 0.21s, 41 results) and NYC (`postal=10001&search_distance=10`, 2.19s, 475 results) listings. **A residential proxy is not required** — `bb fetch --proxies` *without* `postal` is also geo-locked to sfbay (proxy doesn't change the source-IP region for this API), and adding `postal` to a direct curl is ~8× faster than the proxy path (0.21s vs 1.63s on the same Chicago query). Always verify scope via `data.areas` in the response.
- **Snapshot returns 0 refs on `/search/`**: The search page is fully JS-rendered. Don't use `browse snapshot`/`click` to enumerate listings.
- **Compact response format**: `data.items[]` uses positional arrays + `data.decode.*` lookup tables to keep the response small. Don't expect named fields per item — decode by position.
- **Pagination batch sizes**: First page is ~360 (`batch=1-0-360-1-0`); subsequent batches are 1080 each (`batch=1-OFFSET-1080-1-0`).
- **Free items have no price**: `item[3]` may be `0` or absent.
- **Posting time precision**: The rendered page shows relative ("< 1 hr ago"); absolute epoch = `data.decode.minPostedDate + item[1]`.
- **`item[0]` is NOT the postingId** — it's an offset from `data.decode.minPostingId`. Naïvely treating `item[0]` as the postingId produces 404s.
- **`data.decode.locations` indexing is per-response, not stable.** Iter-3 saw `locations[1] → ["sfbay","sfc"]`; iter-4 saw `locations[1] → ["sfbay","eby"]` for the same city/query. The decode block is rebuilt per cache TTL — **always look up `locations[locIdx]` from the response in hand**, never cache or hardcode the table.
- **Neighborhood labels are unreliable**: `data.decode.locationDescriptions` varies per response and per category. The same neighborhood may appear under different label-table indices across responses, may be missing in some categories (e.g. "Russian Hill" shows up in `apa` but is absent from `cta` decode tables), and is sometimes replaced by a generic city-level label by the poster. For neighborhood-scoped searches, use **lat/lon bounding-box matching** on `item[4]`'s coordinates as a fallback or supplement to label-string matching. Example bbox for North Beach + Russian Hill: `lat 37.794–37.810, lon -122.425 to -122.404`.
- **Categories are an undocumented enum** — the response decode tables don't include the `categoryId → cat3` mapping; observed values across iters: `68→bik, 93→spo, 101→foa, 122→pts, 197→bop, 5→fua` (and likely many more for non-bicycle queries). The redirect URL `https://{city}.craigslist.org/search/{cat}?postingId={id}` is the safest fallback when an unknown categoryId is encountered.
- **Rate-limit self-imposed**: No formal block but Craigslist throttles aggre
```

如果 Agent 發現了未經記載的 JSON 介面，該介面就會被記錄下來。如果某個表單在提交前需要短暫等待，那也會被記錄下來。如果某個特定領域的輔助腳本（`helpers/amazon.py`、`helpers/opentable.py`、`helpers/sf-portal.py`）值得保留，它就會被一併簽入到技能旁邊。

這與我們在內部通用 Agent `bb` 中使用的模式相同。在我們關於如何在 Browserbase 構建 Agent 的文章中，我們寫到每個內部工作流程（功能請求、會話調查、PR、銷售分類）都是透過一個按需載入小型 Markdown 技能的 Agent 來執行的。通用迴圈保持簡單。領域知識存在於技能中，在那裡它可以被讀取、編輯、版本控制和重複使用。

Autobrowse 將這個想法更進一步：Agent 編寫自己的技能，並透過實際執行任務來學習。

重要的是，`browse` 內的手寫技能與公開 Browse CLI 生態系統中 Autobrowse 畢業的技能，屬於同一種產出物。一旦技能存在，Agent 如何載入或執行它並不關心是人類還是另一個 Agent 編寫的。

---

## 它擅長什麼？

Autobrowse 在真正需要探索的網站上表現出色。

- 從渲染頁面看不見，但出現在網路流量中的隱藏或未經記載的 API。
- 重度客戶端渲染，內容只有在經過一系列互動後才會出現。
- 多步驟登入或嚮導流程，其中第一眼看不出正確路徑。
- 任何 UI，其最短可靠路徑的複雜度高到人類逆向工程需要幾個小時。
- 迴圈部分冗餘的節省 token 機會（例如，當 UI 沒有顯著變化時跳過 `browse screenshot`）。

例如，我們使用 Autobrowse 來研究一個聯邦補助入口網站，並找出了一個未經記載的 JSON 介面，該介面可以在一次呼叫中返回所有當前的補助項目。原本看起來需要 28 頁的爬取工作縮減為單次 `browse fetch`，而這個發現現在已經內建在畢業的技能中，所以我們永遠不需要重新尋找它。

這就是讓整個方法值得投入的循環模式：Agent 嘗試了人類永遠不會做的事情，並發現了人類永遠看不到的東西。

---

## 具體基準測試：Craigslist

我們目前分享的一個明確內部基準是 Craigslist。

傳統 Claude Code 迴圈：約 $0.22，約 71 秒
畢業的 Autobrowse 技能：約 $0.12，27 秒

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778117390875-iaHHiauSkboAAqxgijpg.jpg)

形狀比絕對數字更重要。在網站上的第一次執行成本大約是你對通用 Agent 迴圈的預期。最終的技能將後續每次執行的單位經濟效益改變了一個數量級或更多，因為它編碼了 Agent 能找到的最短可靠路徑，並重複使用它，而不是重新推導。

我們在較小的任務上也看到了同樣的形狀。在早期的表單填寫實驗中，成本在四次迭代後從 $1.40/次下降到 $0.24/次，僅僅是因為讓 Agent 注意到並刪除了自己方法中那些沒發揮作用的部分。

---

## Autobrowse 在哪裡會失敗

過度吹捧這項技術很容易。Autobrowse 在特定類型的問題上確實很強，但在另一種問題上卻是完全錯誤的工具。不使用 Autobrowse 的紀律，也是善用它的一部分。

當任務是確定性解析時，Autobrowse 就不是正確的工具。我們在面對一個 167 行的靜態 HTML 狀態目錄時，慘痛地學到了這一點。資料就在標記中。沒有 JavaScript、沒有驗證、沒有反機器人機制，只有行。

我們還是把 Autobrowse 丟給它，因為「讓 Agent 自己搞定」的框架太誘人了。四次迭代和約 $24 後，迴圈仍然無法在單次輸出中返回所有 167 行。模型的單次輸出上限不斷截斷其推理過程，而迭代迴圈不斷試圖在一個不需要聰明才智的問題上耍小聰明。

一旦我們意識到這種機制不匹配，Agent 就轉向使用 `browse fetch` 和 BeautifulSoup 編寫約 200 行的確定性 Python 程式碼。執行時間不到一秒，推理成本為零，所有 167 行資料都呈現出來。

這個教訓被寫進了技能本身：

```bash
# Step 1: probe with fetch first.
browse fetch "<https://example.gov/programs>"
# If the data comes back cleanly in the response, write the parser.
# If the response is empty / dynamic / gated, escalate to Autobrowse.
```

Browser Agent 有不同的代理等級，從沒有 LLM 迴圈的靜態腳本，到路由器風格和使用工具的 Agent，一直到可以生成其他 Agent 並定義自己工具的完全自主迴圈。選擇正確的等級是一個真正的工程決策。

Autobrowse 處於該光譜的高代理端，像任何高代理工具一樣，你是在更便宜、更確定性的選項放棄後才伸手去用它。

---

## 為什麼這會改變工作流程

一項技能就像是客戶交接，帶有它所暗示的所有份量。

今天，當 Agent 成功完成工作時，客戶的工程團隊會得到一個執行軌跡、也許是一個會話重播、也許是一段自然語言推理。這些對於真正擁有該工作流程的人來說，都是無法解讀的。

技能是可讀的。它是持久的、可除錯的、人類可審計的且可擁有的。工程師可以閱讀、編輯並提交它。非工程師（技術 PM、技術副總、對入口網站瞭若指掌的補助經理）也可以閱讀它，並大致理解 Agent 在做什麼，而無需觸碰任何程式碼。

我們從「只信任 Agent 的輸出」轉變為「閱讀 Agent 的劇本」。在我們看來，這就是讓 Browser Agent 足夠強大，能夠生活在嚴肅的企業工作流程中，而不是尷尬地待在旁邊的東西。

累積效應也很重要。Agent 遇到的每一個新網站都會產生一項持久的技能。庫存會增長。Agent 在長尾的重複性工作流程中變得更便宜、更快，因為它不再支付探索稅。

Autobrowse 已經成為一個 Browser Agent 能力的工廠，遠超任何單一 Agent 自己能實現的程度。單一的 Autobrowse 技能很有用。一個不斷增長的公開目錄，讓任何執行 Browser Agent 的人都能存取，才是真正的獎品。

---

## 我們接下來的工作

更聰明的停止機制

目前我們將迭代次數限制在一個較小的數字，並在連續執行的成本和步驟數收斂時提前結束。這是一個合理的啟發式方法，但比較粗糙。我們正在讓 Agent 更明確地推理自己的收斂性，不僅比較成本和步驟，還比較跨執行軌跡的結構。

Autobrowse 一些最有用的勝利（如聯邦入口網站的 JSON 介面）來自於 Agent 隨機改變其方法並偶然發現一條短得多的路徑。我們不想太激進地優化掉這種變異性。

關於如何探索的更好先驗知識

我們希望確保 Agent 在生成完整的瀏覽器會話之前，先嘗試我們的 `fetch` 和 `search` 原語。許多看起來需要探索的問題，其實用一次 fetch 就能解決。

對於更進階的任務，讓 Agent 檢查瀏覽器軌跡、網路事件和 CDP 日誌是合理的，這樣它就可以透過觀察網路請求來發現內部 API，而不是從渲染的 DOM 中猜測。

遞迴 Autobrowse

最令人興奮的方向是遞迴：Autobrowse 改進 Autobrowse。今天，迭代迴圈、收斂啟發式和技能模板大多是手工製作的。就像我們使用 Autobrowse 為單個網站畢業技能一樣，我們也可以用它來畢業對其自身 harness 的改進。

為迭代步驟提供更好的 Prompt。為優先選擇哪些原語提供更好的先驗知識。為給定類別的任務提供更好的最終技能模板。

更大的願景

目前關於 Browser Agent 的一個主流說法是，當底層模型變好時，它們就會變好。我們距離那些在網路上「就能用」的 Agent，只差 Anthropic 或 OpenAI 的一次發布。我們並不完全買單。

即使是一個完美的模型，也必須（在每個新網站上）發現一個完美模型如果以前去過那裡就已經知道的事情。如果沒有一個地方可以存放 Agent 學到的東西，每一次執行都是一個新的開始。

真正的瓶頸是記憶，一種人類和 Agent 都能理解且信任的形式。

→ Kyle

## 標籤

Agent, Skills, 開源專案, 自動化, Autobrowse