# 策展 · X (Twitter) 🔥🔥🔥🔥

> 📖 本站完整內容索引（documentation index）：[llms.txt](/llms.txt)

> 作者：Liquid AI (@liquidai) · 平台：X (Twitter) · 日期：2026-06-18

> 原始來源：https://x.com/liquidai/status/2067610173024219225

## 中文摘要

Liquid AI 發布了 LFM2.5-Embedding-350M 與 LFM2.5-ColBERT-350M，提供 11 種語言的超高速與高精確度檢索能力。

這兩款 350M 參數的模型是 Liquid AI 旗下「LFM」家族首批雙向編碼器，專為短文本情境（如產品目錄、FAQ、知識庫）設計，並針對企業級應用優化，端到端檢索延遲最低可達 1.5ms。

**模型定位與差異**
Liquid AI 指出，檢索模型通常需要在速度與準確度之間做出取捨，這兩款模型分別對應不同的需求：
- **LFM2.5-Embedding-350M**：採用密集雙編碼器架構，將每個文件轉換為單一向量。適合追求極致搜尋速度與最小索引體積的場景。
- **LFM2.5-ColBERT-350M**：採用「Late Interaction」架構，將每個 token 轉換為向量，並透過 MaxSim 進行逐詞比對。適合對準確度要求較高、可容忍較大索引體積的場景。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/ee14889c9845adf4.jpg)
> Liquid AI 推出的 LFM2.5-ColBERT-350M 與 LFM2.5-Embedding-350M 在多語言檢索基準測試（NanoBEIR 與 MKQA）中，表現皆優於其前代版本及其他同類模型（如 Qwen3-Embedding-0.6B、gte-multilingual-base 等），展現出同級最佳的檢索品質。

**技術架構與效能**
兩款模型均基於 `LFM2.5-350M-Base` 進行雙向補丁（bidirectional patches）調整，將原本的因果解碼器（causal decoder）轉變為雙向編碼器，使每個 token 都能同時關注左右上下文。
- **部署靈活性**：支援 `llama.cpp` 的 GGUF 格式，可在 CPU、筆記型電腦及邊緣裝置上運行，端到端查詢嵌入延遲低於 10ms。
- **企業級效能**：在 GPU 上透過自定義執行環境（custom runtime），端到端查詢嵌入延遲可低於 2ms。
- **語言支援**：涵蓋阿拉伯語、德語、英語、西班牙語、法語、義大利語、日語、韓語、挪威語、葡萄牙語及瑞典語。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/322f95c4943053d6.jpg)
> 在單張 H100 GPU 的內部 GPU 堆疊測試中，LFM2.5-ColBERT Query Embedding 在高併發數（Concurrency=32）下展現出最高的每秒查詢率（QPS），突破 5,000 QPS，顯著優於加入 MaxSim 或 Doc Embedding 的檢索配置。

**實作指引（以 LFM2.5-ColBERT-350M 為例）**
若要使用 `PyLate` 進行文件索引與檢索，請依照下列步驟操作：

1. 安裝必要套件：
   ```bash
   pip install -U pylate
   ```

2. 載入模型並初始化索引：
   ```python
   from pylate import indexes, models, retrieve

   # 載入模型
   model = models.ColBERT(
       model_name_or_path="LiquidAI/LFM2.5-ColBERT-350M",
       trust_remote_code=True,
   )
   model.tokenizer.pad_token = model.tokenizer.eos_token

   # 初始化 PLAID 索引
   index = indexes.PLAID(
       index_folder="pylate-index",
       index_name="index",
       override=True,
   )
   ```

3. 編碼並新增文件：
   ```python
   documents = ["document 1 text", "document 2 text", "document 3 text"]
   documents_ids = ["1", "2", "3"]
   documents_embeddings = model.encode(
       documents,
       batch_size=32,
       is_query=False,
       show_progress_bar=True,
   )
   index.add_documents(
       documents_ids=documents_ids,
       documents_embeddings=documents_embeddings,
   )
   ```

4. 執行檢索：
   ```python
   retriever = retrieve.ColBERT(index=index)
   queries_embeddings = model.encode(
       ["query for document 3", "query for document 1"],
       batch_size=32,
       is_query=True,
       show_progress_bar=True,
   )
   scores = retriever.retrieve(
       queries_embeddings=queries_embeddings,
       k=10,
   )
   ```

更多詳細資訊可參考 [官方部落格文章](https://www.liquid.ai/blog/lfm2-5-retrievers)、[Hugging Face 頁面](https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M) 或 [官方文件](http://docs.liquid.ai)。

## 媒體內容

**Liquid AI 推出的 LFM2.5-ColBERT-350M 與 LFM2.5-Embedding-350M 在多語言檢索基準測試（NanoBEIR 與 MKQA）中，表現皆優於其前代版本及其他同類模型（如 Qwen3-Embedding-0.6B、gte-multilingual-base 等），展現出同級最佳的檢索品質。**

**數據表（1）LFM2.5-ColBERT-350M Retrieval Benchmarks**

|   | LFM2.5-ColBERT-350M | LFM2-ColBERT-350M | GTE-ModernColBERT-v1 | LateOn |
| --- | --- | --- | --- | --- |
| NanoBEIR ndcg@10 | 0.60 | 0.54 | 0.49 | 0.48 |
| MKQA recall@20 | 0.69 | 0.65 | 0.46 | 0.45 |
| MKQA recall@100 | 0.76 | 0.72 | 0.55 | 0.55 |

**數據表（2）LFM2.5-Embedding-350M Retrieval Benchmarks**

|   | LFM2.5-Embedding-350M | Qwen3-Embedding-0.6B | gte-multilingual-base | DenseOn | gte-modernbert-base | bge-large-en-v1.5 |
| --- | --- | --- | --- | --- | --- | --- |
| NanoBEIR ndcg@10 | 0.58 | 0.56 | 0.53 | 0.43 | 0.38 | 0.36 |
| MKQA recall@20 | 0.69 | 0.64 | 0.68 | 0.43 | 0.29 | 0.41 |
| MKQA recall@100 | 0.76 | 0.72 | 0.75 | 0.54 | 0.42 | 0.52 |

**在單張 H100 GPU 的內部 GPU 堆疊測試中，LFM2.5-ColBERT Query Embedding 在高併發數（Concurrency=32）下展現出最高的每秒查詢率（QPS），突破 5,000 QPS，顯著優於加入 MaxSim 或 Doc Embedding 的檢索配置。**

**數據表**

|   | 起始(Concurrency=1) | 最佳(Concurrency=32) | 結束(Concurrency=32) |
| --- | --- | --- | --- |
| LFM2.5-ColBERT Query Embedding | 750 QPS | 5150 QPS | 5150 QPS |
| LFM2.5-Embedding Query Embedding | 650 QPS | 4750 QPS | 4750 QPS |
| LFM2.5-ColBERT Query Embedding + MaxSim | 400 QPS | 900 QPS | 900 QPS |
| LFM2.5-ColBERT Query+Doc Embedding + MaxSim | 50 QPS | 50 QPS | 50 QPS |

## 標籤

Embedding, 新產品, 開源專案, Liquid AI