本文档说明如何在 Dubbo-go-Pixiu 中配置和使用 dgp.filter.ai.kvcache 过滤器。
该过滤器通过对接 vLLM(/tokenize)与 LMCache controller API(/lookup、/pin、/compress、/evict),实现:
dgp.filter.ai.kvcache 是一个 HTTP Decode 过滤器。典型处理流程如下:
model 与 prompt(必要时从 messages 回退提取)。model + prompt)。/lookupllm_preferred_endpoint_id)/tokenize/lookupcompress / pin / evict当前 cache 路由依赖 instance id 对齐:
llm_preferred_endpoint_iddgp.filter.llm.proxy 读取该值并按 endpoint.id 选目标实例因此要生效必须满足:
LMCache lookup 返回的 instance_id 与 pixiu cluster endpoint.id 一致如果不一致,请求会自动回退到正常负载均衡。
说明:
/query_worker_infolisteners:
- name: net/http
protocol_type: HTTP
address:
socket_address:
address: 0.0.0.0
port: 8888
filter_chains:
filters:
- name: dgp.filter.httpconnectionmanager
config:
route_config:
routes:
- match:
prefix: /
route:
cluster: vllm_cluster
http_filters:
- name: dgp.filter.ai.kvcache
config:
enabled: true
vllm_endpoint: "http://127.0.0.1:8000"
lmcache_endpoint: "http://127.0.0.1:9000"
default_model: "demo"
request_timeout: "2s"
lookup_routing_timeout: "50ms"
hot_window: "5m"
hot_max_records: 300
token_cache:
enabled: true
max_size: 1024
ttl: "10m"
cache_strategy:
enable_compression: true
enable_pinning: true
enable_eviction: true
load_threshold: 0.7
memory_threshold: 0.85
hot_content_threshold: 10
pin_instance_id: "vllm-instance-1"
pin_location: "LocalCPUBackend"
compress_instance_id: "vllm-instance-1"
compress_location: "LocalCPUBackend"
compress_method: "zstd"
evict_instance_id: "vllm-instance-1"
retry:
max_attempts: 3
base_backoff: "100ms"
max_backoff: "2s"
circuit_breaker:
failure_threshold: 5
recovery_timeout: "10s"
half_open_max_calls: 2
- name: dgp.filter.llm.proxy
enabled
vllm_endpoint
/tokenize 的上游地址。lmcache_endpoint
lookup_routing_timeout
token_cache
model + "\x00" + prompt 的 SHA-256。cache_strategy.load_threshold
[0,1]。cache_strategy.memory_threshold
[0,1],用于驱逐决策。cache_strategy.hot_content_threshold
hot_window 内达到该访问次数后,判定为热点内容,用于 pin。retry
lookup/pin/compress/evict)调用的重试参数。circuit_breaker
[kvcache] 前缀。load_threshold 按比例处理。/tokenize 与 mock LMCache API 验证链路接入和路由提示是否正确。endpoint.id 与 LMCache instance_id 对齐。dgp.filter.ai.kvcache 放在 dgp.filter.llm.proxy 之前。地址: https://github.com/apache/dubbo-go-pixiu-samples ai/kvcache