feat: add shared progress API and resume/skip support for douyin batch
This commit is contained in:
@@ -29,12 +29,14 @@ python3 -m venv .venv
|
||||
- 替代原 `nas.nepiedg.site:9002` 的核心接口
|
||||
- `GET /api/layer/get_area`:从数据库 `area_new` 读取地区列表并返回给 `js/douyin.js`
|
||||
- `POST /api/layer/index`:接收脚本回传搜索数据,先保存原始 JSON 到本地,再按参数决定是否入库
|
||||
- `GET/POST /api/layer/progress`:多设备共享采集断点(自动建表 `layer_progress`)
|
||||
|
||||
`/api/layer/index` 当前入库规则(基于 `payload.data.user_list[].user_info`):
|
||||
|
||||
- 主要从 `signature`(简介)里正则提取手机号
|
||||
- 若简介未命中,再从微信相关标记(`微信/wx/vx/v`)和 `unique_id/versatile_display` 提取手机号
|
||||
- `url` 固定写为 `https://www.douyin.com/user/{sec_uid}`(`sec_uid` 为空时回退接口 URL)
|
||||
- 必须命中关键词(默认:`律师,律所`)才允许入库,可通过 `DOUYIN_LAWYER_KEYWORDS` 调整
|
||||
- `url` 固定写为 `https://www.douyin.com/user/{sec_uid}`(`sec_uid` 为空则跳过不入库)
|
||||
|
||||
启动:
|
||||
|
||||
@@ -53,6 +55,9 @@ AREA_DOMAIN=maxlaw
|
||||
DOUYIN_DOMAIN=抖音
|
||||
DOUYIN_RAW_DIR=/www/wwwroot/lawyers/data/douyin_raw
|
||||
DOUYIN_SAVE_ONLY=1
|
||||
DOUYIN_LAWYER_KEYWORDS=律师,律所
|
||||
LAYER_PROGRESS_TABLE=layer_progress
|
||||
LAYER_PROGRESS_DEFAULT_KEY=douyin_batch_default
|
||||
```
|
||||
|
||||
接口示例:
|
||||
@@ -84,6 +89,19 @@ curl -X POST 'http://127.0.0.1:9002/api/layer/index?save_only=1' \
|
||||
|
||||
# 原始数据落盘目录(按天分文件)
|
||||
# /www/wwwroot/lawyers/data/douyin_raw/douyin_index_YYYYMMDD.jsonl
|
||||
|
||||
# 读取共享断点(多设备)
|
||||
curl 'http://127.0.0.1:9002/api/layer/progress?server=1&progress_key=douyin_batch_default'
|
||||
|
||||
# 更新共享断点
|
||||
curl -X POST 'http://127.0.0.1:9002/api/layer/progress?server=1' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"progress_key":"douyin_batch_default","device_id":"device-a","next_city_index":120,"area_signature":"xxxx","area_total":551,"current_city":"北京","reason":"city_done","status":"running"}'
|
||||
|
||||
# 清空共享断点
|
||||
curl -X POST 'http://127.0.0.1:9002/api/layer/progress?server=1' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"action":"clear","progress_key":"douyin_batch_default"}'
|
||||
```
|
||||
|
||||
如果 9002 端口已有旧进程占用,可先执行:
|
||||
|
||||
Reference in New Issue
Block a user