feat: replace nas 9002 with local layer service and update douyin ingestion
This commit is contained in:
@@ -20,6 +20,79 @@ python3 -m venv .venv
|
||||
./common_sites/start.sh
|
||||
```
|
||||
|
||||
## 地区同步服务(Python)
|
||||
|
||||
新增服务脚本:`services/area_sync_service.py`
|
||||
|
||||
用途:
|
||||
|
||||
- 替代原 `nas.nepiedg.site:9002` 的核心接口
|
||||
- `GET /api/layer/get_area`:从数据库 `area_new` 读取地区列表并返回给 `js/douyin.js`
|
||||
- `POST /api/layer/index`:接收脚本回传搜索数据,先保存原始 JSON 到本地,再按参数决定是否入库
|
||||
|
||||
`/api/layer/index` 当前入库规则(基于 `payload.data.user_list[].user_info`):
|
||||
|
||||
- 主要从 `signature`(简介)里正则提取手机号
|
||||
- 若简介未命中,再从微信相关标记(`微信/wx/vx/v`)和 `unique_id/versatile_display` 提取手机号
|
||||
- `url` 固定写为 `https://www.douyin.com/user/{sec_uid}`(`sec_uid` 为空时回退接口 URL)
|
||||
|
||||
启动:
|
||||
|
||||
```bash
|
||||
cd /www/wwwroot/lawyers
|
||||
./.venv/bin/python ./services/area_sync_service.py
|
||||
```
|
||||
|
||||
常用环境变量:
|
||||
|
||||
```bash
|
||||
AREA_SERVICE_HOST=0.0.0.0
|
||||
AREA_SERVICE_PORT=9002
|
||||
AREA_TARGET_TABLE=area_new
|
||||
AREA_DOMAIN=maxlaw
|
||||
DOUYIN_DOMAIN=抖音
|
||||
DOUYIN_RAW_DIR=/www/wwwroot/lawyers/data/douyin_raw
|
||||
DOUYIN_SAVE_ONLY=1
|
||||
```
|
||||
|
||||
接口示例:
|
||||
|
||||
```bash
|
||||
# 健康检查
|
||||
curl 'http://127.0.0.1:9002/health'
|
||||
|
||||
# 读取数据库中的地区(默认直接返回数组,兼容 js/douyin.js)
|
||||
curl 'http://127.0.0.1:9002/api/layer/get_area?server=1'
|
||||
|
||||
# 如果需要带统计信息
|
||||
curl 'http://127.0.0.1:9002/api/layer/get_area?table=area_new&domain=maxlaw&meta=1'
|
||||
|
||||
# 接收 douyin.js 回传结果并入库(默认写 lawyer.domain=抖音)
|
||||
curl -X POST 'http://127.0.0.1:9002/api/layer/index?server=1&save_only=0' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"source":"xhr","url":"https://www.douyin.com/aweme/v1/web/discover/search/","ts":1772811111,"cityIndex":0,"data":{"desc":"联系方式 13812345678"}}'
|
||||
|
||||
# 可选:指定写入域名(用于测试)
|
||||
curl -X POST 'http://127.0.0.1:9002/api/layer/index?save_domain=codex_test_douyin' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"source":"xhr","url":"https://www.douyin.com/aweme/v1/web/discover/search/","ts":1772811111,"cityIndex":0,"data":{"desc":"联系方式 13812345678"}}'
|
||||
|
||||
# 仅保存原始回传(不入库)
|
||||
curl -X POST 'http://127.0.0.1:9002/api/layer/index?save_only=1' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"source":"xhr","url":"https://www.douyin.com/aweme/v1/web/discover/search/","ts":1772811111,"cityIndex":0,"data":{"desc":"联系方式 13812345678"}}'
|
||||
|
||||
# 原始数据落盘目录(按天分文件)
|
||||
# /www/wwwroot/lawyers/data/douyin_raw/douyin_index_YYYYMMDD.jsonl
|
||||
```
|
||||
|
||||
如果 9002 端口已有旧进程占用,可先执行:
|
||||
|
||||
```bash
|
||||
lsof -iTCP:9002 -sTCP:LISTEN -t
|
||||
kill <PID>
|
||||
```
|
||||
|
||||
## 启动参数
|
||||
|
||||
`start.sh` 默认并行启动 5 个站点采集(大律师使用 `dls_fresh.py`)。
|
||||
|
||||
Reference in New Issue
Block a user