76 lines
2.3 KiB
Markdown
76 lines
2.3 KiB
Markdown
# lawyers
|
||
|
||
`common_sites` 独立采集项目。
|
||
|
||
## 目录
|
||
|
||
- `common_sites/`:大律师、找法网、法律快车、律图、华律 5 个采集脚本
|
||
- `one_off_sites/`:一次性/临时站点采集脚本(不纳入常用站点批量启动)
|
||
- `request/proxy_config.py`:代理配置加载逻辑
|
||
- `request/proxy_settings.json`:代理配置文件
|
||
- `Db.py`:数据库连接与基础操作
|
||
- `config.py`:数据库与请求头配置
|
||
|
||
## 运行
|
||
|
||
```bash
|
||
cd /www/wwwroot/lawyers
|
||
python3 -m venv .venv
|
||
.venv/bin/pip install -r requirements.txt
|
||
./common_sites/start.sh
|
||
```
|
||
|
||
## 启动参数
|
||
|
||
`start.sh` 默认并行启动 5 个站点采集(大律师使用 `dls_fresh.py`)。
|
||
|
||
- 日志目录:`/www/wwwroot/lawyers/logs`
|
||
- 大律师 JSON 输出:`/www/wwwroot/lawyers/data/dls_records.jsonl`
|
||
|
||
常用环境变量:
|
||
|
||
```bash
|
||
# 顺序执行(默认 parallel)
|
||
RUN_MODE=sequential ./common_sites/start.sh
|
||
|
||
# 大律师限制采集范围
|
||
DLS_CITY_FILTER=beijing DLS_MAX_CITIES=1 DLS_MAX_PAGES=1 ./common_sites/start.sh
|
||
|
||
# 大律师直连(不走代理)/ 仅导出JSON不写库
|
||
DLS_DIRECT=1 DLS_NO_DB=1 ./common_sites/start.sh
|
||
```
|
||
|
||
## 导出 Excel
|
||
|
||
新增导出脚本:`common_sites/export_lawyers_excel.py`
|
||
|
||
```bash
|
||
# 无参数:默认导出最近7天数据(含手机号/姓名/律所/省份/市区/站点名称)
|
||
# 并默认解析 params 扩展信息(邮箱/地址/执业证号/执业年限/擅长领域等)
|
||
./.venv/bin/python ./common_sites/export_lawyers_excel.py
|
||
|
||
# 按 create_time 时间戳范围导出
|
||
./.venv/bin/python ./common_sites/export_lawyers_excel.py \
|
||
--start-ts 1772380000 --end-ts 1772429999 \
|
||
--output ./data/lawyers_20260302.xlsx
|
||
|
||
# 只导出某站点,并带技术字段(url/域名/时间等)
|
||
./.venv/bin/python ./common_sites/export_lawyers_excel.py \
|
||
--domain 大律师 --include-extra
|
||
|
||
# 如果不需要解析 params 扩展信息
|
||
./.venv/bin/python ./common_sites/export_lawyers_excel.py --no-parse-params
|
||
```
|
||
|
||
## 一次性站点(众法利)
|
||
|
||
脚本:`one_off_sites/zhongfali_single.py`
|
||
|
||
```bash
|
||
# 仅采集写 JSON(默认输出到 data/one_off_sites/)
|
||
./.venv/bin/python ./one_off_sites/zhongfali_single.py --direct --no-db
|
||
|
||
# 采集并写入 lawyer 表(domain=众法利单页)
|
||
./.venv/bin/python ./one_off_sites/zhongfali_single.py --direct
|
||
```
|