38e7c284e8
- Updated `.gitignore` to streamline ignored files and added logging for common sites. - Expanded `config.py` with new configurations for Weixin and Redis, and improved database connection settings. - Refined `README.md` to clarify project structure and usage instructions. - Enhanced `requirements.txt` with additional dependencies for MongoDB and Redis support. - Refactored multiple spider scripts to utilize a session-based approach for HTTP requests, improving error handling and proxy management. - Updated `export_lawyers_excel.py` to include a default timestamp for data exports.
28 lines
621 B
Markdown
28 lines
621 B
Markdown
# lawyers-common-sites
|
|
|
|
从 `/www/wwwroot/lawyer` 中抽离出的 `common_sites` 独立项目。
|
|
|
|
## 目录
|
|
|
|
- `common_sites/`: 站点采集脚本
|
|
- `request/`: 代理配置
|
|
- `utils/`: 公共工具
|
|
- `Db.py`: 数据库封装
|
|
- `config.py`: 项目配置
|
|
|
|
## 快速启动
|
|
|
|
```bash
|
|
cd /www/wwwroot/lawyers
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
bash common_sites/start.sh
|
|
```
|
|
|
|
## 说明
|
|
|
|
- 当前项目直接复用原项目数据库配置和代理配置。
|
|
- 采集依赖原库中的 `lawyer`、`area_new`、`area`、`area2` 等表。
|
|
- 日志默认输出到 `common_sites/*.log`。
|