Commit Graph

10 Commits

Author SHA1 Message Date
hello-dd-code ff5e04d986 feat: add baidu lvlin crawler 2026-03-20 10:40:07 +08:00
hello-dd-code 7d5f5b1054 feat: add gaode crawler and export domain exclusion 2026-03-20 10:07:48 +08:00
hello-dd-code 38e7c284e8 feat: enhance project configuration and improve data export functionality
- Updated `.gitignore` to streamline ignored files and added logging for common sites.
- Expanded `config.py` with new configurations for Weixin and Redis, and improved database connection settings.
- Refined `README.md` to clarify project structure and usage instructions.
- Enhanced `requirements.txt` with additional dependencies for MongoDB and Redis support.
- Refactored multiple spider scripts to utilize a session-based approach for HTTP requests, improving error handling and proxy management.
- Updated `export_lawyers_excel.py` to include a default timestamp for data exports.
2026-03-18 10:02:25 +08:00
hello-dd-code c2b77975c1 feat: add douyin data export functionality to lawyer export script
- Introduced a new command-line argument `--douyin-only` to export data specifically for Douyin, including additional fields such as sec_uid, douyin_uid, and user information.
- Updated the README to include instructions for exporting Douyin data.
- Enhanced the export logic to accommodate new fields when exporting Douyin-specific data.
2026-03-09 21:26:50 +08:00
hello-dd-code e10437cd90 feat: add shared progress API and resume/skip support for douyin batch 2026-03-07 01:06:40 +08:00
hello-dd-code 86cf933913 chore: commit remaining local changes 2026-03-06 23:57:43 +08:00
hello-dd-code a96b9a50e4 feat: replace nas 9002 with local layer service and update douyin ingestion 2026-03-06 23:56:55 +08:00
hello-dd-code bc4a2aa4d5 chore: move zhongfali crawler to one_off_sites 2026-03-04 09:43:35 +08:00
hello-dd-code 19cf9ce901 重构采集脚本并新增按时间导出Excel
- 统一五个站点采集逻辑与启动脚本\n- 新增 dls_fresh 采集流程与日志优化\n- 新增 export_lawyers_excel 按时间条件导出\n- 默认导出近7天并支持扩展字段解析\n- 整理 .gitignore,忽略 data/logs 本地产物
2026-03-02 11:46:05 +08:00
hello-dd-code 03847a4b8e chore: initialize lawyers crawler project 2026-03-02 00:19:48 +08:00