PhD Hunter - PhD Advisor Application Assistant
PhD Hunter is a lightweight PhD advisor information collection tool focused on automating the acquisition of CS professor information and their latest papers.
Current Features
CSRankings Data Crawling: Automatically crawl CSRankings website to get university and professor information
arXiv Paper Fetching: Search papers by exact title (extracted from professor homepages) to avoid name collisions; falls back to author search when needed
Homepage Crawling: Scrape professor homepages for AI summaries and recent paper title extraction
SQLite Storage: All data persisted locally
Web Visualization Interface: Interactive professor browsing, filtering, and management based on Flask
AI Analysis: LLM-powered professor matching scoring and cold email generation
Profile Management: Upload CV/PS, manage arXiv papers, set research preferences
Priority Tagging: Mark priority for each professor (Reach/Match/Target/Safety/Not Considered)
Multi-dimensional Filtering: Filter professor list by priority, research area, university, score
Project Architecture
phd_hunter/
├── main.py # CLI entry (root directory)
├── pyproject.toml # Project configuration
├── README.md # Project documentation
├── docs/ # Sphinx documentation
└── src/phd_hunter/
├── models.py # Pydantic data models
├── database.py # SQLite database operations
├── api_infra/ # LLM API infrastructure
├── crawlers/
│ ├── base.py # Crawler base class (cache support)
│ ├── csrankings.py # CSRankings crawler
│ ├── arxiv_crawler.py # arXiv crawler
│ └── homepage_crawler.py # Homepage crawler
├── hound/
│ └── scorer.py # Professor scoring
├── analyzer/
│ ├── analyzer.py # Professor analysis
│ └── prompts.py # Prompt templates
├── utils/
│ ├── logger.py # Logging configuration
│ ├── helpers.py # Utility functions
│ └── pdf_extract.py # PDF text extraction
└── frontend/ # Web frontend interface
├── app.py # Flask API server
├── index.html # Main page
├── static/
│ ├── styles.css # Stylesheet
│ ├── app.js # Frontend logic
│ └── windsurf.svg # AI avatar icon
└── templates/ # HTML templates
Quick Start
Requirements
Python 3.10+
uv (recommended) or pip
Chrome/Chromium browser (for Selenium)
Installation Steps
Clone the repository
git clone <repository-url> cd phd-hunter
Install dependencies
Using uv (recommended):
uv syncOr using pip:
python -m venv .venv .venv\Scripts\activate # Windows pip install -e .
Install api_infra (REQUIRED for LLM features)
cd src/phd_hunter/api_infra pip install -e . cd ../../..
Run the application
Command line mode:
# Crawl professor data python main.py crawl --area ai --region world --max-professors 5 # Fetch papers python main.py fetch-papers --max-papers 10 # View statistics python main.py stats
Web interface mode:
# Start Flask server (Linux / macOS) PYTHONPATH=src python -m phd_hunter.frontend.app # Windows (Command Prompt): set PYTHONPATH=src && python -m phd_hunter.frontend.app # Windows (PowerShell): $env:PYTHONPATH="src"; python -m phd_hunter.frontend.app # Then open http://localhost:8080 in browser