PhD Hunter - PhD Advisor Application Assistant

PhD Hunter is a lightweight PhD advisor information collection tool focused on automating the acquisition of CS professor information and their latest papers.

Current Features

CSRankings Data Crawling: Automatically crawl CSRankings website to get university and professor information
arXiv Paper Fetching: Search papers by exact title (extracted from professor homepages) to avoid name collisions; falls back to author search when needed
Homepage Crawling: Scrape professor homepages for AI summaries and recent paper title extraction
SQLite Storage: All data persisted locally
Web Visualization Interface: Interactive professor browsing, filtering, and management based on Flask
AI Analysis: LLM-powered professor matching scoring and cold email generation
Profile Management: Upload CV/PS, manage arXiv papers, set research preferences
Priority Tagging: Mark priority for each professor (Reach/Match/Target/Safety/Not Considered)
Multi-dimensional Filtering: Filter professor list by priority, research area, university, score

Project Architecture

phd_hunter/
├── main.py                       # CLI entry (root directory)
├── pyproject.toml                # Project configuration
├── README.md                     # Project documentation
├── docs/                         # Sphinx documentation
└── src/phd_hunter/
    ├── models.py                 # Pydantic data models
    ├── database.py               # SQLite database operations
    ├── api_infra/                # LLM API infrastructure
    ├── crawlers/
    │   ├── base.py               # Crawler base class (cache support)
    │   ├── csrankings.py         # CSRankings crawler
    │   ├── arxiv_crawler.py      # arXiv crawler
    │   └── homepage_crawler.py   # Homepage crawler
    ├── hound/
    │   └── scorer.py             # Professor scoring
    ├── analyzer/
    │   ├── analyzer.py           # Professor analysis
    │   └── prompts.py            # Prompt templates
    ├── utils/
    │   ├── logger.py             # Logging configuration
    │   ├── helpers.py            # Utility functions
    │   └── pdf_extract.py        # PDF text extraction
    └── frontend/                 # Web frontend interface
        ├── app.py                # Flask API server
        ├── index.html            # Main page
        ├── static/
        │   ├── styles.css        # Stylesheet
        │   ├── app.js            # Frontend logic
        │   └── windsurf.svg      # AI avatar icon
        └── templates/            # HTML templates

Quick Start

Requirements

Python 3.10+
uv (recommended) or pip
Chrome/Chromium browser (for Selenium)

Installation Steps

Clone the repository

git clone <repository-url>
cd phd-hunter

Install dependencies

Using uv (recommended):

uv sync

Or using pip:

python -m venv .venv
.venv\Scripts\activate  # Windows
pip install -e .

Install api_infra (REQUIRED for LLM features)

cd src/phd_hunter/api_infra
pip install -e .
cd ../../..

Run the application

Command line mode:

# Crawl professor data
python main.py crawl --area ai --region world --max-professors 5

# Fetch papers
python main.py fetch-papers --max-papers 10

# View statistics
python main.py stats

Web interface mode:

# Start Flask server (Linux / macOS)
PYTHONPATH=src python -m phd_hunter.frontend.app

# Windows (Command Prompt):
set PYTHONPATH=src && python -m phd_hunter.frontend.app

# Windows (PowerShell):
$env:PYTHONPATH="src"; python -m phd_hunter.frontend.app

# Then open http://localhost:8080 in browser