PhD Hunter - PhD Advisor Application Assistant

PhD Hunter is a lightweight PhD advisor information collection tool focused on automating the acquisition of CS professor information and their latest papers.

Current Features

  • CSRankings Data Crawling: Automatically crawl CSRankings website to get university and professor information

  • arXiv Paper Fetching: Search papers by exact title (extracted from professor homepages) to avoid name collisions; falls back to author search when needed

  • Homepage Crawling: Scrape professor homepages for AI summaries and recent paper title extraction

  • SQLite Storage: All data persisted locally

  • Web Visualization Interface: Interactive professor browsing, filtering, and management based on Flask

  • AI Analysis: LLM-powered professor matching scoring and cold email generation

  • Profile Management: Upload CV/PS, manage arXiv papers, set research preferences

  • Priority Tagging: Mark priority for each professor (Reach/Match/Target/Safety/Not Considered)

  • Multi-dimensional Filtering: Filter professor list by priority, research area, university, score

Project Architecture

phd_hunter/
├── main.py                       # CLI entry (root directory)
├── pyproject.toml                # Project configuration
├── README.md                     # Project documentation
├── docs/                         # Sphinx documentation
└── src/phd_hunter/
    ├── models.py                 # Pydantic data models
    ├── database.py               # SQLite database operations
    ├── api_infra/                # LLM API infrastructure
    ├── crawlers/
    │   ├── base.py               # Crawler base class (cache support)
    │   ├── csrankings.py         # CSRankings crawler
    │   ├── arxiv_crawler.py      # arXiv crawler
    │   └── homepage_crawler.py   # Homepage crawler
    ├── hound/
    │   └── scorer.py             # Professor scoring
    ├── analyzer/
    │   ├── analyzer.py           # Professor analysis
    │   └── prompts.py            # Prompt templates
    ├── utils/
    │   ├── logger.py             # Logging configuration
    │   ├── helpers.py            # Utility functions
    │   └── pdf_extract.py        # PDF text extraction
    └── frontend/                 # Web frontend interface
        ├── app.py                # Flask API server
        ├── index.html            # Main page
        ├── static/
        │   ├── styles.css        # Stylesheet
        │   ├── app.js            # Frontend logic
        │   └── windsurf.svg      # AI avatar icon
        └── templates/            # HTML templates

Quick Start

Requirements

  • Python 3.10+

  • uv (recommended) or pip

  • Chrome/Chromium browser (for Selenium)

Installation Steps

  1. Clone the repository

    git clone <repository-url>
    cd phd-hunter
    
  2. Install dependencies

    Using uv (recommended):

    uv sync
    

    Or using pip:

    python -m venv .venv
    .venv\Scripts\activate  # Windows
    pip install -e .
    
  3. Install api_infra (REQUIRED for LLM features)

    cd src/phd_hunter/api_infra
    pip install -e .
    cd ../../..
    
  4. Run the application

    Command line mode:

    # Crawl professor data
    python main.py crawl --area ai --region world --max-professors 5
    
    # Fetch papers
    python main.py fetch-papers --max-papers 10
    
    # View statistics
    python main.py stats
    

    Web interface mode:

    # Start Flask server (Linux / macOS)
    PYTHONPATH=src python -m phd_hunter.frontend.app
    
    # Windows (Command Prompt):
    set PYTHONPATH=src && python -m phd_hunter.frontend.app
    
    # Windows (PowerShell):
    $env:PYTHONPATH="src"; python -m phd_hunter.frontend.app
    
    # Then open http://localhost:8080 in browser
    

Documentation

Indices and Tables