PhD Hunter - PhD Advisor Application Assistant =============================================== PhD Hunter is a lightweight PhD advisor information collection tool focused on automating the acquisition of CS professor information and their latest papers. Current Features ---------------- * **CSRankings Data Crawling**: Automatically crawl CSRankings website to get university and professor information * **arXiv Paper Fetching**: Search papers by exact title (extracted from professor homepages) to avoid name collisions; falls back to author search when needed * **Homepage Crawling**: Scrape professor homepages for AI summaries and recent paper title extraction * **SQLite Storage**: All data persisted locally * **Web Visualization Interface**: Interactive professor browsing, filtering, and management based on Flask * **AI Analysis**: LLM-powered professor matching scoring and cold email generation * **Profile Management**: Upload CV/PS, manage arXiv papers, set research preferences * **Priority Tagging**: Mark priority for each professor (Reach/Match/Target/Safety/Not Considered) * **Multi-dimensional Filtering**: Filter professor list by priority, research area, university, score Project Architecture -------------------- .. code-block:: text phd_hunter/ ├── main.py # CLI entry (root directory) ├── pyproject.toml # Project configuration ├── README.md # Project documentation ├── docs/ # Sphinx documentation └── src/phd_hunter/ ├── models.py # Pydantic data models ├── database.py # SQLite database operations ├── api_infra/ # LLM API infrastructure ├── crawlers/ │ ├── base.py # Crawler base class (cache support) │ ├── csrankings.py # CSRankings crawler │ ├── arxiv_crawler.py # arXiv crawler │ └── homepage_crawler.py # Homepage crawler ├── hound/ │ └── scorer.py # Professor scoring ├── analyzer/ │ ├── analyzer.py # Professor analysis │ └── prompts.py # Prompt templates ├── utils/ │ ├── logger.py # Logging configuration │ ├── helpers.py # Utility functions │ └── pdf_extract.py # PDF text extraction └── frontend/ # Web frontend interface ├── app.py # Flask API server ├── index.html # Main page ├── static/ │ ├── styles.css # Stylesheet │ ├── app.js # Frontend logic │ └── windsurf.svg # AI avatar icon └── templates/ # HTML templates Quick Start ----------- Requirements ~~~~~~~~~~~~ * Python 3.10+ * uv (recommended) or pip * Chrome/Chromium browser (for Selenium) Installation Steps ~~~~~~~~~~~~~~~~~~ 1. **Clone the repository** .. code-block:: bash git clone cd phd-hunter 2. **Install dependencies** Using uv (recommended): .. code-block:: bash uv sync Or using pip: .. code-block:: bash python -m venv .venv .venv\Scripts\activate # Windows pip install -e . 3. **Install api_infra** (REQUIRED for LLM features) .. code-block:: bash cd src/phd_hunter/api_infra pip install -e . cd ../../.. 4. **Run the application** Command line mode: .. code-block:: bash # Crawl professor data python main.py crawl --area ai --region world --max-professors 5 # Fetch papers python main.py fetch-papers --max-papers 10 # View statistics python main.py stats Web interface mode: .. code-block:: bash # Start Flask server (Linux / macOS) PYTHONPATH=src python -m phd_hunter.frontend.app # Windows (Command Prompt): set PYTHONPATH=src && python -m phd_hunter.frontend.app # Windows (PowerShell): $env:PYTHONPATH="src"; python -m phd_hunter.frontend.app # Then open http://localhost:8080 in browser Documentation ------------- .. toctree:: :maxdepth: 2 :caption: Contents: installation architecture crawlers api contributing changelog Indices and Tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search`