Changelog

All major changes to PhD Hunter will be recorded here.

[0.1.1] - 2026-04-26

Added

  • OpenAlex Crawler - Replaced arXiv author search as primary paper source
    • Institution + author matching for accurate professor identification

    • arXiv link extraction from OpenAlex locations / open_access fields

    • Graceful handling of non-arXiv papers (conference/journal work without arXiv ID)

  • arXiv Abstract Enrichment - Post-process OpenAlex papers with accurate arXiv abstracts
    • ArxivCrawler.fetch_by_ids(): batch query arXiv by ID list

    • Updates DB abstract and openaccess_pdf fields when arXiv data is better

  • Professor Modal Enhancements
    • Rescore button: re-run LLM scoring after paper edits

    • Add Paper: paste arXiv URL to manually add a paper (with author verification)

    • Delete Paper: remove incorrect papers via × button

  • Scorer Daemon Reliability
    • Persistent event loop in daemon thread (avoids event loop closed errors)

    • Reduced polling frequency (30s) and inter-professor delay (5s) to avoid API rate limits

  • Database
    • update_paper_by_arxiv_id(): update paper fields by arxiv_id + professor_id

    • delete_paper(): delete paper by database ID

Changed

  • Paper Fetching Flow: OpenAlex → save to DB → arXiv enrichment (abstract + PDF URL)

  • arXiv Crawler: fetch_by_titles() now supports progressive query degradation (full title → 5 words → 3 words) with Jaccard similarity filtering

  • Author Verification: _is_author_match() handles initials, last-name-only, and case-insensitive matching

  • Frontend: defensive request.get_json(silent=True) across all POST routes

Fixed

  • OpenAlex arXiv source matching: exact == "arXiv" → case-insensitive substring match (handles "arXiv (Cornell University)" / "ArXiv.org")

  • arXiv ID version stripping: 2512.02589v22512.02589 for consistent DB keys

  • Second crawl no longer re-processes all existing professors (tracks existing_ids_before)

[0.1.0] - 2026-04-25

Added

  • Analyzer Module - LLM-powered professor analysis and cold email generation
    • Auto-generate professor analysis report + cold email draft on first chat

    • Multi-round conversation to refine cold emails

    • Personalized generation based on user Profile (CV/PS/papers)

  • Profile Page - Complete user profile management
    • CV/PS PDF upload and text extraction

    • arXiv paper link addition and parsing

    • Research preference settings

  • Professor Matching Scoring - LLM-driven scoring system
    • Direction Match Score (1-5): Research direction matching degree

    • Admission Difficulty Score (1-5): Admission difficulty assessment

    • Background auto-polling scoring with configurable iterations

  • Homepage Crawler - HTTP + LLM summary
    • Automatically fetch professor personal homepages

    • AI extraction of research focus, recruiting status, content summary

    • Extract recent paper titles from homepage for precise arXiv search

  • arXiv Title Search - Precise paper fetching by title
    • Primary flow: extract paper titles from homepage, search arXiv by exact title

    • Author verification on every result to prevent name collisions

    • Fallback to author-name search when homepage lacks publication list

  • LLM Config Modal - Configure API Key, model, URL, temperature, etc.

  • Chat Page Improvements
    • User/AI avatar distinction

    • Message deletion feature

    • “Analyzing…” loading animation

    • Auto-scroll messages

  • Web Interface Improvements
    • Top bar displays Avg Match / Avg Diff statistics

    • Professor detail paper titles link to arXiv

    • Simplified Basic Info / Metrics layout

Changed

  • Added api_infra module for unified LLM client calls

  • Added utils/pdf_extract.py for PDF text extraction, decoupling scorer and analyzer

  • Database tables added direction_match_score, admission_difficulty_score, homepage_summary, messages fields

[0.0.1] - 2026-04-21

Added

  • ArxivCrawler: Search papers by author

  • CLI command fetch-papers: Batch fetch professor papers

  • CLI commands stats / list: Database queries

  • Web frontend: Professor browsing, filtering, priority marking

Changed

  • Main entry moved to root main.py

  • Simplified project structure

[0.0.0] - 2026-04-19

Added

  • Initial project structure

  • uv dependency management

  • Sphinx documentation system

  • CSRankings crawler basic implementation

  • SQLite database models