Changelog
=========

All major changes to PhD Hunter will be recorded here.

[0.1.1] - 2026-04-26
---------------------

Added
~~~~~

* **OpenAlex Crawler** - Replaced arXiv author search as primary paper source
    * Institution + author matching for accurate professor identification
    * arXiv link extraction from OpenAlex ``locations`` / ``open_access`` fields
    * Graceful handling of non-arXiv papers (conference/journal work without arXiv ID)
* **arXiv Abstract Enrichment** - Post-process OpenAlex papers with accurate arXiv abstracts
    * ``ArxivCrawler.fetch_by_ids()``: batch query arXiv by ID list
    * Updates DB ``abstract`` and ``openaccess_pdf`` fields when arXiv data is better
* **Professor Modal Enhancements**
    * **Rescore** button: re-run LLM scoring after paper edits
    * **Add Paper**: paste arXiv URL to manually add a paper (with author verification)
    * **Delete Paper**: remove incorrect papers via × button
* **Scorer Daemon Reliability**
    * Persistent event loop in daemon thread (avoids ``event loop closed`` errors)
    * Reduced polling frequency (30s) and inter-professor delay (5s) to avoid API rate limits
* **Database**
    * ``update_paper_by_arxiv_id()``: update paper fields by arxiv_id + professor_id
    * ``delete_paper()``: delete paper by database ID

Changed
~~~~~~~

* **Paper Fetching Flow**: OpenAlex → save to DB → arXiv enrichment (abstract + PDF URL)
* **arXiv Crawler**: ``fetch_by_titles()`` now supports progressive query degradation (full title → 5 words → 3 words) with Jaccard similarity filtering
* **Author Verification**: ``_is_author_match()`` handles initials, last-name-only, and case-insensitive matching
* **Frontend**: defensive ``request.get_json(silent=True)`` across all POST routes

Fixed
~~~~~

* OpenAlex arXiv source matching: exact ``== "arXiv"`` → case-insensitive substring match (handles ``"arXiv (Cornell University)"`` / ``"ArXiv.org"``)
* arXiv ID version stripping: ``2512.02589v2`` → ``2512.02589`` for consistent DB keys
* Second crawl no longer re-processes all existing professors (tracks ``existing_ids_before``)

[0.1.0] - 2026-04-25
---------------------

Added
~~~~~

* **Analyzer Module** - LLM-powered professor analysis and cold email generation
    * Auto-generate professor analysis report + cold email draft on first chat
    * Multi-round conversation to refine cold emails
    * Personalized generation based on user Profile (CV/PS/papers)
* **Profile Page** - Complete user profile management
    * CV/PS PDF upload and text extraction
    * arXiv paper link addition and parsing
    * Research preference settings
* **Professor Matching Scoring** - LLM-driven scoring system
    * Direction Match Score (1-5): Research direction matching degree
    * Admission Difficulty Score (1-5): Admission difficulty assessment
    * Background auto-polling scoring with configurable iterations
* **Homepage Crawler** - HTTP + LLM summary
    * Automatically fetch professor personal homepages
    * AI extraction of research focus, recruiting status, content summary
    * **Extract recent paper titles** from homepage for precise arXiv search
* **arXiv Title Search** - Precise paper fetching by title
    * Primary flow: extract paper titles from homepage, search arXiv by exact title
    * Author verification on every result to prevent name collisions
    * Fallback to author-name search when homepage lacks publication list
* **LLM Config Modal** - Configure API Key, model, URL, temperature, etc.
* **Chat Page Improvements**
    * User/AI avatar distinction
    * Message deletion feature
    * "Analyzing..." loading animation
    * Auto-scroll messages
* **Web Interface Improvements**
    * Top bar displays Avg Match / Avg Diff statistics
    * Professor detail paper titles link to arXiv
    * Simplified Basic Info / Metrics layout

Changed
~~~~~~~

* Added ``api_infra`` module for unified LLM client calls
* Added ``utils/pdf_extract.py`` for PDF text extraction, decoupling scorer and analyzer
* Database tables added ``direction_match_score``, ``admission_difficulty_score``, ``homepage_summary``, ``messages`` fields

[0.0.1] - 2026-04-21
---------------------

Added
~~~~~

* ArxivCrawler: Search papers by author
* CLI command ``fetch-papers``: Batch fetch professor papers
* CLI commands ``stats`` / ``list``: Database queries
* Web frontend: Professor browsing, filtering, priority marking

Changed
~~~~~~~

* Main entry moved to root ``main.py``
* Simplified project structure

[0.0.0] - 2026-04-19
---------------------

Added
~~~~~

* Initial project structure
* uv dependency management
* Sphinx documentation system
* CSRankings crawler basic implementation
* SQLite database models