Changelog
All major changes to PhD Hunter will be recorded here.
[0.1.1] - 2026-04-26
Added
- OpenAlex Crawler - Replaced arXiv author search as primary paper source
Institution + author matching for accurate professor identification
arXiv link extraction from OpenAlex
locations/open_accessfieldsGraceful handling of non-arXiv papers (conference/journal work without arXiv ID)
- arXiv Abstract Enrichment - Post-process OpenAlex papers with accurate arXiv abstracts
ArxivCrawler.fetch_by_ids(): batch query arXiv by ID listUpdates DB
abstractandopenaccess_pdffields when arXiv data is better
- Professor Modal Enhancements
Rescore button: re-run LLM scoring after paper edits
Add Paper: paste arXiv URL to manually add a paper (with author verification)
Delete Paper: remove incorrect papers via × button
- Scorer Daemon Reliability
Persistent event loop in daemon thread (avoids
event loop closederrors)Reduced polling frequency (30s) and inter-professor delay (5s) to avoid API rate limits
- Database
update_paper_by_arxiv_id(): update paper fields by arxiv_id + professor_iddelete_paper(): delete paper by database ID
Changed
Paper Fetching Flow: OpenAlex → save to DB → arXiv enrichment (abstract + PDF URL)
arXiv Crawler:
fetch_by_titles()now supports progressive query degradation (full title → 5 words → 3 words) with Jaccard similarity filteringAuthor Verification:
_is_author_match()handles initials, last-name-only, and case-insensitive matchingFrontend: defensive
request.get_json(silent=True)across all POST routes
Fixed
OpenAlex arXiv source matching: exact
== "arXiv"→ case-insensitive substring match (handles"arXiv (Cornell University)"/"ArXiv.org")arXiv ID version stripping:
2512.02589v2→2512.02589for consistent DB keysSecond crawl no longer re-processes all existing professors (tracks
existing_ids_before)
[0.1.0] - 2026-04-25
Added
- Analyzer Module - LLM-powered professor analysis and cold email generation
Auto-generate professor analysis report + cold email draft on first chat
Multi-round conversation to refine cold emails
Personalized generation based on user Profile (CV/PS/papers)
- Profile Page - Complete user profile management
CV/PS PDF upload and text extraction
arXiv paper link addition and parsing
Research preference settings
- Professor Matching Scoring - LLM-driven scoring system
Direction Match Score (1-5): Research direction matching degree
Admission Difficulty Score (1-5): Admission difficulty assessment
Background auto-polling scoring with configurable iterations
- Homepage Crawler - HTTP + LLM summary
Automatically fetch professor personal homepages
AI extraction of research focus, recruiting status, content summary
Extract recent paper titles from homepage for precise arXiv search
- arXiv Title Search - Precise paper fetching by title
Primary flow: extract paper titles from homepage, search arXiv by exact title
Author verification on every result to prevent name collisions
Fallback to author-name search when homepage lacks publication list
LLM Config Modal - Configure API Key, model, URL, temperature, etc.
- Chat Page Improvements
User/AI avatar distinction
Message deletion feature
“Analyzing…” loading animation
Auto-scroll messages
- Web Interface Improvements
Top bar displays Avg Match / Avg Diff statistics
Professor detail paper titles link to arXiv
Simplified Basic Info / Metrics layout
Changed
Added
api_inframodule for unified LLM client callsAdded
utils/pdf_extract.pyfor PDF text extraction, decoupling scorer and analyzerDatabase tables added
direction_match_score,admission_difficulty_score,homepage_summary,messagesfields
[0.0.1] - 2026-04-21
Added
ArxivCrawler: Search papers by author
CLI command
fetch-papers: Batch fetch professor papersCLI commands
stats/list: Database queriesWeb frontend: Professor browsing, filtering, priority marking
Changed
Main entry moved to root
main.pySimplified project structure
[0.0.0] - 2026-04-19
Added
Initial project structure
uv dependency management
Sphinx documentation system
CSRankings crawler basic implementation
SQLite database models