API Reference

This project provides two types of APIs: command line interface (CLI) and web REST API (Flask).

Web REST API

Base URL: http://localhost:8080

The web frontend interacts with the backend through the following JSON APIs:

API Endpoints
Method	Path	Description
`GET`	`/`	Serve homepage (return `index.html`)
`GET`	`/api/stats`	Get statistics (professor count, paper count, avg scores, etc.)
`GET`	`/api/professors`	Get all professor list
`GET`	`/api/professor/<int:id>`	Get single professor details
`POST`	`/api/professor/<int:id>/priority`	Update professor priority
`GET`	`/api/chat/<int:id>`	Get chat messages for a professor
`POST`	`/api/chat/<int:id>`	Send message or trigger first-time analysis
`DELETE`	`/api/chat/<int:id>/message`	Delete a chat message
`GET`	`/api/hound-config`	Get LLM configuration
`POST`	`/api/hound-config`	Update LLM configuration
`GET`	`/api/profile`	Get applicant profile
`POST`	`/api/profile`	Update applicant profile
`POST`	`/api/profile/cv`	Upload CV PDF
`POST`	`/api/profile/ps`	Upload PS PDF
`POST`	`/api/profile/papers`	Add arXiv paper links
`POST`	`/api/arxiv/resolve`	Resolve arXiv URL to metadata

Detailed examples:

import requests

# Get statistics
stats = requests.get('http://localhost:8080/api/stats').json()
print(f"Professors: {stats['professors']}, Papers: {stats['papers']}")

# Get professor list
data = requests.get('http://localhost:8080/api/professors').json()
professors = data['professors']

# Update priority
response = requests.post(
    'http://localhost:8080/api/professor/1/priority',
    json={'priority': 1}
)
# Returns: {"success": true, "priority": 1}

# Send chat message
response = requests.post(
    'http://localhost:8080/api/chat/1',
    json={'message': 'Can you make the email shorter?'}
)
# Returns: {"success": true, "response": "..."}

Note

Web server entry is at src/phd_hunter/frontend/app.py. Static files (HTML/CSS/JS) are hosted at src/phd_hunter/frontend/static/.

Command Line Reference

Main entry: python main.py

Commands

phd-hunter crawl [OPTIONS]
phd-hunter fetch-papers [OPTIONS]
phd-hunter stats
phd-hunter list [OPTIONS]

For details, see Architecture Overview.

Database API

Direct programmatic access via phd_hunter.database.Database class.

Quick example:

from phd_hunter.database import Database
from phd_hunter.models import Professor, University

db = Database(db_path="phd_hunter.db")

# List professors
professors = db.list_professors(limit=10)

# Get single professor
prof = db.get_professor(prof_id=1)

# Get professor papers
papers = db.get_papers_by_professor(professor_id=1)

# Export to JSON
db.export_to_json("output.json")

Database Class

Main Methods

Connection and Initialization

__init__(db_path: str = "phd_hunter.db") Initialize database connection and create tables.

Professor Operations

list_professors(status, min_match_score, limit) -> List[Dict] List professors with filtering support.
get_professor(prof_id) -> Optional[Dict] Get professor by ID.
get_professor_by_name(name, university_name) -> Optional[Dict] Get professor by name (optional university).
upsert_professor(prof: Professor, university: University) -> int Insert or update professor record, return database ID.
update_professor_scores(professor_id, direction_match, admission_difficulty) Update professor matching scores.
update_professor_messages(professor_id, messages) Update professor chat messages.

Paper Operations

get_papers_by_professor(professor_id, limit) -> List[Dict] Get all papers for a professor.
upsert_paper(professor_id, paper_data) -> int Insert or update paper record.
get_professor_with_papers(professor_id) -> Optional[Dict] Get professor and all papers (joined query).

API Reference

Web REST API

Command Line Reference

Commands

Database API

Database Class

Main Methods

Model Reference

Professor

Paper

University

Crawler Base Class

CSRankingsCrawler

ArxivCrawler

Analyzer

See Also