JOB SCOUT — MULTI-SOURCE SCRAPER
Each run fetches listings from 6+ job boards simultaneously: Indeed RSS feeds (30+ targeted searches), USAJobs federal API, Adzuna API, RemoteOK, Remotive, Jobicy, and We Work Remotely. A rotating keyword bank expands search coverage across successive runs so the same queries are never repeated back-to-back.
DEDUPLICATION + SEEN-JOBS CACHE
URLs are deduplicated across all sources before scoring. A persistent JSON cache tracks every seen job URL with a timestamp. Jobs are skipped for 30 days after first seen, then automatically re-enter the pool — so the agent gets deeper into the job boards over time without ever re-scoring something you already reviewed.
LOCATION FILTER — HAVERSINE DISTANCE
Every listing is checked against a strict two-pass location filter. First, remote signals are detected in the title, location field, and description. Then, non-remote jobs are run through coordinate lookup against a 60+ city database using the Haversine formula. Only remote jobs or jobs within 50 miles of home pass through. Far-away states are hard-rejected before coordinate lookup runs.
CLAUDE AI SCORING — HAIKU MODEL
Each qualifying listing is scored 1–10 by Claude Haiku against a detailed candidate profile covering target roles, skills, pay requirements, and location preferences. The prompt is optimized to under 200 tokens using a cached system prompt, keeping per-run API costs under $0.05. Score rationale is stored with each result for human review.
DIGEST EMAIL + GOOGLE SHEETS LOGGING
All qualifying jobs are batched into a single HTML digest email sent via Gmail API — no per-job emails, no PDF attachments. Each job entry shows score, company, location, pay range, rationale, and a direct link to the posting. Results are simultaneously logged to Google Sheets for historical tracking.
FLASK DASHBOARD — STATUS TRACKING + ON-DEMAND GENERATION
A local Flask dashboard provides full visibility into all discovered jobs. Status tracking (Found → Applied → Interview → Offer/Rejected) is persisted to JSON. On-demand resume and cover letter generation calls Claude to tailor documents to each specific job, generating PDFs only when requested — not on every pipeline run.
TECH STACK
- Python 3.12 — core language
- Anthropic API — AI scoring + document generation
- Flask — local web dashboard
- Gmail API + Google Sheets API — dispatch + logging
- feedparser — RSS ingestion
- ReportLab — PDF generation
- requests — API integrations
- Windows Task Scheduler — hourly automation
PROJECT HIGHLIGHTS
- Runs hourly 7am–6pm via Task Scheduler
- Searches 6+ job boards per run
- 30-day seen-job cache with auto-expiry
- Rotating keyword bank across runs
- Haversine distance filtering (50mi radius)
- AI scoring under $0.05/run optimized
- On-demand tailored resume + cover letter
- Full status lifecycle tracking
- Single HTML digest email per run
- Built entirely as a personal job search tool
Full source code available on GitHub.
Sanitized for public release — API keys and personal data removed. Includes setup instructions and .env template.