Background
The question I set out to answer: What recurring findings of noncompliance surface across FERC's audits of electric utilities, and could that history one day predict the issues in a new filing?
The Federal Energy Regulatory Commission audits electric utilities, ISOs, and RTOs and publishes its final audit reports (FY2015–present) at ferc.gov/audits. Each one documents specific findings of noncompliance and the corrective recommendations that follow, which is useful to anyone navigating energy regulation, but it ships as one more standalone PDF in a pile of dozens, so the patterns across the corpus are invisible. My use case is to read those audits as a body of evidence: which findings keep coming back, what FERC recommends, and how it differs between financial (FA) and non-financial (PA) audits. The longer-term goal is an "audit-my-document" tool that flags the likely issues in a new filing using patterns mined from this history.
How It Works
Every finding links straight back to the official FERC audit it came from. The tool surfaces what the reports already say and doesn't add anything on top. I vibe-coded the extraction-and-structuring pipeline and then pinned it down with a pytest suite so the parsing stays honest:
- Data sources: FERC's final audit reports (FY2015–present), all public, from ferc.gov/audits; a scraped index of 71 reports seeds a Python CLI pipeline (fetch → extract → structure → patterns → build) that downloads each PDF (rate-limited and cached) and classifies it by FERC form → industry to isolate the Form 1 (electric) subset: both financial (FA) and non-financial (PA) audits of electric utilities, ISOs, and RTOs. It then turns each PDF into per-page text and structures the findings of noncompliance, the corrective recommendations, and the cross-report themes. The pipeline is idempotent (re-runs skip cached downloads); v1 deliberately structures the most-recent electric reports as a proof of concept, with gas (Form 2) and oil (Form 6) reports classified but out of scope for now.
- Where it comes from: every record carries its source URL and capture date back to the official FERC report, so any finding can be traced to the audit it came from.
- UX: a static, vanilla HTML/CSS/JS site (no backend, no framework) that reads baked JSON and is deployed to GitHub Pages.
A Look Inside
Each view shown on mobile and desktop — tap any image to open the live site.



