3 Months of JobIntel: What I Built, What I Learned, What's Next

Brian Will13 min read
building-in-publicsaasjobintelstartup

On March 9, 2026, I launched JobIntel with 110,000 lines of code, 236 API endpoints, 4,869 automated tests, and a hypothesis: that job seekers deserve the same quality of intelligence that employers have had for decades. Three months later, I have data on whether that hypothesis holds.

This is an honest retrospective. The numbers that follow are real - good and bad. Building a SaaS in 2026 as a solo founder using AI-assisted development is a story worth telling transparently, not because everything worked, but because the lessons are more useful when you see the full picture.

Three months ago, I shared how I built the platform using AI - an 8-agent Claude Code Agent Team, 112 sprints, and the kind of test coverage that would survive a code review at any company I have worked for. That post described what I started with. Here is where it stands now.

In "Enterprise AI Adoption", I outlined a framework for how organizations should evaluate, implement, and govern AI systems. The 8-agent development team I described in "I Built an Entire SaaS Platform Using AI" was that framework in practice - sprint protocols, completion gates, security reviews, all the enterprise discipline I had prescribed for organizations with hundreds of engineers, applied to a team of one. Three months in, the framework held.

The numbers, side by side

MetricAt Launch (Mar 9)Month 1 (Apr 9)Month 2 (May 8)Now (May 25)
Lines of code110,000293,579373,055~400,000
Tests passing4,8696,200+6,500+~6,700
API endpoints236410414~425
Database migrations53109124~130
Git commits1,2881,727~1,950
Production releases1367

Performance over the trailing 30 days, measured at the load balancer across 117,612 requests: p50 10 ms, p95 435 ms, p99 1.12 s. Full uptime 99.965% (infrastructure uptime 99.987% - the small gap between the two is from application errors, not infrastructure failures).

The numbers I want you to look at are the endpoint and test rows.

Month 1 added 174 endpoints (236 → 410). Month 2 added four. Month 3 added a handful, almost all for the India market. That is not a slowdown - it is a different kind of work showing up in the same column.

Now look at tests. Three months added ~1,800. About 200 of them came in the last 30 days, almost all from a single ticket - the Spine framework, which I will get to in a minute. The test count keeps climbing even as the endpoint count flattens. That is the shape of a codebase shifting from "ship features" to "harden what is here."

Three months of build does not look like a single trend line. It looks like three different jobs.

Three months, three different kinds of work

Month 1 was features. Mock Interview Simulator. Chrome Extension across nine boards. Geographic radius search backed by 41,000 ZIP codes. Multi-language support across English, German, Spanish, and French. Resume Builder. TOTP. Passkeys. Magic links. Coach Console Beta. Resume Intelligence v1.1 with five-version library and side-by-side JD compare. The user-visible surface of the product more than doubled.

Month 2 was stabilization. Two coordinated architecture-remediation waves in five days. The first was an autonomous overnight window on April 23 where the agent system filed 47 tickets and shipped 8 architecture fixes to main while I was asleep. The second was a single 80-ticket Phase Completion Protocol on April 26 - five parallel agents working different domains, 24 broken database relationships repaired, 89 Redis cache keys consolidated into one registry, a stuck-scoring recovery system shipped with proper safety limits and a feature flag, an internal route renamed because the old name was intermittently breaking deployments. None of it ships in a release-notes bullet a user reads. All of it determines whether the platform survives the next twelve months. I wrote about Month 2 in detail in Building in Public - Month 2: The Work Changed.

Month 3 was about smarter gates. The work shifted again, this time from "harden the platform" to "harden the system that builds the platform." Three threads ran in parallel: a new test framework (Spine + Page Integrity Audit) that makes "click and nothing happens" structurally impossible to ship; the JobIntel for India launch with geo-aware INR pricing, PIN-code search, and lakhs-and-crores numbering; and a set of CI gates that close categories of bugs at the workflow layer rather than the code layer.

The system catches its own bugs

The most important thing that happened in three months is a pattern, not a feature. The build started catching its own mistakes.

The admin authentication regression (Month 2). A change to admin auth, if shipped, would have locked everyone out of the admin console including me. The Preview Verification Suite I built in March - 31 checks covering page rendering, visual changes, basic functionality, and file downloads - refused to go green. The deployment did not progress past preview. I rolled the change back, fixed the gate logic, ran the suite again, and shipped clean. The PVS exists because of a March deployment that passed every test I had at the time and still failed for real users; the lesson stuck.

The placeholder content near-leak (Month 3). A blog post I had pre-staged with bracketed placeholder markers - the kind that say "fill in the real platform number here" - and a deliberately far-future publish date got its date flipped to the past somewhere along the way. The publish workflow synced it to S3. It started rendering on Preview with 23 unfilled markers visible. Production only escaped by accident: a separate Terraform drift had Prod still serving from the old bundled image. The next Prod deploy would have shipped placeholders to real readers. I deleted the offending file and added a workflow gate that scans every push for nine flavors of unfilled marker (the kinds of bracketed TODOs and placeholders that have no business reaching customers). Push fails before any AWS calls if any are found. This piece, the one you are reading, is the first long-form blog post I have edited under that gate.

The Spine framework's first real bug (Month 3). Spine is the new test framework: every declared interactive element must produce an explicit ExpectedSideEffect - navigation, modal, toast, network call, or visible UI change. The first time I ran it across the full admin surface, it surfaced a real accessibility bug on the admin announcements Switch component. Not a regression. A latent bug that had been there since the announcement system shipped, invisible to every other test I had. That is the value of a test framework that asserts on shape, not on behavior: it finds the bugs the behavioral tests cannot.

The pattern across all three: the system is starting to catch the mistakes that human review would also catch, but faster and more consistently. That is the difference between Month 1, where I caught bugs by reading code, and Month 3, where the build catches them by running.

Investing in how the work gets done

A lot of Month 2 and Month 3 went into the meta-work of running the build itself. Most of this is invisible to users. All of it changes what kind of work I can do tomorrow.

My agent instructions file, CLAUDE.md, was 1,774 lines on April 21. It is 388 lines today - a 78% cut in one afternoon. The agents got faster. The cost-per-task dropped. The signal-to-noise ratio in the file got dramatically better.

Seven automated pre-commit hooks went live in late April and have been running on every commit since. One catches a class of frontend build-arg bugs that used to silently break the production bundle. One stops an IAM policy from growing past the size the cloud provider will accept. One scans for security checks someone tried to silence with a comment. One catches when a feature name from the codebase shows up in a marketing draft before the feature has actually shipped - which sounds paranoid until you remember that shipping a press release about a feature that was reverted last week is exactly the kind of avoidable embarrassment a check can prevent.

A dedicated code-reviewer agent joined the team. An architecture review workflow now runs on demand and commits its findings directly into the repository as documentation. Mutation testing - testing the tests by intentionally breaking the code - went from ad-hoc to nightly. A diff-cover ratchet enforces that any new code includes at least 90% test coverage. The import-linter is now wired in as a fitness function, so architectural rules I had been writing in documents and hoping people would follow are now enforced by the build. Anthropic spend now breaks down per feature and per user, which means I can finally answer "what does it actually cost me to run a Mock Interview" in a query instead of a guess.

The two newest doc-drift rules (Month 3, SCRUM-1053) catch a specific failure mode that bit me in May: release notes drifting from ship truth. R6 makes publishedAt dates immutable once shipped. R7 lints release notes for internal-plumbing items that should never appear in customer-facing notes. Both came out of one incident where v3-4-0 release notes had drifted to the wrong date and v3-5-0 had picked up seven internal items.

Misfires

I am going to be specific about two places where building with agents went badly wrong before it went right.

The $2,639 Anthropic bill. The 30-day Anthropic bill at the end of April was $2,639. The recent 10-day pace was $163 a day. JobIntel had effectively zero paying customers at the time, and at that rate I was on track for about $59,500 a year in Claude spend alone. "I am spending too much on Claude" is not a fixable observation. The hard question, and the one that took two ticket cycles to answer, was: on what? Every API call was billing against the same key. The first investment was a ledger. I split the one key into five separate keys, one per context: production, staging, test, automation tooling, my laptop. Six hours after the cutover, the per-key view told a story I had not been able to see before - production was burning 53 cents in three hours; the automation tooling was burning 60 dollars in the same window. The fix that mattered most was a single configuration edit: I changed an automated security review workflow to run only on release tags instead of every code push. Five minutes of work after several hours of forensics. Conservative recurring savings about $1,000 a month. The full forensic write-up is in the Month 2 Substack - the abbreviated lesson is that the trigger filter on a CI workflow captured forty percent of the recurring monthly burn. Not the app. Not the model. The trigger filter.

The CLAUDE.md context-bloat. For two months I had been letting context bloat in the agent instructions file. Every observation, every edge case, every "remember to..." went into a single sprawling document the agents had to re-read on every invocation. By April 21 the file was 1,774 lines. Cutting it took an afternoon. The cost was that the slowdown had been gradual enough that I did not notice it until I looked at the file as a system, not as a list. The lesson: agent instruction files need the same discipline as code files. Refactor regularly. Reject the impulse to "just append a note." If you cannot summarize the rule in one line, you do not yet understand the rule.

What I gave up to get here

Three months ago I shipped 174 endpoints in Month 1. Then four in Month 2. Then a handful in Month 3. The user-visible surface of the product moved fast at first and slowed dramatically. If you measure by "what shipped that customers can see," Months 2 and 3 look slow.

I made that call deliberately. The codebase had reached the size at which the cost of every new feature was starting to compound. The choice was either to keep piling features on top of an architecture that was beginning to creak, or pause feature work, do the remediation, and ship the next feature batch into a sturdier substrate. There is no version of this that does not feel slow at the time.

The bet is that the second three months of JobIntel will be much faster because of the time spent in April and May not adding endpoints. I will know if that bet was right around the end of August.

What comes next

The first three months proved the platform can exist, survive its first round of feature-load, and start catching its own mistakes. The next three are about extending the platform to surface intelligence at the patterns the data can now support.

Q3 2026 technical roadmap: Extend international support to EUR (Europe), building on the India launch. Updated Chrome Extension. More extensive video tutorials. Native iOS support.

The platform-data report I had originally scheduled for early May is on hold until the dataset reaches a size that supports honest analysis. More to come.

The simple truth

Three months in, the hypothesis holds. The 8-agent development framework I described in "I Built an Entire SaaS Platform Using AI" absorbed a quadrupling of the codebase, two coordinated architecture-remediation waves, a new international market, and a meta-overhaul of how the build itself runs - without an outage and without an incident that reached real users. Month 1 was about proving the system could exist. Month 2 was about proving it could harden under load. Month 3 was about proving it could start catching its own mistakes.

The thing I would not have predicted on launch day, when I was eighteen hours into shipping seven hotfix patches, was how fast the work would shift from "agents help me write code" to "agents do the meta-work I used to do by hand." That is not a shortcut. It is a different kind of leverage, and it is what made the architecture work in Month 2 and the gate work in Month 3 possible at all. No solo founder is closing 80 tickets in a single session by hand.

Three months in, the work changed twice. That is exactly the right thing to have happened.


Try JobIntel free at jobintel.com. See credibility scores, skill matches, and salary data for every listing. $8.99/month.

Ready to take control of your job search?

Sign up for JobIntel — free.

Get Started Free