How coding agents have built, screwed-up and fixed koenverstrepen.com
I chose to build this website from scratch instead of using pre-built content management systems like WordPress, Substack, or similar platforms. This turned into a fascinating experiment in AI-assisted development.
Why build from scratch?
The main reason is flexibility. If I want to add a chatbot in the future, or integrate some custom recommendation engine, or experiment with interactive visualizations—that's significantly harder with pre-cooked platforms like WordPress.
Normally, building from scratch comes at a high cost. But that cost isn't as much of an issue anymore for two reasons:
- It's a learning opportunity: Understanding how these AI coding agents work in practice is valuable in itself.
- AI coding agents dramatically reduce the cost: What used to take days or weeks now takes hours.
I was also curious about how different models would perform when building something from scratch. I had experience making PRs on large existing codebases and doing some prototyping in Lovable, but this was a perfect opportunity to test models on a greenfield project.
Finally, Google Antigravity was recently released, and I wanted to try it out.
Model performance: a reality check
Here's what I learned testing different AI coding agents on this project:
Planning with GPT-5.1-thinking
I started by discussing with GPT-5.1-thinking how the site should look. When I asked it to create a prompt for a coding agent, it generated something significantly longer and more detailed than I expected. The planning was thorough.
Gemini 3 Pro (High) in Antigravity: magical
I used Gemini 3 Pro (High) to execute the prompt in Antigravity. It was magical.
The model created a great result without any issues, in the style I wanted—not some generic AI aesthetic like you often get from Lovable. Really impressed.
I was also blown away by the fact that it opened the browser to check if things actually worked. That said, while it checked, it didn't always come to the right conclusions. It would say "this looks great!" when it had actually missed a big styling issue 😂
The big issue with Gemini 3 Pro in Antigravity: rate limits. This became a real bottleneck.
Sonnet 4.5 Thinking: not the same, but still ok
When I hit rate limits, I switched to Sonnet 4.5 Thinking. It's not at the same level as Gemini 3 Pro, but it's still decent. A solid fallback.
GPT-5.1-codex-max in VSCode: disappointing
When Sonnet also hit its rate limits, I switched to VSCode + GPT-5.1-codex-max. I had very good experiences with Codex in the past.
But this was a disappointment.
The models keep improving at a fast rate. One month ago, I was still impressed by Codex. Now it feels inferior compared to Gemini 3 Pro.
Codex has been gaining a lot of momentum in "the (social) media," and I was on board with that hype. But after this experience, my opinion changed significantly.
Claude: the next best thing
I never really tried Claude models properly before. After this project, I have to say they are the next best thing after Gemini. I should use them more.
Gemini 3 Pro fixes Codex's mess
In the end, Gemini 3 Pro had to fix the mess created by Codex. And it did so without too much effort.
That's really a big burn for Codex.
What I'll do next
- Use Gemini 3 Pro (Low) more to avoid the heavy token usage from the high version.
- Keep my eyes out for a paying option when it becomes available—I'd happily pay to avoid rate limits.
- Check out Gemini 3 Pro in Cursor—it seems Gemini 3 Pro is now available there.
The tech stack
Here's what I used to build this site, with comparisons to the best alternatives:
Framework: Next.js 15 (App Router)
Best competitor: Vanilla HTML/CSS/JS (no framework)
Why Next.js?
- Server-side rendering (SSR) and static generation are better for SEO
- File-based routing makes adding new pages trivial
- Built-in image and font optimization when I need it
- I expect to add React-based interactive experiments later
- MDX integration is seamless
Why not vanilla HTML? I could have built this with plain HTML, CSS, and JavaScript. It would be lighter and simpler for a basic blog. But Next.js gives me better SEO out of the box, and room to grow when I want to add interactive demos or dynamic features.
Content: MDX
Best competitor: Markdown + Headless CMS (Sanity, Contentful)
Why MDX?
- Content stays version-controlled alongside code
- I can drop in custom React components whenever I want
- Frontmatter gives structured metadata for listings and SEO
- No API calls, no external dependencies
Why not a headless CMS? A CMS would help for multi-author workflows, but it's overkill for a single-author site. MDX keeps everything simple.
Styling: Tailwind CSS
Best competitor: Vanilla CSS
Why Tailwind?
- Gemini 3 Pro works really well with it
- Utility classes keep styling consistent and fast to iterate
- No runtime cost
Why not vanilla CSS? Honestly, I have no strong preference. Tailwind just happens to be what the coding agent is good at, so I went with it.
Deployment: Vercel
Best competitor: Netlify
Why Vercel?
- Free hobby tier
- Zero-config Next.js deploys
- Good experiences using it in the past
Why not Netlify? I've had good experiences with Vercel, so there's no reason to try something else. It just works.
Comments: Giscus (GitHub Discussions)
Best competitor: Database-backed comments (PostgreSQL + custom backend)
Why Giscus?
- I prefer lightweight solutions when there's no reason not to
- Zero database infrastructure to maintain
- Comments are stored in GitHub Discussions
- Free and open-source
Why not a database? If I can avoid managing a database, great. Giscus leverages GitHub's infrastructure—I get comments without running any servers.
Final thoughts
Building this site was a reminder of how fast things are moving. The gap between Gemini 3 Pro and earlier models like Codex is staggering—and we're talking about differences measured in weeks, not years.
If you're building with AI coding agents, my advice:
- Gemini 3 Pro (High) in Antigravity is the best I've used for full-stack development.
- Claude is a solid second choice when you hit rate limits.
- Don't assume yesterday's best model is still competitive today. Re-evaluate often.
- Browser verification is powerful but not perfect—always check the output yourself.
- Rate limits are the real constraint—plan around them.
This site is a living experiment. Every article, every feature, every styling tweak was built with coding agents. I'll keep documenting what works, what breaks, and what I learn.