How coding agents have built, screwed-up and fixed koenverstrepen.com

I chose to build this website from scratch instead of using pre-built content management systems like WordPress, Substack, or similar platforms. This turned into a fascinating experiment in AI-assisted development.

Why build from scratch?

The main reason is flexibility. If I want to add a chatbot in the future, or integrate some custom recommendation engine, or experiment with interactive visualizations—that's significantly harder with pre-cooked platforms like WordPress.

Normally, building from scratch comes at a high cost. But that cost isn't as much of an issue anymore for two reasons:

It's a learning opportunity: Understanding how these AI coding agents work in practice is valuable in itself.
AI coding agents dramatically reduce the cost: What used to take days or weeks now takes hours.

I was also curious about how different models would perform when building something from scratch. I had experience making PRs on large existing codebases and doing some prototyping in Lovable, but this was a perfect opportunity to test models on a greenfield project.

Finally, Google Antigravity was recently released, and I wanted to try it out.

Model performance: a reality check

Here's what I learned testing different AI coding agents on this project:

Planning with GPT-5.1-thinking

I started by discussing with GPT-5.1-thinking how the site should look. When I asked it to create a prompt for a coding agent, it generated something significantly longer and more detailed than I expected. The planning was thorough.

Gemini 3 Pro (High) in Antigravity: magical

I used Gemini 3 Pro (High) to execute the prompt in Antigravity. It was magical.

The model created a great result without any issues, in the style I wanted—not some generic AI aesthetic like you often get from Lovable. Really impressed.

I was also blown away by the fact that it opened the browser to check if things actually worked. That said, while it checked, it didn't always come to the right conclusions. It would say "this looks great!" when it had actually missed a big styling issue 😂

The big issue with Gemini 3 Pro in Antigravity: rate limits. This became a real bottleneck.

Sonnet 4.5 Thinking: not the same, but still ok

When I hit rate limits, I switched to Sonnet 4.5 Thinking. It's not at the same level as Gemini 3 Pro, but it's still decent. A solid fallback.

GPT-5.1-codex-max in VSCode: disappointing

When Sonnet also hit its rate limits, I switched to VSCode + GPT-5.1-codex-max. I had very good experiences with Codex in the past.

But this was a disappointment.

The models keep improving at a fast rate. One month ago, I was still impressed by Codex. Now it feels inferior compared to Gemini 3 Pro.

Codex has been gaining a lot of momentum in "the (social) media," and I was on board with that hype. But after this experience, my opinion changed significantly.

Claude: the next best thing

I never really tried Claude models properly before. After this project, I have to say they are the next best thing after Gemini. I should use them more.

Gemini 3 Pro fixes Codex's mess

In the end, Gemini 3 Pro had to fix the mess created by Codex. And it did so without too much effort.

That's really a big burn for Codex.

What I'll do next

Use Gemini 3 Pro (Low) more to avoid the heavy token usage from the high version.
Keep my eyes out for a paying option when it becomes available—I'd happily pay to avoid rate limits.
Check out Gemini 3 Pro in Cursor—it seems Gemini 3 Pro is now available there.

The tech stack

Here's what I used to build this site, with comparisons to the best alternatives:

Framework: Next.js 15 (App Router)

Best competitor: Vanilla HTML/CSS/JS (no framework)

Why Next.js?

Server-side rendering (SSR) and static generation are better for SEO
File-based routing makes adding new pages trivial
Built-in image and font optimization when I need it
I expect to add React-based interactive experiments later
MDX integration is seamless

Why not vanilla HTML? I could have built this with plain HTML, CSS, and JavaScript. It would be lighter and simpler for a basic blog. But Next.js gives me better SEO out of the box, and room to grow when I want to add interactive demos or dynamic features.

Content: MDX

Best competitor: Markdown + Headless CMS (Sanity, Contentful)

Why MDX?

Content stays version-controlled alongside code
I can drop in custom React components whenever I want
Frontmatter gives structured metadata for listings and SEO
No API calls, no external dependencies

Why not a headless CMS? A CMS would help for multi-author workflows, but it's overkill for a single-author site. MDX keeps everything simple.

Styling: Tailwind CSS

Best competitor: Vanilla CSS

Why Tailwind?

Gemini 3 Pro works really well with it
Utility classes keep styling consistent and fast to iterate
No runtime cost

Why not vanilla CSS? Honestly, I have no strong preference. Tailwind just happens to be what the coding agent is good at, so I went with it.

Deployment: Vercel

Best competitor: Netlify

Why Vercel?

Free hobby tier
Zero-config Next.js deploys
Good experiences using it in the past

Why not Netlify? I've had good experiences with Vercel, so there's no reason to try something else. It just works.

Comments: Giscus (GitHub Discussions)

Best competitor: Database-backed comments (PostgreSQL + custom backend)

Why Giscus?

I prefer lightweight solutions when there's no reason not to
Zero database infrastructure to maintain
Comments are stored in GitHub Discussions
Free and open-source

Why not a database? If I can avoid managing a database, great. Giscus leverages GitHub's infrastructure—I get comments without running any servers.

Final thoughts

Building this site was a reminder of how fast things are moving. The gap between Gemini 3 Pro and earlier models like Codex is staggering—and we're talking about differences measured in weeks, not years.

If you're building with AI coding agents, my advice:

Gemini 3 Pro (High) in Antigravity is the best I've used for full-stack development.
Claude is a solid second choice when you hit rate limits.
Don't assume yesterday's best model is still competitive today. Re-evaluate often.
Browser verification is powerful but not perfect—always check the output yourself.
Rate limits are the real constraint—plan around them.

This site is a living experiment. Every article, every feature, every styling tweak was built with coding agents. I'll keep documenting what works, what breaks, and what I learn.