What I trust AI to do as a solo Founding Engineer

I am a solo founding engineer at a seed stage YC startup. The founders are based in SF and I'm in India. By the time they wake up, I've usually shipped 2 full features, 3 on a good day.

This sounds as a flex but it's just the reality of what AI tools have made possible. I joined about a year ago, and back then this setup wouldn't have worked. We ran into all sorts of problems like communication issues (being in timezones 12 hours apart) and missed context. And the sheer amount of time it took to take a feature to production because back then Claude Code didn't exist, Cursor hallucinated into oblivion most of the time and coding by hand was the norm.

It wouldn't have worked then. But now it does, and I want to walk through how - what I actually use, what I trust, and what I still don't.

The workflow

Every feature usually starts the same way. Problems get scoped, and I add them to Linear. For the features that need extra context or a longer description, it goes into Notion as well.

Then I open Claude Code.

First pass is almost always planning. I use Opus 4.6 with high effort and let it think through the approach. My CLAUDE.md has a rule that's become non-negotiable - do not make any assumptions. Whenever there are knowledge gaps or confusion, ask follow-ups. This one rule alone has saved me from so many bad implementations. Without it, Claude will confidently build something that's 80% right and 20% wrong in ways you won't catch until production.

Second pass is implementation. Same model, same effort. Claude writes the code, I skim through it. If it looks good, I test it manually.

Yeah, we don't have unit tests. I know. At this stage I'd rather ship fast and test by hand than spend days writing test infrastructure that'll change in two weeks anyway.

If something breaks, I iterate with Claude Code until it's clean. Then I push to GitHub.

This is where Cursor Bugbot comes in. It runs automatically on the PR and catches things I missed - edge cases, potential bugs, style issues. If it flags something, I go back to Claude Code, fix it, push again.

Once it's all clean, I put it up for review. My CTO wakes up in SF, skims the PR, approves, I merge and deploy. A full feature cycle, done async across timezones.

The three MCPs that changed everything

If there's one thing that's made the biggest difference in my workflow recently, it's MCP integrations. Before MCPs, there was so much context I had to manually copy into Claude Code. Now the model just has access.

Figma MCP - This one's huge. I use it to pull designs directly into Claude Code so it can implement UI components from the actual Figma file. But here's the thing - I had to add explicit rules in my CLAUDE.md to make this work well. Without guardrails, Claude will look at a Figma screenshot and decide text-md is close enough when the design clearly says text-[10px]. It'll approximate colors instead of using exact hex values. It'll eyeball border widths. So now my CLAUDE.md says - make sure colours, font sizes, widths, borders and other UI elements are implemented verbatim from the design. No interpretation, no approximation. It still messes up sometimes and I have to go manually intervene. But it is still better than before where I had to paste screenshots and describe every change individually and still it would mess up at various places. Now, it goes to 90% of where I want it to be and the rest I do myself.

Datadog MCP - When something's off in production, I used to switch to Datadog in the browser, dig through traces, figure out what happened, then go back to Claude Code and explain the problem. Now Claude Code just reads the traces directly. It can reason about what went wrong with the actual production data in front of it. Recently there was an error spike that was caused because we were hitting OpenAI rate limits. Earlier I would have to go into the traces and manually look through the logs to see what actually went wrong. This time I just tagged the alert to Claude Code and it went and did a full diagnosis of which LLM calls caused it, which services were affected and what would the correct fix be. This was a massive unlock in the amount of time saved.

Supabase MCP (read only) - This is the one where trust matters most. I gave Claude Code access to our Supabase instance so it can check data schemas while writing APIs. But it's read only. I am not letting an AI model write to my production database and have random data gone. The read access alone is a massive unlock though - no more copying schema definitions back and forth. Claude just looks at the actual tables and writes code accordingly. This one MCP is still tricky though. Even though read only mode wont change your DB, data leaks can still happen. Supabase recommends connecting to staging DBs only and that's what I do.

The common thread - every one of these removes a context-switching step where I used to be the bottleneck, copying information between tools. Now the model has direct access to the source of truth.

What goes in CLAUDE.md and why

I think of CLAUDE.md as the set of rules I've learned the hard way. Every rule in there exists because something went wrong without it. The biggest one - do not make any assumptions. Whenever there are knowledge gaps or confusion, always ask follow-ups. This sounds obvious but the default behavior of LLMs is to be confidently helpful. If you don't explicitly tell them to stop and ask when they're unsure, they'll fill in the gaps with plausible sounding guesses. When you're a solo engineer with no one reviewing the logic, those guesses become production bugs.

The Figma fidelity rules I mentioned above - those came from shipping a component that looked "close enough" in development and then having the designer point out six differences. Now it's a hard rule - verbatim implementation. No creative interpretation.

Where I don't trust it yet

Testing. This is the honest part.

We don't have unit tests. For regular code, I test manually and rely on Bugbot to catch what I miss. It works at our current scale but I know it won't forever. For our LLM features, I have a basic eval setup. I write test cases in a YAML file - input/expected output pairs - and tweak prompts until I hit above 95% accuracy. It's vanilla. It works for now. But it's not robust enough for where we're heading. It still works in weird ways with real data. There are cases I constantly miss for which prompts need to be tweaked constantly. Updating models also comes with new rounds of tests and prompt changes.

This is the area where I most feel the gap of being a solo engineer. At a bigger company, there'd be someone whose whole job is eval infrastructure. Here it's just me, and honestly the feature work always wins the priority battle.

What this actually means

A year ago, a solo engineer in India could not have shipped production features async with a founding team in SF at this pace. The tooling wasn't there. Now it is - not perfectly, not without guardrails, but enough that the workflow works. I am not saying this is the optimal way of doing things. There are definitely a lot of tools I don't know about that can make this more streamlined and cohesive. This is a system that works for me.

The key insight for me has been that AI tools don't just write code faster. They collapse the context switching tax. Figma MCP means I don't leave my editor to check a design. Datadog MCP means I don't leave my editor to investigate a bug. Supabase MCP means I don't leave my editor to look up a schema. Each one of those switches used to break my flow for five to ten minutes. Across a full day, that's hours.

It's not magic. It's plumbing. But good plumbing means by the time my founders open their laptops in San Francisco, the work is already done. That's the thing that would have seemed impossible a year ago - not that AI writes code, but that the coordination overhead of being a solo engineer 12 timezones away mostly just went away. The tools didn't change what I'm building. They changed what I have to stop building to go do something else.