BACK

What 50 startups taught us about hosting MCP servers

Francisc Toth · OCT 22, 2025 · 7 minutes

We've onboarded around 50 startups in the last three months. Most of them tried to host their own MCP servers first and gave up. Here's what we learned watching them.

The first server takes a weekend

Every team thinks the first server will take an afternoon. The protocol is simple. The SDKs are decent. The hosting choices feel obvious.

Then they hit auth. MCP doesn't standardize how you authenticate to a server because it's up to the transport layer. Teams pick API keys, get something working, then realize they need rotation, revocation, per-team scoping. Now there's a database, an admin UI, an SSO integration.

After auth comes observability. Tool calls fail silently. Models retry the same tool five times because the description was ambiguous. Latency spikes during burst load. Without logs and traces, debugging takes hours of staring at network tabs.

By the time the first server is genuinely production-ready, the weekend has stretched into three weeks. Most teams stop here. They've built something that works for one engineer's laptop but not for the team. The server lives on someone's personal machine until that person leaves the company.

This is the most common pattern we see. It's why we exist.

The "just deploy it" trap

A surprising number of teams try Vercel or Cloud Run first. Both are excellent for stateless HTTP services. Neither is great for MCP servers.

The issue is connection lifecycle. SSE and WebSocket transports need long-lived connections, minutes or hours per client. Serverless platforms bill per request-second. A connection that lasts an hour at 1MB/s of intermittent traffic costs more than a small VPS that handles 100 such connections in parallel.

Worse, the cold-start problem. A serverless MCP server that sleeps after 5 minutes of inactivity wakes up the next time a tool is called. That cold start is 200ms to 2 seconds depending on the platform. The model waits, the user waits, the chat feels broken.

Teams that ship serverless MCP servers usually end up with a "keep-warm" cron hitting their endpoint every minute. At that point you're paying for a permanently-on service anyway, so you might as well run a real one.

The transport mismatch

A third of the teams we onboarded had picked the wrong transport for their use case. The most common mismatch: WebSocket for what should have been SSE.

WebSocket is the obvious choice for bidirectional real-time communication. MCP isn't actually that. Tool calls flow client-to-server, results flow server-to-client. There's no third-party push. SSE handles this asymmetry just fine and doesn't bring the WebSocket operational burden.

Teams that started on WebSocket often spent a week fighting their CDN. Cloudflare, Fastly, AWS CloudFront all have specific WebSocket configurations that are easy to get wrong. SSE works on the same path as your existing API with no special handling.

The data-shape problem

The single biggest performance issue we've debugged: tools that return too much data.

A "search documents" tool that returns full document text for every hit is going to flood the model's context window. The model spends most of its tokens reading documents instead of reasoning about the query. The chat feels slow because the model is genuinely doing more work than necessary.

The fix is almost always to return summaries by default and add a separate "fetch full document" tool. Models naturally chain to that when they need detail. The first tool stays fast, the second one is targeted, total tokens drop.

Teams that don't see this pattern often blame the model. The model isn't slow — it's reading 50KB of JSON to answer a one-line question.

What the successful teams have in common

A few patterns across the teams that got MCP working well.

They started with one server and made it good before adding a second. The temptation to expose every internal API as an MCP server is strong. Resist it. A focused server with five well-described tools outperforms a sprawling server with fifty.

They wrote tool descriptions like they were writing API docs for a confused colleague. "Returns customer data" is useless. "Returns a customer's email, signup date, and current plan tier given their account ID" is what works. The description is the prompt. Treat it that way.

They monitored tool call counts per session. A tool that gets called more than two or three times in one session is usually being called in a loop. That's a signal to improve the description or rethink the tool's shape.

They used resources for static context instead of tools that return static data. A get_schema tool returning the same schema on every call is wasted overhead. A schema://tables resource that the client caches is the right pattern.

What we changed in our product

These patterns shaped Toolcall directly. We default to SSE because it's the right transport for most teams. We expose tool call counts per session as the first metric in the dashboard. We added resource support in v0.2 because tools-for-static-data was the most common anti-pattern we saw.

If you're a few weeks into building your own MCP infrastructure and the operational scope keeps growing, we'd love to talk.