“Give me tokens. Just give me tokens. I want them fast. I want them cheap. I want them now.”
That’s the mantra for developers building software on generative AI models, or at least what Parasail CEO Mike Henry hears. Parasail provides a cloud computing service to companies running AI models for inference, and Henry told TechCrunch it generates 500 billion tokens a day. How’s that for tokenmaxxing?
Henry was an executive at Groq, the LLM-focused chipmaker, where he built the company’s cloud offering, an early recognition that developers building software on AI models would want cloud processing specialized to their needs. Now, after coming out of stealth a year ago, Parasail has raised a $32 million Series A to do that at scale.
Henry has a background in physical chip design, but Parasail isn’t committed to owning its own chips. While some of its GPUs are its own, the company mainly rents processing time at 40 data centers in 15 countries around the globe, and buys more from liquidity markets, orchestrating that all behind the scenes to drive down the cost of inference requests.
By allocating workloads cleverly and avoiding demand peaks, the company aims to compete with firms that own their own silicon and might be constrained by existing customer commitments and workloads.
The company’s potential relies on the continued proliferation of open-source models and agents outside of frontier labs. Parasail’s executives and investors say this is driven by the growing cost and friction of using offerings from companies like Anthropic and OpenAI.
Instead, a hybrid architecture is emerging, according to Andreas Stuhlmüller, the CEO of Elicit, a startup that has raised a $22 million Series A to develop a research assistant for scientific literature. His customers at top pharmaceutical companies use the LLM-based tool to review and analyze data from tens of thousands of scientific papers.
“We’ve moved more towards open models because it’s pretty rough sending 100,000s of requests to an API endpoint,” Stuhlmüller told TechCrunch, especially now that the company is relying on agents to improve its offering, splitting up tasks and working more strategically over longer time horizons. Open models handle the initial screening to drive down the cost of the work, before a more capable frontier model provides a final answer.
The proliferation of model queries, as agents become an increasingly common part of software development, is driving the investment in companies like Parasail that provide the infrastructure for cheap inference. Samir Kumar, a partner at Touring Capital who co-led this round, told TechCrunch he expects inference to be at least 20% of the cost of building software in the future.
How much of that market could be Parasail’s? In the crowded cloud compute space, Henry argues that his firm’s focus on inference (no training allowed) and willingness to take on startup customers without long-term commitments sets his offering apart from larger cloud-computing companies focused on enterprise business, and even better-funded competitors in the cloud inference space, like Fireworks AI and Baseten.
Of course, there’s a different kind of risk when all of your customers are seed and Series B startups in the unpredictable AI sector.
Steve Jang, a partner at Kindred Ventures, the other co-leader in this fundraising, says the economics of deploying models will demand the kind of compute brokerage Parasail provides. And that’s before widespread use of models for content generation and robotics.
“Everyone thought there was an AI bubble. There’s no AI bubble,” he told TechCrunch. “Inference demand is far outstripping supply.”