Kadoa/Blog/Lessons Learned from Building an AI Startup
Blog post illustration

Lessons Learned from Building an AI Startup

3 Mar 2024

Building an AI startup today feels like the early days of the internet and the app store, with a rarely seen pace of innovation and lots of new opportunities. Many new business cases that were neither possible nor economical before now become feasible thanks to the rapid advances in AI.

It's as if your product automatically becomes cheaper, more reliable, and more scalable with each new major AI advancement. However, developing a production-ready AI application has been anything but straightforward over the last year:

We struggled with limited context windows

One of the first hurdles we encountered with LLMs was the limitation of limited context windows. Before analyzing the structure of a website, we had to slice the DOM of websites into multiple smaller chunks while still keeping the overall context. This was both error-prone and slow.

Over time, context windows increased significantly and we were able to reduce the number of LLM calls and increase the reliability. Here's a brief overview of how context window capacities have evolved across various models:

ModelContext Window
GPT-32048 tokens
GPT-3.54096 tokens
GPT-3.5-turbo16,385 tokens
Claude 1.0100,000 tokens
GPT-44096 tokens
gpt-4-06138192 tokens
GPT-4 (latest)128,000 tokens
Claude 2.1200,000 tokens
Gemini 1.51,000,000 tokens

We had issues with consistent JSON output

We also encountered challenges in generating consistent JSON output, which was critical for our data processing pipelines. This inconsistency often led to additional overhead in parsing, data validation, and retry mechanisms.

Early workarounds included manual JSON parsing of the LLM outputs:

Json

The introduction of function calls and dedicated JSON output modes also resolved this issue and made our code much more robust

We had rate limiting and performance issues with 3rd party models

We encountered numerous rate limiting and performance issues, especially during traffic spikes when many users concurrently used our product. This was not just a technical hurdle but also a critical business constraint. Our startup relied on synchronous GPT-4 calls for configuring new data workflows.

OpenAI provided a form to request a rate limit increase, but most startups like ours never received a response, likely because OpenAI was overwhelmed by the demand and had limited infrastructure capacity. We can find many similar stories on the OpenAI forum.

Quota

Over time, the issue of rate limits has largely been mitigated. OpenAI introduced various usage tiers that offer higher rate limits for increased monthly usage.

We also added fallback logic to direct traffic to other providers such as Anthropic or Mistral when we encounter rate limits or outages, ensuring we can maintain operations. Models like Ollama even offer built-in compatibility with the OpenAI API, making them easily interchangeable.

Groq's LPU Inference Engine, designed for high-speed language model inference, will significantly improve latency and throughput, enabling many new near-real-time LLM use cases.

Hosting OSS models was a pain

Hosting OSS models initially was quite challenging, particularly in terms of setup and operational overhead. However, the rapidly advancing OSS landscape has begun to catch up. There are several helpful libraries and tools that simplify the distribution and running of open source models, such as llamamfile. Additionally, the community has produced excellent resources on self-hosting, such as the comprehensive guide on deploying Ollama on GCP.

The improvements in ease of use for self-hosting LLMs over the last few months have been amazing. With the launch of new powerful open source models like Google Gemma, Llama-2, and Mistral, I only expect this trend to continue.

By leveraging self-hosted LLMs for well-defined and constrained tasks, such as data transformation (e.g. mapping or classification) or browser navigation, we significantly reduced our costs and improved our operational efficiency.

As open source models continue to evolve and become increasingly easy to self-host, we plan to continue the transition from relying on third-party APIs to self-hosting our own models for smaller, less complex LLM tasks.

Differentiate with everything “non-AI”

Every startup still needs to build up defensibility and focus on differentiating with everything “non-AI” as the entry barriers become lower and established players integrate AI into their offerings while leveraging their distribution advantage. And as in every hype cycle: When all you have is a hammer, everything looks like a nail.

Here are some strategies to consider:

  • Differentiate with everything “non-AI”: the best products will be defined by everything they do that's not AI, such as workflows, UIs, UX, performance.
  • Combine AI with traditional engineering: Integrate AI only where necessary to solve specific problems that standard coding cannot efficiently address.
  • Avoid over-reliance on 3rd party models: Find the right abstraction to switch and deploy models quickly.
  • Consider cost and performance limitations: LLMs can be expensive and slow. Sometimes there are traditional approaches that are more efficient.

Conclusion

Transitioning from a cool MVP with low complexity to a production-ready and efficient AI application has been challenging, but it has become much easier over time.

Even though the current market favors incumbents that embed AI into their existing products and leverage their distribution advantage, there are still many opportunities for startups in creating differentiated vertical or horizontal solutions.

We're still only at the beginning of AI adoption, similar to the early days of cloud adoption. I'd say it's one of the most exciting times to build an AI startup.