Yves Junqueira

Software Engineer, Entrepreneur and Startup Advisor

The Most Productive LLM Stack (for me)

How I build and deploy large language model applications

By Yves Junqueira

For the past several months, I built and deployed multiple applications that rely on large language models (LLMs). A non-exhaustive list:

  • a customer support chatbot for a bank
  • a customer onboarding chatbot for medical clinics
  • a legal document analysis tool
  • an app to search my Kindle library using natural language queries
  • a fast development loop tool that lets you edit the code of large software projects

Two of them are in production serving real users as we speak.

After a lot of trial-and-error, I've gained a good understanding of the tech stack that works best for these types of applications.

Writing code is so easy these days that the hard part is picking the right tools.

In this post, I'll share the stack I've landed when I want to build robust, scalable, and efficient LLM-enabled business-to-business (B2B) applications.

The Most Productive LLM Stack

TypeScript

Type-safe development

Instructor

Typed LLM responses

LlamaIndexTS

Document processing and retrieval

Milvus

Efficient vector storage

Logging+Eval

LLM logging and analysis

Next.js

Full-stack React framework

MongoDB

Flexible, scalable database

Stytch

B2B authentication

Vercel

Seamless deployment

HappyDevKit

Feature flag management

Sentry

Error monitoring

TypeScript

TypeScript forms the backbone of this stack. It provides type safety and improved developer experience across the entire project. I've found this particularly crucial when developing LLM applications.

TypeScript allows you to use the same type definitions on both the frontend and the backend, which is a huge productivity boost. It also catches many bugs at compile time, which is great for maintainability.

This is in contrast to my earlier LLM projects, where I used Python for the backend and something different for the frontend. While Python is a good choice for LLM prototypes, the frontend frameworks for Python are very limited and you end up having to write a lot of custom JavaScript and CSS to make the application look modern.

With React, TypeScript, Tailwind CSS, and Next.js (discussed below), you can build a modern full-stack system with a modern UI and a pleasant feedback loop.

Instructor and llm-polyglot

For LLM integration, I strongly recommend Instructor, a library that provides typed responses, as well as llm-polyglot to switch between different LLM providers.

Typed LLM responses give you predictability. If you want the LLM to generate a sentence, you can be sure that the response will be a string. If you want the LLM to generate a list of sentences, you can be sure that the response will be an array of strings. Alternatively, if you want the LLM to generate a list of sentences with metadata, you can be sure that the response will be an array of objects with a string and a metadata field. Instructor is the best library I have found for this purpose.

Llm-polyglot (still in beta) is a library that goes hand-in-hand with Instructor. It allows you to switch between different LLM providers without changing your code. You can use OpenAI, Anthropic, GroqCloud or Ollama (shout-out to r/LocalLLaMA on Reddit) with the same codebase. You can also use different LLMs for different parts of your application, or even for different users of your application. This is useful because different LLMs have different strengths and weaknesses, and the best options change frequently. In theory, you can switch between different LLMs based on your specific needs while maintaining a consistent interface. In practice, each AI provider has their own quirks that you have to deal with (e.g: OpenAI lets you have system messages anywhere, while Anthropic requires system messages at the top), but any help to reduce boilerplate is welcomed.

LlamaIndexTS

LlamaIndexTS is a powerful RAG tool. It can extract data from structured documents, generate embeddings, assemble complex queries and orchestrate calls to LLMs and vector stores. However, given the rapid changes in their libraries, it's recommended that you carefully select specific features that meet your needs and stick to those, rather than relying on the entire framework.

Milvus

Milvus is a good choice for vector storage in your LLM stack. It provides efficient similarity search and management of vector data. I recommend starting with Milvus Lite during development (yes it is a Python project, but think of it as just a runner for the Milvus server). Then, once you need to scale, Zilliz is a fully managed Milvus service that is easy to scale and maintain.

MongoDB

On the database front, MongoDB has the perfect balance of flexibility and scalability. You can get started easily and evolve your data layout as you go. Once you are happy, you can enforce robustness via schema validation. Its native support for JavaScript objects make it an ideal storage system for TypeScript.

Next.js

Next.js serves as my full-stack framework. It offers server-side rendering capabilities and good performance out of the box. Their API is stable, so you can confidently use Copilot or GPTs as coding assistants to write anything, which is a huge productivity boost.

Stytch

For authentication, Stytch offers robust B2B-focused features, including OAuth (login with Google/Facebook/Microsoft/GitHub) and Magic Links (get a login link via email). I've found its organization management features particularly valuable for enterprise applications.

Logging+Eval

You will need a way to log the responses of your LLM and evaluate them. Don't build this yourself. I have done this in the past and it is a huge waste of time. Instead, use a service like Velvet, that stores the request and responses to the LLM APIs, and lets you analyze them later. As of writing this, I haven't personally used Velvet, but I've heard positive feedback about it.

Vercel

I recommend Vercel for hosting, given its tight integration with Next.js and excellent performance monitoring tools. Its scalability and pricing model are great for growing B2B applications.

When I push my code to GitHub, Vercel automatically builds and deploys the full stack. This seamless integration allows me to focus on developing features rather than managing infrastructure.

HappyDevKit

I picked HappyDevKit to handle feature flag management because it's simple to use with nextjs. It allows for controlled rollouts and A/B testing of new AI features. I've found this essential to iterate development and manage risk in LLM systems once they are in production.

Sentry

Last but not least, Sentry provides error monitoring and alerting for your application. I configure it to send me alerts on slack and email when there are errors in the system. This is essential to maintain the reliability of the application.

Key Takeaways: Recommended Tech Stack for LLM systems

These recommendations are based on my experience developing B2B applications that leverage LLMs. I found this tech stack to be effective, but your mileage may vary depending on your specific requirements. I encourage you to experiment with different combinations and find the ones that work best for you. Let me know what you think, and good luck!

- Yves