Improving intent classification is an important task in the conversational AI space. In this blog post, we analyze the benefits of using a hybrid NLU/LLM intent classification architecture across small, medium, and large conversational AI datasets. After testing this solution in production with a small cohort for four months, it outperforms NLU models for smaller datasets and slightly outperforms full LLM solutions for 3x-5x lower costs for larger datasets. We also find that state-of-the-art models don't always outperform older models and performance is heavily dataset-dependent. We examine these performance, cost, and UX benefits in the following sections.