Case Study: CollegeVine Detects Incorrect Claims from AI Agent Responses at scale with Refuel

Before and after using an LLM for lead qualification

About CollegeVine

CollegeVine provides a platform that enables higher education institutions to deploy AI agents for their operations, with millions of daily interactions between agents and students. CollegeVine’s platform leverages AI models from a range of providers, from OpenAI, Anthropic and more.

Considering that models possessed the risk of hallucination and outdated training data, CollegeVine sought a means of improving output quality to meet the high standard of trust required by higher ed institutions and their constituents.

Specifically, CollegeVine sought to catch incorrect claims by blocking and fixing messages within a matter of seconds.
‍

Situation

Initially, CollegeVine was using GPT-4o-mini for detecting claims on a given academic institution (ex. majors, courses, programs offered by the institution)

In production, GPT-4o-mini landed at sub 85% accuracy, meaning incorrect claims were still being shown to end users, with valid claims accidentally being filtered out on occasion

A larger model such as GPT-4o wasn’t an option, as latency would increase by 5x, while accuracy only rose to 87%. CollegeVine then turned to Refuel to explore if the needed latency, quality and throughput was achievable
‍

How Refuel helped and Impact

CollegeVine uploaded a small fraction of their production traffic onto Refuel, and using Refuel Cloud, was able to classify and label the veracity of specific claims

Using this human-verified, labeled data, CollegeVine fine-tuned Refuel-LLM to achieve 93% accuracy on a held-out dataset, which was then seamlessly deployed as an endpoint for inference

CollegeVine then effortlessly incorporated the deployed endpoint in their production pipeline, while leveraging Refuel Cloud for further feedback and improvement
‍

Impact

CollegeVine’s custom built model with Refuel achieved 50% fewer errors, 40% faster speeds, and 60% in cost reductions compared to GPT-4o.

The model effortlessly auto-scaled with CollegeVine’s traffic and currently serves 2B+ tokens per day, all while requiring virtually zero infrastructure effort from their engineering team.

The end to end process required < 2 days of engineering effort, compared to 3-4 weeks of engineering needed for projects of a similar undertaking without Refuel.
‍
‍

“We are amazed at how easily we are able to deploy fine-tuned models that outperform GPT for classification tasks. We have hundreds of such use cases at CollegeVine and are excited to partner with the Refuel team as we scale our agent platform for higher ed in 2025.”

– Chris Coffey, CTO at CollegeVine

‍



TeachFX: Refuel helps TeachFX ship AI features in 2 weeks instead of 2 months

