Best LLM in the world for unsexy data tasks
On a benchmark of 30 data labeling and enrichment tasks, RefuelLLM-2 (83.82%) outperforms all current state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%), and Gemini-1.5-Pro (74.59%).
RefuelLLM-2-small (79.67%) outperforms all comparable LLMs including Claude-3-Sonnet (70.99%), Haiku (69.23%), and GPT-3.5-Turbo (68.13%).