A guide to improving marketplace search, data quality, and onboarding with LLMs

Before and after using an LLM for lead qualification

Introduction to LLMs and Marketplaces

Refuel solves messy enterprise data tasks, and marketplaces have some of the most interesting and sophisticated data challenges.
‍

Over the last year, Refuel has worked with a number of marketplaces across a range of focus areas and scale, to demonstrate how LLMs can transform key workflows for supplier onboarding, product catalog quality, and trust & safety to reduce costs and increase GMV while improving overall marketplace health.
‍

We highlight our learnings below on the problems experienced by marketplaces that can be addressed with LLMs, and how Refuel can uniquely address those problems compared to other LLM providers.
‍

Walmart’s CEO, Doug McMillon, described how using LLMs for these sorts of applications would work at scale:

‍

‍

Key Operational Challenges for Marketplaces

Marketplaces connect buyers and sellers with varying degrees of involvement in the process. Dan Hockenmaier, Chief Strategy Officer at Faire, summarizes a useful framework in his essay on the future of marketplaces.
‍

On one end, we have light marketplaces, where the degree of involvement of the marketplace is minimal, leaving it up to the buyer to understand what they are purchasing.
‍

Some more managed marketplaces take it upon themselves to establish trust by verifying supply and providing mechanisms for review and feedback, and taking on major components of the supplier cost structure.
‍

Finally, some vertically integrated marketplaces take on and provide all dimensions of the supply (e.g., by hiring or partnering) and keeping 100% of the economics.
‍

While the structure and degree of involvement of each marketplace varies, there are some common critical workflows across most marketplaces, as highlighted in this essay by Bart Dessaint and Wass Ayouch.
‍

These workflows have a high degree of human-involvement in the process, leading to high operational cost:
‍

Onboarding - Onboarding for suppliers remains a human-intensive process, with several touch points with the suppliers to verify their information and an eye towards the quality of the data they are providing and potential fraud. The human-intensive nature of the process makes the cost of acquiring the supply high, and can lead to more reactive approaches to trust & safety issues.
‍
Data Quality for Display - The quality of data provided by suppliers for display to consumers may not meet the standards that the marketplace aims to uphold. This includes both text descriptions as well as images provided. The existing process to assess the quality of data requires human evaluation, and flag suppliers (or sometimes even buyers!) whose data may not meet the standards. Even minute improvements in description quality can lead to a dramatic impact in conversion rates and associated revenue.
‍
Search & Relevance - Product tags are often used by search engines when users search for products, relying on precise keyword matching. Unfortunately, due to data quality issues, the product tags may not match what users are searching for, even if the underlying intent is to look for that same product. This leads users to endlessly scroll through irrelevant search results, instead of being shown the most relevant product
‍

The challenges highlighted above become even more pronounced as the supplier side scales, given the reliance on humans for these processes and the difficulty & cost of scaling humans.
‍

Fortunately, the recent advances in LLMs and broadly AI offer an opportunity to automate major parts of these processes and enable organizations to scale these workflows across a larger set of suppliers economically and quickly.
‍

How LLMs can transform Marketplace Workflows

LLMs can have a transformative impact in scaling key marketplace workflows by automating key aspects of the workflows. LLMs can streamline supplier verification, detect fraud during the onboarding process, facilitate high-fidelity product tagging and perform automated content review:
‍

Supplier Verification - Traditionally, verifying supplier information such as business registration and compliance information can be a manual and time-consuming process. LLMs can automate this verification by parsing relevant parts of supplier information and comparing them against the appropriate databases or sources.
‍
Fraud Detection during Onboarding - Assessing for fraud requires a careful analysis of information provided by the supplier, and an assessment of patterns that have historically been associated with the fraud. LLMs can analyze the textual data provided by the supplier and assess it against common patterns associated with fraud. Additionally, LLMs can look to ensure that there aren’t any prohibited items or IP infringements in the image and text information provided.
‍
High-Fidelity Product-Tagging - Traditionally, product tags may be specifically provided by the supplier or human annotators may add the tags based on the text and image information provided by the supplier. In either case, the fidelity of these tags is not high and sufficient enough for either display or search. LLMs can more effectively analyze product description and images to generate high-fidelity product tags, including tags that have semantically similar meanings. This provides a more comprehensive set of tags for display and search.
‍
Automated Content Review and Personalization - Beyond product tagging, it is essential for the content provided by suppliers to be of sufficient quality and accurate in order for buyers to be able to identify products they are looking for. Once quality has been ensured, it can be further personalized on an individual consumer level to drive a higher conversion rate and provide a more bespoke customer experience.
‍

Shortcomings in Existing LLM Approaches

General-purpose LLMs from existing providers are readily available, but they suffer from three key shortcomings:
‍

Accuracy - Given the general-purpose nature of these LLMs, they can provide a reasonable level of accuracy for a range of tasks. However, for tasks that require high accuracy on a consistent basis, off the shelf LLMs will be insufficient to achieve that level of accuracy. Moreover, there is a risk of hallucinations.
‍
Cost - Given the volume-based pricing and high cost per use, the overall annual cost of using these LLMs can become prohibitively high.
‍
Time to production - Setting up the infrastructure, workflows, and managing the process of fine tuning an LLM to high levels of accuracy can take months at a time, requiring significant investments in manpower and resources
‍

Refuel’s Solution

Refuel addresses all of these challenges, making it an attractive option for marketplaces to use for their use cases:
‍

Accuracy - Refuel trains LLM models that are specifically designed for a particular use case. This involves us choosing the most appropriate LLM model for your use case, fine-tuning it with examples of your use case and providing a data engine for ongoing improvements (see Figure 1 below). This yields exceptionally high accuracy for your task, making the use of LLMs feasible for your task. Moreover, Refuel’s LLMs offer no risk of hallucination, as outputs are given from a provided taxonomy.
‍
Cost - Our models are available at a fraction of the cost of using a general purpose, state-of-the-art LLM
‍
Time - Refuel cuts time to product from months to hours, by providing all of the infrastructure and workflows to fine tune a model to very high levels of accuracy specifically for marketplace and catalog data tasks.
‍
‍

‍

Impact of Refuel Solution

Refuel has worked with a range of marketplace customers, which have experienced a demonstrated impact on both operational and financial metrics.
‍

Case Study 1: Marketplace ($5B+ revenue)

Challenges: Our customer was a marketplace that was looking to automate the assessment of data quality from suppliers, including textual product descriptions and images.
‍

Refuel Solution: Refuel created a fine-tuned LLM to flag the quality of data from suppliers based on the marketplace’s guidelines. Additionally, Refuel was able to produce suggestions to improve the quality data in instances where the existing data was of low quality.
‍

Impact:

Faster onboarding (2.5 months to 1 week)
‍
Reduced operational effort by 80% associated with the workflows for onboarding, fraud verification, data quality and content review
‍
Higher trust & safety with customers with more proactive detection of data quality issues and fraud

Refuel allowed this customer to dramatically increase the number of platform transactions while improving margin.
‍

Case Study 2: E-commerce Marketplace (200M+ product catalog size)

Challenges: Our customer was a marketplace that gets product catalog data from a large number of sellers, which was not normalized. A key attribute to normalize is size, as user recommendations cannot be made without accurate information on size.

Refuel Solution: The marketplace shared their unique product catalog items with different size values, and asked Refuel to normalize them to ~50 size values across kid’s / women’s / men’s clothing and footwear. Refuel trained a custom LLM for this task.

Impact: Refuel’s custom LLM increased accuracy of predicting the size to 87% from 46%. This provided them with months of time-savings and significant downstream value with their users

Remarks for the CTO: “We have a product catalog of 200M+ items and ensuring clean, structured data is an ongoing challenge. Using Refuel’s customized LLMs, we were able to label millions of items and improve accuracy on a key attribute from 46% to 87%. What would have taken us months, only took a few days with Refuel.”
‍

Curious to learn more?

We’d love to chat about you marketplace catalog quality and workflows, and chat through addressing paint points via LLMs.
Click here to schedule a demo.



How to score and qualify leads with LLMs

