Parsing and extracting from resumes with LLMs

Schedule a Demo

August 1, 2024

Rishabh Bhargava
by
Rishabh Bhargava
Before and after using an LLM for lead qualification

Challenges with traditional resume parsing approaches


Resumes have traditionally proven to be difficult documents to parse and extract from. The underlying culprit is multifold:

1. Resumes come in multiple formats (PDF, image, Word, XML, etc.)

2. Jargon - keyword and terminology change between industry to industry

3. Lack of standardization - The same title or skill can take on different meanings depending on the context of the role

The combination of these unique challenges have led to traditional “rules-based” parsers to fall short. In fact, a recent study found that traditional Application Tracking Systems (ATS), algorithms and rules-based parsers were only able to attain 60-70% accuracy.

The consequences of this shortcoming are significant. Today, most recruiters and HR systems heavily rely on resume parsing — without cleanly parsed data, automated systems and recruiters will do a poor job at assessing candidate fit via resumes, leading to talent mismatch, lost opportunities, and wasted effort.

How should I parse resumes instead?

Refuel makes it easy to parse and extract from resumes in bulk, in a fast and cost effective manner.

Database of resume data and PDF links

STEP 1: Point Refuel to the resume PDF links in S3 or upload a CSV of raw resume data

Input fields for specifying task context

STEP 2: Describe the context of the at hand in natural language and select the model that best suits your task. In our case, our context is "You are an expert in HR and reading and parsing resumes".

Specifying output columns within Refuel task

STEP 3: Specify the output columns. Output columns are new fields we’d like to show up within our data set by extracting from the resume. In our case, let’s define 4 output columns - education, work experience, location, and skills.

Specifying task guidelines for a given column

STEP 4: For each field, we'll also write down the guidelines for how to extract information in plain and simple natural language. For example, for the education column:

"You will be provided the extracted resume text from a candidate resume. Your job is to extract all the education related fields into a list of JSONs, mapped as closely from the actual resume data as possible."

Output columns produced from running task

STEP 5: Once we save the task and let it run on a sample of the data, we'll yield an output like one pictured above. Let's take a look at one of the examples.

Confidence score for tas output

STEP 6: For each record, you can assess the confidence score for the label, and if necessary, ask for an explanation on why a particular output was generated.

Providing task output feedback

STEP 7: In the off chance that a label is incorrect, you can edit and provide the record feedback to automatically improve the performance of the next data point.

Deploying task as an API endpoint

STEP 8: Once you’re happy with the performance, you can hit a single button to deploy as a live endpoint that can parse/extract resumes in production and automatically scale based on your data volumes

You can watch a video walkthrough of the steps here:

What are the outcomes of LLM based resume parsing?


With Refuel, and an LLM based approach, we are able to achieve:

1. Higher accuracy: 95% vs 60-70% by way of ATS systems and traditional parsers.

2. Significant time savings: Building resume parsers requires many months of engineering effort - leveraging LLMs, especially custom models like Refuel-LLM-2 shortens this task to 2 days.

3. Flexible customizable output schema: Get data exactly in the structure and format you want, and even enrich it with additional information from the internet.

4. Significant cost savings: Most resume parsing services charge $0.08 - $0.10 per resume parsed. Refuel can parse at a fraction of the cost.