Resumes have traditionally proven to be difficult documents to parse and extract from. The underlying culprit is multifold:
1. Resumes come in multiple formats (PDF, image, Word, XML, etc.)
2. Jargon - keyword and terminology change between industry to industry
3. Lack of standardization - The same title or skill can take on different meanings depending on the context of the role
The combination of these unique challenges have led to traditional “rules-based” parsers to fall short. In fact, a recent study found that traditional Application Tracking Systems (ATS), algorithms and rules-based parsers were only able to attain 60-70% accuracy.
The consequences of this shortcoming are significant. Today, most recruiters and HR systems heavily rely on resume parsing — without cleanly parsed data, automated systems and recruiters will do a poor job at assessing candidate fit via resumes, leading to talent mismatch, lost opportunities, and wasted effort.
Refuel makes it easy to parse and extract from resumes in bulk, in a fast and cost effective manner.
STEP 1: Point Refuel to the resume PDF links in S3 or upload a CSV of raw resume data
STEP 2: Describe the context of the at hand in natural language and select the model that best suits your task. In our case, our context is "You are an expert in HR and reading and parsing resumes".
STEP 3: Specify the output columns. Output columns are new fields we’d like to show up within our data set by extracting from the resume. In our case, let’s define 4 output columns - education, work experience, location, and skills.
STEP 4: For each field, we'll also write down the guidelines for how to extract information in plain and simple natural language. For example, for the education column:
"You will be provided the extracted resume text from a candidate resume. Your job is to extract all the education related fields into a list of JSONs, mapped as closely from the actual resume data as possible."
STEP 5: Once we save the task and let it run on a sample of the data, we'll yield an output like one pictured above. Let's take a look at one of the examples.
STEP 6: For each record, you can assess the confidence score for the label, and if necessary, ask for an explanation on why a particular output was generated.
STEP 7: In the off chance that a label is incorrect, you can edit and provide the record feedback to automatically improve the performance of the next data point.
STEP 8: Once you’re happy with the performance, you can hit a single button to deploy as a live endpoint that can parse/extract resumes in production and automatically scale based on your data volumes
You can watch a video walkthrough of the steps here:
With Refuel, and an LLM based approach, we are able to achieve:
1. Higher accuracy: 95% vs 60-70% by way of ATS systems and traditional parsers.
2. Significant time savings: Building resume parsers requires many months of engineering effort - leveraging LLMs, especially custom models like Refuel-LLM-2 shortens this task to 2 days.
3. Flexible customizable output schema: Get data exactly in the structure and format you want, and even enrich it with additional information from the internet.
4. Significant cost savings: Most resume parsing services charge $0.08 - $0.10 per resume parsed. Refuel can parse at a fraction of the cost.