If you are a job seeker and you go to most job boards these days, you will be asked to do one of three things:
- Enter your resume information section by section
- Upload a file containing your resume
- Open your resume in some other program (Word, vi, Pages, et al), then cut and paste the resume text into the appropriate section on the web site
All of these approaches get the data into the resume database, but some of the approaches are better than others.
- Entering resume data piece by piece is tedious, but it results in clean and neatly segmented data. Which is quite helpful in making sure that your resume appears in search results in the best way possible. From a data management perspective, this is the preferred approach; not so much for the job seeker.
- Uploading a file has the advantage of being quick but is also quite dirty. In general. it’s a bit hard for the website to break out the data in a regular and repeatable fashion. This is particularly true as quite a few people use their own personal and unique style of organizing data. If you’ve ever uploaded a resume and wondered why the name of your university became “Sept 2004-Dec 2008″ instead of “UCLA”, you’ve run afoul of an automated process tripping up on the format of your resume.
- The worst of all worlds: first, the job seeker is required to take extra steps to open her resume and then manually copy the data into the website. Next, the website still has to work on dividing the resume text into meaningful groupings — this time without the helpful assistance of format cues embedded in the resume file from option #2.
Well, the overall process is generally the same, but the specifics depend on the specific format of the file. All resume processing sites parse the resume file, attempting to allocate the resume data into neat buckets — this block of words are all associated with the work experience at firm “X” during dates “Y”-”Z”, but that block of text is about the time spent at university “A” majoring in “B” from date “C”-”D”.
Depending on how the document is formatted, the format itself — particularly with an XML variant like .docx — may contain clues within the structure of the data relating to the content of the data. In other words, metadata. While the generic overall schema specific to .docx (the MS Word format for Office 2007 and later) is not even vaguely resume oriented, the general convention amongst most resume writers is to use various formatting options (bold, italics, underline, white spaceetc.) as signifiers for changes between resume data elements.
Pretty much any format into which a resume can be saved can also be parsed. Some formats (text, docx) are easier than others (PDF, I’m looking at you); in general, it’s a matter of how much work it will take to find and correctly identify the sections of data. In the next post of this series, we’ll go into further detail for the most common resume storage formats, their associated pluses and minuses and other assorted details.
This is the first of several posts about the technical details on how resumes are commonly stored in the online world.
on Shuffling Paper
, the RE