I Built an AI Version of Myself and It Didn't Know Me at All

I'm doing quite a bit of job searching right now. I use hiring.cafe by choice and LinkedIn out of necessity. LinkedIn's search tools leave a lot to be desired and there's a lot of noise to filter through, but that's where most of the hirers are. Needs must when the devil drives, and all that.

I also follow a lot of groups and subreddits on job searching and hiring just to keep up with trends. It dawned on me after reading through the usual complaints about the current job market that AI has really taken the human out of the equation, and that's led to a couple of disturbing trends from a job seeker's standpoint. One of them is the AI interview: some employers are starting to use AI to do first-level screenings of candidates. As you might imagine, this does not sit well with the job seeker community. They already have to get their resume through ATS systems and algorithmic filters, and now even if they clear that bar their next hurdle is an AI screener?! But that's a different post.

This whole situation got me thinking that I could flip that model on its head. So I built an AI interviewee. Essentially a chatbot, but trained on being me. Go to sturgeon.us/chat and start asking questions you'd ask in a recruiter screening. It draws on a corpus of career and biographical information gleaned from my historical resumes, skillsets, management philosophies and job preferences. The project is called Interview ShAIne. The misspelling is intentional - a little silly and punny, I know, but it sticks.

I got the prototype up in about a day, but it was a mess. Not sloppy-coding type of mess. Fundamentally-not-representing-who-I-am type of mess.

What broke first

The initial corpus was mostly resume. I had a pile of files going back years: resumes in different formats, certifications, a couple of reference letters. I dumped them into an inbox folder, used Claude Code to help de-dupe the overlapping content, ran the ingestion script, and started testing.

Early results were fine for direct factual questions. Dates, titles, companies, the model pulled those accurately because they were right there in the content. But when I asked it about the kind of work I want to do, the answers were flat. Generic. Occasionally wrong, or outside the scope of what it could answer at all.

The one that stopped me cold: the tool said I lacked mission-driven job experience.

If you know anything about my career, that's almost exactly backwards. I spent years in healthcare: CareSource, Ad Hoc, CMS and Healthcare.gov. Before that, six years managing the app portfolio for Dayton Public Schools, which was some of the most rewarding work I've done. The work I'm most proud of has real consequences for real people: Medicaid members, ACA marketplace participants, teachers and students navigating systems that don't always make it easy. When I'm evaluating a new opportunity, mission is near the top of the list. It's a filter, not a nice-to-have.

The model didn't know any of that. Not because it couldn't understand it, but because I'd never written it down in a way that made it retrievable. My resumes listed job titles and technical accomplishments. They didn't say: this is what I care about and why.

From Curriculum Vitae to Corpus Vitae

For years the model has been: the resume gets you the interview, then you talk through everything that didn't make the cut. I think that paradigm is shifting. It's always been drilled into us to keep the resume brief, one or two pages, something that can be scanned by a recruiter and has enough highlights to get you that first call. With ATS systems and AI beginning to filter at the screening layer, that approach makes less and less sense.

We haven't quite graduated to replacing the Curriculum Vitae with a Corpus Vitae yet. But thinking of it in those terms led me exactly where I needed to go with this tool. Not just the Course of Life, but the Body of Life. Everything that doesn't fit on two pages but actually explains who you are.

RAG (retrieval-augmented generation, the pattern behind how this tool works) is only as good as what you've given it to retrieve. The model isn't going to infer your values from your job titles. It reflects back exactly what you put in, interpreted through whatever framing you gave it.

So I started writing things I'd never formally written down. What I look for in a role. What I won't do again. Why healthcare. How I think about management. What it's like to work with me, in the words of people who have. I then asked Claude to act as a Talent Acquisition Specialist and interview me for a Senior Software Engineering Manager role. We went through it for about half an hour, probably a dozen questions, occasionally diving into details when prompted. When done, I loaded that transcript into the corpus and the results were surprising. Not only did it cover a lot of the questions that typically come up in that kind of screening call, but it also surfaced things about my work history that would never have appeared in such detail on a resume: reasons for leaving each job, what I enjoyed about each one, the challenges I faced and so on.

The job-preferences.md file I ended up with is now one of the most important documents in the corpus. It covers things no resume would ever say: what organizational structures I thrive in, what red flags I've learned to spot in job descriptions, why I weigh mission more than compensation. That file gets injected into every query. Not optional context, fixed context, always present.

When I started, the corpus was about 56 chunks of 600 characters each. After working through the gaps methodically over a week, it's now over 200, covering everything from early-career technical work to management philosophy to how I handle specific leadership situations. The size matters less than the coverage: a thousand chunks about your job history won't help if nobody ever wrote down what you actually believe. The biggest single jump was a full rewrite of job-preferences.md, which went from about 3,000 characters to over 32,000 in a single session and is now roughly a quarter of the entire corpus.

Teaching it my voice

There's a piece of this that goes beyond content. The interview transcript did something I didn't fully anticipate: it didn't just add more facts, it added cadence. How I transition between topics. How much context I give before getting to the point. When I deflect a question versus when I lean into it. Even the punctuation I use, or don't use (cough em-dash cough).

A chatbot that knows your job history but answers in flat corporate prose isn't really representing you. It's a glorified FAQ. Getting the voice right meant giving it examples of how I actually talk, not just what I've done. That's why the interview transcript mattered as much as the resume content.

It's still not perfect. Voice is the hardest thing to capture in a corpus because it's distributed across everything, not contained in any one document. But it's directionally right, and the missed queries (more on those below) help identify where the tone goes flat.

The problem with honesty

Once the chatbot was working reasonably well, I wanted a second tool: something a hiring manager or recruiter could use to evaluate fit before a call. Paste in a job description, get a graded scorecard across several categories. Honest assessment, specific strengths and gaps.

I already had the corpus, so the hard part was done. I built the JD Fit tool. It worked, but the overall grades were consistently in the B-minus to C-plus range.

That's a problem, not because the grades were wrong, but because the grading scale was broken. On a traditional A-through-F scale, a B-minus sounds mediocre and a C-plus sounds like almost-failing. But that's not what those grades meant in context. A B-minus from this tool meant strong candidate with one real gap worth discussing. A C-plus meant solid alignment, worth a conversation.

The scale carried 30 years of academic baggage that made honest results look like bad results. So I invented new language for it:

C = could work, but needs clarity. B = strong fit. A = hire immediately.

That framing is now on every results page. The grades didn't change, just the frame around them.

This is still an open problem, honestly. Most AI tools in the hiring space are promotional by design. They're built to make the candidate look good, which makes them useless for actual evaluation. When this tool flags a real gap, "this role requires someone who contributes production code, and Shane has clearly stated he doesn't do that as a manager," that's the tool working correctly. But it requires the reader to trust the scale, and that trust has to be built.

I'm currently working on adding distribution context: where do most analyses actually land? If 90% of results come back B or C and 5% come back A, that changes how you read any individual result. Grading on a curve is a reasonable UX problem to solve.

What the missed queries taught me

One feature I added early and got a lot from: when a question comes in and the similarity search returns nothing above threshold, the query gets logged to a missed_queries table and I get an email from ShAIne. It basically means a recruiter asked a question I couldn't answer, which could mean one of several things: I need to expand the corpus, or there are some angles I haven't considered, or someone's just trying to jailbreak the chatbot. No matter the reason, it's a good signal, and the potential for continuous improvement is high.

At first I was treating these as failures. But reading through the logs, I realized they were just as valuable as the hits. The missed queries were a map of what I hadn't explained about myself: questions I'd never thought to address, assumptions people bring into a screening call that I'd never proactively documented.

Every non-answer was a potential gap in the corpus. Every gap in the corpus was a gap in how I'd described my own experience. Fixing them made the tool better. It also made me think harder about questions I'd been giving rote answers to for years.

One more thing the data unlocked

Retaining the results opened up something I hadn't planned for. Since the analysis data was being stored anyway, it made sense to generate a shareable link for each report, something that lives for a period of time (180 days for now) and can be sent directly to a hiring manager or recruiter. The volume is low enough that storage is trivial, and having a stable URL for each result means the tool can be used as part of an actual application process, not just a personal gut-check.

I also set up an email notification any time a report is generated: company name, job title, overall grade, category breakdown, and a link to the permanent report. It's become a useful daily signal of how the tool is being used and whether the corpus is holding up against real job descriptions.

The next post in this series covers the actual architecture and some detailed design decisions: how the RAG pipeline works, the chunking and embedding strategy, how the JD Fit scoring is weighted, and what I'd do differently. If you want to dig into the technical side, that's next.

The tools are live now if you want to try them: Interview ShAIne and the JD Fit Analyzer. Development is largely complete but both are still being trained. The corpus improves regularly and the scoring is still being refined.

More posts coming. Subscribe for updates or get in touch if you have questions.