Evidnc·Founder, Designer & Engineer·2024
AI-Powered Survey Analysis
Turning 40 hours of qualitative research synthesis into 15 minutes, live at evidnc.ai
40hrs → 15min
research synthesis time
10
beta testers ran real sessions
Live
shipped at evidnc.ai
1
person design + engineering

Problem
Qualitative researchers spend 40+ hours per project manually coding transcripts, finding themes, and building synthesis reports. Existing tools offered keyword search, which is useless for semantic meaning. A researcher saying "I feel lost" and another saying "the navigation confused me" are the same insight that share zero keywords. Every existing platform missed this.
Constraints
Solo build
One person owning research, design, frontend, and the embedding + clustering pipeline. Every feature had to justify its build cost.
Trust over accuracy
Researchers will not hand their data to a black box. The product had to earn trust before being allowed to help.
Ship, not prototype
The bar was a live, usable product, not a Figma flow. Every screen had to work against real transcripts, not lorem ipsum.
Key Insight
I assumed researchers wanted better organization tools: tags, folders, smart search. But talking to them revealed they wanted pattern detection. They weren't looking for help arranging what they already knew. They wanted the tool to surface things they'd missed. That reframed everything. The design decision and the architecture decision became the same decision: embeddings, not keywords.
Solution
Built a live AI platform that ingests research transcripts, generates embeddings, clusters them semantically, and surfaces patterns researchers would otherwise miss. The interface is designed around confidence and control: every cluster shows the quotes that formed it, the similarity scores, and a drag-to-reorganize interaction so the researcher is always the final decision-maker. Trust comes from visibility, not from hiding the machinery.
My Role
- Sole designer and engineer
- Owned research, product, UI, and the full technical stack
- React frontend, embedding pipeline, semantic clustering algorithm
- Recruited and ran 10 beta testers through real research synthesis tasks
Duration
Solo build, 2024. Live product.
Team Collaboration
Founder, Designer & Engineer
Impact
40hrs → 15min
research synthesis time
10
beta testers ran real sessions
Live
shipped at evidnc.ai
1
person design + engineering
The choices that shaped the product
Key Decisions
Decision 01
Embeddings over keywords
Themes in qualitative data are semantic, not lexical. "I feel lost" and "the navigation confused me" are the same insight sharing zero keywords. Every existing tool I looked at was solving this with TF-IDF or string matching: fast, cheap, and wrong. I built an embedding-based clustering pipeline so the system understands meaning, not words. This was simultaneously a design decision (dramatically better results for researchers) and an architecture decision (embeddings over TF-IDF). For Evidnc, those two decisions collapsed into one.

Decision 02
Show the AI's reasoning
Researchers don't trust black boxes, and they're right not to. I designed every cluster to expose its own evidence: the source quotes that formed it, the similarity scores between them, and a drag-to-reorganize affordance so the researcher can override the model at any point. The clustering UI isn't a magic box that spits out themes, it's a lens that shows its work. Trust came from visibility, not from accuracy claims.

Decision 03
Ship it, don’t prototype it
The easy version of Evidnc would have been a Figma prototype and a pitch deck. But researchers evaluate tools by running their actual data through them, not by imagining what a tool could do. I committed to shipping a live product (real ingest, real embeddings, real clustering, real export) before showing it to anyone. That bar forced every design decision to hold up against real transcripts, and it’s why 10 beta testers ran real sessions on their own research instead of walking through a canned demo.

Context
Qualitative researchers spend 40+ hours per project manually coding survey responses and interview transcripts. That means reading every quote, tagging themes, and reassembling insights into a synthesis report. It’s tedious, slow, and the most valuable part of the work (pattern detection) is the part humans are weakest at.
Every existing tool I evaluated solved this with keyword search or lexical tagging. But themes in qualitative data are semantic, not lexical. A respondent saying "I feel lost" and another saying "the navigation confused me" are expressing the same insight and share zero keywords. TF-IDF misses it. String matching misses it. Even "smart search" misses it, because it was never designed to detect meaning.
I built Evidnc as a live, usable product rather than a prototype, because researchers evaluate tools by running their real data through them, not by imagining what a tool could do.
Problem Statement
“Qualitative research synthesis is manual, slow, and blind to semantic similarity. Researchers need pattern detection, not better organization.”
Pain Points
40+ hours per synthesis
Manual coding of transcripts, theme-building, and report assembly dominates the researcher’s week.
Keyword search misses meaning
Existing tools rely on lexical match. Semantically identical quotes go uncounted because they share no words.
Black-box AI is worse
Off-the-shelf LLM summarizers produce confident answers with no traceable evidence. Researchers can’t defend findings to stakeholders.
Context switching kills flow
Researchers jump between transcript tool, spreadsheet, Miro board, and doc. Evidence lives in four places, synthesis lives nowhere.
Research
Sixteen weeks from market analysis to human truth, validated at every step.
Phase 01
Discovery calls with the ASU UX research team
- Synthesis is not a single tool problem. It’s a context-switching problem across 4 to 5 disconnected tools.
- The most time-consuming step is open-text coding, and it gets worse as researchers fatigue through long transcripts.
- Researchers already distrust AI tools they’ve tried. Not because of accuracy, but because the tools couldn’t explain their own outputs when stakeholders asked "why did you categorize this that way?"
- The real problem isn’t speed. It’s the gap between what the AI decided and what the researcher can defend.
I used an AI tool to pre-code themes. When a stakeholder asked why a specific response was categorized the way it was, I had no answer. I had to fall back on manual review anyway.
UX Researcher, ASU discovery interview
Phase 02
Quantitative validation survey with 300 researchers
- 89.7%of respondents demand full traceability to source data. The strongest signal in the study (Q9).
- 92.7%say surfacing ambiguous or low-confidence cases would increase their trust in the tool (Q12).
- 76.9%validated the "AI transparency and trust gap" as the #1 composite pain point, outranking even the open-text coding burden.
- 77%said they’d pay for a transparent auto-coding tool. Uniform across UXRs, PMs, and VPs, with no segment differences.
- 72.2%feel a qual-quant disconnect. They can’t easily connect what respondents said to who said it. Universal, not segment-specific.
- Large effect size on AI trust by segment (Cohen’s d = 0.91). PMs adopt AI faster (M=3.75) while UXRs (2.49) and VPs (2.48) need transparency before they’ll adopt at all.
I’d automate first-pass categorization and confidence scoring. Give me a sorted list: high-confidence auto-codes that I approve in bulk, and low-confidence edge cases that I review one by one.
UX Researcher, validation survey
Phase 03
Beta testing with 10 researchers on live product
- Researchers didn’t ask "how accurate is the clustering?" They asked "why did you group these together?" That validated the transparency-over-accuracy design call.
- The drag-to-reorganize interaction was used more than any AI-suggested action. Control matters more than automation.
- Running real transcripts surfaced edge cases no prototype would have: messy quotes, off-topic responses, language mixing. The product held up because it was live, not mocked.
- The 40-hour-to-15-minute claim held across all 10 sessions on real datasets.
Design Pillars
The principles that guided every decision from sketches to ship.
Semantic over Lexical
Embeddings, not keywords. The system should understand that two sentences mean the same thing even when they share no words.
Transparency as Trust
Every AI output must expose its evidence. No black boxes. Clusters show source quotes, similarity scores, and let the researcher override them.
Researcher Stays in Control
The AI proposes; the researcher disposes. Drag-to-reorganize, manual override, and human-in-the-loop editing at every step.
Ship, Don’t Prototype
Real ingest, real embeddings, real export. The product had to hold up against real transcripts, not a curated demo dataset.
User Flows
Two roles, one shared system. Built so coordination feels invisible.
Thematic Analysis
“I want the tool to find the themes I’d miss reading line-by-line.”

Semantic Search
“I want to search my data by idea, not by exact words.”

Question Analysis with Visuals
“I want to see how responses to one question break down.”

Quick AI Summary
“I need a one-paragraph summary I can drop into a report right now.”

Gallery



Reflections
The design decision was the architecture decision
Choosing embeddings over TF-IDF wasn’t just a technical optimization. It was the difference between a tool that works and a tool that doesn’t. On Evidnc, the product’s value and the backend’s architecture collapsed into a single call. Owning both roles made that call obvious. I suspect it would have been a months-long debate if design and engineering were separate.
Transparency beats accuracy
Researchers didn’t ask "how accurate is the clustering?" They asked "why did you group these together?" I stopped chasing a higher silhouette score and instead made every cluster explain itself. Trust came from visibility, not from metrics I could have put on a landing page.
Ship so researchers can test on their own data
A prototype would have let me show ten beta testers a pretty demo. Shipping a live product let them upload their own transcripts, and that’s when the feedback got useful. Every interesting thing I learned came from someone running their real work through it.
