Evidnc·Founder, Designer & Engineer·2024

AI-Powered Survey Analysis

Turning 40 hours of qualitative research synthesis into 15 minutes, live at evidnc.ai

40hrs → 15min

research synthesis time

beta testers ran real sessions

Live

shipped at evidnc.ai

person design + engineering

Read the full research study

AI-powered survey analysis, 40 hours down to 15 minutes

Scroll

Problem

Qualitative researchers spend 40+ hours per project manually coding transcripts, finding themes, and building synthesis reports. Existing tools offered keyword search, which is useless for semantic meaning. A researcher saying "I feel lost" and another saying "the navigation confused me" are the same insight that share zero keywords. Every existing platform missed this.

Constraints

Solo build

One person owning research, design, frontend, and the embedding + clustering pipeline. Every feature had to justify its build cost.

Trust over accuracy

Researchers will not hand their data to a black box. The product had to earn trust before being allowed to help.

Ship, not prototype

The bar was a live, usable product, not a Figma flow. Every screen had to work against real transcripts, not lorem ipsum.

Key Insight

I assumed researchers wanted better organization tools: tags, folders, smart search. But talking to them revealed they wanted pattern detection. They weren't looking for help arranging what they already knew. They wanted the tool to surface things they'd missed. That reframed everything. The design decision and the architecture decision became the same decision: embeddings, not keywords.

Solution

Built a live AI platform that ingests research transcripts, generates embeddings, clusters them semantically, and surfaces patterns researchers would otherwise miss. The interface is designed around confidence and control: every cluster shows the quotes that formed it, the similarity scores, and a drag-to-reorganize interaction so the researcher is always the final decision-maker. Trust comes from visibility, not from hiding the machinery.

My Role

Sole designer and engineer
Owned research, product, UI, and the full technical stack
React frontend, embedding pipeline, semantic clustering algorithm
Recruited and ran 10 beta testers through real research synthesis tasks

Duration

Solo build, 2024. Live product.

Team Collaboration

Founder, Designer & Engineer

Impact

40hrs → 15min

research synthesis time

beta testers ran real sessions

Live

shipped at evidnc.ai

person design + engineering

The choices that shaped the product

Key Decisions

Decision 01

Embeddings over keywords

Themes in qualitative data are semantic, not lexical. "I feel lost" and "the navigation confused me" are the same insight sharing zero keywords. Every existing tool I looked at was solving this with TF-IDF or string matching: fast, cheap, and wrong. I built an embedding-based clustering pipeline so the system understands meaning, not words. This was simultaneously a design decision (dramatically better results for researchers) and an architecture decision (embeddings over TF-IDF). For Evidnc, those two decisions collapsed into one.

Decision 02

Show the AI's reasoning

Researchers don't trust black boxes, and they're right not to. I designed every cluster to expose its own evidence: the source quotes that formed it, the similarity scores between them, and a drag-to-reorganize affordance so the researcher can override the model at any point. The clustering UI isn't a magic box that spits out themes, it's a lens that shows its work. Trust came from visibility, not from accuracy claims.

Decision 03

Ship it, don’t prototype it

The easy version of Evidnc would have been a Figma prototype and a pitch deck. But researchers evaluate tools by running their actual data through them, not by imagining what a tool could do. I committed to shipping a live product (real ingest, real embeddings, real clustering, real export) before showing it to anyone. That bar forced every design decision to hold up against real transcripts, and it’s why 10 beta testers ran real sessions on their own research instead of walking through a canned demo.

Context

Qualitative researchers spend 40+ hours per project manually coding survey responses and interview transcripts. That means reading every quote, tagging themes, and reassembling insights into a synthesis report. It’s tedious, slow, and the most valuable part of the work (pattern detection) is the part humans are weakest at.

Every existing tool I evaluated solved this with keyword search or lexical tagging. But themes in qualitative data are semantic, not lexical. A respondent saying "I feel lost" and another saying "the navigation confused me" are expressing the same insight and share zero keywords. TF-IDF misses it. String matching misses it. Even "smart search" misses it, because it was never designed to detect meaning.

I built Evidnc as a live, usable product rather than a prototype, because researchers evaluate tools by running their real data through them, not by imagining what a tool could do.

Problem Statement

“Qualitative research synthesis is manual, slow, and blind to semantic similarity. Researchers need pattern detection, not better organization.”

Pain Points

40+ hours per synthesis

Manual coding of transcripts, theme-building, and report assembly dominates the researcher’s week.

Keyword search misses meaning

Existing tools rely on lexical match. Semantically identical quotes go uncounted because they share no words.

Black-box AI is worse

Off-the-shelf LLM summarizers produce confident answers with no traceable evidence. Researchers can’t defend findings to stakeholders.

Context switching kills flow

Researchers jump between transcript tool, spreadsheet, Miro board, and doc. Evidence lives in four places, synthesis lives nowhere.

Research

Sixteen weeks from market analysis to human truth, validated at every step.

Phase 01

Discovery calls with the ASU UX research team

Synthesis is not a single tool problem. It’s a context-switching problem across 4 to 5 disconnected tools.
The most time-consuming step is open-text coding, and it gets worse as researchers fatigue through long transcripts.
Researchers already distrust AI tools they’ve tried. Not because of accuracy, but because the tools couldn’t explain their own outputs when stakeholders asked "why did you categorize this that way?"
The real problem isn’t speed. It’s the gap between what the AI decided and what the researcher can defend.

Open-ended conversations with UX researchers on the ASU teamObserved how they synthesize survey and interview data todayCaptured the tool stack they patch together (Qualtrics, Atlas.ti, Miro, NVivo, Dovetail, Notion)

I used an AI tool to pre-code themes. When a stakeholder asked why a specific response was categorized the way it was, I had no answer. I had to fall back on manual review anyway.
UX Researcher, ASU discovery interview

Phase 02

Quantitative validation survey with 300 researchers

89.7%of respondents demand full traceability to source data. The strongest signal in the study (Q9).
92.7%say surfacing ambiguous or low-confidence cases would increase their trust in the tool (Q12).
76.9%validated the "AI transparency and trust gap" as the #1 composite pain point, outranking even the open-text coding burden.
77%said they’d pay for a transparent auto-coding tool. Uniform across UXRs, PMs, and VPs, with no segment differences.
72.2%feel a qual-quant disconnect. They can’t easily connect what respondents said to who said it. Universal, not segment-specific.
Large effect size on AI trust by segment (Cohen’s d = 0.91). PMs adopt AI faster (M=3.75) while UXRs (2.49) and VPs (2.48) need transparency before they’ll adopt at all.

15-question instrument: 12 Likert items plus 3 open-ended promptsUX Researchers, Product Managers, VP / CX LeadersAnalyzed with ANOVA and Cohen’s d to surface segment-level differences and effect sizesUsed to rank pain-point severity and validate hypotheses before locking the product direction

I’d automate first-pass categorization and confidence scoring. Give me a sorted list: high-confidence auto-codes that I approve in bulk, and low-confidence edge cases that I review one by one.
UX Researcher, validation survey

Phase 03

Beta testing with 10 researchers on live product

Researchers didn’t ask "how accurate is the clustering?" They asked "why did you group these together?" That validated the transparency-over-accuracy design call.
The drag-to-reorganize interaction was used more than any AI-suggested action. Control matters more than automation.
Running real transcripts surfaced edge cases no prototype would have: messy quotes, off-topic responses, language mixing. The product held up because it was live, not mocked.
The 40-hour-to-15-minute claim held across all 10 sessions on real datasets.

Recruited 10 researchers and gave them access to the live Evidnc productAsked them to run real research synthesis tasks on their own transcriptsObserved where they got stuck, what they over-trusted, and what they ignoredIterated the product between sessions based on recurring friction points

Design Pillars

The principles that guided every decision from sketches to ship.

Semantic over Lexical

Embeddings, not keywords. The system should understand that two sentences mean the same thing even when they share no words.

Transparency as Trust

Every AI output must expose its evidence. No black boxes. Clusters show source quotes, similarity scores, and let the researcher override them.

Researcher Stays in Control

The AI proposes; the researcher disposes. Drag-to-reorganize, manual override, and human-in-the-loop editing at every step.

Ship, Don’t Prototype

Real ingest, real embeddings, real export. The product had to hold up against real transcripts, not a curated demo dataset.

User Flows

Two roles, one shared system. Built so coordination feels invisible.

Thematic Analysis

“I want the tool to find the themes I’d miss reading line-by-line.”

Upload survey responses or interview transcriptsSystem generates semantic embeddings per responseClusters form automatically based on meaningReview, rename, or merge clusters with drag-to-reorganize

Semantic Search

“I want to search my data by idea, not by exact words.”

Type a concept or question in natural languageSystem ranks responses by semantic similarity, not keyword matchResults include quotes that share meaning but no vocabularyClick any result to jump to its source context

Question Analysis with Visuals

“I want to see how responses to one question break down.”

Pick any question in the surveySee themed breakdown of all responses to that questionVisual charts show distribution of sentiment and themesDrill into any segment to read source quotes

Quick AI Summary

“I need a one-paragraph summary I can drop into a report right now.”

Trigger summary generation on any cluster, question, or full datasetSystem generates a concise narrative with inline quote citationsEvery claim is backed by a linked source quoteCopy-paste straight into Notion, Docs, or a slide

Gallery

Reflections

The design decision was the architecture decision

Choosing embeddings over TF-IDF wasn’t just a technical optimization. It was the difference between a tool that works and a tool that doesn’t. On Evidnc, the product’s value and the backend’s architecture collapsed into a single call. Owning both roles made that call obvious. I suspect it would have been a months-long debate if design and engineering were separate.

Transparency beats accuracy

Researchers didn’t ask "how accurate is the clustering?" They asked "why did you group these together?" I stopped chasing a higher silhouette score and instead made every cluster explain itself. Trust came from visibility, not from metrics I could have put on a landing page.

Ship so researchers can test on their own data

A prototype would have let me show ten beta testers a pretty demo. Shipping a live product let them upload their own transcripts, and that’s when the feedback got useful. Every interesting thing I learned came from someone running their real work through it.

Next Case Study

Rebel Foods / EatSure·2020–2022

Group ordering is about belonging, not logistics

+25%total orders through group feature3xaverage order value vs individual2,323pilot orders, 40% from new users

View project

Problem

Solution