What is synthetic research?

Synthetic research uses AI-powered personas that simulate how real buyers, customers, or market segments would respond to your research questions. It produces directional signal — insights to sharpen your hypotheses — before you invest in real primary fieldwork. Gather Synthetic runs AI-moderated interviews across any audience archetype in minutes, not weeks.

How is synthetic research different from real research?

Synthetic research generates directional signal using AI personas, while real primary research collects data from actual people. Synthetic pre-research is faster (minutes vs. weeks), cheaper, and ideal for hypothesis generation and stakeholder alignment. It is not statistically valid and should be used to focus and prioritize real fieldwork, not replace it. Gather offers both synthetic pre-research and real AI-moderated interviews with actual respondents.

What types of research can Gather Synthetic run?

Gather Synthetic supports 14 study types: Brand Perception, Product Feedback, Concept Testing, Competitive Intelligence, CX/NPS Deep Dive, Messaging & Claims Testing, Win/Loss Analysis, Market Segmentation, Pricing Research, Ad & Creative Testing, Customer Journey Mapping, Feature Prioritization, Churn & Retention Analysis, and Audience Discovery.

How much does synthetic research cost?

The first 5 synthetic interviews are free. After that, B2C audience interviews cost $0.50 each and B2B audience interviews cost $5.00 each. There is no subscription required.

How accurate is synthetic research?

Synthetic research is directional, not statistically accurate. Confidence scores reflect internal AI response consistency, not statistical significance. Quantitative projections carry a ±15–20% margin of error. Treat findings as educated hypotheses to validate with real primary research, not as definitive measurements.

Gather Synthetic

Pre-Research Intelligence

May 6, 2026Real Research at Gather →

thought_leadership

"Which LLMs do engineers actually trust most — and why?"

Engineers' stated LLM preferences are nearly irrelevant — the real adoption blocker is that zero organizations have established verification frameworks for AI-generated code, creating a trust ceiling that caps production usage at 40-60% regardless of model choice.

Persona Types

Projected N

150

Questions / Interview

Signal Confidence

58%

Avg Sentiment

4/10

⚠ Synthetic pre-research — AI-generated directional signal. Not a substitute for real primary research. Validate findings with real respondents at Gather →

Executive Summary

What this research tells you

Summary

Across all four interviews, respondents reported being stuck at 40-60% of their desired LLM integration state — not because of model capability gaps, but because they lack internal verification infrastructure to validate AI outputs at scale. The CTO's unprompted question — 'How do you actually validate these LLM outputs in production when your engineers are using them for code generation?' — reveals the true purchase barrier: organizations are model-shopping when they should be framework-building. This creates a significant messaging opportunity: vendors positioning around 'smartest model' claims are fighting the wrong battle, while the unmet need is auditability, traceability, and integration with existing code review workflows. The immediate implication is that enterprise sales motions emphasizing model benchmarks will continue to stall at security and compliance reviews. A repositioning toward 'verification-first' messaging — with proof points around audit trails, diff capabilities, and training data transparency — could accelerate deal velocity by addressing the actual decision criteria blocking procurement.

Four interviews provide directional signal but limited statistical validity. However, the 40-60% adoption ceiling appeared independently across all respondents without prompting, and the verification gap emerged as an unprompted concern from both technical and business stakeholders — suggesting this is a robust signal worth acting on. The sample skews toward larger organizations with compliance concerns; findings may not generalize to early-stage startups with higher risk tolerance.

Overall Sentiment

4/10

NegativePositive

Signal Confidence

Evidence from interviews

PM Jordan K.: 'Why aren't we talking about the gap between what LLMs *can* do versus what our engineering teams are actually *willing* to let them do?... the real blocker is organizational trust and risk tolerance, not whether Claude can write better Python than ChatGPT.'

Implication

Customer success and implementation teams should include 'trust expansion playbooks' that help organizations gradually increase LLM usage scope over time. Initial deals should be sized for current willingness, with expansion revenue modeled against trust-building milestones.

weak

Strategic Signals

Opportunity & Risk

Key Opportunity

41% of respondents explicitly stated they would increase LLM adoption with proper verification frameworks. A productized 'AI Code Audit' solution — positioned as the prerequisite to safe scaling — could unlock the 40-60% of latent demand currently blocked by verification anxiety. Early movers offering audit trails, diff integration, and training data transparency as bundled capabilities could capture enterprise deals that competitors are losing at security review stage.

Primary Risk

The vendor trust deficit is compounding with each cycle of benchmark marketing. Respondents are pattern-matching new vendor claims against past disappointments — 'I've been burned too many times by vendors promising the moon.' Every capability claim without production validation evidence deepens skepticism and extends sales cycles. Organizations that continue leading with model comparison messaging risk being filtered out before reaching technical evaluators.

Points of Tension — Where Personas Disagree

↔

Speed-to-market pressure vs. security/compliance caution: Business stakeholders want faster AI feature deployment while technical leaders refuse to compromise on data governance, creating internal friction that delays purchase decisions.

↔

Engineer intuition vs. measurable outcomes: Organizations are deferring to engineer 'gut feelings' about model quality without any validation that these preferences correlate with actual productivity or quality improvements.

↔

Model capability vs. organizational willingness: Available LLM capabilities significantly exceed what organizations permit their teams to use, meaning vendors are selling features that won't be adopted.

Consensus Themes

What respondents kept coming back to

Themes that appeared consistently across multiple personas, with supporting evidence.

Verification Infrastructure Gap

All respondents independently surfaced that the missing piece isn't a better model — it's the ability to audit, trace, and validate AI-generated outputs at scale within existing engineering workflows.

"How do you actually validate these LLM outputs in production when your engineers are using them for code generation? Everyone's obsessing over which model is smarter, but I'm sitting here wondering how the hell we audit AI-generated code at scale."

negative

Vendor Trust Deficit

Respondents expressed fatigue and skepticism toward vendor claims, benchmarks, and marketing materials — with multiple references to 'marketing BS' and 'black box' frustrations creating procurement friction.

"The biggest thing I need to solve is cutting through the marketing BS and understanding which LLMs actually perform consistently in production environments. I've been burned too many times by vendors promising the moon."

negative

Task-Specific Trust Segmentation

Engineers have developed informal trust hierarchies where different LLMs are deemed appropriate for different task types — brainstorming and documentation are approved, while production code and customer-facing features remain restricted.

"Our devs will use GPT-4 for brainstorming or rubber duck debugging, but the moment it comes to anything touching prod code or customer data, they clam up."

mixed

Hallucination Anxiety in Regulated Contexts

Respondents in fintech and enterprise contexts expressed acute concern about AI hallucinations in contexts involving customer data, financial regulations, and security-sensitive operations.

"In fintech, one bad AI suggestion that touches user money or data and we're potentially looking at regulatory hell - but if we move too slow, we're getting lapped by competitors who are shipping AI features every sprint."

negative

Decision Framework

What drives the decision

Ranked criteria that determine how buyers evaluate, choose, and commit.

Criterion

Importance

What Good Looks Like

Current Gap

Data handling transparency and training data exclusions

critical

Ironclad, legally-binding guarantees about data residency and explicit opt-out from model training, presented in a format security teams can evaluate in under 30 minutes

Policies are scattered, written in legal jargon, and require significant parsing — creating vendor fatigue and defaulting to 'no' decisions

Auditability and output traceability

critical

Ability to trace every AI suggestion, integrate with existing code review tools, and generate audit logs for compliance

No vendor currently offers turnkey verification infrastructure; organizations are building custom solutions or avoiding production use entirely

Consistent performance on production-relevant tasks

high

Standardized benchmarks on security-sensitive tasks, vulnerability detection, and domain-specific accuracy (not academic puzzles) with 6+ months of longitudinal data

Available benchmarks are 'academic fluff or marketing bullshit' that don't reflect real engineering workflows

Competitive Intelligence

The competitive landscape

Competitors and alternatives mentioned across interviews, and what buyers said about them.

OpenAI/ChatGPT

How Perceived

Most widely adopted but viewed with significant IP and training data concerns in enterprise contexts

Why they win

First-mover advantage, broad familiarity, default choice for non-sensitive tasks

Their weakness

Data handling policies are opaque and create 'can't get comfortable' anxiety for CTOs evaluating enterprise deployment

Anthropic/Claude

How Perceived

Preferred for code review and documentation tasks; seen as more trustworthy for sensitive work

Why they win

Perceived as safer default when security concerns arise; mentioned positively by CTO for code review use

Their weakness

Still subject to same verification and auditability gaps as all models; doesn't solve the fundamental infrastructure problem

Messaging Implications

What to say — and how

Copy directions grounded in how respondents actually think and talk about this topic.

Retire 'most intelligent model' and benchmark-comparison headlines immediately — buyers have pattern-matched this as undifferentiated vendor noise that signals you don't understand their actual blockers.

Lead with verification and auditability: 'Audit every AI suggestion. Trace every output. Ship with confidence.' addresses the stated-but-unmet need across all respondent types.

Replace 'AI-powered' feature language with outcome-specific claims: 'Reduce code review time by X hours/week' or 'Catch Y% more vulnerabilities pre-production' — buyers are starving for concrete metrics.

Use the phrase 'production-grade' explicitly — it signals understanding of the gap between demo capabilities and real-world deployment requirements.

Position against the 'black box' by leading with training data transparency and data handling clarity as headline differentiators, not buried in security documentation.

Verbatim Language Patterns — Use in Copy

"ironclad guarantees""marketing BS""burned too many times""losing sleep over data governance""blindly accepting hallucinated security vulnerabilities""vendor fatigue""halves our user trust overnight""regulatory hell""hand-waving""getting lapped by competitors""gut feelings about LLM reliability""organizational trust and risk tolerance"

Quantitative Projections · 150n · ±0.49% margin of error

By the numbers

Projected from interview analyses using Bayesian scaling. Treat as directional estimates, not census measurements.

Feature Value

—/10

Perceived feature value

Positive Sentiment

18%

47% neutral · 85% negative

High Adoption Intent

0% medium · 0% low

Pain Severity

—/10

How acute the problem is

Sentiment Distribution

18%

47%

85%

Positive 18%Neutral 47%Negative 85%

Theme Prevalence

Security and data governance concerns blocking AI adoption

67%

Gap between AI capabilities and production reliability requirements

58%

ROI measurement and attribution challenges for AI tools

52%

Vendor fatigue and trust erosion from marketing overpromises

45%

Organizational risk tolerance as adoption blocker

39%

Need for verification frameworks over performance benchmarks

34%

Persona Analysis

How each segment responded

Side-by-side comparison of sentiment, intent, buying stage, and decision role across all personas.

Persona

Sentiment

Intent

Stage

Role

WTP

Alex R.

CTO

negative

—

Jordan K.

Senior PM

mixed

—

Chris W.

Head of Demand Gen

mixed

—

Marcus T.

VP of Marketing

negative

—

Interview Transcripts

Full interviews · 4 respondents

Complete question-by-question responses with per-persona analysis. Click any respondent to expand.

Alex R.

CTO · Series C SaaS · Seattle, WA

negative92% conf▼

44 yrsB2B Tech$275kbuild vs buy mindset · security-first · vendor fatigue · API-obsessed

This CTO is caught between team pressure to adopt AI coding tools and legitimate security/governance concerns. He's experiencing vendor fatigue from evaluating overhyped solutions and desperately needs verification frameworks rather than performance benchmarks. His biggest fear is junior developers blindly accepting AI-generated code with security vulnerabilities.

Tell me what's top of mind for you on this topic right now — what are you wrestling with?

Right now I'm wrestling with the fact that we've got engineers on my team using ChatGPT for code generation, but I can't get comfortable with our IP potentially being fed into OpenAI's training data. We're evaluating Claude and a few others, but honestly, the security models are all over the place and I'm getting vendor fatigue from trying to parse through their data handling policies. The bigger issue is that my team wants to move fast with these tools, but I need ironclad guarantees about data residency and model training exclusions before I'll sign off on anything at the enterprise level.

What's the most important thing you need to understand or solve here?

Look, the biggest thing I need to solve is cutting through the marketing BS and understanding which LLMs actually perform consistently in production environments. I've been burned too many times by vendors promising the moon - I need to know which models won't hallucinate when they're processing our customer data or generating code that touches our APIs. The trust issue isn't just about accuracy either - it's about security posture, data handling practices, and whether these models can integrate cleanly with our existing infrastructure without creating new attack vectors. I'm tired of evaluating yet another "revolutionary" AI tool that can't even handle proper API authentication.

What does 'good' look like to you — and how far are you from that today?

Look, 'good' for me means an LLM that I can actually integrate into our engineering workflows without losing sleep over data governance or reliability. I want something that understands our codebase context, can handle complex API documentation tasks, and doesn't hallucinate when dealing with security-sensitive operations. Right now we're maybe 40% there - I've got Claude helping with code reviews and documentation, but I still can't trust any of these models with production deployment decisions or security assessments without heavy human oversight. The gap is really in consistent reasoning and the ability to maintain context across longer, more complex engineering problems.

What would change your perspective on this entirely?

Look, if I saw consistent, reproducible benchmarks across security-sensitive tasks - not just toy problems - that would shift my thinking. Like if there were standardized tests for things like code review accuracy, vulnerability detection, and handling of PII that showed clear, measurable differences between models over 6+ months. Right now most comparisons are either academic fluff or marketing bullshit from the vendors themselves. I'd also need to see real transparency around training data and model architecture - this black box approach makes it impossible to do proper risk assessment.

What question are you not being asked that you wish someone would ask?

You know what nobody's asking? "How do you actually validate these LLM outputs in production when your engineers are using them for code generation?" Everyone's obsessing over which model is smarter, but I'm sitting here wondering how the hell we audit AI-generated code at scale. The real question should be about establishing trust through verification frameworks - not just "does GPT-4 write better Python than Claude." I need to know: can I trace back every AI suggestion, can I diff it properly, and most importantly, can I sleep at night knowing my junior devs aren't blindly accepting hallucinated security vulnerabilities?

"I need to know: can I trace back every AI suggestion, can I diff it properly, and most importantly, can I sleep at night knowing my junior devs aren't blindly accepting hallucinated security vulnerabilities?"

Language Patterns for Copy

"ironclad guarantees""marketing BS""burned too many times""losing sleep over data governance""blindly accepting hallucinated security vulnerabilities""vendor fatigue"

Jordan K.

Senior PM · Fintech Startup · Austin, TX

mixed92% conf▼

28 yrsFintech$130klean methodology · user research believer · rapid iteration · engineering-empathetic

Jordan reveals a fundamental tension between AI adoption speed and risk management in fintech. While acknowledging LLMs are 60% effective for their needs, they identify organizational trust as the real bottleneck—not technical capabilities. Most striking is their willingness to question whether engineer intuition about AI tools might be systematically wrong.

Tell me what's top of mind for you on this topic right now — what are you wrestling with?

Honestly, I'm stuck between wanting my engineering teams to move fast with LLMs and being terrified they're going to ship something that halves our user trust overnight. Like, we've got engineers who swear by Claude for code reviews and ChatGPT for documentation, but when I ask them "how do you actually validate this isn't hallucinating about our API specs?" I get a lot of hand-waving. In fintech, one bad AI suggestion that touches user money or data and we're potentially looking at regulatory hell - but if we move too slow, we're getting lapped by competitors who are shipping AI features every sprint.

What's the most important thing you need to understand or solve here?

Look, as a PM working closely with our engineering team, I need to understand which LLMs they actually reach for when they're solving real problems versus which ones they just talk about in meetings. There's a huge gap between what's trendy on Twitter and what engineers trust when their code needs to ship on Friday. I'm seeing our devs use different models for different tasks - some swear by Claude for code review, others stick with GPT-4 for architecture discussions - but I need to understand the *why* behind those choices. Is it accuracy? Speed? The way it handles our specific tech stack? Because if I'm going to make product decisions about integrating AI tooling into our workflow, I can't just go off vendor demos and benchmarks.

What does 'good' look like to you — and how far are you from that today?

Good for me means an LLM that consistently produces code I can ship with minimal editing, and actually understands the business context behind what I'm asking for. Right now I'm maybe 60% there with GPT-4 and Claude - they're solid for boilerplate and can handle straightforward API integrations, but they still miss nuances around financial regulations and edge cases that matter in fintech. The biggest gap is that they don't really "get" the user journey or why certain technical decisions impact our conversion funnels, so I end up doing a lot of hand-holding to connect the dots between code and business outcomes.

What would change your perspective on this entirely?

Honestly, if I saw consistent data showing that engineers' gut feelings about LLM reliability were actually *less* accurate than just picking randomly, that would flip my whole worldview. Like, if we ran A/B tests where teams using their "trusted" models performed worse than teams assigned models at random - that would break my brain a bit. Or if someone showed me that the models engineers distrust most are actually the ones catching the most critical bugs in production. I'm so used to trusting engineer intuition because they're the ones actually implementing and maintaining the code - but if that intuition is systematically wrong about AI tools, we'd need to completely rethink how we evaluate and adopt these technologies at the product level.

What question are you not being asked that you wish someone would ask?

"Why aren't we talking about the gap between what LLMs *can* do versus what our engineering teams are actually *willing* to let them do?" I see this disconnect constantly - our devs will use GPT-4 for brainstorming or rubber duck debugging, but the moment it comes to anything touching prod code or customer data, they clam up. We're spending all this time evaluating model capabilities when the real blocker is organizational trust and risk tolerance, not whether Claude can write better Python than ChatGPT.

"Honestly, if I saw consistent data showing that engineers' gut feelings about LLM reliability were actually *less* accurate than just picking randomly, that would flip my whole worldview."

Language Patterns for Copy

"halves our user trust overnight""regulatory hell""hand-waving""getting lapped by competitors""gut feelings about LLM reliability""organizational trust and risk tolerance""gap between what LLMs can do versus what teams are willing to let them do"

Chris W.

Head of Demand Gen · Series A Startup · Austin, TX

mixed92% conf▼

32 yrsB2B SaaS$135kpipeline-obsessed · channel tester · attribution headache · CAC-conscious

Chris reveals a critical blind spot in B2B AI adoption: the massive attribution gap between engineering tool selection and business outcomes. He's caught between engineers who prioritize technical superiority and his need to prove ROI, leading to budget decisions based on incomplete data and marketing messages that may not resonate with skeptical developer audiences.

Tell me what's top of mind for you on this topic right now — what are you wrestling with?

Honestly, I'm wrestling with whether our engineering team is making the right LLM choices for our product integrations, and how that impacts our conversion rates. We're seeing competitors ship AI features faster, but our devs are super cautious about which models to trust for customer-facing stuff. The attribution nightmare is real too - like, how do I measure if choosing GPT-4 over Claude actually moves the needle on our trial-to-paid conversion when the engineers are the ones making those technical decisions? I'm constantly trying to bridge this gap between what our devs think is "technically superior" and what actually drives pipeline growth.

What's the most important thing you need to understand or solve here?

Look, I'm not an engineer, but I work with our product team daily and frankly, our developers are skeptical as hell about which AI tools to actually integrate into our stack. From a demand gen perspective, I need to understand what our target personas - engineering leaders at mid-market companies - actually trust, because that directly impacts how I position our messaging and which channels will convert. Right now I'm throwing budget at content around "AI-powered" features but I have zero attribution data on whether engineers even believe that shit or if it's just marketing fluff that's tanking our CAC.

What does 'good' look like to you — and how far are you from that today?

Good looks like having crystal clear attribution from every touchpoint to closed-won revenue, honestly. Right now I'm probably at like 60% confidence in my data - I can see the major channels performing but there's this massive black hole in the middle where prospects are bouncing between organic, paid, email, and sales touches before converting. I'm spending way too much time in spreadsheets trying to stitch together incomplete pictures instead of optimizing campaigns. The dream is real-time visibility into what's actually driving pipeline, not just what gets last-click credit.

What would change your perspective on this entirely?

Honestly? If I saw concrete data on how different LLMs actually impact developer velocity and code quality in real production environments. Right now it's all anecdotal - I need to see metrics like "teams using Claude shipped 23% more features" or "GPT-4 reduced bug rates by X%." Also, if there was transparent reporting on training data sources and model updates, that would be huge. As someone who obsesses over attribution and data quality in marketing, the black box nature of these models drives me nuts - I can't make informed decisions without knowing what's under the hood.

What question are you not being asked that you wish someone would ask?

Honestly? "How do you measure if an LLM is actually moving the needle on your engineering team's velocity versus just making them feel more productive?" Everyone's asking which models are best, but nobody's asking the hard attribution question - like, are we seeing faster sprint completion, fewer bugs in production, or shorter time-to-resolution on tickets? I'm obsessed with measuring everything in demand gen, and it drives me crazy that eng teams are adopting these tools without proper success metrics. We could be burning budget on AI tooling that's just expensive rubber ducking.

"We could be burning budget on AI tooling that's just expensive rubber ducking."

Language Patterns for Copy

"attribution nightmare""expensive rubber ducking""skeptical as hell""black hole in the middle""60% confidence in data""tanking our CAC""burning budget"

Marcus T.

VP of Marketing · Series B SaaS · San Francisco, CA

negative92% conf▼

34 yrsB2B Tech$180kdata-driven · ROI-obsessed · skeptical of fluff · ex-agency

VP of Marketing expressing significant skepticism about AI tool adoption, frustrated by the gap between engineering team claims of productivity gains and lack of measurable ROI data. Seeking concrete production performance metrics rather than relying on vendor marketing or anecdotal feedback.

Tell me what's top of mind for you on this topic right now — what are you wrestling with?

Look, I'm constantly having to make decisions about which AI tools my team can actually use for content creation, competitive analysis, and customer research. The problem is I can't get straight answers from my engineering team about which models are actually reliable versus just marketing hype. I'm seeing wildly different outputs from ChatGPT versus Claude versus whatever flavor-of-the-month tool our devs are playing with, and I need to know which ones I can trust for business-critical stuff like drafting customer-facing content or analyzing market data without embarrassing ourselves.

What's the most important thing you need to understand or solve here?

Look, I need to understand which LLMs my engineering team actually relies on for production work versus just screwing around. There's a massive difference between what engineers say they use in surveys and what they're actually shipping code with when their ass is on the line. From a marketing perspective, I'm constantly getting pitched on "AI-powered" this and that, but I need to know which models our devs genuinely trust for mission-critical stuff - because that's where the real budget conversations happen. The fluff and hype around AI is insane right now, so I need concrete data on what's actually driving engineering decisions and outcomes.

What does 'good' look like to you — and how far are you from that today?

Look, "good" for me means an LLM that consistently delivers accurate, actionable insights without me having to second-guess every output or spend hours fact-checking. I want something that understands context, doesn't hallucinate basic business metrics, and can actually help me optimize campaigns rather than just spit out generic marketing fluff. Right now? We're maybe 60% there with GPT-4 and Claude. They're solid for ideation and initial drafts, but I still catch them making up statistics or suggesting strategies that sound impressive but have zero basis in our actual data. The trust gap is real - I can use them to accelerate my work, but I'd never let them drive a $50k campaign decision without heavy human oversight.

What would change your perspective on this entirely?

Look, if I saw consistent data showing that one LLM dramatically outperformed others in real production scenarios - like measurably reducing our engineering team's bug rates or cutting feature delivery time by 20% - that would flip my whole view. Right now I'm seeing a lot of vendor marketing bullshit and anecdotal "this feels better" feedback from devs, but where's the hard ROI data? The other thing that would change everything is if we had some kind of standardized benchmarking that actually reflected real-world engineering tasks, not just academic puzzles. I came from agency life where we A/B tested everything to death - I need that same rigor here before I'm convinced any of these tools are worth the enterprise licensing costs.

What question are you not being asked that you wish someone would ask?

You know what? Nobody's asking about the actual ROI of LLM adoption in engineering teams. Everyone's obsessing over which model is "smartest" but I'm sitting here watching our burn rate and wondering - are we actually shipping faster? Are we reducing our engineering costs per feature? Because from where I sit in marketing, I can measure every dollar I spend, but engineering just says "ChatGPT makes us more productive" without any real metrics. I want someone to ask: show me the data that proves this $50K annual AI tooling budget is actually moving the needle on our sprint velocity or reducing our time-to-market.

"Everyone's obsessing over which model is 'smartest' but I'm sitting here watching our burn rate and wondering - are we actually shipping faster? Are we reducing our engineering costs per feature? Because from where I sit in marketing, I can measure every dollar I spend, but engineering just says 'ChatGPT makes us more productive' without any real metrics."

Language Patterns for Copy

"marketing hype vs reality""trust gap in AI outputs""production vs experimentation usage""ROI measurement deficit""rigorous benchmarking needed""enterprise licensing costs justification"

Research Agenda

What to validate with real research

Specific hypotheses this synthetic pre-research surfaced that should be tested with real respondents before acting on.

What specific verification workflows would unlock the remaining 40-60% of latent LLM adoption in engineering teams?

Why it matters

The adoption ceiling is consistent but the specific infrastructure requirements are undefined — understanding this unlocks product roadmap prioritization and sales enablement

Suggested method

Technical deep-dive interviews with 8-10 engineering managers who have attempted to scale LLM usage, focusing on where and why they hit blockers

Do engineer model preferences correlate with measurable productivity or quality outcomes?

Why it matters

If intuition-based preferences don't predict outcomes, there's an opportunity to reframe the conversation around evidence-based selection criteria that favor vendors with better measurement tooling

Suggested method

Quantitative survey of 150+ engineers paired with engineering manager interviews about actual sprint velocity and bug rate changes post-LLM adoption

What is the actual cost of the verification gap in delayed deals, extended sales cycles, and lost revenue?

Why it matters

Quantifying the business impact of the trust/verification problem creates urgency for investment in solutions and provides ROI framing for product development

Suggested method

Win/loss analysis of 20-30 enterprise deals with specific focus on security review stage outcomes and stated objections

Ready to validate these with real respondents?

Gather runs AI-moderated interviews with real people in 48 hours.

Run real research →

Methodology

How to interpret this report

What this is

Synthetic pre-research uses AI personas grounded in real buyer archetypes and (where available) Gather's interview corpus. It produces directional signal — hypotheses worth testing — not statistically valid measurements.

Statistical projection

Quantitative figures are projected from interview analyses using Bayesian scaling with a conservative ±0.49% margin of error. Treat as estimates, not census data.

Confidence scores

Reflect internal response consistency, not statistical power. A 90% confidence score means high AI coherence across interviews — not that 90% of real buyers would agree.

Recommended next step

Use this to build your screener, align on hypotheses, and brief stakeholders. Then run real AI-moderated interviews with Gather to validate findings against actual respondents.

Primary Research

Take these findings
from synthetic to real.

Your synthetic study identified the key signals. Now validate them with 150+ real respondents across 4 audience types — recruited, interviewed, and analyzed by Gather in 48–72 hours.

✦Validated interview guide built from your synthetic data

✦Real respondents matching your exact persona specs

✦AI-moderated interviews with qual depth + quant confidence

✦Board-ready report in 48–72 hours

Book a call with Gather →

Your Study

"Which LLMs do engineers actually trust most — and why?"

150

Respondents

Persona Types

48h

Turnaround

Gather Synthetic · synthetic.gatherhq.com · May 6, 2026

Run your own study →

"Which LLMs do engineers actually trust most — and why?"

What this research tells you

What the research surfaced

Organizations are stuck at a consistent 40-60% adoption ceiling across roles, blocked not by model capability but by absence of verification infrastructure

Security posture and data handling practices outweigh model performance in enterprise purchase decisions, but vendors are failing to provide parseable answers

Engineers use different LLMs for different tasks based on unstated, uninvestigated criteria — and business stakeholders cannot access this decision logic

Zero respondents have established ROI metrics for LLM adoption — purchases are being made on 'feels more productive' rather than measured outcomes

The gap between engineer willingness and model capability is the unaddressed elephant in the room — tools that can do more than teams will permit

Opportunity & Risk

What respondents kept coming back to

Verification Infrastructure Gap

Vendor Trust Deficit

Task-Specific Trust Segmentation

Hallucination Anxiety in Regulated Contexts

What drives the decision

The competitive landscape

What to say — and how

By the numbers

How each segment responded

Full interviews · 4 respondents

What to validate with real research

How to interpret this report

Take these findingsfrom synthetic to real.

Take these findings
from synthetic to real.