The next time you need to divide up a group of researchers for a game of dodge ball, take the half with positive things to say about synthetic data to one side and the half with negative perspectives the other.
Few topics in the insights world are more controversial or divisive. To steer clear of the drama, many stakeholders end up walking the safe line of “human powered synthetic insights”. This is code for, buy my traditional research data from “real” humans and we’ll add some synthetic insights on top. Pure synthetic solutions get eye-rolls, crossed arms, and skeptical muttering. LinkedIn posts follow, detailing flaws in synthetic data and claiming its insights are unreal, “computer-generated guesswork.”
The biggest synthetic data companies don’t care, because they know that soon, neither will anyone else.
Vibe Insight Enters the Chat
There’s a new buzzword in the AI world: “vibe coding.” Andrei Karpathy, OpenAI co-founder and the AI engineering messiah, coined it to describe a wild new way of working, tell an AI model the “vibes” of what you want, and it figures out the rest. No endless mock-ups, no complex prework, no micro-managing… just vibes.
Anointed with a label, vibe coding is now mainstream. Nowadays, most engineers have experimented with vibe coding. The magic of LLMs we experience when asking a question is even better when you say, “Hey, build me a dashboard,” and the AI pulls in tools, code, web results, whatever it needs. The boundary between what you know and what the machine figures out is fading fast.
Vibe Insights? It’s the same thing, but for research questions. Want to know what’ll happen to your brand if you run a spicy ad? Tell the LLM what you’re after, and if it knows, great, answer provided. If it doesn’t, it’ll go fetch data wherever you let it. Vibe Insights is research, with a layer of AI driven conclusions. If it needs to synthesize those conclusions, then it will.
The Human Understanding Gap
If you take a step back from the insights industry for a minute and look at it from a 50,000 ft view, you’ll notice something… we dumb everything down. The reason we do so much segmentation, cross-tabbing, and demographic pigeonholing is because our brains can’t process the messiness of real life. Grouping people by age or income is a hack to make complex stuff fit into simple charts we understand. The world is weird, complicated, and the only way we process the complexity is to fit it into unified thinking by smoothing the differences into 18 to 34 years old or well named segments like Kids & Cul-de-Sacs.
Look around and you’ll see it in how we design studies, reports and even our world views. Want a grand unified theory of how brands grow? Take your pick, emotional attachment, meaningfully different, or availability & distinctiveness. While supported by marketing science, these frameworks can present conflicting advice and, frankly, they don’t work for every brand.
This unification we do has a benefit. Intuitively, we know every brand has its own unique path to building equity, but we simplify to make it easier to understand and communicate. It’s to our benefit to make the complex approachable. Unfortunately, AI models don’t need that benefit.
Most people don’t get just how massive the math behind these AIs is. You want to train a baby LLM with 1,000 parameters, by hand? If you’re doing a multiplication every 2 seconds, you’ll finish in about 15 months (assuming no breaks and a 12-hour workday). Scale it up to a million parameters? Now you’re looking at 23,000 years.
It would take over a century of full-time work from every person on Earth to train a model the size of ChatGPT. And every three months, models grow. AI systems don’t handle complexity, they embody it.
Not only are models faster and able to handle complexity, but they’re also smarter. The latest from OpenAI, Meta, Google, Anthropic, are crushing previous intelligence tests. Researchers had to put together Humanity’s Last Exam: 2,500 PhD-level questions, structured, academic, and a challenge for anyone with a pulse. I don’t know anyone that can score above 1%. Google’s Gemini model scored 21% in April. On Wednesday, Grok 4 dropped and doubled that, scoring 44%. For reference, Grok is at a point that it can ace the SAT, ACT, and likely earn a PhD in multiple subjects. On top of all that ChatGPT 5 is around the corner and likely to be even smarter.
Synthetic Data to the Rescue
I don’t think there’s much more evidence needed to be convinced that AI systems are smart. However, I am certain that some calls for “real data from real respondents” will persist, and synthetic will continue to be thrown under the bus. However, things are happening that will force change.
The data the big AI companies want for training is being walled off at an alarming rate. The New York Times is embroiled in a legal battle with OpenAI. Disney has gone nuclear on several AI companies. Reddit locked down and sells access for a fee. Cloudflare just put up a paywall for AI crawlers. The content wars have become a battle royale.
So, what do you do when you can’t get fresh data? You make your own. Enter synthetic data, what to the insights world has been Frankenstein’s monster the AI giants have been using for years. With pressure on data supply, it’s getting new focus, and getting way better, fast.
Perhaps a supply issue sounds familiar? The research industry’s been in a panel crisis for years; tiny panels, high churn, professional respondents, and data quality nightmares. Synthetic isn’t a stop gap; its looking like an escape hatch. The more attention it gets from companies with trillions to spend, the better it will become.
Reasoning Models Change the Game
Ok, fine, you might believe synthetic is the next thing, but at least you have your data moat to protect your insights business. Not so fast. Bob McGrew, former head of research at OpenAI, filled in the data moat with a simple post: “I’m skeptical of the power of proprietary data as a moat in the long run.” Why? Because reasoning, not just raw data, is now proving to be a game changer.
The new wave of AI models don’t just memorize; they think and reason. Like a smart high schooler figuring out a complex math problem, these models don’t need to have the answer memorized in their training data. They can infer, extrapolate, and fill in the blanks from their base knowledge. As a result, your vast oceans of survey data and tracking studies is less important than you think. If you give a reasoning model a little bit of information, it’ll figure out a lot.
Bob highlighted this sentiment in his post, “How will you value what your proprietary data adds compared to what your competitor’s infinitely smart, infinitely patient agents can estimate from public data?”. He goes into this in detail on a Sequoia podcast where he relays his experience that even custom industry-specific models, trained on secret datasets, often under perform generalist models that just…think better.
Human Behavior Models: AI That Knows You Better Than You Do
If you still think human generated insights data is required, then it’s time to talk about Centaur. Of course I’m not talking about the half-horse, half-dude. I’m talking about the Centaur AI project.
The Centaur project took a big, general LLM and fine-tuned it with Psych-101 data; trial-by-trial human behavior results from 60,000+ people, and 160 experiments. The result was that Centaur didn’t just keep up with domain-specific models, it beat them at predicting how people act.
The team even went so far as to use fMRI scans to compare Centaur’s “thinking” matched to human neural activity and found that the model was the best at predicting human thought processes. So, yes, the AI is literally learning to think like us, and over time they’re going to get infinitely better.
The largest share of work being done in the for-profit insights sector is an offshoot of foundational work done in behavioral psychology which has been steered toward understanding drivers of for profit consumption. If the behavioral psychologists are figuring out how to teach an AI model to think like a human and predict behavior do you really think the same kinds of tech won’t soon figure out which Taco Bell ad moves more burritos?
The “Humwelt” Problem
Allow me to introduce a new term to your vocabulary: Humwelt. The term comes from a combination of the “H” from human and the German term Umwelt. Umwelt describes “the specific way in which organisms of a particular species perceive and experience the world, shaped by the capabilities of their sensory organs and perceptual systems.” For an ant, it’s umwelt is the backyard which it perceives as its whole universe.
Humwelt? It’s our little corner of human experience. To me, humwelt describes a particular human bias, a bubble where we can’t perceive a world where an AI system is as powerful, fast, predictive, smart, creative, or as capable as a human. It’s a world where we can’t fathom AI creating valid insights.
When researchers dismiss synthetic respondents, or find flaws in synthetic data they’re likely not wrong…today. But they’re stuck in their humwelt, not imagining how quickly AI will solve those issues. Synthetic insights have flaws right now, but those flaws are getting patched faster than you can say “survey fatigue.”
Thinking about Vibe Insights through a lens of “yeah, but” is humwelt in practice. You are much better off imagining the Sci-Fi version of the future where Vibe Insights is all powerful and the problems of today are just part of the journey.
Vibe Insights and the Next Differentiator
So what comes next?
Despite the fun name, don’t assume that as AI gets better at “vibe coding,” it’ll also get easier to run end-to-end projects. If you try vibe coding, you will learn pretty quickly that it is only as good as the context and environment you give it to work with. It’s easy to ask the AI for the vibe; it’s hard to set up the playground where the AI actually knows what’s relevant, has tools, data and knows how to act on it. The same is true of research, a powerful Vibe Insights tool will have the right context and environment for answering questions.
For insight vendors, this is existential. The old edge was in the research, methods, operational expertise and having better access to people or data. But the future advantage is going to be in context engineering, setting up the right frameworks, permissions, API plumbing, data curation, and policy so that when the AI is asked to solve a complicated problem, it knows what to do and has the tools to do it.
Insights vendors that stick to their unified theories are bound to face new threats coming from company scale thinking machines. On the other hand, those who see themselves as context engineers, architects of the AI operating environment, will be the ones to survive. The question is becoming less about “How do Brands grow?” It’s “Can you build a system where a complex AI model can help my specific brand grow, deliver something new, useful, and relevant every day?”
This is the real differentiator. In the vibe insights future, the companies shaping the context will pull ahead.
Why We’ve Got Time
I’ve been around long enough to see every “revolutionary tech” collide with corporate inertia. Companies still hire the old names, McKinsey, Nielsen, Kantar, IPSOS, take your pick, because nobody gets fired for picking the safe option.
That’s the human advantage for now: our stubborn bias toward the familiar, and the humwelt of vendor risk-aversion. In large part, this is because we still control the purse strings and this isn’t likely to change overnight. But disruption will find a way, either through “white knight” risk-takers or upstart vendors running around the old guard carving out new value using classic innovator’s dilemma tactics, while incumbents say “they’re not competing in our space.”
The Bottom Line
Vibe Insights is eventually going to eat the world, the evidence piles higher every day whether we like it or not. The next generation of insights leaders will be the ones who figure out how to engineer the vibe, and not cling to their frameworks and vaults of historical data.
But hey, that’s just my two cents. What do you think? Ready to vibe, or are you still clinging to your cross-tabs for dear life?
More progress on AI models seeking to imitate the brain. This one uses system 1/2 thinking… https://www.sapient.inc/blog/5