FASRIM: A Data Quality Framework for Analytics & Research
Analytics and research lives and dies on the quality of its data. It doesn’t matter how elegant your methodology is or how sophisticated your analysis tools are — if your data is garbage, so are your insights. And yet, the industry is awash in low-quality data. Fraud, inattentiveness, misrepresentation — these are just a few of the challenges analysts battle daily.
The problem? Most frameworks for assessing data quality are vague or overly academic. Everyone has their own approach and industry bodies propose solutions that are gamified by vendors to hide data quality issues. Fundamentally, quality is a subjective measure for clients and something claimed by vendors, and it’s all good until something goes wrong. In fact, clients don’t care about data quality as much as they care about anomalies in their data.
Clients don’t tend to concern themselves over the market share numbers they’ve been tracking over the last decade that are all 20% higher than reality, but when the numbers change by 10% in a single week because of an underlying issue in the data… the sky is falling.
As an industry we need to move past this, we need credible numbers because credible numbers are going to be the lifeblood of a AI and data driven future for measurement. That’s why we need FASRIM, a practical, structured approach to evaluating the integrity of your data. This acronym stands for Fraud, Attentiveness, Self-Selection, Representativeness, Instrument Bias, and Methodology. The first three — FAS — cover threats to external validity, the things researchers have less control over but must be aware of. The latter three — RIM — are internal validity concerns, meaning researchers have direct influence over them.
External Validity: The Uncontrollables (FAS)
These are the data quality threats that live outside the researcher’s control. You can’t always stop them, but you can detect and mitigate them.
F — Fraud
Fraud in research and analytics is rampant. Pay people to take surveys, and some will game the system. People pay for advertising and bots will gobble up impressions to secure a share of ad budgets. People will always seek to game a system, they’ll lie about their demographics, use bots, or even hire click farms. This is especially prevalent in lower-income markets, where the incentive to cheat is higher when what they gain is currency in a high-income market — money that to us would seem inconsequential but in their local currency supports a good quality of life. Without a solid fraud detection strategy — duplicate checks, digital fingerprinting, behavioral analytics — you’ll end up with data that’s worse than useless.
A — Attentiveness
Even legitimate consumers can pollute your data. Why? Because analytical exercises can be boring, ads can be ignored. With surveys, if participants don’t care, they speed through measurements, straight-line their answers, or pick random responses just to finish. Attention-check questions, time-tracking, and engagement measures help, but inattentiveness remains a chronic issue. On ad supported experiences consumers will ignore ads, install ad blockers, or spend time on their phone while ads run unwatched.
S — Self-Selection
Even when no outright fraud occurs, bias creeps in through self-selection. People opt into studies and data they shouldn’t qualify for because they’re interested in the topic, want to try a new product, or just need the payout. People register for ad supported websites and often complete inaccurate profiling data, or even use someone else’s login. It’s a softer form of fraud — one where people bend the truth just enough to pass screening. Better screening and validation can reduce the impact, but this is a pervasive issue that needs continuous monitoring.
Internal Validity: The Controllables (RIM)
Now for the parts you can control. These factors dictate whether your measurement leads to valid, actionable insights.
R — Representativeness
Your insights are only as good as your sample. If the audience you’re measuring isn’t the one you need, your results won’t generalize. Maybe your recruitment strategy is flawed. Maybe your vendor gave you the wrong people. Either way, bad sample = bad data. Fixing this means tightening your sampling, verifying demographics, and ensuring your audience reflects the population you need to understand.
I — Instrument Bias
This is a sneaky one. Even if your sample is perfect, your measurement tool can introduce bias. Ad blockers will block web analytics code in a cat and mouse evolutionary game. The way you word survey questions, the order you ask them in, the interface used to collect responses — all of these can subtly nudge people toward certain answers. If your survey, app, or platform isn’t neutral, your data won’t be either. The fix? Pilot testing, cognitive interviews, and continuous refinement of research instruments.
M — Methodology
Last but not least: Are you using the right method for the job? You wouldn’t run a wave study when an experiment is what’s needed, just like you wouldn’t rely on qualitative data for something that demands hard numbers. Picking the right methodology is everything. A mismatch between method and objective guarantees misleading insights.
Why FASRIM Matters
FASRIM isn’t just another theoretical framework — it’s a reality check. If your data fails on any of these six dimensions, your research is compromised. The first three — Fraud, Attentiveness, and Self-Selection — determine whether your data can even be trusted in the first place. The last three — Representativeness, Instrument Bias, and Methodology — decide whether your research actually answers the right question.
The reality is that no dataset or analysis will ever succeed across all these dimensions. Anyone reviewing research should assume that the data isn’t 100% robust on FASRIM. But that’s not the point. The point is to use this framework to understand where the weaknesses lie. Knowing where the gaps are allows researchers to adjust their confidence levels, refine their approach, and ensure that all bases are covered before drawing conclusions.
Over time we may be able to develop objective measures for each of these items whereas others might require a more subjective assessment. As a best practice I’d encourage consumers of data and insights to ask for a disclosure page or document for their data or research that includes a transparent assessment of each component of the framework. Knowing what’s happening is half the battle.
Bad data is expensive. It leads to bad decisions, wasted budgets, and strategies that don’t work. If you’re not thinking about FASRIM in your research process, you’re gambling with your results.