User Engagement Metrics: The Ones That Lie and the Ones That Don't
Most user engagement metrics count motion, not progress. A rep forced to log in daily scores high and does nothing. Here is which metrics survive contact with Goodhart's Law.
User engagement metrics measure how much people use a product, and they split into surface metrics that count motion (logins, sessions, clicks) and outcome metrics that count progress (did the key behavior happen, did the work move forward), with only the second resisting gaming.
Picture two reps and one dashboard. The first logs in every morning, leaves the tool open all day, and racks up an enviable session count. The second opens the tool for four minutes, updates the one field that moves the deal, and closes it. On every user engagement metric the dashboard tracks, the first rep is winning. On the only thing the company is paying for, the second one is. The dashboard is not lying, exactly. It is answering a question nobody should have asked.
User engagement metrics are the measures of how much people use a product, and they split into surface metrics that count motion (logins, sessions, clicks) and outcome metrics that count progress (did the key behavior happen, did the work move forward), with only the second resisting gaming. Keep that split in front of you, because almost every misleading dashboard is built entirely from the first kind.
The distinction is not new, and the person who sharpened it for software was Eric Ries. In The Lean Startup he drew the line between vanity metrics, the numbers that go up and to the right and make everyone feel good, and actionable metrics, the ones that tell you whether the thing you care about is actually happening (Eric Ries, The Lean Startup). His test was simple and brutal: if a metric cannot change a decision, it is vanity, no matter how impressive the chart. Total registered users only ever rises, so it flatters and informs nothing. Engagement dashboards are full of these. They are popular precisely because they are reassuring, and reassurance is the opposite of the job a metric is supposed to do.
Which user engagement metrics count motion, and which count progress?
The two families look similar on a slide and mean opposite things in practice.
- Surface metrics. Logins, sessions, time-in-app, daily and monthly active users, clicks, page views. Easy to collect, easy to chart, and easy to fake. They tell you a user was present, not that anything happened.
- Outcome metrics. Did the behavior the product exists to enable happen, and did the work move forward because of it. Harder to instrument and far harder to game, because gaming them means doing the work.
The same split runs through adoption metrics generally: the easy ones count presence and the hard ones count whether the work got done. The trap is that the surface metrics are the easy ones, so they are the ones that get tracked, charted, and reported up. A team optimizing for product engagement on those numbers can post a beautiful curve while the behavior that pays the bills sits flat.
Why do login and session metrics mislead?
Because motion is not progress, and the dashboard cannot tell the difference. A rep required to keep the tool open all day looks maximally engaged and may do nothing of value. A rep who uses it briefly but correctly looks lightly engaged and does the real work. The same number, two opposite realities. This is the odometer problem: it counts the miles without knowing whether anyone arrived, and a car idling in the driveway can run the odometer all day.
How does Goodhart’s Law kill an engagement metric?
The moment you turn a surface metric into a target, it stops measuring what you wanted. The economist Charles Goodhart’s observation, now a law, is that any measure that becomes a target ceases to be a good measure (Goodhart’s Law). Make daily logins the goal and you will get daily logins: people log in, do nothing, and log out, and the metric now records empty logins. You optimized the proxy and lost the thing the proxy stood for.
The cleanest picture of this is a true story, the cobra effect. A colonial government in Delhi wanted fewer venomous cobras in the city, so it paid a bounty for every dead cobra turned in. The metric, dead cobras collected, climbed beautifully. The goal, fewer cobras, did not, because enterprising residents started breeding cobras to kill for the bounty. When the government noticed and scrapped the program, the breeders released their now-worthless snakes, and the city ended up with more cobras than when it began. The number went up the whole time. The thing the number was supposed to track went the other way.
This is not a quirk of bounties or of British India. Donald Campbell, a social scientist, stated the same law for human systems years before Goodhart: the more any quantitative indicator is used for decision-making, the more it will be gamed and the more it will distort the process it was meant to monitor (Campbell’s Law). Two independent thinkers, an economist and a social scientist, arrived at the same conclusion from opposite directions, which is the strongest sign a finding is real. A login target on a sales tool is a small cobra bounty. The reps are not villains for clicking through it; the system rewarded presence, so it got presence. Blaming them misreads the problem, which is the metric, not the people.
Outcome metrics survive Goodhart because gaming them requires doing the work. If your metric is “the deal-qualifying step was completed correctly,” the only way to score is to complete the step correctly, which is the behavior you wanted. The measure and the goal are the same thing, so there is no proxy to game.
This is the same reason activity tracking is good but never sufficient on its own: capturing what a user did is necessary and honest, and it must be paired with whether the work moved forward, a pairing laid out in pipeline hygiene and user adoption. The error is not tracking activity. The error is mistaking the activity for the outcome and stopping there.
How do you pick an outcome metric that does not lie?
Find the single behavior the product exists to produce, and measure whether it happened. Sean Ellis, who coined the term growth hacking, built the now-standard North Star Metric practice on exactly this: pick the one number that captures the core value the customer gets, and let the vanity metrics orbit it as diagnostics, never as goals (Sean Ellis on the North Star Metric). For a messaging app the North Star is messages sent, not signups. For a sales tool it is the deciding step run correctly on a real deal, not minutes in the app. The test for a good outcome metric is the gaming test: imagine someone trying hard to make the number go up while delivering no value, and if they cannot, the metric is sound.
A few rules keep an engagement dashboard honest:
- One outcome metric on top. Lead with the single behavior the product exists to produce, and judge everything against it. A dashboard with twelve coequal numbers has no North Star and will drift toward the easiest one.
- Surface metrics as diagnostics only. Logins and sessions earn a place as leading sanity checks, the canary that tells you something upstream broke, but never as a goal anyone is rewarded for hitting.
- The gaming test on every target. Before you set a target on any metric, ask whether someone could hit it while delivering nothing. If yes, it is a surface metric in disguise, and the day it becomes a target is the day it stops telling the truth.
- Pair activity with progress. Track what the user did, because that is honest and necessary, and pair it with whether the work moved, because activity alone advances nothing on its own.
What we recommend
Build the dashboard outcome-first. Lead with the metrics that count progress, the key behavior happening and the work moving forward, and report the surface metrics (logins, sessions) only as leading sanity checks, never as goals. Never set a target on a surface metric you cannot defend against Goodhart, because the day it becomes a target is the day it stops telling you the truth. The point of measuring engagement was always to know whether people are doing the work the tool exists for. Measure that directly, and the vanity numbers become what they should have been all along: a footnote.
From here: the system view in user adoption, the activity-plus-outcome balance in pipeline hygiene, and why training does not move the number in technology adoption.
Frequently asked questions
What are user engagement metrics?+
Which user engagement metrics actually matter?+
Why are login and session metrics misleading?+
What is Goodhart's Law and how does it affect engagement metrics?+
Your process, running itself.