We stopped measuring engagement and our product got better
For the first year of building Murror, we optimized for the same metrics every other app optimizes for: daily active users, session length, screens per visit. The dashboard looked healthy. Usage was growing. We felt good about it.
But something was off. Our most engaged users were not our happiest users. People who spent the most time in the app were often the ones who left the harshest feedback. Meanwhile, users who opened the app twice a week for five minutes were writing us emails about how it changed how they handle difficult conversations.
We were measuring activity when we should have been measuring impact.
So we ran an experiment. For one quarter, we replaced our engagement metrics with what we called "outcome metrics." Instead of tracking how long someone stayed, we tracked whether they reported feeling more clarity after a session. Instead of measuring return frequency, we measured whether people said they applied something from Murror in a real life situation.
The results were counterintuitive. Some of our most "engaging" features scored terribly on outcomes. A beautiful interactive visualization that users loved to play with was not actually helping them understand anything about themselves. And a simple, almost boring two-question prompt that most people finished in under a minute was producing the highest outcome scores we had ever seen.
We started making product decisions based on outcomes instead of engagement. We removed three features that quarter. We simplified two screens. Our session length dropped. Our DAU dipped slightly. And our NPS went from 34 to 61.
The hardest part was trusting the process. Every instinct from years of building products told us that declining engagement metrics meant something was wrong. But we had to keep reminding ourselves: the goal is not to keep people in the app. The goal is to help them understand themselves better and take that understanding into the real world.
We are still early in this shift. We do not have it all figured out. But I genuinely believe that the next generation of AI products, especially ones dealing with something as personal as emotions and self-awareness, will need to rethink what success looks like. Not every product should optimize for time spent.
Curious if anyone else has experimented with outcome-based metrics instead of engagement. What did you measure, and how did it change your product decisions?



Replies
What kind of questions did you ask users for outcome tracking?
Murror
@isaac_dominic1Β Great question! We kept it really simple β two core questions after each session: "Do you feel more clarity about what you were reflecting on?" and "Is there something from this session you'd try in real life?" We also started sending a 3-day follow-up asking if they actually applied anything. The follow-up was the game-changer β it told us whether the app was creating lasting value, not just momentary relief.
Did investors push back when DAU dropped?
Murror
@yara_simoneΒ Honestly, yes β it was a tough conversation at first. When DAU dipped, our investors naturally had questions. What helped was showing them the NPS jump (34 to 61) alongside the qualitative feedback we were getting. We reframed the narrative: our users weren't leaving, they were just using the app more intentionally. Once we showed that retention actually improved and that users were reporting real behavioral changes, the conversation shifted from "why is DAU down" to "how do we scale this kind of impact."
@monatruong_murrorΒ Makes sense, strong retention and NPS matter more than raw DAU
How long did it take before you trusted the new data?
Murror
@violet_ameliaΒ It took about 6-8 weeks before we really started trusting it. The first few weeks were honestly nerve-wracking β the outcome data was messier than engagement metrics. Session length is a clean number; "did you feel more clarity" is subjective. But over time, patterns emerged. We noticed that users who scored high on our outcome questions were also the ones who stayed long-term and referred friends. That correlation gave us the confidence to commit fully to the new approach.
How do you balance simple vs engaging now?
@delaney_rose1Β i found this perspective really eye opening because I have always leaned on engagement metrics as a sign things are working. Seeing how they can hide actual dissatisfaction makes me rethink what I should be tracking.
Murror
@delaney_rose1Β It's less about choosing one or the other and more about redefining what "engaging" means. We found that the features users called "engaging" were often just visually interesting β fun to play with but not actually changing anything. The simpler features that scored highest on outcomes also turned out to drive the strongest word-of-mouth. So our new rule of thumb: if it's simple and produces real clarity, it's engaging enough. The engagement just shows up differently β in referrals and long-term return rates rather than time-in-app.
This is super interesting, and weβve been experimenting with something similar (not as structured though).
We moved away from tracking βactivityβ in our marketing (posts, impressions, etc.) and started looking more at:
- whether content actually gets saved or shared
- whether it leads to inbound conversations
- and whether we can sustain it consistently over time
One thing we noticed: high-effort content often performs worse on actual outcomes than simple, repeatable formats.
Your point about the βboringβ feature outperforming everything else really resonates.
Curious. How do you collect those outcome signals at scale? Are users explicitly reporting them, or are you inferring them somehow?
Murror
@judit10Β Love that you're experimenting with this in marketing too β tracking whether content leads to inbound conversations is such a better signal than impressions. To answer your question: it's a mix of both. We use short in-app prompts right after sessions (explicit), but we also look at behavioral proxies like whether someone returns within 7 days or refers a friend (inferred). The explicit reporting gets us the "why," the behavioral data gives us scale. Neither alone tells the full story. And yes β the boring feature outperforming everything was humbling but clarifying!
Are outcome metrics consistent or do they fluctuate a lot?
Murror
@ethan_marshallΒ They fluctuate more than engagement metrics, especially early on. Engagement numbers feel steady because they're counting simple actions. Outcome metrics are noisier because you're measuring something subjective β "did this help?" varies by person and context. But we found that over 4-6 week windows, clear patterns emerged. The weekly noise averaged out and the directional signal was actually stronger than engagement ever was. The trick is resisting the urge to react to daily swings.
Did you automate tracking or is it manual feedback?
Murror
@cole_simmonsΒ Both, actually. The in-app outcome questions are automated β they trigger after sessions and as follow-ups. But interpreting the data still involves a lot of manual review. We read qualitative responses, look for patterns in the behavioral data, and cross-reference. We're slowly building more automated dashboards, but honestly the manual reading of user responses has been some of the most valuable product insight we've ever gotten.
minimalist phone: creating folders
Pretty interesting idea, but I once read about this approach in one marketing newsletter.
It is actually a good tactic.
For example, Taplio is using something similar β monthly sends a report that sounds more like: With your content, you earned $XY (results) β the thing is that we haven't earned any money, but it probably counts "how much time we saved by scheduling things etc."
Murror
@busmark_w_nikaΒ That's a great example with Taplio! The "you saved $XY" framing is clever because it ties the value back to the user's real life rather than just showing activity stats. We're doing something similar β after each session, we ask if the user gained clarity they can apply, rather than just showing them how many sessions they completed. The difference is subtle but it completely changes how users think about the product's role in their life.
minimalist phone: creating folders
@monatruong_murrorΒ It has a psychological effect predominantly, but there are many things you can try in terms of messaging and psychological framing :)
Murror
@busmark_w_nikaΒ Totally agree β psychological framing is a huge lever we're still exploring. One thing we've been experimenting with is how we phrase the post-session reflection. Even small wording changes (like "what stood out to you?" vs "what did you learn?") shift how users engage with the outcome. It's less about the data structure and more about how the question makes the user feel. Would love to hear what framing approaches have worked for you!
minimalist phone: creating folders
@monatruong_murrorΒ Social proof has always worked! :) at least for me :)
What was the hardest feature to remove?
Murror
@aiden_jenkinsΒ An interactive visualization that let users explore their emotional patterns over time. Users loved playing with it, but it scored almost zero on our outcome metrics β it wasn't actually helping anyone understand themselves better. It was just... pretty. Removing it felt risky because it had high engagement, but nothing changed in our outcome scores when it was gone. That was probably the moment we fully trusted the new approach.
nps jumping from 34 to 61 while dau drops is the whole story tbh. engagement dashboards are addictive for founders too - watching numbers go up feels like progress even when its just people being confused or stuck
Murror
@umairnadeemΒ Exactly this. The dashboard addiction is real. We spent months watching numbers climb and feeling productive without ever asking whether users were actually better off. Once we broke that cycle and focused on NPS and real-world application, it changed everything about how we build.