Documentation Index
Fetch the complete documentation index at: https://launchdarkly-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
At LaunchDarkly, there’s always been an unwritten goal to identify interesting user sessions to make it easier for our customers to understand their users.
One way to analyze a session is to look at the user’s journey, summarized by the order of page visits and the time spent per page, the thought being that sessions with unusual user journeys would uncover insights into how users may be experiencing frustration or using the app in an unexpected way.
This blog post covers the logic that we use to analyze the user journeys of each session and surface the interesting ones.
Defining a “page”
First, in order to define a user journey as “the order of page visits”, we need to first define a “page”. The LaunchDarkly client [records](https://github.com/highlight/highlight/blob/47ac4497d8ba94df83471e54d14ea788e04767b0/sdk/client/src/index.tsx#L- browser navigation events, so we can get the URLs and timestamps for each page visit. Conceptually, a page is a bit different from a URL, since one or many URLs may map to a single page. An app may also use URL normalization rules (e.g. case insensitivity, or removing trailing slashes).
app.highlight.io/{project-id}/errors/{error-id}, but regardless of the project-id and error-id in the URL, we render the same “error details page” with different content displayed.
Normalizing our data
To reduce noise, we wanted to apply some normalization steps before saving a session’s user journey. A web app may have some redirection logic, such that between two page visits, there is an intermediate visit to another page which isn’t meaningful to the user. This doesn’t really impact the user journey, so we can just discard the intermediate page. To try our best to group URLs together as pages with data, we need to identify resource IDs in URLs - to do this, we wanted an algorithm that would split up a URL and identify which parts are likely resource IDs so we could remove them. There isn’t a single way to accomplish this, but we ended up splitting the URL paths on their slashes, and handling each part with a few heuristics that work pretty well for our use case:- If the part contains a number, treat the part as an ID.
- Split the part on capitals or separator characters (’-’, ‘_’, ‘~’, ‘+’). If any of these doesn’t contain a vowel or contains more than 5 sequential consonants, treat the part as an ID.
Scoring interesting sessions
After applying these normalization steps, we can enumerate all of the state transitions in a user journey. For each step in the journey, we can calculate the bigram probability of the next url given the current url using all of the other sessions we’ve seen, creating a Markov chain. An interesting session should be one where the state transitions are less probable. For a LaunchDarkly user session, an example journey with its corresponding probabilities looks like this: | page | next_page| probability |
|---|
5.01x10^-6. This is the probability of that exact user journey happening, assuming at each step that the next page was randomly chosen based only on the current page. We could calculate this probability for all sessions, then find the sessions with the lowest probability.
As fun as this sounds, this has a couple of issues:
- Longer sessions will be favored, because probabilities are multiplied together and the probability of every step cannot be greater than 1.
- Navigating among pages with many links to other pages will cause lower scores, as the expected probability is lower. For example, for a current page, if there are five other pages a user can visit and each is equally interesting (equal probability), the step’s probability will be .2 regardless of which page is chosen.
- For the first issue, regardless of how many transitions are made, the expected value of an entire session is
- For the second issue, all transitions are scored as 1 instead of .
- High probability steps will now have a normalized score greater than 1, and the total score of a session will be less than 1 if it’s more interesting and greater than 1 if it’s less interesting. Applying this normalization to the previous example looks like this:
| normalized |
|---|