
10 min read
Subscribe to our monthly newsletter to get the latest updates in your inbox
SELECT user_pseudo_id, --unique user id event_timestamp, --time in microseconds UTC as logged on the client TIMESTAMP_MICROS(event_timestamp) AS ts, --transform to TIMESTAMP type LAG(TIMESTAMP_MICROS(event_timestamp)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_evt_ts, --timestamp of the event before the current one IF(event_name = "session_start", 1, 0) AS is_session_start_event FROM `<your project id>.<your dataset id>.<your table id>`The query is pretty straightforward with the exception of the usage of the LAG analytic function, which might be new to those less proficient in SQL. Generally, we're only selecting user id and event timestamp but with the usage of the LAG function we take a look at the timestamp before the current event (we'll use this later to check if 30 minutes has already passed). And lastly, we take note when we encounter event by the name of session_start .
One thing to note: if there's no previous event available
prev_evt_ts will return
null. If
is_session_start_event is 1 that means the row represents a
session_start event. That information will come in handy later.
SELECT *, IF(COALESCE(TIMESTAMP_DIFF(ts, prev_evt_ts, SECOND) / 60, 1e10) >= 30, 1, 0) AS session_start FROM ( SELECT user_pseudo_id, event_timestamp, TIMESTAMP_MICROS(event_timestamp) AS ts, LAG(TIMESTAMP_MICROS(event_timestamp)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_evt_ts, IF(event_name = "session_start", 1, 0) AS is_session_start_event FROM `<your project id>.<your dataset id>.<your table id>` )We’ve wrapped our layer 1 query into another query. The only addition is the session_start column. The purpose is to make a note on which event a new session should be started. TIMESTAMP_DIFF(ts, prev_evt_ts, SECOND) / 60 is used to calculate the difference between the current timestamp and the previous timestamp in minutes. Keep in mind that prev_evt_ts can be null if there is no previous event, meaning that TIMESTAMP_DIFF would also return null. That’s why we wrapped it inside a COALESCE function which returns the first non null value. In case the first one is null we output 1e10 which is a big number and clearly larger than 30, that means we automatically make note to start a new session if there is no previous event before the current one. Also if the difference is larger than 30 that also means a new session should be started as per Firebase’s default session timeout duration.
SELECT *, SUM(session_start) OVER (PARTITION BY user_pseudo_id ORDER BY ts) AS session_id FROM ( SELECT *, IF(COALESCE(TIMESTAMP_DIFF(ts, prev_evt_ts, SECOND) / 60, 1e10) >= 30, 1, 0) AS session_start FROM ( SELECT user_pseudo_id, event_timestamp, TIMESTAMP_MICROS(event_timestamp) AS ts, LAG(TIMESTAMP_MICROS(event_timestamp)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_evt_ts, IF(event_name = "session_start", 1, 0) AS is_session_start_event FROM `<your project id>.<your dataset id>.<your table id>` ) )We did the same thing as before, wrapping our layer 2 query into a new query and added exactly one additional column. Column session_id is meant to represent the id of the user’s session. We achieve that by using analytic functions once again. This time we partition our data by user id and order it by event timestamp in an ascending order and SUM our session_start column that we prepared in the previous layer. The function moves row by row and adds the previous sum to the new sum.
We see how
session_id changes once we need to start with a new session. This is almost good enough. What we're doing is grouping all user events that are no more than 30 minutes apart and making that our session id. However, at the beginning we mentioned that
session_start event plays a role in how a session is defined. That’s why we created an
is_session_start_event column, but we're not yet using it. Firebase will only acknowledge a session if there is a
session_start event inside it.
SELECT user_pseudo_id, event_timestamp, session_id, MAX(is_session_start_event) OVER (PARTITION BY user_pseudo_id, session_id) AS has_session_start_event FROM ( SELECT *, SUM(session_start) OVER (PARTITION BY user_pseudo_id ORDER BY ts) AS session_id FROM ( SELECT *, IF(COALESCE(TIMESTAMP_DIFF(ts, prev_evt_ts, SECOND) / 60, 1e10) >= 30, 1, 0) AS session_start FROM ( SELECT user_pseudo_id, event_timestamp, TIMESTAMP_MICROS(event_timestamp) AS ts, LAG(TIMESTAMP_MICROS(event_timestamp)) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_evt_ts, IF(event_name = "session_start", 1, 0) AS is_session_start_event FROM `<your project id>.<your dataset id>.<your table id>` ) ) )This is the last layer—we expose user id—event timestamp and session id from the previous query and add another new column. has_session_start_event will tell us if the session had session_start event or not. That way we can easily exclude sessions that lasted less than 10 seconds and had no session_start or do just the opposite and analyze only those short, not-acknowledged-by-firebase sessions. The analytic function partitions data by user id and session id and calculates the MAX value of is_session_start_event column, if the result is 1 we know that somewhere inside that session session_start event was sent. Thing to note is that session_start event is not guaranteed to be the first one in the session.
Example is the same as the previous layer. Here we see that
has_session_start_event is set to 1 indicating both sessions are valid.
Above, we see a session with 3 events, but none of those events are
session_start which would make this an invalid session per Firebase’s session definition. That does it. With this query we can now explore user sessions in more detail.
SELECT user_pseudo_id, session_id, COUNT(*) AS events, TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration FROM `<your project id>.<your dataset id>.<your view name>` WHERE has_session_start_event = 1 --only look at valid sessions GROUP BY user_pseudo_id, session_id
Note that we are only looking at valid sessions. From there on it’s fairly easy to further calculate global averages of events per session and average session duration. The idea is that you can use
user_pseudo_id and
event_timestamp from the query (or view) and join it back with the original data so you have that extra data available right beside your main data.
Find a time to talk media, measurement or both!
Our media geniuses, analytics savants and industry-leading data scientists have deep expertise and a passion for helping our clients use it effectively. And they'd love to help you achieve your marketing goals.