Data Explorer queries that I use


(Sarah Hawk) #1

On request from @JoelZaslofsky here are the Discourse Data Explorer queries that I use here at Experts.

Active readers for the past month

Lists users, no. of visits, and no. of posts read in the last month

select user_id, 
count(1) as visits,
sum(posts_read) as posts_read
from user_visits
where posts_read > 0
and visited_at > CURRENT_TIMESTAMP - INTERVAL '30 days'
group by user_id
order by visits desc, posts_read desc

Active readers since N days

Lists no. of users who have read at least 1 post since N days ago

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 days') n
),
latest_visits as (
    select 
        user_id, 
        max(visited_at) as visited_at
    from user_visits
    where posts_read > 0
    group by user_id
)
select 
  COUNT(1) as active_readers_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM users u
left join latest_visits v
on u.id = v.user_id
right join intervals i
on v.visited_at >= i.start_time and v.visited_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Users last seen since (weeks)

Shows users that have been absent for N weeks

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '140 days',
                         CURRENT_TIMESTAMP - INTERVAL '7 days',
                         INTERVAL '7 days') n
)
select 
  COUNT(1) as users_seen_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM USERS u
right join intervals i
on u.last_seen_at >= i.start_time and u.last_seen_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Recently read topics by user

The specific topics read by an individual user

-- [params]
-- integer :user = 1
-- integer :since_days_ago = 7

with topic_timing as (
  select user_id, topic_id, sum(msecs) / 1000 as seconds
  from post_timings
  where user_id = :user
  group by user_id, topic_id
)
SELECT tv.topic_id,
    tv.user_id,
    tv.viewed_at,
    tt.seconds
from topic_views tv
left join topic_timing tt
on tv.topic_id = tt.topic_id
and tv.user_id = tt.user_id
where tv.user_id = :user
and viewed_at + :since_days_ago > CURRENT_TIMESTAMP
order by seconds desc

Lurkers

Members that have read but never posted (parameter = min no of posts read)

–[params]
– int :num_of_posts = 100

WITH posts_by_user AS (
    SELECT COUNT(*) AS posts, user_id
    FROM posts
    GROUP BY user_id
), posts_read_by_user AS (
    SELECT SUM(posts_read) AS posts_read, user_id
    FROM user_visits
    GROUP BY user_id
)
SELECT 
    u.id,
    u.username_lower AS "username",
    u.email,
    u.created_at,
    u.last_seen_at,
    COALESCE(pbu.posts, 0) AS "posts_created",
    COALESCE(prbu.posts_read, 0) AS "posts_read"
FROM users u
LEFT JOIN posts_by_user pbu ON pbu.user_id = u.id
LEFT JOIN posts_read_by_user prbu ON prbu.user_id = u.id
WHERE u.active = true
AND posts IS NULL
AND posts_read > :num_of_posts
ORDER BY  posts_read DESC

Post count for time period

Returns post count for specified date period (doesn’t count DMs or closed/deleted/archived topics)

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

    SELECT count(p.id)
    FROM posts p
        LEFT JOIN topics t ON t.id = p.topic_id
        WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
            AND t.deleted_at IS NULL
            AND t.visible = TRUE
            AND t.closed = FALSE
            AND t.archived = FALSE
            AND t.archetype = 'regular'
            AND p.deleted_at IS NULL

Active users during timeframe

Returns a count of users that made at least one post during the defined period

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

WITH user_activity AS (
    SELECT p.user_id, count (p.id) as posts_count
    FROM posts p
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
        AND t.deleted_at IS NULL
        AND t.visible = TRUE
        AND p.deleted_at IS NULL
        
    GROUP BY p.user_id
)
SELECT COUNT(user_id)
FROM user_activity
WHERE posts_count >= :min_posts

Custom field query

Queries custom user fields (in this case I’m looking for Twitter names)

select users.name, users.email, user_custom_fields.user_id, user_custom_fields.name, user_custom_fields.value from users, user_custom_fields
where users.id = user_custom_fields.user_id
and user_custom_fields.value LIKE '%@%'

Query staff notes

Search for a specific term within staff notes

WITH psr AS (
 SELECT regexp_matches(plugin_store_rows.key, '[0-9]+') AS psr_key_value
   , plugin_store_rows.plugin_name AS psr_plugin_name
   , plugin_store_rows.value AS psr_value
 FROM plugin_store_rows
)
SELECT users.id 
 , users.username
 , users.email
FROM users 
JOIN psr
ON
 users.id = CAST (array_to_string(psr.psr_key_value, '') AS int)
WHERE 
  psr.psr_plugin_name LIKE 'staff_notes'
AND  
  psr.psr_value LIKE '%education%'

(Alessio Fattorini) #2

True gold here.


(Joel Zaslofsky) #3

Because Discourse only let’s me :two_hearts: this once, I want to make sure you know that I’d :two_hearts: this ten times if I could. Thank you, thank you!


(Sarah Hawk) #4

Here’s another one – this one is awesome.

Find out what people are searching for in your community

SELECT term, count(*) searches, 
 sum(case when clicked_topic_id is not null then 1 else 0 end) 
 "continued from search"
from search_logs
where created_at > current_timestamp - interval '30' day
group by term
order by count(*) desc
limit 100

Note: If you’re not on a hosted Discourse plan you may not have this data available.


(Alessio Fattorini) #5

Uhm,

PG::UndefinedTable: ERROR: relation “search_logs” does not exist
LINE 11: from search_logs

I have just on my DB
category_search_data
topic_search_data
user_search_data


(Sarah Hawk) #6

That may be because we don’t host your community – you don’t have access to the search logs.
I note the same thing with communities hosted at discoursehosting.com


(Joel Zaslofsky) #7

Bummer. I’m in the same situation as @ale_fattorini as we host our own Discourse install.

Perhaps you could update your OP with the added query and specify that it’s only for people with Discourse hosted installs.