作者:因情语写
链接:https://www.proprogrammar.com/article/682
声明:请尊重原作者的劳动,如需转载请注明出处
动作表:
Actions
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | post_id | int | | action_date | date | | action | enum | | extra | varchar | +---------------+---------+ 此表没有主键,所以可能会有重复的行。 action 字段是 ENUM 类型的,包含:('view', 'like', 'reaction', 'comment', 'report', 'share') extra 字段是可选的信息(可能为 null),其中的信息例如有:1.报告理由(a reason for report) 2.反应类型(a type of reaction)
编写一条SQL,查询每种 报告理由(report reason)在昨天的报告数量。假设今天是 2019-07-05。
查询及结果的格式示例:
Actions table: +---------+---------+-------------+--------+--------+ | user_id | post_id | action_date | action | extra | +---------+---------+-------------+--------+--------+ | 1 | 1 | 2019-07-01 | view | null | | 1 | 1 | 2019-07-01 | like | null | | 1 | 1 | 2019-07-01 | share | null | | 2 | 4 | 2019-07-04 | view | null | | 2 | 4 | 2019-07-04 | report | spam | | 3 | 4 | 2019-07-04 | view | null | | 3 | 4 | 2019-07-04 | report | spam | | 4 | 3 | 2019-07-02 | view | null | | 4 | 3 | 2019-07-02 | report | spam | | 5 | 2 | 2019-07-04 | view | null | | 5 | 2 | 2019-07-04 | report | racism | | 5 | 5 | 2019-07-04 | view | null | | 5 | 5 | 2019-07-04 | report | racism | +---------+---------+-------------+--------+--------+ Result table: +---------------+--------------+ | report_reason | report_count | +---------------+--------------+ | spam | 1 | | racism | 2 | +---------------+--------------+ 注意,我们只关心报告数量非零的结果。
select extra report_reason, count(distinct post_id) report_count
from Actions
where action = 'report' and extra is not null and date_format(action_date, '%Y-%m-%d') = '2019-07-04'
group by extra
学一下is not null,date_format,group+count+distinct
动作表:
Actions
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | post_id | int | | action_date | date | | action | enum | | extra | varchar | +---------------+---------+ 这张表没有主键,并有可能存在重复的行。 action 列的类型是 ENUM,可能的值为 ('view', 'like', 'reaction', 'comment', 'report', 'share')。 extra 列拥有一些可选信息,例如:报告理由(a reason for report)或反应类型(a type of reaction)等。
移除表:
Removals
+---------------+---------+ | Column Name | Type | +---------------+---------+ | post_id | int | | remove_date | date | +---------------+---------+ 这张表的主键是 post_id。 这张表的每一行表示一个被移除的帖子,原因可能是由于被举报或被管理员审查。
编写一段 SQL 来查找:在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位。
查询结果的格式如下:
Actions table: +---------+---------+-------------+--------+--------+ | user_id | post_id | action_date | action | extra | +---------+---------+-------------+--------+--------+ | 1 | 1 | 2019-07-01 | view | null | | 1 | 1 | 2019-07-01 | like | null | | 1 | 1 | 2019-07-01 | share | null | | 2 | 2 | 2019-07-04 | view | null | | 2 | 2 | 2019-07-04 | report | spam | | 3 | 4 | 2019-07-04 | view | null | | 3 | 4 | 2019-07-04 | report | spam | | 4 | 3 | 2019-07-02 | view | null | | 4 | 3 | 2019-07-02 | report | spam | | 5 | 2 | 2019-07-03 | view | null | | 5 | 2 | 2019-07-03 | report | racism | | 5 | 5 | 2019-07-03 | view | null | | 5 | 5 | 2019-07-03 | report | racism | +---------+---------+-------------+--------+--------+ Removals table: +---------+-------------+ | post_id | remove_date | +---------+-------------+ | 2 | 2019-07-20 | | 3 | 2019-07-18 | +---------+-------------+ Result table: +-----------------------+ | average_daily_percent | +-----------------------+ | 75.00 | +-----------------------+ 2019-07-04 的垃圾广告移除率是 50%,因为有两张帖子被报告为垃圾广告,但只有一个得到移除。 2019-07-02 的垃圾广告移除率是 100%,因为有一张帖子被举报为垃圾广告并得到移除。 其余几天没有收到垃圾广告的举报,因此平均值为:(50 + 100) / 2 = 75% 注意,输出仅需要一个平均值即可,我们并不关注移除操作的日期。
select round(avg(p.per) * 100, 2) average_daily_percent
from(
select count(distinct spam.post_id) / count(distinct s.post_id) per
from Actions s left join (
select distinct a.post_id
from Actions a, Removals r
where a.extra = 'spam' and a.post_id = r.post_id
) spam on s.post_id = spam.post_id
where s.extra = 'spam'
group by s.action_date
) p
先找出移除的垃圾贴子,再找出垃圾贴子,求每天的移除比率,再求平均比率,思路很清楚,学一下distinct, left join, group by + count + distinct, round用法,
再看另一种做法
SELECT round(SUM(delCount / spamCount * 100) / COUNT(DISTINCT action_date), 2) AS average_daily_percent
FROM (
SELECT action_date, COUNT(distinct a.post_id) AS spamCount, count(distinct b.post_id) AS delCount
FROM Actions a
LEFT JOIN Removals b ON a.post_id = b.post_id
where a.extra = 'spam'
GROUP BY a.action_date
) a
上面是三层select嵌套,这里是两层,充分利用了left join的性质(右边可能不存在为null),从而一次连接从连接的左右求出两种想要的结果
亲爱的读者:有时间可以点赞评论一下
月份 | 原创文章数 |
---|---|
202206 | 4 |
202205 | 2 |
202204 | 1 |
202203 | 11 |
202201 | 2 |
202108 | 7 |
202107 | 3 |
202106 | 16 |
202105 | 10 |
202104 | 16 |
202103 | 56 |
202102 | 14 |
202010 | 3 |
202009 | 3 |
202008 | 7 |
202007 | 7 |
202006 | 10 |
202005 | 11 |
202004 | 22 |
202003 | 52 |
202002 | 44 |
202001 | 83 |
201912 | 52 |
201911 | 29 |
201910 | 41 |
201909 | 99 |
201908 | 35 |
201907 | 73 |
201906 | 121 |
201811 | 1 |
201810 | 2 |
201804 | 1 |
201803 | 1 |
201802 | 1 |
201707 | 1 |
全部评论