【LeetCode 2854. 滚动平均步数】[MySQL/Pandas]过去(连续/不连续)n日滑动窗口
题目地址
https://leetcode.cn/problems/rolling-average-steps/description/
MySQL代码
下面的代码是:
- 【过去n_days天,以及今天】的滑动窗口
- 对于每一天,如果有可用数据的情况下,我们会计算以该天为结束的 (n_days+1) 天连续步数的平均值,否则,对于该天来说,(n_days+1)天滚动平均步数是未定义的。
WITH t1 AS
(
SELECT
user_id,
steps_date,
CASE
WHEN COUNT(steps_count) OVER (
PARTITION BY user_id
ORDER BY date(steps_date)
RANGE BETWEEN INTERVAL @n_days DAY PRECEDING AND CURRENT ROW
) >= @n_days + 1 THEN -- 确保至少有n天的数据
ROUND(AVG(steps_count) OVER (
PARTITION BY user_id
ORDER BY date(steps_date)
RANGE BETWEEN INTERVAL @n_days DAY PRECEDING AND CURRENT ROW
), 2)
ELSE NULL -- 如果不足n天,就不计算平均值
END AS rolling_average
FROM
Steps
, (SELECT @n_days := 2 ) vars -- 我不知道怎么改,用的时候把这一行注释掉,把所有@n_days替换为实际数值
)
SELECT * from t1
WHERE rolling_average IS NOT NULL;
等效pandas代码
如果要求【今日和过去2日】这三日连续,pandas没有什么好办法,还是建议for-loop写
import pandas as pd
def rolling_average(steps: pd.DataFrame) -> pd.DataFrame:
n_days: int = 2
# First, ensure the 'steps_date' column is of type datetime
steps['steps_date'] = pd.to_datetime(steps['steps_date'])
# Sort the DataFrame by user_id and steps_date to ensure correct processing
steps = steps.sort_values(['user_id', 'steps_date'])
# Initialize a DataFrame to hold the rolling averages
rolling_averages = pd.DataFrame(columns=['user_id', 'steps_date', 'steps_count', 'rolling_average'])
# Process each user separately
for user_id, user_steps in steps.groupby('user_id'):
user_steps = user_steps.reset_index(drop=True)
# Initialize a list to hold the user's rolling averages
rolling_avg_list = []
# Loop over the user's steps
for i in range(len(user_steps)):
# Check if there are enough previous days to form a window
if i >= n_days:
# Extract the window of days
window = user_steps.iloc[i-n_days:i+1]
# Check if the window is continuous
if window['steps_date'].diff().dt.days.fillna(1).eq(1).all():
# Calculate the rolling average for the continuous window
rolling_avg = window['steps_count'].mean()
rolling_avg_list.append(rolling_avg)
else:
rolling_avg_list.append(None)
else:
rolling_avg_list.append(None)
# Add the rolling averages to the user's DataFrame
user_steps['rolling_average'] = rolling_avg_list
# Append the user's DataFrame to the main DataFrame
rolling_averages = pd.concat([rolling_averages, user_steps], ignore_index=True)
# Drop rows with None in 'rolling_average' and keep relevant columns
rolling_averages = rolling_averages.dropna(subset=['rolling_average'])
rolling_averages = rolling_averages[['user_id', 'steps_date', 'steps_count', 'rolling_average']]
rolling_averages['rolling_average'] = rolling_averages['rolling_average'].round(2)
return rolling_averages[['user_id', 'steps_date' ,'rolling_average']]
如果要求【今日和过去2日】这三日不一定连续
import pandas as pd
def rolling_average(steps: pd.DataFrame) -> pd.DataFrame:
n_days: int = 2
# First, ensure the 'steps_date' column is of type datetime
steps['steps_date'] = pd.to_datetime(steps['steps_date'])
# Sort the DataFrame by user_id and steps_date to ensure the rolling window works correctly
steps = steps.sort_values(['user_id', 'steps_date'])
# Calculate the rolling count of steps_count to find where the window size is at least n_days + 1
steps['valid_window'] = steps.groupby('user_id')['steps_count'].rolling(window=n_days+1, min_periods=n_days+1).count().reset_index(0, drop=True)
# Calculate the rolling average where the window size is valid
steps['rolling_average'] = steps.groupby('user_id')['steps_count'].rolling(window=n_days+1, min_periods=n_days+1).mean().reset_index(0, drop=True)
# Round the rolling average to 2 decimal places where the window is valid
steps.loc[steps['valid_window'] >= n_days + 1, 'rolling_average'] = steps.loc[steps['valid_window'] >= n_days + 1, 'rolling_average'].round(2)
# Only keep rows where the rolling average is not null
result = steps[steps['rolling_average'].notnull()].copy()
# Drop the 'valid_window' column as it is no longer needed
result = result.drop(columns=['valid_window'])
return result