欢迎访问yhm138的博客园博客, 你可以通过 [RSS] 的方式持续关注博客更新

MyAvatar

yhm138

HelloWorld!

【LeetCode 2854. 滚动平均步数】[MySQL/Pandas]过去(连续/不连续)n日滑动窗口

题目地址

https://leetcode.cn/problems/rolling-average-steps/description/

MySQL代码

下面的代码是:

  • 【过去n_days天,以及今天】的滑动窗口
  • 对于每一天,如果有可用数据的情况下,我们会计算以该天为结束的 (n_days+1) 天连续步数的平均值,否则,对于该天来说,(n_days+1)天滚动平均步数是未定义的。
WITH t1 AS
(
    SELECT 
        user_id,
        steps_date,
        CASE
            WHEN COUNT(steps_count) OVER (
                PARTITION BY user_id 
                ORDER BY date(steps_date)
                RANGE BETWEEN INTERVAL @n_days DAY PRECEDING AND CURRENT ROW
            ) >= @n_days + 1 THEN -- 确保至少有n天的数据
                ROUND(AVG(steps_count) OVER (
                    PARTITION BY user_id 
                    ORDER BY date(steps_date)
                    RANGE BETWEEN INTERVAL @n_days DAY PRECEDING AND CURRENT ROW
                ), 2)
            ELSE NULL -- 如果不足n天,就不计算平均值
        END AS rolling_average
    FROM 
        Steps
        , (SELECT @n_days := 2 ) vars   -- 我不知道怎么改,用的时候把这一行注释掉,把所有@n_days替换为实际数值
)

SELECT * from t1 
WHERE rolling_average IS NOT NULL;

等效pandas代码

如果要求【今日和过去2日】这三日连续,pandas没有什么好办法,还是建议for-loop写

import pandas as pd

def rolling_average(steps: pd.DataFrame) -> pd.DataFrame:
    n_days: int = 2
    # First, ensure the 'steps_date' column is of type datetime
    steps['steps_date'] = pd.to_datetime(steps['steps_date'])
    
    # Sort the DataFrame by user_id and steps_date to ensure correct processing
    steps = steps.sort_values(['user_id', 'steps_date'])
    
    # Initialize a DataFrame to hold the rolling averages
    rolling_averages = pd.DataFrame(columns=['user_id', 'steps_date', 'steps_count', 'rolling_average'])
    
    # Process each user separately
    for user_id, user_steps in steps.groupby('user_id'):
        user_steps = user_steps.reset_index(drop=True)
        # Initialize a list to hold the user's rolling averages
        rolling_avg_list = []

        # Loop over the user's steps
        for i in range(len(user_steps)):
            # Check if there are enough previous days to form a window
            if i >= n_days:
                # Extract the window of days
                window = user_steps.iloc[i-n_days:i+1]
                # Check if the window is continuous
                if window['steps_date'].diff().dt.days.fillna(1).eq(1).all():
                    # Calculate the rolling average for the continuous window
                    rolling_avg = window['steps_count'].mean()
                    rolling_avg_list.append(rolling_avg)
                else:
                    rolling_avg_list.append(None)
            else:
                rolling_avg_list.append(None)
        
        # Add the rolling averages to the user's DataFrame
        user_steps['rolling_average'] = rolling_avg_list
        # Append the user's DataFrame to the main DataFrame
        rolling_averages = pd.concat([rolling_averages, user_steps], ignore_index=True)
    
    # Drop rows with None in 'rolling_average' and keep relevant columns
    rolling_averages = rolling_averages.dropna(subset=['rolling_average'])
    rolling_averages = rolling_averages[['user_id', 'steps_date', 'steps_count', 'rolling_average']]
    rolling_averages['rolling_average'] = rolling_averages['rolling_average'].round(2)
    
    return rolling_averages[['user_id', 'steps_date' ,'rolling_average']]

如果要求【今日和过去2日】这三日不一定连续

import pandas as pd

def rolling_average(steps: pd.DataFrame) -> pd.DataFrame:
    n_days: int =  2
    # First, ensure the 'steps_date' column is of type datetime
    steps['steps_date'] = pd.to_datetime(steps['steps_date'])
    
    # Sort the DataFrame by user_id and steps_date to ensure the rolling window works correctly
    steps = steps.sort_values(['user_id', 'steps_date'])
    
    # Calculate the rolling count of steps_count to find where the window size is at least n_days + 1
    steps['valid_window'] = steps.groupby('user_id')['steps_count'].rolling(window=n_days+1, min_periods=n_days+1).count().reset_index(0, drop=True)

    # Calculate the rolling average where the window size is valid
    steps['rolling_average'] = steps.groupby('user_id')['steps_count'].rolling(window=n_days+1, min_periods=n_days+1).mean().reset_index(0, drop=True)
    
    # Round the rolling average to 2 decimal places where the window is valid
    steps.loc[steps['valid_window'] >= n_days + 1, 'rolling_average'] = steps.loc[steps['valid_window'] >= n_days + 1, 'rolling_average'].round(2)
    
    # Only keep rows where the rolling average is not null
    result = steps[steps['rolling_average'].notnull()].copy()
    
    # Drop the 'valid_window' column as it is no longer needed
    result = result.drop(columns=['valid_window'])
    
    return result
posted @ 2024-01-12 09:26  yhm138  阅读(14)  评论(0编辑  收藏  举报