Splitting Time Series Data into Train/Test/Validation Sets
Time-series (or other intrinsically ordered data) can be problematic for cross-validation. If some pattern emerges in year 3 and stays for years 4-6, then your model can pick up on it, even though it wasn’t part of years 1 & 2.
An approach that’s sometimes more principled for time series is forward chaining, where your procedure would be something like this:
- fold 1 : training [1], test [2]
- fold 2 : training [1 2], test [3]
- fold 3 : training [1 2 3], test [4]
- fold 4 : training [1 2 3 4], test [5]
- fold 5 : training [1 2 3 4 5], test [6]
That more accurately models the situation you’ll see at prediction time, where you’ll model on past data and predict on forward-looking data. It also will give you a sense of the dependence of your modeling on data size.
REF
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
https://stats.stackexchange.com/questions/346907/splitting-time-series-data-into-train-test-validation-sets
https://stats.stackexchange.com/questions/117350/how-to-split-dataset-for-time-series-prediction
https://stats.stackexchange.com/questions/453386/working-with-time-series-data-splitting-the-dataset-and-putting-the-model-into
https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection
https://community.dataquest.io/t/how-to-split-time-series-data-into-training-and-test-set/4116/2
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
2020-02-27 马尔科夫链蒙特卡洛采样(MCMC)入门 之二
2012-02-27 matlab程序性能优化与混合编程技术介绍