ijpq - 博客园

[置顶] git 系列

摘要： # 基本 ![](https://img2022.cnblogs.com/blog/1481923/202206/1481923-20220623184653443-1138374228.png) # gitignore A gitignore file specifies intentionall 阅读全文

posted @ 2022-06-23 18:50 ijpq 阅读(37) 评论(0) 推荐(0) 编辑

2024年10月14日

scientifically practice DP

摘要： I understand your frustration, and it's a common feeling when tackling complex problems like this. Finding these insights often comes down to a combin 阅读全文

posted @ 2024-10-14 18:27 ijpq 阅读(4) 评论(0) 推荐(0) 编辑

how to write recursive DFS scientifically

摘要： Principles for Writing Correct Recursive Functions in DFS Algorithms When implementing a Depth-First Search (DFS) algorithm using recursion, it's cruc 阅读全文

posted @ 2024-10-14 13:57 ijpq 阅读(6) 评论(0) 推荐(0) 编辑

2024年7月22日

pytorch contributing - matmul analysis

摘要： as per this issue: https://github.com/pytorch/pytorch/issues/113743 阅读全文

posted @ 2024-07-22 11:47 ijpq 阅读(5) 评论(0) 推荐(0) 编辑

2024年7月11日

pytorch contributing compilation

摘要： refer to this page to learn about the entire docs about contributing https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributio 阅读全文

posted @ 2024-07-11 21:59 ijpq 阅读(2) 评论(0) 推荐(0) 编辑

2024年7月6日

gnu inline asm

摘要： ::: index asm keyword, assembly language in C, inline assembly language, mixing assembly language and C ::: How to Use Inline Assembly Language in C C 阅读全文

posted @ 2024-07-06 14:24 ijpq 阅读(15) 评论(0) 推荐(0) 编辑

2024年4月2日

matmul优化

摘要： HOW TO OPTIMIZE GEMM 介绍一些常规的优化思路，参考：https://github.com/flame/how-to-optimize-gemm/wiki baseline /* Create macros so that the matrices are stored in co 阅读全文

posted @ 2024-04-02 16:22 ijpq 阅读(43) 评论(0) 推荐(0) 编辑

2024年3月27日

relocation overflow log

摘要：问题背景： https://airflow-megengine.iap.hh-d.brainpp.cn/log?dag_id=megbrain-release&task_id=prebuild-cu111&execution_date=2022-10-08T06%3A06%3A51%2B00%3A0 阅读全文

posted @ 2024-03-27 13:12 ijpq 阅读(126) 评论(0) 推荐(0) 编辑

milestone

摘要： 2022.Q3 沟通 with wangbiao 晋升milestone: Q3: 怎么做目标切分怎么当一个owner Q4: 要独立当一个owner 2023.Q1: 要学习带人做项目 kernel优化路线 cuda c -> tensor core -> cutlass -> tvm Q3 p 阅读全文

posted @ 2024-03-27 13:09 ijpq 阅读(4) 评论(0) 推荐(0) 编辑

cutlass进度快照

摘要： 03 Feb 2023 : 最近一周，重新梳理了dnn上rrconv的codegen代码，dnn上rrconv fprop全部test跑通。dnn rrconv dgrad不能通过，部分case计算错误。rrconv cutlass dgrad全部通过。 2.2号来了，先检查dgrad的codege 阅读全文

posted @ 2024-03-27 13:07 ijpq 阅读(16) 评论(0) 推荐(0) 编辑

2024年3月26日

CUTLASS: Fast Linear Algebra in CUDA C++

摘要： https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/ Efficient Matrix Multiplication on GPUs 计算密集度 = (时间复杂度/空间复杂度) = O(N^3)/O(N^2) = O(N) // 阅读全文

posted @ 2024-03-26 13:47 ijpq 阅读(14) 评论(0) 推荐(0) 编辑

0x01

computer arch/parallel programming/