杂学 - 随笔分类 - 馒头and花卷

Communication Efficient Large-Scale Training with Adam's Convergence Speed

摘要：目录概1-bit Adam1-bit SGD代码 Seide F., Fu H., Droppo J., Li G. and Yu D. 1-bit stochastic gradient descent and its application to data-parallel distribute 阅读全文

posted @ 2025-02-13 21:28 馒头and花卷阅读(4) 评论(0) 推荐(0) 编辑

ZeRO, ZeRO-Offload, ZeRO-Infinity, ZeRO++

摘要：目录概MotivationZeROZeRO-OffloadZeRO-InfiniteZeRO++代码 Rajbhandari S., Rasley J., Ruwase O. and He Y. ZeRO: Memory optimizations toward training trillion 阅读全文

posted @ 2025-02-13 14:33 馒头and花卷阅读(22) 评论(0) 推荐(0) 编辑

Quantization

摘要：目录引线性量化 (Linear Quantization)对称量化非对称量化非线性量化Logarithmic QuantizationPower-of-XRoundingDeterministic roundingStochastic rounding [1] 进击的程序猿-模型压缩-神经网络量化基阅读全文

posted @ 2024-12-04 19:44 馒头and花卷阅读(108) 评论(0) 推荐(0) 编辑

An Empirical Model of Large-Batch Training

摘要：目录概Gradient Noise Scale McCandlish S., Kaplan J., Amodei D. and OpenAI Dota Team. An empirical model of large-batch training. 2018. 概本文讨论了随着 batch si 阅读全文

posted @ 2024-11-26 17:01 馒头and花卷阅读(50) 评论(0) 推荐(0) 编辑

Recursive Algorithm for Sliding Signal Processing

摘要：目录概滑动窗口上的快速算法 Farhang-Boroujeny B. and Gazor S. Generalized sliding fft and its application to implementation of block lms adaptive filters. TSP, 1994 阅读全文

posted @ 2024-11-12 21:20 馒头and花卷阅读(10) 评论(0) 推荐(0) 编辑

Frequent Directions

摘要：目录概Frequent DirectionsFrequent Directions over Slidding Windows代码 Ghashami M., Liberty E., Phillips J. M. and Woodruff D. P. Frequent directions : Sim 阅读全文

posted @ 2024-11-06 15:47 馒头and花卷阅读(41) 评论(0) 推荐(0) 编辑

Faster Local Solvers for Graph Diffusion Equations

摘要：目录概Graph Diffusion Equations 的传统近似解法Sequential local updates via Successive Overrelaxation (SOR)代码 Bai J., Zhou B., Yang D. and Xiao Y. Faster Local S 阅读全文

posted @ 2024-10-31 17:27 馒头and花卷阅读(12) 评论(0) 推荐(0) 编辑

Auxiliary Learning by Implicit Differentiation

摘要：目录概AuxiLearn问题设定理解两阶段的训练代码 Navon A., Achituve I., Maron H., Chechik G. and Fetaya E. Auxiliary learning by implicit differentiation. ICLR, 2021. 概通过阅读全文

posted @ 2024-10-11 16:34 馒头and花卷阅读(36) 评论(0) 推荐(0) 编辑

Modularity-based Graph Clustering

摘要：目录概符号说明ModularityAgglomerative Hierarchical ClusteringLouvainModularity-based Graph ClusteringRabbit代码 [1] Newman M. E. J. and GirvanM. Finding and ev 阅读全文

posted @ 2024-09-20 15:02 馒头and花卷阅读(29) 评论(0) 推荐(0) 编辑

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

摘要：目录概符号说明AdafactorFactored Second Moment EstimationNo MomentumOut-of-Date Second Moment Estimator算法代码 Shazeer N. and Stern M. Adafactor: Adaptive learni 阅读全文

posted @ 2024-09-11 15:28 馒头and花卷阅读(93) 评论(0) 推荐(0) 编辑

Memory-Efficient Adaptive Optimization

摘要：目录概符号说明SM3区间的划分代码 Anil R., Gupta V., Koren T., Singer Y. Memory-efficient adaptive optimization. NeurIPS, 2019. 概本文提出了一种 memory-efficient 的优化器: SM3. 阅读全文

posted @ 2024-09-10 21:25 馒头and花卷阅读(13) 评论(0) 推荐(0) 编辑

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

摘要：目录概METISCoarseningPartitioning phaseUncoarsening phase Karypis G. and Kumar V. A fast and high quality multilevel scheme for partitioning irregular gr 阅读全文

posted @ 2024-09-08 17:21 馒头and花卷阅读(29) 评论(0) 推荐(0) 编辑

Graph Edge Partitioning via Neighborhood Heuristic

摘要：目录概符号说明Vertex vs Edge partitioningNE (Neighbor Expansion)代码 Zhang C., Wei F., Liu Q., Tang Z. G. and Li Z. Graph edge partitioning via neighborhood he 阅读全文

posted @ 2024-09-08 14:17 馒头and花卷阅读(49) 评论(0) 推荐(0) 编辑

优化与收敛率小记

摘要：目录概基本的设定非凸优化凸优化强凸优化概近来对优化和收敛速度有了一些新的感悟, 特此一记. 这些感悟有的来自博客 (如 here), 有的来自书籍. 以往只是套一些收敛的模板, 这里我会讲一下如何从几何的角度去理解这些收敛性. 基本的设定假设我们希望优化: \[\tag{1} \min_{x 阅读全文

posted @ 2024-07-18 20:19 馒头and花卷阅读(114) 评论(0) 推荐(1) 编辑

Regularized Stochastic Learning and Online Optimization

摘要：目录概符号说明MotivationFOBOS (Forward-Backward Splitting)RDA (Regularized Dual Averaging)FTRL-Proximal (Follow The Regularized Leader)FOBOS, RDA, FTRL-Proxi 阅读全文

posted @ 2024-07-16 09:27 馒头and花卷阅读(53) 评论(0) 推荐(0) 编辑

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

摘要：目录概符号说明经验性的结果Noisy Model Learning代码 Chen H., Wang J., Shah A., Tao R., Wei H., Xie X., Sugiyama M. and Raj B. Understanding and mitigating the label n 阅读全文

posted @ 2024-05-30 16:37 馒头and花卷阅读(41) 评论(0) 推荐(0) 编辑

Learning Latent Permutations with Gumbel-Sinkhorn Networks

摘要：目录概SinkHorn operator Mean G. E., Belanger D., Linderman C. and Snoek J. Learning latent permutations with gumbel-sinkhorn networks. ICLR, 2018. 概本文提出阅读全文

posted @ 2024-05-30 09:58 馒头and花卷阅读(49) 评论(0) 推荐(0) 编辑

Further Generalizations of the Jaccard Index

摘要：目录概Jaccard Index推广到 multisets推广到 Multiple sets Costa L. Further generalizations of the jaccard index. 2021. 概本文介绍了 Jaccard Index (Jaccard Similarity) 阅读全文

posted @ 2024-05-23 21:50 馒头and花卷阅读(24) 评论(0) 推荐(0) 编辑

Cayley transform

摘要：目录定义性质旋转矩阵的一般重参数化:正交矩阵的一般重参数化:代码 [1] Gallier J. Remarks on the Cayley Representation of Orthogonal Matrices and on Perturbing the Diagonal of a Matrix 阅读全文

posted @ 2024-03-25 16:22 馒头and花卷阅读(359) 评论(0) 推荐(0) 编辑

Learning to rank: from pairwise approach to listwise approach

摘要：目录概ListNetPermutation ProbabilityTop-k ProbabilityListMLE Cao Z., Qin T., Liu T., Tsai M. and Li H. Learning to rank: from pairwise approach to listwi 阅读全文

posted @ 2023-11-19 21:10 馒头and花卷阅读(184) 评论(0) 推荐(1) 编辑

馒头and花卷

随笔分类 - 杂学

公告

搜索

随笔分类

Python相关

概率论-论文

收藏

优化问题-论文