随笔分类 - 杂学
摘要:目录概1-bit Adam1-bit SGD代码 Seide F., Fu H., Droppo J., Li G. and Yu D. 1-bit stochastic gradient descent and its application to data-parallel distribute
阅读全文
摘要:目录概MotivationZeROZeRO-OffloadZeRO-InfiniteZeRO++代码 Rajbhandari S., Rasley J., Ruwase O. and He Y. ZeRO: Memory optimizations toward training trillion
阅读全文
摘要:目录引线性量化 (Linear Quantization)对称量化非对称量化非线性量化Logarithmic QuantizationPower-of-XRoundingDeterministic roundingStochastic rounding [1] 进击的程序猿-模型压缩-神经网络量化基
阅读全文
摘要:目录概Gradient Noise Scale McCandlish S., Kaplan J., Amodei D. and OpenAI Dota Team. An empirical model of large-batch training. 2018. 概 本文讨论了随着 batch si
阅读全文
摘要:目录概滑动窗口上的快速算法 Farhang-Boroujeny B. and Gazor S. Generalized sliding fft and its application to implementation of block lms adaptive filters. TSP, 1994
阅读全文
摘要:目录概Frequent DirectionsFrequent Directions over Slidding Windows代码 Ghashami M., Liberty E., Phillips J. M. and Woodruff D. P. Frequent directions : Sim
阅读全文
摘要:目录概Graph Diffusion Equations 的传统近似解法Sequential local updates via Successive Overrelaxation (SOR)代码 Bai J., Zhou B., Yang D. and Xiao Y. Faster Local S
阅读全文
摘要:目录概AuxiLearn问题设定理解两阶段的训练代码 Navon A., Achituve I., Maron H., Chechik G. and Fetaya E. Auxiliary learning by implicit differentiation. ICLR, 2021. 概 通过
阅读全文
摘要:目录概符号说明ModularityAgglomerative Hierarchical ClusteringLouvainModularity-based Graph ClusteringRabbit代码 [1] Newman M. E. J. and GirvanM. Finding and ev
阅读全文
摘要:目录概符号说明AdafactorFactored Second Moment EstimationNo MomentumOut-of-Date Second Moment Estimator算法代码 Shazeer N. and Stern M. Adafactor: Adaptive learni
阅读全文
摘要:目录概符号说明SM3区间的划分代码 Anil R., Gupta V., Koren T., Singer Y. Memory-efficient adaptive optimization. NeurIPS, 2019. 概 本文提出了一种 memory-efficient 的优化器: SM3.
阅读全文
摘要:目录概METISCoarseningPartitioning phaseUncoarsening phase Karypis G. and Kumar V. A fast and high quality multilevel scheme for partitioning irregular gr
阅读全文
摘要:目录概符号说明Vertex vs Edge partitioningNE (Neighbor Expansion)代码 Zhang C., Wei F., Liu Q., Tang Z. G. and Li Z. Graph edge partitioning via neighborhood he
阅读全文
摘要:目录概基本的设定非凸优化凸优化强凸优化 概 近来对优化和收敛速度有了一些新的感悟, 特此一记. 这些感悟有的来自博客 (如 here), 有的来自书籍. 以往只是套一些收敛的模板, 这里我会讲一下如何从几何的角度去理解这些收敛性. 基本的设定 假设我们希望优化: \[\tag{1} \min_{x
阅读全文
摘要:目录概符号说明MotivationFOBOS (Forward-Backward Splitting)RDA (Regularized Dual Averaging)FTRL-Proximal (Follow The Regularized Leader)FOBOS, RDA, FTRL-Proxi
阅读全文
摘要:目录概符号说明经验性的结果Noisy Model Learning代码 Chen H., Wang J., Shah A., Tao R., Wei H., Xie X., Sugiyama M. and Raj B. Understanding and mitigating the label n
阅读全文
摘要:目录概SinkHorn operator Mean G. E., Belanger D., Linderman C. and Snoek J. Learning latent permutations with gumbel-sinkhorn networks. ICLR, 2018. 概 本文提出
阅读全文
摘要:目录概Jaccard Index推广到 multisets推广到 Multiple sets Costa L. Further generalizations of the jaccard index. 2021. 概 本文介绍了 Jaccard Index (Jaccard Similarity)
阅读全文
摘要:目录定义性质旋转矩阵的一般重参数化:正交矩阵的一般重参数化:代码 [1] Gallier J. Remarks on the Cayley Representation of Orthogonal Matrices and on Perturbing the Diagonal of a Matrix
阅读全文
摘要:目录概ListNetPermutation ProbabilityTop-k ProbabilityListMLE Cao Z., Qin T., Liu T., Tsai M. and Li H. Learning to rank: from pairwise approach to listwi
阅读全文