摘要: Locality sensitive hashing — LSH explained The problem of finding duplicate documents in a list may look like a simple task — use a hash table, and th 阅读全文
posted @ 2020-07-29 00:25 HuangB2ydjm 阅读(163) 评论(0) 推荐(0) 编辑
摘要: Motivation The task of finding nearest neighbours is very common. You can think of applications like finding duplicate or similar documents, audio/vid 阅读全文
posted @ 2020-07-28 23:04 HuangB2ydjm 阅读(210) 评论(0) 推荐(0) 编辑
摘要: 最小哈希签名(MinHash)简述 最小哈希签名(minhashing signature)解决的问题是,如何用一个哈希方法来对一个集合(集合大小为n)中的子集进行保留相似度的映射(使他在内存中占用的字节数尽可能的少)。 其实哈希本身并不算难,难的是怎么保留两个子集的相似度的信息。所谓保留相似度,就 阅读全文
posted @ 2020-07-27 22:57 HuangB2ydjm 阅读(244) 评论(0) 推荐(0) 编辑
摘要: Hashing Passwords – Python Cryptography Examples Building a from-scratch server or using a lightweight framework is empowering. With that power comes 阅读全文
posted @ 2020-07-27 20:51 HuangB2ydjm 阅读(132) 评论(0) 推荐(0) 编辑
摘要: (Very) Basic Intro to Hash Functions (SHA-256, MD5, etc) Why Use A Hash Function? Hash functions are used all over the internet in order to securely s 阅读全文
posted @ 2020-07-27 20:10 HuangB2ydjm 阅读(148) 评论(0) 推荐(0) 编辑
摘要: shingling算法用于计算两个文档的相似度,例如,用于网页去重。维基百科对w-shingling的定义如下: In natural language processing a w-shingling is a set of unique "shingles"—contiguous subsequ 阅读全文
posted @ 2020-07-25 12:50 HuangB2ydjm 阅读(272) 评论(0) 推荐(0) 编辑
摘要: 考研计算机组成原理总结的零星一点 阅读全文
posted @ 2020-07-20 00:26 HuangB2ydjm 阅读(164) 评论(0) 推荐(0) 编辑
摘要: 统计、语言综合 阅读全文
posted @ 2020-07-19 16:12 HuangB2ydjm 阅读(204) 评论(0) 推荐(0) 编辑
摘要: 计算机问题的书摘 阅读全文
posted @ 2020-07-19 10:16 HuangB2ydjm 阅读(201) 评论(0) 推荐(0) 编辑
摘要: 历史以及军旅题材电影的观后感 阅读全文
posted @ 2020-07-18 22:03 HuangB2ydjm 阅读(121) 评论(0) 推荐(0) 编辑