[Big Data] Week 2: LSH (Basic)
Question 1
Your Answer | Score | Explanation | |
---|---|---|---|
There are 3 pairs at distance 1. | |||
There is 1 pair at distance 4. | Correct | 1.00 | |
There are 4 pairs at distance 5. | |||
There is 1 pair at distance 3. | |||
Total | 1.00 / 1.00 |
Question 2
C1 | C2 | C3 | C4 | |
---|---|---|---|---|
R1 | 0 | 1 | 1 | 0 |
R2 | 1 | 0 | 1 | 1 |
R3 | 0 | 1 | 0 | 1 |
R4 | 0 | 0 | 1 | 0 |
R5 | 1 | 0 | 1 | 0 |
R6 | 0 | 1 | 0 | 0 |
Perform a minhashing of the data, with the order of rows: R4, R6, R1, R3, R5, R2. Which of the following is the correct minhash value of the stated column? Note: we give the minhash value in terms of the original name of the row, rather than the order of the row in the permutation. These two schemes are equivalent, since we only care whether hash values for two columns are equal, not what their actual values are.
Your Answer | Score | Explanation | |
---|---|---|---|
The minhash value for C1 is R6 | |||
The minhash value for C3 is R4 | Correct | 1.00 | |
The minhash value for C1 is R2 | |||
The minhash value for C3 is R5 | |||
Total | 1.00 / 1.00 |
Question 3
C1 | C2 | C3 | C4 | C5 | C6 | C7 |
---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | 2 | 5 | 4 |
2 | 3 | 4 | 2 | 3 | 2 | 2 |
3 | 1 | 2 | 3 | 1 | 3 | 2 |
4 | 1 | 3 | 1 | 2 | 4 | 4 |
5 | 2 | 5 | 1 | 1 | 5 | 1 |
6 | 1 | 6 | 4 | 1 | 1 | 4 |
Suppose we use locality-sensitive hashing with three bands of two rows each. Assume there are enough buckets available that the hash function for each band can be the identity function (i.e., columns hash to the same bucket if and only if they are identical in the band). Find all the candidate pairs, and then identify one of them in the list below.
Your Answer | Score | Explanation | |
---|---|---|---|
C2 and C3 | |||
C2 and C5 | Correct | 1.00 | |
C4 and C5 | |||
C2 and C7 | |||
Total | 1.00 / 1.00 |
Question 4
ABRACADABRA
and also for the "document":
BRICABRAC
Answer the following questions:
- How many 2-shingles does ABRACADABRA have?
- How many 2-shingles does BRICABRAC have?
- How many 2-shingles do they have in common?
- What is the Jaccard similarity between the two documents"?
Then, find the true statement in the list below.
Your Answer | Score | Explanation | |
---|---|---|---|
ABRACADABRA has 10 2-shingles. | |||
ABRACADABRA has 9 2-shingles. | |||
There are 5 shingles in common. | Correct | 1.00 | |
There are 4 shingles in common. | |||
Total |
Question 5
Your Answer | Score | Explanation | |
---|---|---|---|
(53,15) | Correct | 1.00 | |
(58,13) | |||
(52,13) | |||
(54,8) | |||
Total | 1.00 / 1.00 |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具