[Big Data] Week 3 (Basic)
Question 1
Your Answer | Score | Explanation | |
---|---|---|---|
The fraction of 1's is 79/99. | |||
The fraction of 1's is 1-e-20/99. | Correct | 1.00 | |
The fraction of 1's is 20/99. | |||
The fraction of 0's is 20/99. | |||
Total | 1.00 / 1.00 |
Question 2
The method of Section 4.2.4 will be used. User ID's will be hashed to a bucket number, from 0 to 999,999. At all times, there will be a threshold t such that the 100-byte records for all the users whose ID's hash to t or less will be retained, and other users' records will not be retained. You may assume that each user generates emails at exactly the same rate as other users. As a function of n, the number of emails in the stream so far, what should the threshold t be in order that the selected records will not exceed the 1010 bytes available to store records? From the list below, identify the true statement about a value of n and its value of t.
Your Answer | Score | Explanation | |
---|---|---|---|
n = 109; t = 999 | |||
n = 1012; t = 999 | |||
n = 1013; t = 9 | Correct | 1.00 | |
n = 1011; t = 1000 | |||
Total | 1.00 / 1.00 |
From the problem we know that there are currently N emails in the stream and 10^6 buckets and we can thus calculate the email capacity of each bucket as N/10^6 emails.
We also know that each email needs 100 bytes, hence the total space requirement per bucket is (N/10^6)∗100 bytes
Let's consider the worst case scenario where all the N emails in the stream have to be retained.
Let's assume that the total number of buckets we would need for this scenario is ( t+1 ) since we started the bucket count from 0.
So (space requirement per bucket) ∗ (total number of buckets) <= Total available space
(N/10^6)∗100 ∗ ( t + 1) <= 1010
Further simplification gives
t <= ( 10^14 / N ) -1
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具