PAT-1063 Set Similarity （set集合）

1063. Set Similarity

Given two sets of integers, the similarity of the sets is defined to be N_c/N_t*100%, where N_c is the number of distinct common numbers shared by the two sets, and N_t is the total number of distinct numbers in the two sets. Your job is to calculate the similarity of any given pair of sets.

Input Specification:

Each input file contains one test case. Each case first gives a positive integer N (<=50) which is the total number of sets. Then N lines follow, each gives a set with a positive M (<=10⁴) and followed by M integers in the range [0, 10⁹]. After the input of sets, a positive integer K (<=2000) is given, followed by K lines of queries. Each query gives a pair of set numbers (the sets are numbered from 1 to N). All the numbers in a line are separated by a space.

Output Specification:

For each query, print in one line the similarity of the sets, in the percentage form accurate up to 1 decimal place.

Sample Input:

3
3 99 87 101
4 87 101 5 87
7 99 101 18 5 135 18 99
2
1 2
1 3

Sample Output:

50.0%
33.3%

题目大意：输入n个集合，每个集合中有若干数，现在需要做k次查询，每次给出要比较的两个集合，要求计算出相似度 = Nc / Nt * 100%，其中Nc是两个集合的交集的大小，Nt是两个集合并集的大小。

主要思想：考虑到每一个集合中可能存在重复的数，而且需要做大量的查找操作（找并集时对集合a的每个元素判断是否存在于集合b），很容易想到stl库中的set容器，因为set中不存在重复元素，而且查找操作很快。对于每次查找操作，设置初始值nc为 0， nt 为集合 b 的大小，集合 a 的每个元素，如果存在于集合 b，则 nc+1；如果不存在，则 nt+1（注意：如果用两集合大小之和减去两集合交集大小来计算 nt，可能会出现超时）。

#pragma warning(disable: 4786)
#include <cstdio>
#include <vector>
#include <set>
using namespace std;
int main(void) {
    int n, i, j;
    
    scanf("%d", &n);
    vector<set<int> > vec(n);
    set<int>::iterator iter; 
    int m, num;
    for (i = 0; i < n; i++) {
        scanf("%d", &m);
        for (j = 0; j < m; j++) {
            scanf("%d", &num);
            vec[i].insert(num);
        }        
    }
    int k, a, b;
    scanf("%d", &k);
    for (i = 0; i < k; i++) { 
        scanf("%d%d", &a, &b);
        int nc = 0, nt = vec[b-1].size();
        for (iter = vec[a-1].begin(); iter != vec[a-1].end(); iter++) {
            if (vec[b-1].count(*iter))                  //if (vec[b-1].find(*iter) != vec[b-1].end())
                nc++;
			else		
				nt++;
        }
	//  nt = vec[a-1].size() + vec[b-1].size() - nc;	//这样计算可能会超时
        printf("%.1f%%\n", nc * 1.0 / nt * 100);       
    }
    
    return 0;
}

爬虫中的set容器解决这个问题就更容易了，& 和 | 分别对应交集和并集，唯一不足的就是有一个用例超时了。

n = int(input())
L1 = []
for i in range(n):
    st = input()
    L2 = st.split(' ')
    L1.append(set(L2[1:]))
k = int(input())
for i in range(k):
    pair = input().split(' ')
    x, y = int(pair[0]), int(pair[1])
    similarity = len(L1[x-1] & L1[y-1]) / len(L1[x-1] | L1[y-1]) * 100
    print('%.1f%%' % (similarity)

posted @ 2017-09-12 21:34 zhayujie 阅读(135) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Physicaloser

个人主页：https://zhayujie.com

PAT-1063 Set Similarity （set集合）

公告