leetcode Most Common Word——就是在考察自己实现split

819. Most Common Word

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words.  It is guaranteed there is at least one word that isn't banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation.  Words in the paragraph are not case sensitive.  The answer is in lowercase.

 

Example:

Input: 
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class Solution(object):
    def split_word(self, paragraph):
        sep = set("[!?',;.] ")
        ans = []
        pos = 0
        for i,c in enumerate(paragraph):
            if c in sep:
                word = paragraph[pos:i]
                if word:
                    ans.append(word)
                pos = i+1
        word = paragraph[pos:]
        if word:
            ans.append(word)
        return ans
 
    def mostCommonWord(self, paragraph, banned):
        """
        :type paragraph: str
        :type banned: List[str]
        :rtype: str
        """
        m = {}
        paragraph = paragraph.lower()
        for w in self.split_word(paragraph):           
            if w in banned: continue
            m[w] = m.get(w, 0)+1
        ans = ""
        max_cnt = 0
        for w,c in m.items():
            if c > max_cnt:
                ans = w
                max_cnt = c
        return ans

使用字符串替换也可以实现,就是找到哪些字符应该直接remove掉,然后再分割:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public static String mostCommonWord(String paragraph, String[] banned) {
        String[] splitArr = paragraph.replaceAll("[!?',;.]","").toLowerCase().split(" ");
        HashMap<String, Integer> map = new HashMap<>();
        List<String> bannedList = Arrays.asList(banned);
        for(String str: splitArr) {
            if(!bannedList.contains(str)) {
                map.put(str, map.getOrDefault(str, 0) + 1);
            }
        }
 
        int currentMax = 0;
        String res = "";
        for(String key: map.keySet()) {
            res = map.get(key) >  currentMax ? key : res;
            currentMax = map.get(key);
        }
        return res;
    }

还有使用内置python的正则表达式:

1
2
3
4
5
6
7
8
9
10
Python:
Thanks to @sirxudi I change one line from
words = re.sub(r'[^a-zA-Z]', ' ', p).lower().split()
to
words = re.findall(r'\w+', p.lower())
 
    def mostCommonWord(self, p, banned):
        ban = set(banned)
        words = re.findall(r'\w+', p.lower())
        return collections.Counter(w for w in words if w not in ban).most_common(1)[0][0]

 

posted @   bonelee  阅读(361)  评论(0编辑  收藏  举报
编辑推荐:
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
历史上的今天:
2016-12-09 英语中逗号作用
2016-12-09 splunk LB和scale(根本在于分布式扩展index,search)
2016-12-09 Splunk Enterprise architecture——转发器本质上是日志收集client附加负载均衡,indexer是分布式索引,外加一个集中式管理协调的中心节点
2016-12-09 c中gets函数使用可能导致缓冲区溢出
2016-12-09 日志易——中国版的splunk
点击右上角即可分享
微信分享提示