如何在保存的html文件中查找字符串的出现次数?

我有一个已保存的html文件,我想从这个文件中找出某个字符串被找到的次数。例如

string= 'Beautiful days'
text = "those beautiful days were unforgettable. I wish every day was a beautiful day"

output expected = 2 (beautiful days, beautiful day)

尝试了以下方法:我试过使用spacy,但无法做到,谁能告诉我这其中的逻辑?

解决方案:

你可以使用 stemmer。它可能是多余的,但它也能找到最接近的词。

import nltk
nltk.download('punkt')

from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 

ps = PorterStemmer() 

sentence = "those beautiful days were unforgettable. I wish every day was a beautiful day"
words = word_tokenize(sentence) 
sentence = ""
for w in words: 
    sentence += (ps.stem(w.lower()) + " ")
query = 'Beautiful days' 
words = word_tokenize(query) 
query = ""
for w in words: 
    query += (ps.stem(w.lower()) + " ")
print(sentence)
print(query)
print(sentence.count(query))
those beauti day were unforgett . i wish everi day wa a beauti day 
beauti day 
2

给TA打赏
共{{data.count}}人
人已打赏
未分类

为什么Typescript中的集合在firebase中更新两次?

2022-9-9 8:48:21

未分类

有没有什么快速的公式可以求出N个点给出的一般四边形(六面体)的最大面积?

2022-9-9 8:59:16

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索