TF-IDF抽取关键词是根据哪种分词模式分词的？ #1022

haixingpai · 2024-08-01T00:27:33Z

是不是根据默认模式（精确模式）？如何自己修改成全模式？

manother · 2024-08-01T00:29:10Z

邮件已收到~

haixingpai · 2024-08-01T13:55:39Z

邮件已收到~

还有一个问题，我测试了一句话的分词:“我喜欢看电视，不喜欢看电影”。直接默认模式分词以后会分出：我，喜欢, 看电视, ，不, 喜欢, 看, 电影这几个词。但是如果用TF-IDF找关键词用topK=None的模式也就是不设定关键词个数，显示的分词则不会包含“我”，“不” 这种单个字。是什么原因呢。怎样能让TF-IDF找关键词生成的词表结果也包含单个字？

wfs420100 · 2024-09-24T09:26:53Z

邮件已收到~

还有一个问题，我测试了一句话的分词:“我喜欢看电视，不喜欢看电影”。直接默认模式分词以后会分出：我，喜欢, 看电视, ，不, 喜欢, 看, 电影这几个词。但是如果用TF-IDF找关键词用topK=None的模式也就是不设定关键词个数，显示的分词则不会包含“我”，“不” 这种单个字。是什么原因呢。怎样能让TF-IDF找关键词生成的词表结果也包含单个字？

TFIDF在进行关键词提取时，会对长度小于2的词进行过滤

# jieba.analyes.tfidf.py

class TFIDF(KeywordExtractor):
    ...
    def extract_tags(self, sentence, topK=20, withWeight=False, allowPOS=(), withFlag=False):
            ...
            if len(wc.strip()) < 2 or wc.lower() in self.stop_words:
                continue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF-IDF抽取关键词是根据哪种分词模式分词的？ #1022

TF-IDF抽取关键词是根据哪种分词模式分词的？ #1022

haixingpai commented Aug 1, 2024

manother commented Aug 1, 2024 via email

haixingpai commented Aug 1, 2024

wfs420100 commented Sep 24, 2024

TF-IDF抽取关键词是根据哪种分词模式分词的？ #1022

TF-IDF抽取关键词是根据哪种分词模式分词的？ #1022

Comments

haixingpai commented Aug 1, 2024

manother commented Aug 1, 2024 via email

haixingpai commented Aug 1, 2024

wfs420100 commented Sep 24, 2024