Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复了部分bug #49

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

修复了部分bug #49

wants to merge 7 commits into from

Conversation

linshouyi
Copy link

No description provided.

linshouyi and others added 7 commits January 19, 2019 14:59
在源项目中做了如下处理:
1.确保语料文本文件是UTF-8编码,附带了语料corpus.txt,训练模型文件model.bin因太大(120M)没有提交,需要自己本地训练(LearnTest.class),训练时间大概几十分钟。
2.源作者提供的语料是用制表符切割的词组,但是代码是根据空格切割,需要将制表符全部替换成空格。或者修改代码:Learn.java
271行,修改成String[] split = temp.split("[\s ]+");支持同时出现多个半角或全角空格,或制表符分隔。
3.发现一个bug
Word2Vec中2个distance方法中,min = result.last().score; 应该放在resultSize <
result.size()块里。
只有当结果数已经大于resultSize,才能将最后一个得分数赋予min,作为以后最小允许得分。结果数不大于resultSize不能赋予给min。
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant