Skip to content

Commit

Permalink
add output to extraction tool
Browse files Browse the repository at this point in the history
  • Loading branch information
labusch committed Oct 25, 2021
1 parent 321a8ce commit 07102f9
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
5 changes: 4 additions & 1 deletion .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion qurator/topic_modeling/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,11 @@ def read_linking_table(con, min_proba, min_surface_len, filter_type, min_occuren

tmp = df.drop_duplicates(subset=['ppn', 'wikidata'])
vc = tmp.wikidata.value_counts()
min_count = int(len(tmp.ppn.unique()) / 100.0 * min_occurences)

num_ppns = len(tmp.ppn.unique())
print("Number of PPNs: {}".format(num_ppns))

min_count = int(num_ppns / 100.0 * min_occurences)

print('Removing entities that occur less than {} times...'.format(min_count))
df = df.loc[df.wikidata.isin(vc.loc[vc >= min_count].index)]
Expand Down

0 comments on commit 07102f9

Please sign in to comment.