List boundary discards one token in the context window #10

jonnybluesman · 2021-09-09T17:15:37Z

word2vec-pytorch/word2vec/data_reader.py

Line 102 in 36b93a5

enumerate(word_ids[max(i - boundary, 0):i + boundary]) if u != v]

I think i + boundary should include a + 1 to make it inclusive, otherwise the right context takes 1 token less in the resulting skipgrams.

The text was updated successfully, but these errors were encountered:

francesco-mollica · 2021-11-16T12:21:21Z

Why creating a boundary like this:
boundary = np.random.randint(1, self.window_size)

and not use simply the window_size value instead of boundary?

jonnybluesman · 2021-11-16T12:51:14Z

Why creating a boundary like this: boundary = np.random.randint(1, self.window_size)

and not use simply the window_size value instead of boundary?

Because with the random function you are implicitly giving "more importance" to the closest words in the neighbourhood, by creating more data with those "close" tokens.

francesco-mollica · 2021-11-16T13:12:37Z

You wanna say explicitly? So why not reduce the window size and fix it instead of using a boundary? This use of boundary is in other implementations?
Just to be clear, this implementation uses a random window size that is in the range (1, window_size)? Is it correct that the boundary changes with each new sentence?
thanks for the quick response!

francesco-mollica · 2021-11-20T12:50:04Z

Does the concept of boundary can be apply to cbow-style? I implemented it and i'm in stuck because the size of the context varies from phrase to phrase as boundary changes as well and put it all in a unique tensor create me big problems!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List boundary discards one token in the context window #10

List boundary discards one token in the context window #10

jonnybluesman commented Sep 9, 2021

francesco-mollica commented Nov 16, 2021

jonnybluesman commented Nov 16, 2021

francesco-mollica commented Nov 16, 2021 •

edited

Loading

francesco-mollica commented Nov 20, 2021 •

edited

Loading

List boundary discards one token in the context window #10

List boundary discards one token in the context window #10

Comments

jonnybluesman commented Sep 9, 2021

francesco-mollica commented Nov 16, 2021

jonnybluesman commented Nov 16, 2021

francesco-mollica commented Nov 16, 2021 • edited Loading

francesco-mollica commented Nov 20, 2021 • edited Loading

francesco-mollica commented Nov 16, 2021 •

edited

Loading

francesco-mollica commented Nov 20, 2021 •

edited

Loading