Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add confidences to word pair representation #156

Merged
merged 6 commits into from
Feb 24, 2025
Merged

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Feb 20, 2025

This change is Reviewable

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


machine/corpora/aligned_word_pair.py line 36 at r1 (raw file):

                        alignment_score = float(token[second_colon_index + 1 : len(token)])
                    except:
                        pass

We should throw if the numbers are invalid.


machine/corpora/aligned_word_pair.py line 75 at r1 (raw file):

        target_index = "NULL" if self.target_index < 0 else str(self.target_index)
        repr = f"{source_index}-{target_index}"
        if include_scores and self.translation_score >= 0 or self.alignment_score >= 0:

I would prefer not to change the current behavior. There might be existing code that is dependent on it.

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.49%. Comparing base (7877bc7) to head (a3094e2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #156      +/-   ##
==========================================
+ Coverage   88.38%   88.49%   +0.10%     
==========================================
  Files         275      276       +1     
  Lines       16389    16415      +26     
==========================================
+ Hits        14485    14526      +41     
+ Misses       1904     1889      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


machine/corpora/aligned_word_pair.py line 36 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should throw if the numbers are invalid.

Just like this? Or throw a new exception?


machine/corpora/aligned_word_pair.py line 75 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would prefer not to change the current behavior. There might be existing code that is dependent on it.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)


machine/corpora/aligned_word_pair.py line 33 at r3 (raw file):

                if second_colon_index > 0:
                    translation_score = float(token[colon_index + 1 : second_colon_index])
                    alignment_score = float(token[second_colon_index + 1 : len(token)])

We should handle the case where the alignment score isn't specified.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


machine/corpora/aligned_word_pair.py line 33 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should handle the case where the alignment score isn't specified.

Done.

@johnml1135
Copy link
Collaborator

tests/corpora/test_aligned_word_pair.py line 10 at r4 (raw file):

    wps = list(AlignedWordPair.from_string("1-0:0.111111"))
    assert len(wps) == 1
    assert wps[0].translation_score == 0.111111

Can you verify both alignment and word scores for both?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)

@johnml1135
Copy link
Collaborator

tests/corpora/test_aligned_word_pair.py line 10 at r4 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Can you verify both alignment and word scores for both?

Would alignment_score be -1 here? Would it be null?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @johnml1135)


tests/corpora/test_aligned_word_pair.py line 10 at r4 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Would alignment_score be -1 here? Would it be null?

Done.

@Enkidu93 Enkidu93 force-pushed the add_word_pair_confidences branch from a9203c5 to a3094e2 Compare February 24, 2025 19:49
@Enkidu93 Enkidu93 merged commit 546fb1c into main Feb 24, 2025
13 of 14 checks passed
@Enkidu93 Enkidu93 deleted the add_word_pair_confidences branch February 24, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants