Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: expressions and particles #255

Open
Kala-J opened this issue Apr 28, 2020 · 2 comments
Open

Idea: expressions and particles #255

Kala-J opened this issue Apr 28, 2020 · 2 comments

Comments

@Kala-J
Copy link

Kala-J commented Apr 28, 2020

This is similar to #169 which was closed, but I think if you expand the concept to other particles used in expressions it is something worth considering. Expressions and the particles used in them are somewhat fluid and not always present in the dictionary in the way that a user might encounter them. There are two primary cases: one where particles are simply omitted, which is very common in dialogue, and another where different particles are interchangeable with no difference in meaning.

For example, I encountered 頭抱える while reading, and normal parsing does not reveal what it actually means; since it is an expression (頭を抱える) with the particle dropped. Another example; I found 無駄口叩く but it is actually in the dictionary as 無駄口を叩く. Perhaps logic could be added where, after identifying and selecting a range of text, 'test' particles are inserted after it to see if we can grab a larger block as part of an expression.

For the second case (particles that are interchangeable), I find that の and が are the main culprits. Another real-world example; I encountered 趣がある which is not in the dictionary. However, 趣のある is. Other expressions I've found actually do have both forms in the dictionary; compare 人当たりの良い and 人当たりが良い, 我が強い and 我の強い. I have throughout my reading encountered numerous expressions like this, some that have both entries and some that don't. For this case, perhaps logic could be added to try changing the particle into a different one.

Of course, this is only an idea that may not be feasible! In particular, it may be necessary to restrict it to certain kinds of particles (like only testing for a dropped を, or only trying to change の and が). Since at this point it would be basically offering a 'best guess' to the user rather than parsing exactly what is in the text perhaps it could be a toggled option?

Edit: an example of the first case where an expression is in the dictionary with を and also without -- 間髪を入れず and 間髪入れず

@birtles
Copy link
Member

birtles commented Apr 29, 2020

That's an interesting idea. Substituting が for の (and vice versa) is definitely feasible and certainly very common.

Adding in a dropped を is, like you mention, more tricky. I suppose if the longest match was a noun, we could try adding を and seeing if we get a longer match.

I think this might be a bit easier once I've moved the database over to indexeddb since hopefully the lookup time will be a little faster (and the grammatical information pre-parsed).

@nicolasmaia
Copy link

Another idea would be to replace Kansai's へん in expressions like かもしれへん to ない.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants