According to wikipedia, Eliza is an early natural language processing computer program simulating a conversation. I attempted to recreate the intended behaviors of Eliza so that it'll be able to communicate in Slovak. The implementation is based on this paper and this tutorial.
Every line of code was developed and tested using SWI - Prolog
version 8.0.3 64-bit
. All the pieces of code are intended to be used all together and they're in separate files only for better readibility.
-
The main part, the conversation tool is in predicate
eliza/0
. It holds a conversation with user in Slovak until it spots line with single word :keyword
"dovidenia" or until user has used obscenities more than 3 times. -
Other useful tools include:
- predicate
conj(+Verb, -Person, -Time)
which returns the conjugation ofVerb
- (
Time
=now
,Person
in {sg1
,sg2
,sg3
,pl1
,pl2
,pl3
}) - (
Time
=past
,Person
in {sg
,pl
}) - (
Time
=neuter
,Person
=neuter
)
- (
- predicate
read_atomics/1
which reads the standard input, divides it by whitespaces, lowercases it and returns as alist
ofatoms
. - predicate
comment/1
which writes its argument, which has to belist
ofatoms
to standard output
- predicate
Example run from paper
Dobry den porozpravajte mi o vasom probleme!
> vsetci muzi su rovnaki
Akym sposobom?
> stale nas s niecim otravuju
Viete vymysliet nejaky konkretny priklad?
> napriklad, moj priatel ma prinutil prist sem
Vas priatel vas prinutil prist sem?
> hovori, ze som stale depresivna
Je mi luto, ked pocujem, ze ste depresivna
> je to pravda, som nestastna
Myslite si, ze ked ste nestastna tak vam pomoze rozpravat sa so mnou?
> potrebujem nejaku pomoc, to je iste
Preco chcete nejaku pomoc?
> pretoze sa chcem pohodnut s mojou mamou
Rozpravate sa casto s mamou?
> ano, moja mama sa o mna stara
Je este niekto vo vasej rodine kto sa o vas stara?
> moj otec
Vas otec?
> obcas sa podobas na mojho otca, Eliza
Vidite nejake spolocne vlastnosti?
> nie si velmi agresivna ale myslim, ze len nechces, aby som si to vsimla
Hovorite nie len preto aby ste zneli nastvane?
> dovidenia
Dakujem za podnetny rozhovor. Dufam, ze sa este niekedy stretneme.
Dobry den porozpravajte mi o vasom probleme!
> citim sa uplne strasne
Povedzte mi viac o tychto pocitoch
> citim sa ako na horskej drahe
Ako casto sa citite ako na horskej drahe?
> citim si nohy
Uzivate si ked si citite nohy?
> citim sa ako v deliriu
Aky pocit vam to pripomina?
> dovidenia
Dakujem za podnetny rozhovor. Dufam, ze sa este niekedy stretneme.
Dobry den porozpravajte mi o vasom probleme!
> jebat tento svet
Poprosim vas nenadavajte v tejto konverzacii
> ale chod do pice
Mozno vam to je prirodzene, ale upozornujem vas nenadavajte mi tu
> vies co si ? obycajna svina !
Posledny krat vas ziadam nenadavajte mi tu!
> ty kurva, co ma upozornujes ?
Chod do prdele ty drbo!
I'll show workflow of predicate eliza/0
, which is the main component of this semestral project.
-
At the start of the predicates workflow Eliza prompts user to start conversation with script from
start_script/1
, and prints it to standard output withcomment/1
(start_script
is simple fact containing beginning message of conversation,comment
printslist
of atoms gotten fromstart_script
to standard output) -
read_atomics/1
reads the standard input and breaks it intolist
ofatoms
which is processed byget_comment
-
traverse_input_stem_lemm(+User_input, -Stemmed_lemmed)
:- Since in Slovak there are many different synonyms, conjugations and declensions at first we need to :
- unify some kind of words (
stale
->vzdy
{every_time
->always
}) - transform possessives (
moj
->tvoj
{my
->your
}) so that we won't have to worry about these later in processing of input (because when user talks about something belonging to him, we want to query it, so we need these transformations) - transform conjugations (
robim
->robite
{I do
->you do
}) for the same reasons as above
- unify some kind of words (
- Since in Slovak there are many different synonyms, conjugations and declensions at first we need to :
get_scripts_matching_keywords(+Stemmed_input, -Scripts)
- In this phase we traverse input for the second time and use all the conditional transformations to find the keywords
- "Conditional Transformation" : concept of matching when we don't actually change user input, but we look at it as on similar keyword
- for example when user says
I'm depressed
we should look at it as onI'm sad
because the meanings are similar, although we don't transformdepressed
->sad
because the meanings aren't same
- for example when user says
- now there are two possibilites how to continue based on output of
get_scripts_matching_keywords/2
, they are basically the same, so I'll unify their description even though they are in separate code scopes get_initial_uninformed_memory_comment(-Output, -Keyword, -Pattern_index, -Priority)
- get last entry inmemory/1
, fail if there is only empty list inmemory/1
, thanks to this predicate we can get answer/query to user input even though we failed to find keywords in it (How doesmemory/1
work ?)- if
get_initial_uninformed_memory_comment/4
fails we callget_initial_uninformed_comment(-Output, -Keyword, -Pattern_index, -Priority)
, which stores last used entry from none script in simple factmemory_current_action(-Keyword, -Pattern_index, -Action_index)
, again, thanks to this predicate we can get answer/query to user input even though we failed to find keywords in it (How doesmemory_current_action/3
work ?) - with aid of
find_best_from_scripts/5
traverses all the scripts and makes use of following predicate get_informed_comment(+User_input, +Script, -Action, -Keyword, -Pattern_index)
- this predicate uses many hidden matching gems which are described here. It tries to match user input to any of the patterns of theScript
and unify the result with the actions of pattern, hence it exploits script structure. If it succeeds, it returnsAction
which has to be done in script search,Keyword
of matched script andPattern_index
of matched pattern.- after having found the answer to user input,
get_comment/2
returns it asOutput
-
print
Output
withcomment/1
- scripts( script (
keyword
(actual_keyword
,actual_keyword_priority
),list_of_patterns
)) - each
actual_keyword
is followed by alist_of_patterns
it may appear in each pattern has following structure : pattern( matched (to_be_matched
), actions(list_of_actions
)) each of these is described below
-
the hidden gem of the implementation
-
because of the structure of patterns
matched keyword
is automatically unified with output inactions
-
matching is done with aid of predicate
match(+User_input, +To_be_matched)
- we distinct 3 types of words to be matched :
- atom_is_to_be_matched -> we check all the declensions of word from user input and if any of them is equal to atom, we proceed to the rest of User_input and Pattern_input, otherwise we fail
- class_to_be_matched - > we got a list of predicates aiding us to match different kinds of words, which are called with predicate
call(Predicate, Arg1,Arg2...)
:- synonyms (for example
sad
,happy
...) which have following structuresynonym(Word)
which checks if word from the user input is declension of any of the synonyms and unifies it with Word (that means, that we can mentionWord
in actions and it'll be unified with user_input) - conjugations (for example
dream
,remember
) which have the following structureconjugation(+Verb, +Time, +Number, -Base)
- successes if Verb has defined Time and Number
- synonyms (for example
- variable_to_be_matched - same concept as the above
- we distinct 3 types of words to be matched :
-
therefore when the user_input is matched by any of the patterns the results of matching are unified with actions of the pattern
- response(
list_of_atoms
) ->list_of_atoms
is Eliza's comment of the user input, we can conclude searching for the output newkey
-> proceed to another keyword, don't look for anything in the current pattern anymoreequivalence(Keyword)
-> the keyword in the current pattern is equivalent toKeyword
proceed to this keyword in looking for Eliza's comment
- sometimes Eliza can't find any keyword in the
user_input
, to be able to simulate real conversation it is good to remember some of the user_phrases - especially we remember each phrase matched with keyword
your
and keywordfamily
- how does "remembering" work ?
- dynamic predicate
memory(List_of_matched_responses)
- each time
your
orfamily
keyword is encountered we use predicateget_random_memory_pattern(-Pattern, +Keyword_encountered)
which based onKeyword_encountered
(which can beyour
orfamily
) returns some memory pattern, and this pattern is then matched with user_input and the response is appended to end ofList_of_matched_responses
thanks to predicateappend_to_memory_list
- dynamic predicate
- what happens in
get_initial_uninformed_memory_comment
?- we access the first element in
List_of_matched_responses
and return it withkeyword
memory
- we access the first element in
- what happens if we fail to find keyword ?
- we remove response from the head of memory in order to not repeat previous answers with aid of
remove_head_memory_list
- we remove response from the head of memory in order to not repeat previous answers with aid of
remove_head_memory_list
makes use ofretract
append_to_memory_list
makes use ofasserta
- in order not to repeat answers we have dynamic predicate
memory_current_action(?Keyword, ?Pattern_index, ?Action_index)
- each time some action is selected as the answer for
user_input
we use predicateassert_next_action
which incrementsAction_index
modulonumber_of_actions
- again we make use of
retract
andasserta
- its considerably harder to detect keywords in Slovak than in English mainly because of declension of nouns and conjugation of verbs
The scripts below are examples of transformations applied to input text.
During the implementation of Eliza I've chosen 2 basic approaches to detection of word 'sorry'. Since there are many ways of saying 'sorry' in Slovak I definitely needed to transform its synonyms into 'sorry'. E.G. (I'm sorry = I apologize
) <=> (prepáč = ospravedlňujem sa
)
-
-
simple stemming :
prepac = prepac, prepacte = prepac
-
simple lemmatization :
ospravedlnujem = prepac, ospravedlnte = prepac, osprave... = prepac
-
- Keyword 'sorry' doesn't have a high priority. When the user says
my sister apologized for her actions
the keyword isn't(apologized = sorry)
butmy
. Because of lemmatization Eliza would obtain sentencemy sister sorry for her actions
which is an obvious nonsense.
- Keyword 'sorry' doesn't have a high priority. When the user says
-
-
-
no stemming and no lemmatization in preprocessing
-
lemmatization and stemming only in keyword_matching phase
-
- Eliza obtains sentence in the original form, so she can infer transformation in answering_phase. There are many problems in conjugation and declension transformations, but not in the 'sorry' script.
-
My implementation of Eliza is hugely inspired by works of @bartosz-witkowski . In his implementation he introduced class(family)
, and worked with it in your
- script. This approach might be good in English although it's inconvenient in Slovak.
Problems :
- when an English person talks about a member of his family he says
my family member
. In Slovak there isn't any such convention, so Slovak talks aboutsister
, with meaningmy sister
. Therefore I've introduced thefamily script
called after detection of any family member. - in Slovak there are 3 grammatical genders : feminine, masculine and neuter
- it means that we use different declensions of adjectives with words (
father
,brother
) and with words (mother
,sister
)
- it means that we use different declensions of adjectives with words (
Approaches :
-
class(family)
matched inyour
script- it's affected by all the problems described above
-
-
class(family)
and different script for all its members implemented asscripts(Script) :- family(Member), Script=script(keyword(Member, 2),...)
-
- While I've resolved the problem with
your
in front offamily member
another problem emerged :findall
predicate has to go through many more scripts inkeyword_detection phase
- This doesn't handle any declensions at all.
- While I've resolved the problem with
-
- Introduce
family
script and handlekeyword_detection
phase similarly as insorry_script
- Introduce
-
-
-
family
keyword andfamily_masculine
andfamily_feminine
declensiond handling - this means that there are 2 possibleadjective declensions
. Example of communication :
> otec nevie programovat
(my father can't code)
Vas otec?
(your father ?) <= in masculine
> ani mama nevie programovat
(my mother neither)
Vasa mama ?
(your mather ?) <= in feminine -
- recognition of grammatical genders in grammatical cases different from nominative
-
-
-
introduction of check of grammatical cases in
eliza_language_utils.pl
, after detection of anyfamily member
we also detect its grammatical case with aid ofgram_case_masculine_sg
andgram_case_feminine_sg
-
introduction of patterns handling matched grammatical cases through
class(family_feminine, Word, Case)
andclass(family_masculine, Word, Case)
-
many times we need to force presence of another word after the keyword -> introduction of
class(atom, X)
to be able to query appropriately -
This is the final approach with following results :
> sestre nedochadza, ze to nie je pravda
(my sister doesn't realize that it isn't true)Je este niekto vo vasej rodine komu nedochadza, ze to nie je pravda?
(is there anybody else in your family, who doesn't realize that it isn't true ?)> s mojou mamkou sme sa dneska ucili molove stupnice
(Today we were practicing minor scales with my mother.)> Rozpravate sa casto s mamkou?
(Do you speak often with your mother ?)
-
- predicate
conjugation(Verb, Number, Time, Base)
which based onVerb
findsNumber
(sg
,pl
),Time
(now
,past
,neuter
) - basic transformations
sg1
->pl2
( when user speaks about himself, we want to ask him questions (and use polite form (that's why we don't usesg2
instead ofpl2
)))pl2
->sg1
|sg2
->pl1
(same reasons as above)