Human players prefer training with human opponents over agents as the latter are distinctively different in level and style than humans.
Agents designed for human-agent play are capable of adjusting their level, however their style is not aligned with that of human players.
In this work, we implement approach for designing such agents.

Basic Agent

Our agents (agent1 folder) are based on


Read doc of significant part of our work here zeroHex_doc.


We found lists of ELO ranked players and their games (board size 11X11) at

Original Format

The original format of the games was as the following:

(;FF[4]EV[]PB[Maciej Celuch]PW[pensando]SZ[13]RE[B]GC[ game #1254910]SO[];W[mf];B[gg];W[de];B[df];W[ji];B[if];W[ld];B[jg];W[jf];B[ig];W[kg];B[kc];W[mb];B[ma];W[lb];B[la];W[kb];B[ka];W[jb];B[ja];W[hc];B[hb];W[fd];B[fc];W[ib];B[ia];W[gc];B[gb];W[cd];B[da];W[eb];B[dd];W[ce];B[dc];W[cc];B[db];W[bb];B[le];W[kf];B[ke];W[je];B[il];W[jj];B[kk];W[hk];B[resign])

This was needed to be parsed and arranged.

More Sources

Check the also the site There is a potential to mine from it more games.

Our Datasets

We produced, among the rest, the following datasets:

  • Files of all the players according their rank (for instance, the file 1234.txt belongs to a player with rank 1234) Clearly, the rank function isn't one to one, and thus there can be multiple players with the same rank.
  • Files of all the players according their rank, when the name of the player is at the first line.
  • The files used to train the humanized agents, separated.
  • The original bunch from littlegolem.
  • The records of the games between our agent and wolve.

Work Components

Our work contains, among the rest, the following components:

  • The code of the modified and the original agents. It contains the original files to create and train the original agent.
    The file can be run by model1 model2 to compare the models.
    The file can be run by to train the agent to fit the rank of a banch of games.
  • Inside the agent folder there are also files that used to run our agents against the agent wolve.
    It can be run by Modify the hard coded paths inside.
    Some integration details (commands, parameters and more) can be founded inside
  • The convertor contains scripts to handle and arrange the original games records.
  • The evaluation tool can be used to evaluate the accuracy of predicting the next move given a board state.
    It can be run by Notice that the paths and games are hard coded inside (and can easily be modified).
  • The gui that can be used by human to play against the agents. The original application had some bugs in it, that we fixed in this version.


Find more agents to test against here


Playing against the agent can be done by using

About Us

The work done by Avshalom , Ori and Shlomo as undergraduates' final project at Bar Ilan University.

Some Theory

At the beginning of the work, some of theoretic material was needed to be caught up.


Note that the focus in humanizing the agent was in adjusting the search tree's parameters and not the nn's.

Reinforcement Learning

Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience.


Creation of mcts consists four stages:

1. Selection

Used for nodes we've seen before.
Pick according to UCB.

2. Expansion

Used when we reach the frontier.
Add one node per playout.

3. Simulation

Used beyond the search frontier.
Don't bother with UCB, just play randomly.

4. Backpropagation

After reaching a terminal node.
Update value and visits for states expanded in selection and expansion.

Read & Watch more and more at