-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoutput.txt
107 lines (100 loc) · 1.89 KB
/
output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Model-based algorithms
## Policy iteration
Number of policy iterations needed: 4
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['_' '>' '_' '<']
['_' '^' '_' '^']
['>' '_' '_' '^']
['^' '>' '>' '^']]
Value:
[[0.455 0.504 0.579 0.505]
[0.508 0. 0.653 0. ]
[0.584 0.672 0.768 0. ]
[0. 0.771 0.887 1. ]]
## Value iteration
Number of value iterations :-> 11
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['_' '>' '_' '<']
['_' '^' '_' '^']
['>' '_' '_' '^']
['^' '>' '>' '^']]
Value:
[[0.455 0.504 0.579 0.505]
[0.508 0. 0.653 0. ]
[0.584 0.672 0.768 0. ]
[0. 0.771 0.887 1. ]]
# Model-free algorithms
## Sarsa
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['_' '<' '_' '<']
['_' '^' '_' '^']
['>' '>' '_' '^']
['^' '>' '>' '>']]
Value:
[[0.039 0.025 0.181 0. ]
[0.044 0. 0.403 0. ]
[0.258 0.322 0.663 0. ]
[0. 0.705 0.816 1. ]]
## Q-learning
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['>' '>' '_' '<']
['^' '^' '_' '^']
['>' '>' '_' '^']
['^' '>' '>' '^']]
Value:
[[0.447 0.504 0.571 0.502]
[0.369 0. 0.638 0. ]
[0.455 0.652 0.777 0. ]
[0. 0.769 0.892 1. ]]
## Linear Sarsa
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['_' '<' '_' '>']
['_' '^' '_' '^']
['>' '_' '_' '^']
['^' '>' '>' '<']]
Value:
[[0.423 0.34 0.295 0.082]
[0.485 0. 0.602 0. ]
[0.554 0.634 0.752 0. ]
[0. 0.777 0.885 1. ]]
## Linear Q-learning
Lake:
[['&' '.' '.' '.']
['.' '#' '.' '#']
['.' '.' '.' '#']
['#' '.' '.' '$']]
Policy:
[['_' '<' '_' '<']
['_' '^' '_' '^']
['>' '>' '_' '^']
['^' '>' '>' '<']]
Value:
[[0.464 0.407 0.44 0.415]
[0.513 0. 0.569 0. ]
[0.571 0.668 0.768 0. ]
[0. 0.786 0.888 1. ]]