-
Notifications
You must be signed in to change notification settings - Fork 4
/
amcmc.sthlp
804 lines (670 loc) · 33.8 KB
/
amcmc.sthlp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
{smcl}
{* 19feb2013}{...}
{cmd:help amcmc}
{hline}
{title:Title}
{p 4 4 2}
{bf:amcmc -- Mata Functions and Structures for Adaptive Markov Chain Monte Carlo sampling from distributions}
{title:Introduction}
{p 4 4 2}
{bf:amcmc} refers to a collection of Mata tools for performing adaptive Markov chain Monte Carlo (MCMC) sampling from distributions. Algorithms are
described in Baker (2013) and rely on Markov chain Monte Carlo sampling using a continually adapted multivariate normal proposal
distribution. Using the commands and functions, a user can perform adaptive MCMC using either the function {bf:amcmc()}
or by setting up an adaptive MCMC
problem using the suite of commands {bf:amcmc_*()}. The function {bf:amcmc()} is described below under the heading
of {bf:{help mf_amcmc##amcmc_fun: the function amcmc()}}. Setting up a structured object for performing {bf:amcmc_*()}
is discussed under the heading {bf: {help mf_amcmc##amcmc_struct:using amcmc_*() commands}}.
{marker amcmc_fun}{...}
{title:The function amcmc()}
{title:Syntax}
{p 8 40 2}
{it:real matrix}{bind: }
{cmd:amcmc(}{it:alginfo}{cmd:,}
{it:lnf}{cmd:,}
{it:xinit}{cmd:,}
{it:Vinit}{cmd:,}
{it:draws}{cmd:,}
{it:burn}{cmd:,}
{it:damper}{cmd:,}
{it:aopt}{cmd:,}
{it:arate}{cmd:,}
{it:vals}{cmd:,}
{it:lam}{cmd:,}
{it:blocks} [{cmd:,}
{it:M}{cmd:,}
{it:noisy}]{cmd:)}
{p 4 4 2}
where inputs are
{p2colset 9 33 35 2}{...}
{p2col 7 33 35 2: {it:alginfo}: {it:string rowvector}}{p_end}
{p2col 11 33 35 2: {it:lnf}: {it:pointer (real scalar function) scalar f}}{p_end}
{p2col 9 33 35 2: {it:xinit}: {it:real colvector}}{p_end}
{p2col 9 33 35 2: {it:Vinit}: {it:real matrixvector}}{p_end}
{p2col 9 33 35 2: {it:draws}: {it:real scalar}}{p_end}
{p2col 10 33 35 2: {it:burn}: {it:real scalar}}{p_end}
{p2col 9 33 35 2: {it:delta}: {it:real scalar}}{p_end}
{p2col 10 33 35 2: {it:aopt}: {it:real scalar}}{p_end}
{p2col 9 33 35 2: {it:arate}: {it:transmorphic}}{p_end}
{p2col 10 33 35 2: {it:vals}: {it:transmorphic}}{p_end}
{p2col 11 33 35 2: {it:lam}: {it:transmorphic}}{p_end}
{p2col 8 33 35 2: {it:blocks}: {it:real matrix}}{p_end}
{p2col 13 33 35 2: {it:M}: {it:transmorphic}}{p_end}
{p2col 9 33 35 2: {it:noisy}: {it:string scalar}}{p_end}
{title:Description}
{p 4 4 2}
{bf:amcmc()} performs adaptive MCMC sampling using a multivariate normal proposal distribution. MCMC methods work through
acceptance-rejection sampling; an MCMC algorithm with adaptation is continually
tuned as the algorithm runs so as to achieve a targeted acceptance rate. The degree of tuning
recedes as the algorithm proceeds so as
to achieve an eventually stationary proposal distribution. For a full description of the Mata implementation, see Baker (2013),
which follows the more detailed descriptions of adaptive MCMC algorithms in Andrieu and Thoms (2008).
The draws from the target distribution are returned as a matrix, with each row representing a draw.
{p 4 4 2}
Discussion of {it:alginfo}, the first argument, is postponed until other arguments have been discussed.
{p 4 4 2}
The (pointer) function {it:lnf} specifies the target distribution from which the user wishes to draw. The function describing the target
distribution must be written in log form, so that the (typically, but not always scalar) log-value of the distribution is returned by the function.
{p 4 4 2}
The initial values used in drawing are specified in the argument {it:xinit}, and an initial covariance matrix for the
proposal distribution is specified in {it:Vinit}. {it:draws} tells
the algorithm how many draws to perform, while {it:burn} instructs {bf:amcmc()} to drop the first {it:burn} draws as a burn-in period;
accordingly,
{bf:amcmc()} returns only the last {it:draws-burn} draws from the distribution {it:lnf}.
{p 4 4 2}
{it:delta} is an adjustment parameter telling the algorithm how agressively or conservatively to adapt the proposal distribution to achieve the
acceptance rate specified by the user in {it:aopt}. {it:delta} should lie between zero and one, with values closer to zero corresponding
with slower adaptation, and values closer to one corresponding to more rapid adaptation to the proposal history.
{it:aopt} is the acceptance rate desired by the user; typically lying in the range .2 to .44, as
it has been shown that optimal acceptance rates for univariate problems are around .44, while for large-dimensional problems
optimal rates are around .234. See Andrieu and Thoms (2008) for further discussion.
{p 4 4 2}
{it:arate}, {it:vals}, and {it:lam} are arguments that can be initialized as missing or as anything else by the user, as they are overwritten.
{it:arate} is overwritten with acceptance rates of the algorithm.
{it:vals} is overwritten with the values of the target distribution corresponding with each draw.
{it:lam} is a set of scaling parameters which are tuned as the algorithm proceeds. These parameters
scale the proposal covariance matrix to direct the algorithm towards the desired acceptance rate, with
aggressiveness captured by {it:delta}. In
global, all-at-once sampling, {it:lam} is returned as a scalar,
while if metropolis-within-gibbs sampling is specified (so that each component of the
target distribution is drawn sequentially), {it:lam} returns a vector
of lambda values equal in dimension to the target distribution. Finally, if block sampling is specified, {it:lam} returns a vector of lambda values
equal in dimension to the number of blocks. For further description of how different types are specified, see the description of {it:alginfo} below
and the examples. In block sampling or in metropolis-within-gibbs sampling, the dimension of {it:arate} matches that of sampler.
{p 4 4 2}
{it:blocks} is a matrix of zeros and ones which describes how the algorithm is to proceed if the user wishes to draw from the target distribution
not all-at-once but in a sequence of (Gibbs) steps. Values that are to be drawn together are marked by rows of {it:blocks} containing ones and zeros
elsewhere.
{p 4 4 2}
{it:M} can be used to relay additional information to the algorithm. In cases in which the user has assembled a
model statement using {bf:{help mf_moptimize:moptimize()}} or {bf:{help mf_optimize:optimize()}}, this model can be passed to {bf: amcmc()} using {it:M}. The idea is to
save the user the need to respecify things such as missing values, arguments of the function embedded in the model, etc. when switching or using {bf:amcmc()}.
In cases in which the user does not have a model, but has a function requiring additional arguments, {it:M} can be a pointer holding additional arguments
of the target distribution; up to ten additional arguments can be passed to {bf:amcmc()} in this fashion. For example, if the target distribution is characterized by
some function {it:lnf(x,Y,Z)} where {it:x} are to be drawn, but {it:Y} and
{it:Z} are also arguments, the user would define a pointer {it:M} containing {it:Y} and {it:Z}.
{it:M} is optional and does not require specification.
{p 4 4 2}
{it:noisy} is a string scalar that can be specified as {bf:"noisy"} or as something else. If {it:noisy=}{bf:"noisy"}, the algorithm produces
feedback - each time the target distribution is evaluated, it produces a {bf:.} as output, while after 50 calls, it produces the value of the target
distribution after the last call of {it:lnf}.
{p 4 4 2}
Last but not least, the first argument of the function is {it:alginfo}, which is a string scalar specifying the drawing
scheme desired by the user, what sort of target distribution
evaluator the user has passed to the function when the target is part of a previously specified model statement,
and perhaps some other things about how the algorithm is to be executed. While the examples present
more details, a user may assemble a string row vector composed of one entry from each of the following:
{col 12}{it:Sampling information}{...}
{col 36}{bf:mwg, global, block}
{col 12}{it:Model definition}{...}
{col 36}{bf:moptimize, optimize, standalone}
{col 12}{it:Model evaluator type}{...}
{col 36}{bf:d*, q*, e*, g*, v*}
{col 12}{it:Other information}{...}
{col 36}{bf:fast}
{p 4 4 2}
Thus, if the user wishes to perform Metropolis-within-Gibbs sampling (each component of {it:lnf} sampled alone and in order), and has previously
modeled the target distribution as part of a structure using {bf:{help mf_moptimize:moptimize()}} and a type {bf:d0} evaluator,
the user would specify
(in any order) {it:alginfo=}{bf:"moptimize","d0","mwg"}. Note that each component of {it:alginfo} should be a separate
string entry in a string row vector, so {it:alginfo=}{bf:"moptimize,d0,mwg"} will not work. The final option,
{bf:"fast"}, is somewhat experimental
and should be used with caution. In large problems, global, all-at-once samplers require Cholesky decomposition of the proposal covariance matrix.
This can be slow and time consuming. The option {bf:"fast"} avoids Cholesky decomposition by working with a diagonal proposal covariance matrix that
is continually adjusted as the algorithm proceeds. {bf:"fast"} sometimes works if the target distribution has many independent random variates and
if the proposal distribution is close to the target distribution. See the examples for other possibilities.
{p 4 4 2}
If the user wishes to use a block sampler (which is somewhere between a one-at-a-time, Metropolis-within-Gibbs sampler,
the user must also submit a block matrix to communicate to {cmd:amcmc()} which values are to be drawn together.
{title:Further Options}
{p 4 4 2}
While the function {bf:amcmc()} is designed to perform adaptive MCMC, adaptation can be turned off by the user. This feature is often useful if
a previous set of draws from a target distribution has already been performed and the distribution is well-tuned. In this case, the user can set
{it:damper} equal to missing, i.e., {it:damper=.}, and no adaptation will occur. In this case, the user must also specify a set of conformable values
for {it:lam}.
{title:Examples}
{pstd}{it:Example 1:} Sampling from a univariate normal mixture - with probability 1/2 the mean is -1 or 1
with standard deviation 1 in each case. While this is probably not the most efficient way to sample
from a mixture distribution, it serves to illustrate basic ideas. Initial values for
drawing are set to 0, with initial variance matrix for proposals set at 1.
1000 draws are taken with the first 100 discarded. A value of {it: delta}=1/2 is a
fairly conservative choice. A type of sampler has not been specified, which means a global sampler will be used as a default:
{com}:real scalar mixnorm(X)
>{
>val=1/2*normalden(x,1,1)+1/2*normalden(x,-1,1)
>return(ln(val))
>}
{res}
{com}:alginfo="standalone"
{res}
{com}:X=amcmc(alginfo,&mixnorm(),0,1,1000,100,1/2,.44,arate=.,vals=.,lam=.,.)
{res}{txt}
{pstd}{it:Example 2:} Sampling from a bivariate normal mixture with dimension 2, where with probability {it:p}
the mean is {it:m}, and with probability {it:p} the mean is {it:-m}. {it:p}, {it:m} and {it:Sig} - the covariance matrix of the distribution -
are passed to {bf:amcmc()} as additional arguments of the function. A vector of zeros and an identity matrix are used as the starting values for the proposal
distribution:
{com}real scalar mixnorm2(x,p,m1,m2,Sig)
>{
>dSig=1/sqrt(det(Sig))
>Siginv=invsym(Sig)
>val1=1/2*dSig*exp(-(x-m1)*Siginv*(x-m1)')
>val2=1/2*dSig*exp(-(x-m2)*Siginv*(x-m2)')
>return(ln(p*val1+(1-p)*val2))
>}
{res}
{com}:p=1/2
{res}
{com}:m1=1,1
{res}
{com}:m2=-1,-1
{res}
{com}:Sig=I(2):+.1
{res}
{com}:Args=J(4,1,NULL)
{res}
{com}:Args[1]=&p
{res}
{com}:Args[2]=&m1
{res}
{com}:Args[3]=&m2
{res}
{com}:Args[4]=&Sig
{res}
{com}:alginfo="standalone","global"
{res}
{com}:X=amcmc(alginfo,&mixnorm2(),J(1,2,0),I(2),100000,10000,1,.34,arate=.,vals=.,lam=.,.,Args)
{res}{txt}
{pstd}{it:Example 3:} "Estimating" a negative-binomial count model by simulation. The premise behind the example
is that one can view a likelihood function as a distribution for
parameters conditional on data. The idea is superficially Bayesian; the model is isomorphic to a typical Bayesian
analysis with uninformative prior distributions for parameters, but can be applied more generally; see Chernozukov and Hong (2003).
Sampling from the parameters' distribution is a sometimes useful method for analyzing models applied to
small samples, as results do not depend upon asymptotics as
they do in the typical Bayesian analysis. While in a more complete analysis, one might "thin" the resulting draws out,
check for convergence, etc. this is not done in the example, which is really used to illustrate how a model statement is
passed to {cmd:amcmc()}. The results can be contrasted with those obtained from
estimation via application of {bf:{help nbreg}}.
{com}. clear all
{res}
{com}. use http://www.stata-press.com/data/lf2/couart2, clear
{res}{txt}(Academic Biochemists / S Long)
{res}
{com}. set seed 5150
{res}
{com}. mata:
{res}{txt}{hline 20} mata (type {bf:end} to exit) {hline 20}
{com}:void nb_d0(M,todo,b,lnf,g,H)
>{
>y=moptimize_util_depvar(M,1)
>mu=exp(moptimize_util_xb(M,b,1))
>a=exp(moptimize_util_xb(M,b,2))
>lnfi=lngamma(y:+1/a):-lngamma(1/a):-
> lnfactorial(y):-(y:+1/a):*ln(1:+a*mu):+
> y*ln(a):+y:*ln(mu)
>lnf=colsum(lnfi)
>}
{res}
{com}:M=moptimize_init()
{res}
{com}:moptimize_init_evaluator(M,&nb_d0())
{res}
{com}:moptimize_init_evaluatortype(M,"d0")
{res}
{com}:moptimize_init_depvar(M,1,"art")
{res}
{com}:moptimize_init_eq_indepvars(M,1,"fem mar kid5 phd ment")
{res}
{com}:moptimize_init_eq_indepvars(M,2,"")
{res}
{com}:moptimize_evaluate(M)
{res}
{com}:alginfo="global","moptimize","d0"
{res}
{com}:X=amcmc(alginfo,&nb_d0(),J(1,7,0),I(7),20000,10000,3/4,.234,arate=.,vals=.,lam=.,.,M){txt}
{res}
{com}:mean(X)'
{res} {txt} 1
{c TLC}{hline 15}{c TRC}
1 {c |} {res}-.2200206506 {c |}
2 {c |} {res} .148970965 {c |}
3 {c |} {res}-.1774053762 {c |}
4 {c |} {res} .0173632397 {c |}
5 {c |} {res} .0290353224 {c |}
6 {c |} {res} .253103149 {c |}
7 {c |} {res}-.7998843153 {c |}
{c BLC}{hline 15}{c BRC}
{res}{txt}
{marker amcmc_struct}{...}
{title: Using amcmc_*() commands}
{title:Syntax}
{p 4 4 2}
Syntax is discussed in a three steps, which include:
{help mf_amcmc##syn_step1:Step 1: Initialization}
{help mf_amcmc##syn_step2:Step 2: Problem definition}
{help mf_amcmc##syn_step3:Step 3: Obtaining results}
{marker syn_step1}{...}
{title:Step 1: Initialization}
{p 4 4 2}
To initialize a problem, the user first issues an initialize command:
{p2col 7 33 35 2: {it:A = } {cmd:amcmc_init()}}{p_end}
{p 4 4 2} {cmd:amcmc_init()} sets up an adaptive MCMC problem with the following defaults:
the number of draws is set to one, the burn period is set to zero, and the optimal acceptance rate is set to .234.
{marker syn_step2}{...}
{title:Step 2: Problem definition}
{p 4 4 2}
Problem definition fills in the arguments described under {bf:{help mf_amcmc##amcmc_fun:the function amcmc()}}, so it might
first help the user to peruse that material, in addition to having a look at Baker (2013) and/or Andrieu and Thoms (2008). The user has to
relay information about the target distribution, initial values, how many draws are desired, any additional information about the function or
how drawing should go, and how adaptation of the proposal should occur.
{p 8 40 2}
{cmd:amcmc_lnf(}{it:A}{cmd:,} {it:pointer (real scalar function) scalar lnf}{cmd:)}
{p 12 12 2}
{it:Use:} sets the target distribution as the function {it:lnf}. Note that {it:lnf} must be in log form.
{p 8 40 2}
{cmd:amcmc_args(}{it:A}{cmd:,} {it:pointer matrix Args}{cmd:)}
{p 12 12 2}
{it: Use:} sets any additional arguments of the function {it:lnf}. Note that this option can only be used with a stand-alone target
distribution. That is,
if the user is sampling from a problem with details constructed using {bf:{help mf_moptimize:moptimize()}} or {bf:{help mf_optimize:optimize()}}, the user should
pass any arguments of the function {it:lnf} through those routines.
{p 8 40 2}
{cmd:amcmc_xinit(}{it:A}{cmd:,} {it:real rowvector xinit}{cmd:)}
{p 12 12 2}
{it:Use:} sets initial values for the proposal distribution.
{p 8 40 2}
{cmd:amcmc_Vinit(}{it:A}{cmd:,} {it:real matrix Vinit}{cmd:)}
{p 12 12 2}
{it:Use:} sets initial values of the proposal covariance matrix. If the matrix submitted is not positive definite, by default a conformable identity
matrix is used.
{p 8 40 2}
{cmd:amcmc_aopt(}{it:A}{cmd:,} {it:real scalar aopt}{cmd:)}
{p 12 12 2}
{it:Use:} sets desired acceptance rate, typically in the range .234 to .45 or so.
{p 8 40 2}
{cmd:amcmc_damper(}{it:A}{cmd:,} {it:real scalar delta}{cmd:)}
{p 12 12 2}
{it:Use:} Specifies the parameter determining how tuning of the algorithm is to occur; the value should be between zero and one.
Values close to zero mean less aggressive tuning of the proposal distribution, while values closer to one mean more aggressive tuning. One can
also specify a missing value for {it:damper} here; i.e., {cmd:amcmc_damper(A,.)}. In this case, no adaptation of the proposal distribution occurs,
and the user must specify scaling parameters using the command {cmd:amcmc_lambda(}{it:A}{cmd:,} {it:real rowvector lambda}{cmd:)}.
{p 8 40 2}
{cmd:amcmc_lambda(}{it:A}{cmd:,} {it:real rowvector lambda}{cmd:)}
{p 12 12 2}
{it:Use:} Specifies the scaling parameters for the proposal covariance matrix. That is, when a draw is performed using covariance matrix {it:W},
draws use {it:lambda*W} to make the draws. This option should only be set if {it:damper} (discussed above under the heading
{cmd:amcmc_damper(}{it:A}{cmd:,)} is set to missing so that no adaptation of the proposal distribution is to occur.
{p 8 40 2}
{cmd:amcmc_burn(}{it:A}{cmd:,} {it:real scalar burn}{cmd:)}
{p 12 40 2}
{it:Use:} sets the length of the burn-in period, for which information about draws is discarded.
{p 8 40 2}
{cmd:amcmc_draws(}{it:A}{cmd:,} {it:real scalar draws}{cmd:)}
{p 12 40 2}
{it:Use:} Number of draws to be performed.
{p 8 40 2}
{cmd:amcmc_noisy(}{it:A}{cmd:,} {it:string scalar noisy}{cmd:)}
{p 12 12 2}
{it:Use:} If {it:noisy}={bf:"noisy"}, a dot is produced each time {it:lnf} is called; every 50 calls the function value at the last draw is also
produced.
{p 8 40 2}
{cmd:amcmc_model(}{it:A}{cmd:,} {it:transmorphic M}{cmd:)}
{p 12 12 2}
{it:Use:} Can be used to append the drawing problem with a previously assembled model statement
formulated using either {bf:{help mf_moptimize:moptimize()}} or {bf:{help mf_optimize:optimize()}}.
{p 8 40 2}
{cmd:amcmc_blocks(}{it:A}{cmd:,} {it:real matrix blocks}{cmd:)}
{p 12 12 2}
{it:Use:} In conjunction with block samplers. The matrix {it:blocks} contains information about the sequence of draws.
{p 8 40 2}
{cmd:amcmc_alginfo(}{it:A}{cmd:,} {it:string rowvector alginfo}{cmd:)}
{p 12 12 2}
{it:Use:} A sequence of strings that contain information about how drawing is to proceed, what type of sampler to use, if and how models are specified,
etc. Available options are:
{col 12}{it:Sampling information}{...}
{col 36}{bf:mwg, global, block}
{col 12}{it:Model definition}{...}
{col 36}{bf:moptimize, optimize, standalone}
{col 12}{it:Model evaluator type}{...}
{col 36}{bf:d*, q*, e*, g*, v*}
{col 12}{it:Other information}{...}
{col 36}{bf:fast}
{p 12 12 2}
Thus, if the user is sampling from a previously formed model statement using optimize with a type {bf:d0} evaluator, and wished to sample in blocks the user
would specify {cmd:amcmc_alginfo(A,("block","optimize","d0"))}.
{p 8 40 2}
{cmd:amcmc_draw(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Execute the sampler.
{p 4 4 2}
Setting up a problem as a structured object has advantages in
situations in which the user is interested in executing adaptive MCMC as a step in a larger algorithm, or in which the user must set up
a sequence of similar adaptive MCMC problems (or both). An additional set of commands which do not directly analogize with the use of the
function {bf:amcmc()} follow:
{p 8 40 2}
{cmd:amcmc_append(}{it:A}{cmd:,} {it:string scalar append}{cmd:)}
{p 12 12 2}
{it:Use:} In the event that the user executes the command {cmd:amcmc_draw(A)} multiple times in sequence, by specifying {it:append}={bf:"append"},
the results of the draw will be attached to the information on previous calls of {cmd:amcmc_draw(A)}. Acceptance rates, the proposal distribution, etc.
are all updating accordingly. This is the default setting; unless the user specifies {it:append}={bf:"overwrite"}, all information about previous
draws will be retained. Specifying {it:append}={bf:"overwrite"} does not restart the draw; everything is updated using past draws, but only the
most recent set of draws is retained. This is useful in large problems in which a sampler is used as a step in a larger sampling algorithm.
{p 8 40 2}
{cmd:amcmc_reeval(}{it:A}{cmd:,} {it:string scalar reeval}{cmd:)}
{p 12 12 2}
{it:Use:} In the event that the user executes the command {cmd:amcmc_draw(A)} as a step in larger algorithm,
the parameters that have been passed to the problem might change. It is then necessary for the algorithm to reevaluate the function
using the last drawn values with the new parameter values before proceeding. If the user wishes to do this, the user may simply
set {it:reeval="reeval"}.
{marker syn_step3}{...}
{title:Step 3: Results}
{p 4 4 2}
Once a run has been executed using {cmd:amcmc_draw()}, the user can access information about the draws, and also recover some information about
initial settings.
{p 8 40 2}
{it:real matrix} {cmd:amcmc_results_draws(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the draws in a matrix form; each row represents a draw.
{p 8 40 2}
{it:real colvector} {cmd:amcmc_results_vals(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the values of the proposal distribution corresponding with each draw.
{p 8 40 2}
{it:real rowvector} {cmd:amcmc_results_arate(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns acceptance rates; if a global scheme is used, this is a single value; otherwise, acceptance rates conform with the number of blocks
(for a block sampler) or the dimension of the target distribution (for a metropolis-within-gibbs sampler).
{p 8 40 2}
{it:real scalar} {cmd:amcmc_results_passes(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the number of times the {cmd:amcmc_draw()} command has been issued.
{p 8 40 2}
{it:real scalar} {cmd:amcmc_results_totaldraws(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the total number of draws.
{p 8 40 2}
{it:real colvector} {cmd:amcmc_results_vals(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the values of the proposal distribution corresponding with each draw.
{p 8 40 2}
{it:real matrix} {cmd:amcmc_results_acceptances(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns a matrix of zeros and ones conformable with the sampling scheme, where a one marks whether or not a particular draw was
accepted, and a zero marks whether or not a draw was rejected. In short, the function returns the acceptance history of the draw.
{p 8 40 2}
{it:real rowvector} {cmd:amcmc_results_propmean(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the mean of the proposal distribution at the end of the run.
{p 8 40 2}
{it:real matrix} {cmd:amcmc_results_propvar(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns the covariance matrix of the proposal distribution at the end of the run.
{p 8 40 2}
{it:real rowvector} {cmd:amcmc_results_lastdraw(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Returns only the last draw.
{p 8 40 2}
{it:void} {cmd:amcmc_results_report(}{it:A}{cmd:)}
{p 12 12 2}
{it:Use:} Produces a quick summary of a run, which includes the total passes,
draws per pass, the total number of draws, the average value of the target distribution,
the last value of the target distribution, the acceptance rate (average), and the burn-in period.
{p 4 4 2}
Additionally, one can obtain the values that the user specified by using {cmd: amcmc_results_*()}, where {cmd:*} can be any one of:
{bf:alginfo, noisy, blocks, damper, xinit, Vinit,} or {bf:lambda}.
{title: Examples}
{pstd}{it:Example 1:} This example recasts example 2 above. To recap, the
interest lies in sampling from a multivariate normal mixture with dimension 2, where with probability {it:p}
the mean is {it:m}, and with probability {it:p} the mean is {it:-m}. {it:p}, {it:m} and {it:Sigma} - the covariance matrix of the distribution -
are passed as additional arguments of the function. A vector of zeros and an identity matrix are used as the starting values for the proposal
distribution:
{com}real scalar mixnorm2(x,p,m1,m2,Sig)
>{
>dSig=1/sqrt(det(Sig))
>Siginv=invsym(Sig)
>val1=1/2*dSig*exp(-(x-m1)*Siginv*(x-m1)')
>val2=1/2*dSig*exp(-(x-m2)*Siginv*(x-m2)')
>return(ln(p*val1+(1-p)*val2))
>}
{res}
{com}:p=1/2
{res}
{com}:m1=1,1
{res}
{com}:m2=-1,-1
{res}
{com}:Sig=I(2):+.1
{res}
{com}:Args=J(4,1,NULL)
{res}
{com}:Args[1]=&p
{res}
{com}:Args[2]=&m1
{res}
{com}:Args[3]=&m2
{res}
{com}:Args[4]=&Sig
{res}
{com}:alginfo="standalone","global"
{res}
{com}:A=amcmc_init()
{res}
{com}:amcmc_alginfo(A,alginfo)
{res}
{com}:amcmc_lnf(A,&mixnorm2())
{res}
{com}:amcmc_draws(A,100000)
{res}
{com}:amcmc_burn(A,10000)
{res}
{com}:amcmc_args(A,Args)
{res}
{com}:amcmc_xinit(A,J(1,2,0))
{res}
{com}:amcmc_Vinit(A,I(2))
{res}
{com}:amcmc_aopt(A,.34)
{res}
{com}:amcmc_damper(A,1)
{res}
{com}:amcmc_draw(A){txt}
{pstd}{it: Example 2:} Bayesian estimation of a mixed logit model. The ideas behind this example follow Chapter 12 of Train (2009).
The user is interested in estimating a logit model, but where parameters vary about a distribution at some group level.
There are three set of parameters: 1) beta_n, the values of the group parameters for each group, 2) b, the mean of the group
level parameters across groups, and 3) W, the covariance matrix of the group-level parameters.
{pstd} Following the setup described in Train (2009), suppose that the prior distribution
for b is normal with arbitrarily large variance, while the prior on W is inverted
Wishart with K (the dimension of b) degrees of freedom and identity scale matrix. Under these conditions:
{pstd} 1. The distribution of b given beta_n and W is normal with mean=mean(beta_n), and covariance matrix W/N,
where N is the number of groups.
{pstd} 2. The distribution of W given b and beta_n is an inverted Wishart.
{pstd} 3. The distribution of beta_n given b and W is the product of a normal density
capturing the likelihood of the group's parameters given the mean and variance of group parameters across the sample,
and the likelihood of the individual's choices given beta_n.
{pstd} The example begins by setting up the data, which is {bf:bangladesh.dta}. The data set has information on
the use of contraceptives for individuals in different districts. The user posits that the coefficients vary
randomly by district in accordance with the model. Reading data into Mata:
{com}. clear all
{res}
{com}. webuse bangladesh
{res}{txt}(NLS Women 14-24 in 1968)
{res}
{com}. set seed 90210
{res}
{com}. mata:
{res}{txt}{hline 20} mata (type {bf:end} to exit) {hline 20}
{com}: st_view(X=.,.,"urban age child1 child2 child3")
{res}
{com}: st_view(y=.,.,"c_use")
{res}
{com}: st_view(id=.,.,"district")
{res}
{com}: X=X,J(rows(X),1,1)
{res}
{com}: m=panelsetup(id,1){txt}
{pstd} Mata now contains contraceptive use information in the vector {it:y}, explanatory variables (and a constant) in the vector {it:X},
and an id code which is organized into a panel using {bf:{help mf_panelsetup:panelsetup()}}. The next step is to code the functions
following Train's (2009) descriptions. For steps 1 and 2, a function producing draws from the
respective distributions is needed; step 3 will be set up to work with {bf:amcmc_*()}, so
this function does not produce draws, but instead returns the log value of the parameter density conditional on data, choices, b, and W.
The notational convention in the necessary functions are that the mean and variance of group-level parameters are denoted by {bf:b} and
{bf:W}, the (matrix of) group-level parameters is denoted {bf:beta}, while a set (really, a rowvector) of group-level parameters
are denoted by {bf:beta_n}. The three required functions are:
{com}: real matrix drawb_betaW(beta,W) {
>return(mean(beta)+rnormal(1,cols(beta),0,1)*cholesky(W)')
>}
{res}
{com}: real matrix drawW_bbeta(beta,b)
>{
>v=rnormal(cols(b)+rows(beta),cols(b),0,1)
>S1=variance(beta:-b)
>S=invsym((cols(b)*I(cols(b))+rows(beta)*S1)/(cols(b)+rows(beta)))
>L=cholesky(S)
>R=(L*v')*(L*v')'/(cols(b)+rows(beta))
>return(invsym(R))
>}
{res}
{com}: real scalar lnchoiceprob(beta_n,b,W,yn,Xn)
>{
>mus=rowsum(Xn:*beta_n)
>lnp=yn:*mus:-ln(1:+exp(mus))
>lnprior=-1/2*(beta_n-b)*invsym(W)*(beta_n-b)'-
> 1/2*ln(det(W))-cols(b)/2*ln(2*pi())
>return(sum(lnp)+lnprior)
>}
{res}
{com}:{txt}
{pstd} The next step is setting up a series of adaptive MCMC problems - one for each individual, using {bf: amcmc_*()} and Mata's
{help mata J():J()} function, which allows easy duplication of a single problem.
{com}: Ap=amcmc_init()
{res}
{com}: amcmc_damper(Ap,1)
{res}
{com}: amcmc_arate(Ap,.4)
{res}
{com}: amcmc_alginfo(Ap,("standalone","global"))
{res}
{com}: amcmc_lnf(Ap,&lnchoiceprob())
{res}
{com}: amcmc_draws(Ap,1)
{res}
{com}: amcmc_append(Ap,"overwrite")
{res}
{com}: amcmc_reeval(Ap,"reeval")
{res}
{com}: A=J(rows(m),1,Ap)
{res}
{com}:{txt}
{pstd} Each problem is set up as a global drawing problem, where one draw is taken. In passing, the author's
experience is that it is sometimes helpful to
let each individual problem run a bit by specifying 5 or 10 draws, say, in this step for better mixing in the early stages of the
algorithm. The option {bf:"overwrite"} specifies that
information on each draw is not to be stored as the algorithm proceeds. The option {bf:"reeval"} specifies that
it is necessary to reevaluate the function, as the two arguments {it:b} and {it:W} are changed as the
algorithm proceeds. After some (pretty poor) starting values are
specified, an initial draw of the individual-level parameters is taken.
A loop can now be run to fill in the arguments. This is done by
setting up a pointer matrix, each row of which points to explanatory values and the mean and variance of the distribution. Now, initial
values for parameters are specified, and then all the separate {bf:amcmc} problems are initiated in a loop.
{com}: b=J(1,6,0)
{res}
{com}: W=I(6)*6
{res}
{com}:beta=b:+rnormal(rows(m),cols(X),0,1)*cholesky(W)'
{res}
{com}: Args=J(rows(m),4,NULL)
{res}
{com}: for (i=1;i<=rows(m);i++) {
>Args[i,1]=&b
>Args[i,2]=&W
>Args[i,3]=&panelsubmatrix(y,i,m)
>Args[i,4]=&panelsubmatrix(X,i,m)
>amcmc_args(A[i],Args[i,])
>amcmc_xinit(A[i],b)
>amcmc_Vinit(A[i],W)
>}
{res}
{com}:{txt}
{pstd} The algorithm is now implemented as follows (with 10000 total draws, and an initial value of individual
level parameters taken as a draw
from the normal distribution. The matrices {bf: bvals} and {bf: Wvals} are used to hold the draws of the mean and the covariance matrix:
{com}: its=10000
{res}
{com}: bvals=J(0,cols(beta),.)
{res}
{com}: Wvals=J(0,cols(rowshape(W,1)),.)
{res}
{com}: for (i=1;i<=its;i++) {
>b=drawb_betaW(beta,W/rows(m))
>W=drawW_bbeta(beta,b)
>bvals=bvals\b
>Wvals=Wvals\rowshape(W,1)
>for (j=1;j<=rows(m);j++) {
>amcmc_draw(A[j])
>beta[j,]=amcmc_results_lastdraw(A[j])
>}
>}
{res}
{com:}{txt}
{pstd} While in a typical application, one usually discards some number of initial values and thins results so as to
eliminate the autcorrelation inherent in MCMC sampling, here we just summarize the draws
for the mean of the parameter vector and the covariance matrix:
{com}:mean(bvals)'
{res} {txt} 1
{c TLC}{hline 15}{c TRC}
1 {c |} {res} .9653859143 {c |}
2 {c |} {res}-.0363394328 {c |}
3 {c |} {res} 1.399463261 {c |}
4 {c |} {res} 1.696209143 {c |}
5 {c |} {res} 1.656005711 {c |}
6 {c |} {res}-2.121194287 {c |}
{c BLC}{hline 15}{c BRC}
{res}
{com}:rowshape(mean(Wvals),6)
{txt}[symmetric]
{res} {txt} 1 2 3 4 5 6
{c TLC}{hline 85}{c TRC}{txt}
1 {c |} {res} 1.692919191 {c |}{txt}
2 {c |} {res}-.0236467397 .1201797953 {c |}{txt}
3 {c |} {res} .7638987195 -.0098907509 1.356704652 {c |}{txt}
4 {c |} {res} .5181705857 -.0080073073 .6239961521 1.122827391 {c |}{txt}
5 {c |} {res} 1.087957638 -.0416998767 1.023219879 .6940770407 1.978052313 {c |}{txt}
6 {c |} {res}-1.317031389 .0337024609 -1.044189251 -.812652926 -1.535208409 2.077123236 {c |}{txt}
{c BLC}{hline 85}{c BRC}
The example is in fact a fairly complete description as to how the user-written command {cmd: bayesmlogit} works.
For more information, type {cmd:findit net bayesmlogit} at the Stata command prompt. Further examples can be found in Baker (2013).
{title:Additional Notes}
{p 4 4 2}
{bf:amcmc()} or {bf:amcmc_*()} requires that the user install Ben Jann's {cmd:help moremata} set of commands.
{title:Author}
{p 4 4 2} Matthew J. Baker, Hunter College and the Graduate Center, CUNY, [email protected]. Comments and suggestions are appreciated.
{title: References}
{p 8 8 2}
Andrieu, Christophe, and Johannes Thoms. 2008. A tutorial on adaptive MCMC. Statistics and Computing 18: 343-73.
{p 8 8 2}
Baker, Matthew J. 2013. Adaptive Markov chain Monte Carlo in Mata. {browse " http://EconPapers.repec.org/RePEc:htr:hcecon:440":Hunter College working paper 440}.
{p 8 8 2}
Chernozukov, Victor, and Han Hong. 2003. An MCMC approach to classical estimation. Journal of Econometrics, 115(2): 293-346.
{p 8 8 2}
Train, Kenneth E. 2009. Discrete choice methods with simulation, 2nd. ed. Cambridge and New york: Cambridge University Press.
{title:Also see}
{p 4 13 2}
Online: help for {bf:{help mata}}