Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Documentation, readme and docstrings #79

Merged
merged 15 commits into from
Jun 8, 2023
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
/dev/

.vscode/
docs/Manifest.toml
49 changes: 29 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# QuasiMonteCarlo.jl

[![Join the chat at https://julialang.zulipchat.com #sciml-bridged](https://img.shields.io/static/v1?label=Zulip&message=chat&color=9558b2&labelColor=389826)](https://julialang.zulipchat.com/#narrow/stream/279055-sciml-bridged)
Expand Down Expand Up @@ -28,12 +29,12 @@ ub = [1.0,20.0]
n = 5
d = 2

s = QuasiMonteCarlo.sample(n,lb,ub,GridSample([0.1,0.5]))
s = QuasiMonteCarlo.sample(n,lb,ub,UniformSample())
s = QuasiMonteCarlo.sample(n,lb,ub,GridSample())
s = QuasiMonteCarlo.sample(n,lb,ub,Uniform())
s = QuasiMonteCarlo.sample(n,lb,ub,SobolSample())
s = QuasiMonteCarlo.sample(n,lb,ub,LatinHypercubeSample())
s = QuasiMonteCarlo.sample(n,lb,ub,LatticeRuleSample())
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample([10,3], false))
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample())
```

The output `s` is a matrix, so one can use things like `@uview` from
Expand All @@ -53,11 +54,18 @@ Everything has the same interface:
A = QuasiMonteCarlo.sample(n,lb,ub,sample_method)
```

or to generate points directly in the unit box $[0,1]^d$

```julia
A = QuasiMonteCarlo.sample(n,d,sample_method) # = QuasiMonteCarlo.sample(n,zeros(d),ones(d),sample_method)
```

where:

- `n` is the number of points to sample.
- `lb` is the lower bound for each variable. The length determines the dimensionality.
- `ub` is the upper bound.
- `d` is the dimension of the unit box.
- `sample_method` is the quasi-Monte Carlo sampling strategy.

Additionally, there is a helper function for generating design matrices:
Expand All @@ -75,13 +83,13 @@ all sampled from the same low-discrepancy sequence.
Sampling methods `SamplingAlgorithm` are divided into two subtypes

- `DeterministicSamplingAlgorithm`
- `GridSample(dx)` where the grid is given by `lb:dx[i]:ub` in the ith direction.
- `GridSample` for samples on a regular grid.
- `SobolSample` for the Sobol' sequence.
- `FaureSample` for the Faure sequence.
- `LatticeRuleSample` for a randomly-shifted rank-1 lattice rule.
- `HaltonSample(base)` where `base[i]` is the base in the ith direction.
- `HaltonSample` for the Halton sequence.
- `GoldenSample` for a Golden Ratio sequence.
- `KroneckerSample(alpha, s0)` for a Kronecker sequence, where alpha is an length-`d` vector of irrational numbers (often `sqrt(d`)) and `s0` is a length-`d` seed vector (often `0`).
- `KroneckerSample(alpha, s0)` for a Kronecker sequence, where alpha is a length-`d` vector of irrational numbers (often `sqrt(d)`) and `s0` is a length-`d` seed vector (often `0`).
- `RandomSamplingAlgorithm`
- `UniformSample` for uniformly distributed random numbers.
- `LatinHypercubeSample` for a Latin Hypercube.
Expand Down Expand Up @@ -115,24 +123,25 @@ end

## Randomization of QMC sequences

Most of the previous methods are deterministic i.e. `sample(n, d, Sampler()::DeterministicSamplingAlgorithm)` always produces the same sequence $X = (X_1, \dots, X_n)$.
The API to randomize sequence is either
- Directly use `QuasiMonteCarlo.sample(n, d, DeterministicSamplingAlgorithm(R = RandomizationMethod()))` or `sample(n, lb, up, DeterministicSamplingAlgorithm(R = RandomizationMethod()))`
- Or given any matrix $X$ ($d\times n$) of $n$ points all in dimension $d$ in $[0,1]^d$ one can do `randomize(x, ::RandomizationMethod())`
Most of the previous methods are deterministic, i.e. `sample(n, d, Sampler()::DeterministicSamplingAlgorithm)` always produces the same sequence $X = (X_1, \dots, X_n)$.
There are two ways to obtain a randomized sequence:

- Either directly use `QuasiMonteCarlo.sample(n, d, DeterministicSamplingAlgorithm(R = RandomizationMethod()))` or `sample(n, lb, up, DeterministicSamplingAlgorithm(R = RandomizationMethod()))`.
- Or, given $n$ points $d$-dimensional points, all in $[0,1]^d$ one can do `randomize(X, ::RandomizationMethod())` where $X$ is a $d\times n$-matrix.

There are the following randomization methods:
The currently available randomization methods are:

- Scrambling methods `ScramblingMethods(base, pad, rng)` where `base` is the base used to scramble and `pad` the number of bits in `b`-ary decomposition.
- Scrambling methods `ScramblingMethods(b, pad, rng)` where `b` is the base used to scramble and `pad` the number of bits in `b`-ary decomposition.
`pad` is generally chosen as $\gtrsim \log_b(n)$.
The implemented `ScramblingMethods` are
- `DigitalShift`
- `MatousekScramble` a.k.a Linear Matrix Scramble.
- `OwenScramble` a.k.a Nested Uniform Scramble is the most understood theoretically but is more costly to operate.
- `Shift(rng)` a.k.a. Cranley Patterson Rotation.
- `DigitalShift`
- `MatousekScramble` a.k.a. Linear Matrix Scramble.
- `OwenScramble` a.k.a. Nested Uniform Scramble is the most understood theoretically, but is more costly to operate.
- `Shift(rng)` a.k.a. Cranley Patterson Rotation.

For numerous independent randomization, use `generate_design_matrices(n, d, ::DeterministicSamplingAlgorithm), ::RandomizationMethod, num_mats)` where `num_mats` is the number of independent `X` generated.

### Example
### Randomization Example

Randomization of a Faure sequence with various methods.

Expand All @@ -144,7 +153,7 @@ b = QuasiMonteCarlo.nextprime(d)
N = b^m # Number of points
pad = m # Length of the b-ary decomposition number = sum(y[k]/b^k for k in 1:pad)

# Unrandomized low discrepency sequence
# Unrandomized (deterministic) low discrepancy sequence
x_faure = QuasiMonteCarlo.sample(N, d, FaureSample())

# Randomized version
Expand All @@ -165,7 +174,7 @@ Plot the resulting sequences along dimensions `1` and `3`.

```julia
d1 = 1
d2 = 3 # you can try every combination of dimension (d1, d2)
d2 = 3 # you can try every combination of dimensions (d1, d2)
sequences = [x_uniform, x_faure, x_shift, x_digital_shift, x_lms, x_nus]
names = ["Uniform", "Faure (deterministic)", "Shift", "Digital Shift", "Matousek Scramble", "Owen Scramble"]
p = [plot(thickness_scaling=1.5, aspect_ratio=:equal) for i in sequences]
Expand All @@ -184,4 +193,4 @@ end
plot(p..., size=(1500, 900))
```

![Different randomize methods of the same initial set of points](img/various_randomization.svg)
![Different randomize methods of the same initial set of points](img/various_randomization.svg)
4 changes: 3 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[deps]
QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuasiMonteCarlo = "8a4e6c94-4038-4cdc-81c3-7e6ffdb2a71b"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
Documenter = "0.27"
1 change: 1 addition & 0 deletions docs/pages.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ pages = [
"Home" => "index.md",
"samplers.md",
"randomization.md",
"design_matrix.md",
]
33 changes: 33 additions & 0 deletions docs/src/design_matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@

# [Design Matrices](@id DesignMatrices)

## API

It is often convenient to generate multiple independent sequence, for error estimation (uncertainty quantification).
The resulting sequences can be stored in what is often called a design matrix.
In this package, this is achieved with the `generate_design_matrices(n, d, ::DeterministicSamplingAlgorithm), ::RandomizationMethod, num_mats)` function. `num_mats` is the number of independent realization. The resulting design matrix is a vector of matrix of length `num_mats`.

```@docs
QuasiMonteCarlo.generate_design_matrices
```

!!! warning
The method `generate_design_matrices(n, d, sampler, R::NoRand, num_mats, T = Float64)` is an ad hoc way to produce a Design Matrix. Indeed, it creates a deterministic point set in dimension `d × num_mats` and splits it into `num_mats` point set of dimension `d`. The resulting sequences have no QMC guarantees.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is there a better way to do this which guarantees the good properties for QMC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That is the point of all the randomization methods (RQMC).

I thought of adding this illustration to the design_matrices page.
Even with a relatively small case d = 6 and a nice function F_test, one can observe that sometime the method proposed by Saltelli and implemented with NoRand() produce from time to time very far away estimate. With M=500 realizations one can observe that the distribution of estimate from Saltelli has heavy tails while you would hope for something looking like a Normal distribution.
On the other hand, RQMC is a theoretically grounded way to do (but it can be more computationally expensive)

using QuasiMonteCarlo, Random, StatsBase
using Plots
gr() # plotly() # if you wanna zoom and all
Random.seed!(1234)
m = 13
d = 6
b = 2
N = b^m # Number of points
pad = m+4 # Can also choose something as `2m` to get "better" randomization
M = 500 # nb of realization

# "Saltelli independent" sequences
Saltelli = QuasiMonteCarlo.generate_design_matrices(N, d, SobolSample(R = NoRand()), M)
# True i.i.d sequences
RQMC = QuasiMonteCarlo.generate_design_matrices(N, d, SobolSample(R = OwenScramble(base = b, pad = pad)), M)

F_test(x) = prod(x)
μ_exact = 1/2^d # = ∫ F_test(x) dᵈx

μ_Saltelli = [mean(F_test(c) for c in eachcol(X)) for X in Saltelli]
μ_RQMC = [mean(F_test(c) for c in eachcol(X)) for X in RQMC]

mean(μ_Saltelli .- μ_exact) # -2.309473096176987e-5
mean(μ_RQMC .- μ_exact) # -6.113472379543662e-7

std(μ_Saltelli .- μ_exact) # 0.0002962911429492179
std(μ_RQMC .- μ_exact) # 2.408945952759537e-5

stephist(μ_Saltelli .- μ_exact, label = "Saltelli", norm = :pdf) # not close at all from being asymptotically Normal
stephist!(μ_RQMC .- μ_exact, label = "RQMC", norm = :pdf)
vline!([0], label = "true", c = :black, s = :dot)
# yaxis!(:log10, ylims = (1e2, Inf)) # logscale can help or not to vizualize
# savefig("Saltelli_vs_RQMC.png")

Saltelli_vs_RQMC

This seems to have been proposed in Section 5.1 of [*Saltelli, A. et al. (2010)*](https://d1wqtxts1xzle7.cloudfront.net/76482087/PUBLISHED_PAPER-libre.pdf?1639660270=&response-content-disposition=inline%3B+filename%3DVariance_based_sensitivity_analysis_of_m.pdf&Expires=1685981169&Signature=aim5tHldlkb0ewZ9-gSMZsW2F1b88tLvV8euV1FpD61UYrE1mLR3RDERut0BsHNbcibjKQnF1JlsZ8mtEx~E1~eI3A~SOSySbpQllIpbhu556pFGUvD3GV5M6ghwa-5QMDP3-aQczBzflR721N4PCVJgqfmV-y94pkijQYvHSvZaPKb-tsoS8TVxE6H31Ptk4u662H61ofKzXR5JCHv0740qkQ0hORH~GqXOt8s7yQMVWYswZT4pWGSkJ9EehEQHCLo2uDVW-YSopwlSaaMRz~~0O~hkGAVE8sC~TAB7b5KnUgtNXl0jYTfGNTYO4GNJo1XhmHwj~Og~sBLDIXDxsg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)[^1] to do uncertainty quantification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is the source.

Copy link
Contributor Author

@dmetivie dmetivie Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very puzzled on why this is so much used in the UQ community. Some notations are not up-to-date with the current QMC community, and the review cited dates from 1988.
This phenomenon is somehow related to what is explained in the excellent paper On Dropping the First Sobol’ Point.
Basically, points are generated to have good low discrepancy properties all together, so one cannot play with half the sequence or dimension or remove even a point and hope to keep the good properties.
For what is done there this is even more clear since it is known Sobol points are not too good in some higher dimensions.

From Owen book

IMO, these practices should be progressively removed from QMC software. For example, in #75 something similar to Saltelli is proposed (+ the dropping zero story).

To nuance my words, for the use case M=2 of sensitivity analysis this is probably not as bad as I show for small enough cases.


[^1]: Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., & Tarantola, S. (2010). Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer physics communications, 181(2), 259-270.

## Example

```@example 2
using QuasiMonteCarlo, Random
Random.seed!(1234)
m = 4
d = 3
b = QuasiMonteCarlo.nextprime(d)
N = b^m # Number of points
pad = m # Can also choose something as `2m` to get "better" randomization

# 5 independent Randomized Faure sequences
QuasiMonteCarlo.generate_design_matrices(N, d, FaureSample(R = OwenScramble(base = b, pad = m)), 5)
```
81 changes: 73 additions & 8 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ using Pkg
Pkg.add("QuasiMonteCarlo")
```

## Example
## Get Started

### Basic API

```julia
using QuasiMonteCarlo, Distributions
Expand All @@ -21,12 +23,12 @@ ub = [1.0,20.0]
n = 5
d = 2

s = QuasiMonteCarlo.sample(n,lb,ub,GridSample([0.1,0.5]))
s = QuasiMonteCarlo.sample(n,lb,ub,UniformSample())
s = QuasiMonteCarlo.sample(n,lb,ub,GridSample())
s = QuasiMonteCarlo.sample(n,lb,ub,Uniform())
s = QuasiMonteCarlo.sample(n,lb,ub,SobolSample())
s = QuasiMonteCarlo.sample(n,lb,ub,LatinHypercubeSample())
s = QuasiMonteCarlo.sample(n,lb,ub,LatticeRuleSample())
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample([10,3]))
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample())
```

The output `s` is a matrix, so one can use things like `@uview` from
Expand All @@ -38,6 +40,54 @@ using UnsafeArrays
@uview s[:,i]
```

### MC vs QMC

We illustrate the gain of QMC methods over plain Monte Carlo using the 5-dimensional example from Section 15.9 in the [book by A. Owen](https://artowen.su.domains/mc/qmcstuff.pdf).
```@example MCvsQMC; continued = true
f₁(𝐱) = prod(1 + √(12)/5*(xⱼ - 1/2) for xⱼ ∈ 𝐱)
μ_exact = 1 # = ∫ f₁(𝐱) d⁵𝐱
```

One can estimate the integral $\mu$ using plain Monte Carlo, or Quasi Monte Carlo or Randomized Quasi Monte Carlo. See the other section of this documentation for more information on the functions used in the example.

```@example MCvsQMC
using QuasiMonteCarlo, Random, Distributions
using Plots; default(fontfamily="Computer Modern")
Random.seed!(1234)
d = 5 # Dimension (= prime base for Faure net)
b = 2 # Base for Sobol net
m_max = 19
m_max_Faure = 8
N = b^m_max

# Generate sequences
seq_MC = QuasiMonteCarlo.sample(N, d, Uniform()) # Monte Carlo i.i.d Uniform sampling
seq_QMC_Sobol = QuasiMonteCarlo.sample(N, d, SobolSample()) # Sobol net
seq_RQMC_Sobol = QuasiMonteCarlo.sample(N, d, SobolSample(R = OwenScramble(base = b, pad = 32))) # Randomized version of Sobol net
seq_RQMC_Faure = QuasiMonteCarlo.sample(d^m_max_Faure, d, FaureSample(R = OwenScramble(base = d, pad = 32))) # Randomized version of Faure net

# Estimate the integral for different n with different estimator μ̂ₙ
μ_MC = [mean(f₁(x) for x in eachcol(seq_MC[:, 1:b^m])) for m in 1:m_max]
μ_QMC_Sobol = [mean(f₁(x) for x in eachcol(seq_QMC_Sobol[:, 1:b^m])) for m in 1:m_max]
μ_RQMC_Sobol = [mean(f₁(x) for x in eachcol(seq_RQMC_Sobol[:, 1:b^m])) for m in 1:m_max]
μ_RQMC_Faure = [mean(f₁(x) for x in eachcol(seq_RQMC_Faure[:, 1:d^m])) for m in 1:m_max_Faure]

# Plot the error |μ̂-μ| vs n
plot(b.^(1:m_max), abs.(μ_MC .- μ_exact), label="MC")
plot!(b.^(1:m_max), abs.(μ_QMC_Sobol .- μ_exact), label="QMC Sobol")
plot!(b.^(1:m_max), abs.(μ_RQMC_Sobol .- μ_exact), label="RQMC Sobol")
plot!(d .^(1:m_max_Faure), abs.(μ_RQMC_Faure .- μ_exact), label="RQMC Faure")
plot!(n -> n^(-1/2), b.^(1:m_max), c = :black, s = :dot, label = "n^(-1/2)")
plot!(n -> n^(-3/2), b.^(1:m_max), c = :black, s = :dash, label = "n^(-3/2)")
# n^(-3/2) is the theoretical scaling for scrambled nets e.g. Theorem 17.5. in https://artowen.su.domains/mc/qmcstuff.pdf
xlims!(1, 1e6)
ylims!(1e-9, 1)
xaxis!(:log10)
yaxis!(:log10)
xlabel!("n", legend = :bottomleft)
ylabel!("|μ̂-μ|")
```

## Adding a new sampling method

Adding a new sampling method is a two-step process:
Expand All @@ -62,68 +112,83 @@ function sample(n,lb,ub,::NewAmazingSamplingAlgorithm)
end
end
```

## Contributing

- Please refer to the
[SciML ColPrac: Contributor's Guide on Collaborative Practices for Community Packages](https://github.com/SciML/ColPrac/blob/master/README.md)
for guidance on PRs, issues, and other matters relating to contributing to SciML.
- There are a few community forums:
- the #diffeq-bridged channel in the [Julia Slack](https://julialang.org/slack/)
- [JuliaDiffEq](https://gitter.im/JuliaDiffEq/Lobby) on Gitter
- on the [Julia Discourse forums](https://discourse.julialang.org)
- see also [SciML Community page](https://sciml.ai/community/)
- the #diffeq-bridged channel in the [Julia Slack](https://julialang.org/slack/)
- [JuliaDiffEq](https://gitter.im/JuliaDiffEq/Lobby) on Gitter
- on the [Julia Discourse forums](https://discourse.julialang.org)
- see also [SciML Community page](https://sciml.ai/community/)

## Reproducibility

```@raw html
<details><summary>The documentation of this SciML package was built using these direct dependencies,</summary>
```

```@example
using Pkg # hide
Pkg.status() # hide
```

```@raw html
</details>
```

```@raw html
<details><summary>and using this machine and Julia version.</summary>
```

```@example
using InteractiveUtils # hide
versioninfo() # hide
```

```@raw html
</details>
```

```@raw html
<details><summary>A more complete overview of all dependencies and their versions is also provided.</summary>
```

```@example
using Pkg # hide
Pkg.status(;mode = PKGMODE_MANIFEST) # hide
```

```@raw html
</details>
```

```@raw html
You can also download the
<a href="
```

```@eval
using TOML
version = TOML.parse(read("../../Project.toml",String))["version"]
name = TOML.parse(read("../../Project.toml",String))["name"]
link = "https://github.com/SciML/"*name*".jl/tree/gh-pages/v"*version*"/assets/Manifest.toml"
```

```@raw html
">manifest</a> file and the
<a href="
```

```@eval
using TOML
version = TOML.parse(read("../../Project.toml",String))["version"]
name = TOML.parse(read("../../Project.toml",String))["name"]
link = "https://github.com/SciML/"*name*".jl/tree/gh-pages/v"*version*"/assets/Project.toml"
```

```@raw html
">project</a> file.
```
Loading