Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change discrete distributions to standard #3192

Draft
wants to merge 1 commit into
base: cornu/random/continuous_to_standard
Choose a base branch
from

Conversation

alkino
Copy link
Member

@alkino alkino commented Nov 9, 2024

Discrete:

  • Binomial
  • Discrete Uniform
  • Poisson
Computation for binomial
KS test 2 samples from old to new
D-=0.0024800000000000377, pvalue=0.9174075702167388
19.0
Generating graph...
Over
Computation for discunif
KS test 2 samples from old to new
D-=0.0030100000000000127, pvalue=0.7542898827264148
Generating graph...
Over
Computation for poisson
KS test 2 samples from old to new
D-=0.00269999999999998, pvalue=0.8582934803778028
poisson
Generating graph...
Over

Script to generate values:

from neuron import h
import pickle

r = h.Random()

nrun = int(1e5)

def generate_data(name, *args):
    fun = getattr(r, name)
    fun(*args)
    hist = []
    for i in range(nrun):
        j = r.repick()
        hist.append(j)
    with open(f"{name}.data", "wb") as h:
        pickle.dump(hist, h)

# Discrete
# generate_data("binomial", 20, .5)
# generate_data("discunif", 0, 10)
# generate_data("poisson", 3)

# Continuous
generate_data("negexp", 0.5)
generate_data("normal", -1, .5)
generate_data("lognormal", 5, 2) # mean = 5, variance = 2
generate_data("uniform", 0, 2)
generate_data("erlang", 5, 1)
generate_data("weibull", 5, 1.5)

# Not implemented
# generate_data("geometric", .8)
# generate_data("hypergeo", 10, 150)

Script to generate graphs:

import pickle
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import os.path
from math import pow, sqrt, log, exp

stats_name = {
        'erlang': {'fun': stats.erlang, 'args': [25, 0, 1/5]},
        'lognormal': {'fun': stats.lognorm, 'args': [sqrt(log(2/(5*5) + 1)), 0, 5*5/sqrt(2+5*5)]},
        'negexp': {'fun': stats.expon, 'args': [0, 0.5]},
        'normal': {'fun': stats.norm, 'args': [-1, sqrt(.5)]},
        'uniform': {'fun': stats.uniform, 'args': [0, 2]},
        'weibull': {'fun': stats.weibull_min, 'args': [5, 0, pow(1.5, 1/5)]},
        }
def plot(name):
    print(f"Computation for {name}")
    with open(f'old/{name}.data', 'rb') as f:
        old_data = pickle.load(f)
    with open(f'new/{name}.data', 'rb') as f:
        new_data = pickle.load(f)

    if name in stats_name:
        fun = stats_name[name]["fun"]
        args = stats_name[name]["args"]
        print("KS test 2 sample")
        res = stats.ks_2samp(old_data, new_data)
        print(res)
        print(f"KS test 1 sample from old compare to {name}")
        res = stats.ks_1samp(x=old_data, cdf=fun.cdf, args=args)
        print(res)
        print(f"KS test 1 sample from new compare to {name}")
        res = stats.ks_1samp(x=new_data, cdf=fun.cdf, args=args)
        print(res)
        if name == 'negexp' or name == 'normal' or name == 'uniform':
            old_loc, old_scale = fun.fit(old_data, floc=0)  # floc=0 fixes the location at 0, as lognormal is defined on (0, inf)
            new_loc, new_scale = fun.fit(new_data, floc=0)  # floc=0 fixes the location at 0, as lognormal is defined on (0, inf)

            print(f"Fitted old parameters: scale={old_scale}")
            print(f"Fitted new parameters: scale={new_scale}")
        else:
            old_shape, old_loc, old_scale = fun.fit(old_data, floc=0)  # floc=0 fixes the location at 0, as lognormal is defined on (0, inf)
            new_shape, new_loc, new_scale = fun.fit(new_data, floc=0)  # floc=0 fixes the location at 0, as lognormal is defined on (0, inf)

            print(f"Fitted old parameters: scale={old_scale}, loc={old_loc}, shape={old_shape}")
            print(f"Fitted new parameters: scale={new_scale}, loc={new_loc}, shape={new_shape}")

        NUM_BINS = 2000  # Increase this number for smaller bins
    else:
        NUM_BINS = 'auto'

    print("Generating graph...")
    plt.hist(old_data, bins=NUM_BINS, density=True, alpha=0.6, color='b')
    plt.hist(new_data, bins=NUM_BINS, density=True, alpha=0.6, color='g')
    plt.title(f"Histogram of Data and Fitted {name} Distribution")
    plt.savefig(f"{name}_comparison.png")
    plt.clf()
    print("Over")

if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1:
        plot(sys.argv[1])
    else:
        for n in stats_name.keys():
            if os.path.is_file(os.path.join("old", f"{n}.data")) and os.path.is_file(os.path.join("new", f"{n}.data")):
                plot(n)

@bbpbuildbot

This comment has been minimized.

@alkino alkino changed the base branch from cornu/random/cpp11 to master November 11, 2024 15:04
@alkino alkino marked this pull request as ready for review November 11, 2024 15:04
Copy link

✔️ 7bd2d12 -> Azure artifacts URL

@bbpbuildbot

This comment has been minimized.

Copy link

✔️ a052717 -> Azure artifacts URL

@bbpbuildbot

This comment has been minimized.

Copy link

✔️ f6bd650 -> Azure artifacts URL

@bbpbuildbot

This comment has been minimized.

Copy link

✔️ 8d73abe -> Azure artifacts URL

@cattabiani
Copy link
Member

cattabiani commented Nov 12, 2024

The graphs are good and everything but... eyeballing graphs does not give an objective evaluation.

If you want to be more thorrough I suggest to use a "goodnless of fit" test. For steps for example we used the Kolmogorov–Smirnov test. Here the nice suite in python.

They all work in this way:

  • Hypothesis: "my sample comes from this distribution"
  • try to prove that this hypothesis is improbable. Do the test
  • compare the p-value to some threshold that you deemed good. Usually 5%
  • fail to disproof the hypothesis
    -> demonstrated that the sample comes from that distribution!

It is a matter of a few more lines of code in python. If you are not sure on how to do it I am sure that if you ask "add the ks test matching the distribution X" and append your current code to chatGPT it can help you further with the details

@bbpbuildbot

This comment has been minimized.

@bbpbuildbot

This comment has been minimized.

Copy link

✔️ de2b6ed -> Azure artifacts URL

@alkino alkino changed the title Use binomial from C++ stdlib Change most of distributions to use standard library Nov 13, 2024
Copy link

✔️ 2bb8a05 -> Azure artifacts URL

@bbpbuildbot

This comment has been minimized.

@alkino alkino changed the title Change most of distributions to use standard library Change discrete distributions to standard Nov 15, 2024
@alkino alkino marked this pull request as draft November 15, 2024 12:13
@alkino alkino changed the base branch from master to cornu/random/continuous_to_standard November 15, 2024 13:37
Copy link

sonarcloud bot commented Nov 15, 2024

Copy link

✔️ 6e07a0d -> Azure artifacts URL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants