Here is an R package of a reference class provides a Gradational Gaussian Distribution (GGD). It can approximates asymmetric frequency distributions or traces quantiles accurately with GGD models.
The Gradational Gaussian Distribution (GGD) (Named by the author on his own)
is one of continuous distribution models
for mainly modeling asymmetric unimodal data which do not follow a normal distribution.
Without
The GGD is alike the Gaussian mixture model (GMM) but different. The GMM is represented by linear combinations of some normal distributions, and is often used for clustering of mixed data. On the other hand, the GGD is a distribution of which mixes some normal distributions with gradually changing ratio along the x-axis or y-axis directions, and treat non-normal distribution data as it is. Please remark that the GGD is not a convolution of normal distributions.
The GGD model may be applied as a distribution model of data which is a bit like a normal distribution but never follows any normal distribution model because some effects by (hidden) continuous parameters.
This package provides following GGD models:
- Horizontal Gradational Distribution
- Vertical Gradational Distribution (with 2 or 3 components)
- Horizontal-Vertical Gradational Distribution
A horizontal gradational distribution is a distribution model in which the mixing ratio of two normal distributions varies gradually along the x-axis. This model is suitable for representing left- or right-skewed distributions.
We write a sign of a horizontal gradational distribution as
Generally it is expressed as
where
Therefore,
Here,
A (2-component) vertical gradational distribution is a distribution model in which the mixing ratio of two normal distributions varies gradually along the y-axis. This model is suitable for representing heavy-tailed or flat-topped or very sharp distributions.
We write a sign of a vertical gradational distribution as
Here, we call
Generally it is expressed as
where the mixing ratio functions
where
Here,
and inversely,
Or, we can write as
Normalizer
Normally, the shape of the probability density function of
However, note that
This is why I wrote "at least as an image" earlier for about
the shape changing of the probability density function of
You can divide the tail-side distribution along x-axis into left (lower) side and right (upper) side.
In other words, we can consider a skewed distribution model in which the probability density function
gradually varies from that of
In this case, we write a sign of the distribution as
where the mixing ratio function
and
Here, we call
Two vertical GGDs
In this case, we write a sign of the distribution as
where
where each
This model is suitable for, for example, representing skewed and heavy-tailed distributions.
This package can generate objects for the following kinds of distribution models:
- Normal Distribution
- Mean of 2 Normal Distributions (a kind of Gaussian mixture model)
- Horizontal Gradational Distribution
- Vertical Gradational Distribution (2 or 3 components)
- Horizontal-Vertical Gradational Distribution
The 0 and 1 of above are not kinds of GGD. They can be used as criteria to determine if it is appropriate to use GGD as a distribution model for the data.
Each of 1 to 4 of above can be further classified according to the conditions of the normal distribution of the components as follows:
- Mean-Differed Sigma-Equaled: a distribution with components of different means and equal standard deviations
- Mean-Equaled Sigma-Differed: a distribution with components of equal means and different standard deviations
- Mean-Differed Sigma-Differed: a distribution with components of both means and standard deviations are different
Therefore, there are a total of 16 kinds of distribution models.
The larger the number of the kind, the more degrees of freedom and the more complex distribution can be represented. But simple models may be easier to use for analyzing data.
Type | Name | Overview |
---|---|---|
Generator | ggd.nls.freq | Generates a GGD object that approximates a frequency distribution. |
〃 | ggd.nls.freq.all | Approximates a frequency distribution with all supported distribution models. |
〃 | ggd.trace.q | Generates a GGD object that traces quantiles. |
〃 | ggd.set.cmp | Generates a GGD object with indicated components. |
Field | median | The median value of the distribution. |
〃 | mean | The mean value of the distribution. |
〃 | sd, usd, lsd | The standard deviation, upper semi-standard deviation and lower semi-standard deviation. |
Method | d | Returns the values of the probability density function. |
〃 | p | Returns the values of the cumulative probability function. |
〃 | q | Returns the values of the quantile function. |
〃 | r | Returns random samples following the distribution. |
〃 | tex | Displays the formulas of the probability density function and the cumulative distribution function in TeX format. |
〃 | read.csv | Read the composition of a GGD object from a CSV file. |
〃 | write.csv | Write the composition of a GGD object to a CSV file. |
The mean and standard deviation are calculated using the dnorm and pnorm functions in 'stats' package and four arithmetic operations (semi-standard deviations of the horizontal-vertical gradational distribution are computed with numerical integration). Therefore, the accuracy of the them depends on the dnorm and pnorm functions.
# Install devtools from CRAN install.packages( "devtools" ) # Then use devtools::install_github( "user/repository" ) to install cgd package from GitHub devtools::install_github( "Kimitsuna-Goblin/ggd" )
The probability density function
This is the so-called normal distribution. This is provided to compare the adequacy of data modeling between a normal distribution and GGD models. Also, it can trace 2 quantiles (e.g., tertiles) with the cumulative density function.
The probability density function
This is a kind of the Gaussian mixture model (GMM). It is provided to compare the adequacy of data modeling with GMM and GGD. Also, it can trace 3 or 4 quantiles with the cumulative density function.
The probability density function
The horizontal gradational distribution model can trace left- or right-skewed 3 or 4 quantiles with the cumulative density function.
Sample images of probability density functions are:
The probability density function
In the following expressions,
3-1. Vertical gradation with 2 components
3-2. Vertical gradation with 3 components
The vertical gradational distribution is a model that emphasizes the kurtosis of the distribution.
It can trace from 3 to 6 quantile points with the cumulative distribution function,
but is not suitable for tracing equally spaced few number quantiles such as
Sample images of probability density functions are:
- 2-component models
- 3-component models
The probability density function
In the following expressions,
The horizontal-vertical gradational distribution has the most degrees of freedom in this package
and can represent the most complex distributions.
This model can trace 5 to 8 quantiles with the cumulative density function.
For example, quantiles of
More than 8 quantiles cannot be traced with any models in this package. If you have more than 8 quantiles, make a frequency distribution and try ggd.nls.freq instead.
Sample images of probability density functions are:
The GGD model was invented by the author of this package, but I think that someone had come up with the GGD already and there may be some prior researches. If you have any research information about this model, please let me know.