Assumption: Set theory and proofs familiarity
Source(s):
Most of this material is derived from "Mathematical Statistics" by Wackerly.
Some of this material is also derived from "Probability and Statistics for Engineering and the Sciences", by Jay Devore, but to a much lesser degree.
The Wackerly book has more formulas (instead of tables), introductions to the mn rule, and other important concepts of combinatorics and statistics.
-- Part 1: The Basics --
- Probability Definition: Events, Sample Points and Sequencing Events Techniques
- How to calculate probability: Combinations, Permutations, Bayes Theorem
- Expected Value, Variance, Standard Deviation, Quartiles
- Discrete Random Variables
- Discrete Probability Distributions: Binomial
- 5b. Bernoulli
- Discrete Probability Distributions: Geometric
- Discrete Probability Distributions: Hypergeometric
- Discrete Probability Distributions: Negative Binomial
- Discrete Probability Distributions: Poisson
- Continuous Random Variables
- Probability Distributions "Distribution Functions" for all types of variables
- What is Density? A Mathematician's Perspective (and prep for Density Functions)
- Probability Density Functions: PDF
- Expected Value for a Continuous Random Variable
- Cumulative Distribution Functions (CDFs)
- Uniform Probability Distribution
- Gamma and Exponential Distributions
- Multivariate (Bivariate, Joint) Probability Distributions
- Marginal and Conditional Probability Distributions
- Independent Random Variables
- Expected Value of a Function of Random Variables
- Covariance of Two Random Variables
-- Part Two: Estimation and Application --
- 2b. Z Scores
- 2c. Central Limit Theorem
Probability is the likelihood that an event will occur.
Events The probability of an event
E
is the cardinality of the event|E|
divided by the cardinality of the sample space|S|
(the "universe",S
,) that the event is in.
• For any event, the probability is nonnegative.
• Probability of entire sample space is
• The likelihood of at least 1 event occurring is the sum of all events.
Law of Total Probability:
Law of Conditional Probability:
Independent Events:
One really interesting quality about independent events is that reliant events are dependent;
"negative number" versus "positive number" are dependent events.
Mutually Exclusive is not Independent
Take the "negative number" versus "positive number" setup. "if A, then not B".
Here, events are dependent, and mutually exclusive.
Multiplicative:
If A and B independent,
Additive:
If A and B are mutually exclusive,
•
• The complement of "at most one" is "at least two."
• The complement of "at least one type" is "only one type."
The Wackerly probability book is great, and describes the sample-point method for calculating probability.
One example is to toss a pair of dice. The sample space, via the mn rule
, is
There will be a list of events such as
See the Wackerly book for more details on this technique, as well as sequenced events.
Another technique, after sample point technique, is sequenced events.
Ordering n items:
$n!$ ways.
Combinations: Order Doesn't Matter
Examples: Out of the set S = {A, B, C}
, a combination set would include AAA
, AAB
, ABC
, .... etc, and ABA = BAA
because order doesn't matter. When order doesn't matter, you don't need to count as many things, e.g. if AAB
is equivalent to ABA
, then those items count as one element of the set, not two.
Permutations: Order Matters
Note that the denominator is smaller than in combinations. Permuations possibilities are much larger because order matters, so we have to count it all.
Examples: Out of the set S= {A, B, C}
, a combination set would include AAA
, AAB
, ABC
, .... etc, and ABA != BAA.
Bayes Theorem:
Usually used for inversion techniques. "Find probability of a cause, given effect."
Let
Then,
Cardinality
Cardinality is the number of elements in a Set.
Expected Value,
$\mu$ or$E[Y]$ : The average
Expected value or mean is a calculation whose computation will differ depending on the probability distribution technique.
Variance,
$\sigma^2$ : Dispersion From the Mean
Variance is a measure of how far a set of numbers "spreads out" from the mean or average value.
Standard Deviation,
$\sigma$ : Amount of variance from the mean
A low standard deviation means values are close to the mean, and high standard deviation, more distributed values.
Quartiles:
A measure in statistics; we've heard "upper quartile", etc. There are three actual quartiles, first is 25th percentile, then 50th (median) and 75th; the four quartiles are just data that fits around those quartiles.
Expected Value or Mean of a Discrete Random Variable
Variance of a Discrete Random Variable,
$\sigma^2$
Hacking variance:
$Var[Y] = E[Y^2] = [E(Y)]^2$
A trick that's nice to know.
Standard Deviation of a Discrete Random Variable,
$\sigma$
Scalar, discrete values of probability. Stepwise functions. Best described via pmf.
pmf: Probability "mass" function
A pmf measures the scalar value of a discrete variable; the probability that a discrete random variable has a particular value.
This could be denoted as P(Y = y)
, or more concretely, P(Y = 1)
for example.
Probability mass functions will depend on the particular problem you're trying to solve.
Axioms of pmf's and discrete random variable probabilities:
- Each possible value of the random variable must be assigned a nonzero probability;
- All of the probabilities must sum to a total probability of
1
.
The binomial distribution is identical, independent trials. These are uniform experiments of a series of failures and successes, for example
Distribution:
Using the binomial probability distribution formula, we know that for
the pmf represented by:
Or, more canonically, let
for
Mean, Variance, Std Deviation of Binomial:
The Bernoulli distribution is considered a special case of the binomial distribution, with
Bernoulli random variables, or distributions, are considered the simplest. This is a binary random variable with "success" denoted as p
and "failure" as 1-p
, or just q
where q = 1-p
.
-
PMF:
$f(x;p) = p$ if success,$f(x;p) = 1-p = q$ if failure. -
Mean:
$\mu = 1-p$ -
Variance:
$\sigma^2 = p(1-p) \Rightarrow pq$
The geometric probability distribution is built on the binomial distribution idea; that of a series of uniform trials occurring of successes and failures; the geometric distribution of a random variable is where value
Looking at the sample space (Wackerly 3.5), we see that
...
$E_k: F, F, F .... S $ with success on
where there are
As such,
Geometric Probability Distribution:
Mean, Variance, Std Deviation of Geometric Distribution:
Proofs for these are in the Wackerly book chapter 3.5 and are interesting.
Distribution:
For random sampling of sample size
The denominator: counting the number of ways to select a subset of
Then for the numerator, we think of
Mean, Variance, Std Deviation of Hypergeometric:
Then if we define
Note the factor
As
So for larger population sizes, the variance of the hypergeometric distribution is the same as binomial, e.g.
As
then obviously the hypergeometric distribution variance is smaller than that of the binomial distribution, as we'd have variance of
Having lesser variance can be a good thing, so we can see how the hypergeometric distribution is useful for cases where the sample size approaches the population size. "For sampling from a finite population" such as, quality control, genetic hypothesis testing, or statistical hypothesis testing.
Recall the geometric distribution, which is finding the probability of the first success. The negative binomial distribution focuses on the use case for multiple successes occurring.
Depending on the textbook you are using, this is either counting the number of failures, or counting the trial where the $r$th success occurs.
The "rth success".
Distribution (TODO): (case 1, Wackerly)
Distribution (TODO): Case 2, Devore
The Poisson probability distribution, used for rare events over a period of time, is also used to approximate the binomial distribution since the binomial distribution converges to the Poisson distribution. The Poisson distribution can approximate the binomial distribution in use cases for: large
The Poisson distribution's probability function is
Continuous random variables are defined on a continuum, e.g. an interval.
Take the real number line
Hence, axioms of probability for continuous variables cannot be similar to those of discrete.
- If each possible value of the random variable must be assigned a probability,
- And each possible value is a subset of an infinite set within an interval,
- Then the probabilities cannot all sum to 1, as they are infinite.
- Therefore a new set of axioms for continuous random variables must be defined, as follows.
From Wackerly 4.2, this is an important note about the definition of distribution functions, because distribution functions, e.g. cumulative distributions or probability distributions, can be for ANY random variable, whether discrete or continuous:
"Before we can state a formal definition for a continuous random variable, we must define the distribution function (or cumulative distribution function) associated with a random variable."
Let
Y
denote any random variable. Then,F(y) = P(Y <= y)
, for example,P(Y <= 2)
.
The nature of the distribution function associated with a random variable, determines whether the variable is discrete or continuous.
- Discrete random variables have a stepwise function.
- Continuous random variables have a continuous function.
- Continuous random variables have a smooth curve graph that is the result of histograms, or Riemann summations.
-
Variables are continuous if their distributions are, and, lots of real analysis continuity stuff, regarding "absolute continuity." More importantly,
-
For a continuous random variable
Y
, then$\forall y \in \mathbb{R}, P(Y = y) = 0$ , that is,
Continuous random variables have a zero probability at discrete points.
Wackerly uses the example of daily rainfall; probability of exactly 2.312 inches, a discrete point, is quite unlikely; probability of between 2 and 3 inches is quite likely; an interval.
Semantics and Idioms of
R
language for probability distributions: Considered separate from pure mathematical theory.
Note in R, the "density function," invoked via dhyper(y, r, N-r, n)
, this function measures a discrete random variable's scalar value, such as our hypergeometric example in R; there's a bit of oddness here, since we've used this function for discrete random variables.
Also in R, the "probability distribution function" is invoked via phyper(4, r, N-r, n)
.
And a preparation for density functions in probability.
Note: This is often considered grad-student level Real Analysis work, and the real numbers can arguably be constructed in various ways; the Dedekind cuts are merely my personal favorite.
I ran across this material with Jay Cummings' Real Analysis book, this is a book that's $20 on Amazon and used by the Wrath of Math (excellent Youtube math channel).
If you'd prefer to have a social life, you can skip this section, but frankly, without density in Real Analysis, density functions in probability are a bit nonsensical to me.
Recall Real Analysis, and that the real numbers can be constructed via Dedekind cuts of rational numbers link; recall that "rationals are dense in the reals," stack exchange, Wikipedia dense set and topology here.
We could also say "density of
Basically, there are a lot of "density" discussions with the real numbers, as such.
Take any interval on the real number line. "Subdivide" that interval into many "subdivisions."
There are "infinite" real numbers, or subdivisions, in that interval (arguably countable or uncountable).
The big picture is, they're infinite, or close enough to infinite that it doesn't matter.
This is what "density" looks like. (The articles above are about this, regarding the real numbers, as well as rational and irrational numbers, and constructing the real number line from a hybrid of rational and irrational numbers like Dedekind, which is very fun Real Analysis stuff).
So, that's what "density" is: take an interval on the real number line, subdivide it quite a lot into infinite subdivisions, and hey, that's "dense."
Continuous variables are analyzed on an interval, so we care about density in that interval, as the previous section discusses.
PDF: Probability Density Function
A PDF is a function that provides a "likelihood" that a continuous random variable's value is close to that of the value of a sample, or multiple samples.
For more on PDFs, see Wikipedia PDF article.
Probability density: Probability per unit length that RV is near one or more samples.
Probability density is the probability per unit length, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there is an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample." wikipedia
PDF formula: The PDF of continuous random var
$Y$ is the function$f(y)$ , such that
for interval
$[a,b], a \leq b$ ,
$P(a \leq Y \leq b) = \int_a^b f(y) dy$ .
That is, the probability that the continuous random variable is within an interval,
is the area under the curve of the density function between
PDF Axioms:
- The total area under the curve of $f(x)$, from $(-\infty, \infty) = 1$:
That is,
Continuous variables have a "smooth curve" graph
This axiom is analogous to the discrete RV's having all probabilities sum to 1 discretely.
-
$f(x) \geq 0, \forall x$ . All probabilities of the PDF function are positive.
Mean or Expected Value of a continuous random variable:
Similarly, for
Variance of a continuous random variable with PDF
$f(x)$ :
The CDF for a continuous random variable
Using $F(x) to compute probabilities:
Let
Then,
Relating PDF and CDF via fundamental theorem of calculus:
If
Then,
In a uniform distribution, every possible outcome is equiprobable - for example, handing out a dollar to random passersby without discernment.
Uniform Distributions look like a "block" most of the time, where probability is constant within an interval.
Uniform Distributions for Discrete Random Variables
The probability is 1, divided by total outcomes.
Use cases include the possible outcomes of rolling a 6-sided die,
probability of drawing a particular suit within a deck of cards,
flipping a coin, etc.
All of these are equiprobable discrete cases.
Uniform Distributions for Continuous Random Variables
This can include a random number generator, temperature ranges, and many use cases with an infinite number of possible outcomes within an interval of measurement.
For the continuous random variables, we'll present the probability density function, the cumulative distribution function, and mean and variance.
PDF:
PDF of uniform distributions is
In the uniform distribution, the probability over a subinterval is proportional to the length of that subinterval.
CDF:
$\mu, \sigma^2$ :
The gamma distribution, like the Poisson, is often used for waiting times and other measurements during temporal intervals.
Exponential Distribution:
With scale param
-
$\mu = \dfrac{1}{\lambda}$ , and$\sigma^2 = \dfrac{1}{\lambda^2}$ -
PDF:
$f(x, \lambda) = \lambda e^{-\lambda x}, x \geq 0$ , else$0$ -
CDF:
$F(x, \lambda) = 1 - e^{-\lambda x}, x > 0$ , else$0$
Gamma Distribution
With params
- PDF:
$f(y; \alpha, \beta) = \dfrac{y^{\alpha - 1}e^{-y/\beta}}{\beta^{\alpha}\tau(\alpha)}$ ,
where gamma function
-
PDF, Standard Gamma Distribution (
$\beta = 1$ ):$f(y; \alpha) = \dfrac{y^{\alpha - 1}e^{-y}}{\tau(\alpha)}$ -
CDF:
$F(y, \alpha) = \int_0^{y} \dfrac{y^{\alpha - 1}e^{-y}}{\tau(\alpha)}$ -
$\mu = \alpha\beta$ -
$\sigma^2 = \alpha\beta^2$
Until now we've seen univariate probability distributions. The same basic axioms and rules tend to apply to multivariate distributions.
Example: toss a pair of dice.
The sample space by the mn
rule is
with events such as
Hence, the bivariate probability function is
Joint or Bivariate PMFs for discrete random multiple variables is their sum:
- Axioms: Probabilities all nonzero, and all probabilities sum to 1.
Example: For tossing two die, find
Simply sum the probabilities:
Joint or Bivariate CDFs for two jointly continuous random variables is a double integral:
"To find p1(y1), we sum p(y1, y2) over all values of y2 and hence accumulate the probabilities on the y1 axis (or margin)." - Wackerly
Bivariate events such as
Marginal Probability Functions: Fix one var, iterate (sum, integrate) over the other; accumulate.
-
Discrete PMF:
$p_x(x) = \sum_{\forall y} p(x,y), \forall x$ . -
Continuous CDF:
$f_x(x) = \int_{\forall y} f(x,y) dy$ .
We know that bivariate or joint events such as
Generally,
Less generally:
Conditional: Discrete:
$P(y_1, y_2) = P(y_1 \cap y_2) = P(Y_1 = y_1, Y_2 = y_2)$
Conditional: Continuous:
$P(y_1 | y_2) = P(y_1 \cap y_2) = P(Y_1 \leq y_1 | Y_2 = y_2)$
If Y1 and Y2 are independent, the joint probability can be written as the product of the marginal probabilities:
This is the same as in univariate situations, just multiply the variable value by the (density/mass/PDF/pmf) function.
Covariance and Correlation are measures of dependency. The larger the covariance, the larger the correlation (zero covariance, zero correlation).
If
After some algebra, we can see that's also
Positive covariance indicates proportionality; negative indicate inverse proportionality.
Since covariance is hard to use, we often use the correlation coefficient instead:
Or, "why did we wait until now to talk about the normal distribution and z scores?"
The answer is, because we use those things for estimation, and they belong best together in an introduction. The normal distribution is the most frequently used probability distribution. We'll learn about that, and then about z scores and moments which feed into estimation.
Z scores also help us with confidence intervals and estimation.
This is the famous "bell curve," the most widely used probability distribution, where the mean is at the center, and standard deviation depicts width around that mean of the curve, indicating its variance - or, its volatility. This relation to volatility helps us understand the bell curve's importance in measuring the relative stability of a metric.
The normal distribution is common in statistics, economicics and finance.
The little underlying standard deviations from the mean create the bell shape.
Normal Distribution for a continuous random variable has the PDF:
Parameters of the Normal Distribution:
$\mu, \sigma$
We consider
we consider
The notation
Area under the normal density function from a to b:
R code: pnorm, qnorm
Solving for the Normal Distribution in R
dnorm: density function of the normal distribution
pnorm: cumulative density function of the normal distribution
qnorm: quantile function of the normal distribution
rnorm: random sampling from the normal distribution
This is the normal distribution, with param values
The PDF of a random continuous variable with standard normal distribution is:
"Z score / Z Value / Standard Score"
From the population (mostly theoretical or edge cases)
Z Scores are called so many things, but they all mean the same thing: the distance of an observed value from the statistical mean. Theoretically, this would be the population mean, although that is hard to measure as we will cover.
where the observed value is
Z Scores represent how far an observed value is from the statistical mean (recall
again that population mean and std dev can be difficult to get to, so we say
"statistical mean" to indicate this abstraction). If a Z Score is 1, that means
the observed raw value
From the sample (most actual practice)
Outside of "standardized testing" where an entire population is measured (including its mean and standard deviation). So, often, population mean and standard deviation are unknown. In these cases, we use sample statistics instead of population statistics.
Using sample stats,
Unfortunately many statisticians will not make clear the very important difference
between population and sample statistics in their Z scores, making it confusing
to figure out what they are talking about. You will often see
Z Curve The "z -curve" is the standard normal curve.
Z-scores: How many std dev from the mean a value is; areas under the curve
68-95-99 rule:
68% of the distribution is within one standard deviation; 95% within two; 99% within three.
So,
• 68% of all scores:
• 95% of all scores:
• 99% of all scores:
• and 50% of all scores:
Z-notation for z-critical values; percentiles
The
Standardizing (nonstandard) distributions:
$\mu = 1, \sigma = 1$
Recall distance from the mean in standard deviations was
This is similar; the "standardized variable Y" is
• Subtracting
• Dividing by
Standard normal distribution axioms:
•
Then, when we see
$\phi$ , that means to use probability distribution tables:
•
•
• The CDF of Z =
**Please note the normal distribution markdown file to see an application
of the axioms of std normal distribution, as that is the best way to learn.**
Standard Normal Approximation of Binomial:
An interesting quality of the normal distribution is that its curve approximates the histogram Riemann-sums-like binomial distribution when a random variable under the binomial distribution has histograms that aren't "too skewed". For these cases, use the normal approximation.
Normal approximation:
This approximation is adequate if
"For large enough n, things are normal."
For a large enough sample size of n, usually
If the sample size is large,
Moments of a probability distribution include:
Moment 1: Expected value (mean)
-
The first population moment is
$E(X) = \mu$ ; -
The first sample moment is
$\overline{x} = \dfrac{1}{n} \sum X_i$ .(This makes sense as an average of many points in the sample.)
Moment 2: Variance
- The second population moment is
$E(X^2) = \sigma^2$ ; - The second sample moment is
$\overline{x}^2 = \dfrac{1}{n} \sum X_i^2$ or$s^2$ .
Moment 3: Skewness (whether the data is skewed to the left or right of the mean), e.g. asymmetry about mean;
Moment 4: Kurtosis ("tail-ness").
Moment k:
-
The kth population moment is
$E(X^k)$ ; -
The kth sample moment is
$\overline{x}^k = \dfrac{1}{n} \sum X_i^k$ .
These moments will be fundamental to techniques of estimation that follow.
The purpose of statistics is to make inferences about data, and conclusions. We make inferences about a population, and its sample(s). All the data, is not always known.
So, we use parameters to pass into functions - joint probability functions, estimation functions, and so on, in order to estimate, and infer, data.
When we create estimates they can be a scalar point estimate, or a prediction interval, e.g. a confidence interval.
Notation:
We use
Now, remember when we talked above in the "z scores/standardization" section about statistics often using population or sample data, mixed, without much explanation. Here we see that in play:
"One example is
That statement is really saying the following:
- there is a population mean
$\mu$ . - By changing
$\mu$ into$\hat{\mu}$ , we are saying "what is the _estimator of$\mu$ ? - That question is answered on the RHS of the equation, by
$\overline{x}$ , the sample mean.
This completely squares with what we discussed earlier, using sample data to estimate (often unavailable) population data.
TODO finish this section
Unbiased Estimator
Bias of a point estimate is
Good Estimator: Minimum Variance Unbiased Estimator (MVUE)
A good estimator has minimal variance, and has a "skinny" scatter about the mean.
MUVE:
Recall also variance derivations, often used in estimation techniques:
Method of Estimation: Method of Moments
Typical problem statement: "Use method of moments to obtain an estimator for
$\theta$ ."
For the first population moment,
For the second population moment,
Let the kth population moment be
As covered in Devore 6.2, the method of moments estimator is obtained by equating the expected value
As covered in Wackerly, the nth raw moment (about zero) of a random variable
-
$<X^n> = \sum_i X_i^n * f(x_i)$ for a discrete distribution,... very similarly to evaluating a discrete PMF,
$f(x,...) = x*f(x)$ ; -
$<X^n> = \int (x-\mu)^n * f(x) dx$ for a continuous distribution, similar to CDF.
That's it. That is the "method of moments" technique for obtaining estimators for
Method of Estimation: Method of Maximum Likelihood
Typical problem statement: "Use method of maximum likelihood to obtain an estimator for
$\theta$ ."
Process:
• (1) Take the distribution function, e.g. the PDF or CDF. This is also called
the likelihood function
. Same thing.
This would be a joint PMF/PDF/CDF. Recall that joint probability looks like
Notably this joint probability is a product
• (2) Take its natural log. Why? Because it's easier due to logarithm rules as follow.
Recall that the joint probability is a product. Logarithm rules apply,
"The log of a product
Notation and flow for this step would generally be something like:
• (3) Take its derivative and set that to equal for "maximum value."
We are taking the derivative of the log function in the last step. Set it to zero.
• (4) Solve for
That's it.
Confidence intervals are another way to obtain estimates. The confidence interval, or "interval estimator," is a rule by which we get the limits/endpoints. We desire:
- A narrow interval;
- That actually encloses the desired parameter,
$\theta$ .
Confidence coefficient: "$1-\alpha$"
Probability that a confidence interval will enclose the desired parameter
Or, "the fraction of the time, with repeated sampling, that the interval contains
A high confidence coefficient means, high confidence. Moving forward with sampling,
we can be confident that our resulting confidence interval contains
Two sided confidence interval:
Very similarly to real analysis and delta-epsilon infimum and supremum, let
the probability of the interval between lower limit
One sided confidence interval:
Let
less abstractly,
Let
less abstractly,
Finding a confidence interval
Recall the standard normal distribution axioms, distance from the mean in standard
deviations was
Subtracting
This will be similar. The quantity
Let's look at our probability, by selecting tail area values
of
Where our endpoints comprise the LHS and RHS of the inequality. This is quite abstract,
so let's note that
[EX] Let the parameter of interest,
We know its estimator is the sample mean,
Now, regarding this: "$\sigma_{\hat{\theta}}$." This is the estimator for standard deviation
of the population,
Putting this all together, we have that
This is also called a
$100(1-\alpha)%$ confidence interval for$\mu$ .
When
$\sigma$ is known,
a
the point estimate of
Remember you can always replace
When
$\sigma$ is unknown and$n < 30$ , use the t-distribution:
Instead of "z critical scores"
The t-distribution is not a normal distribution; since it uses small
The t-distribution is controlled by parameter "degrees of freedom." This can be
notated
- When
$\nu = 1$ , the t-distribution becomes Cauchy with very heavy tails. - When
$\nu \rightarrow \infty$ , the t-distribution converges to the standard normal distribution, with very light tails.
These principles are also related to kurtosis.
R code related to the t distribution:
-
$dt$ (PDF value), -
$pt$ (CDF value), returns value to the left, or to the right if$pt(x, df, lower.tail = FALSE)$ -
$qt$ t-distribution's quantile.$qt(x, df)$ , e.g.#find the t-score of the 99th quantile of the Student t distribution with df = 20 qt(.99, df = 20)
Or better yet,
$qt(1 - \dfrac{\alpha}{2}, df = n-1)$ . -
$rt$ (ret value is vector of random variables).
Confidence Interval Width: Why we sometimes choose
$90%$ over$99%$
Wider intervals are more reliable, but less precise; we desire narrow intervals whenever possible. For this reason, we often specify our desired CI and interval width, and our output is the required sample size for such a CI and interval width.
(This is also reminiscent of the minimal variance theories of estimation).
The sample size required for the confidence interval to have width $w$ is:
Final notes on Confidence Intervals
- According to the relative frequency viewpoint of probability, many experiments
need to be applied/performed. Can't just do one experiment and say that you have
a
$95%$ confidence interval. That theory must be tested.