QCEasy Logo

Definitions

Key Statistics Terms, Definitions and Explanations

What is a Bernoulli distribution and when should I use it?

The Bernoulli distribution is ideal for modeling simple binary outcomes.

What is a Binomial distribution and when should I use it?

The Binomial distribution is ideal for modeling experiments with repeated trials that have two outcomes.

Comparison: Bernoulli vs Binomial Distribution

Key Differences

Aspect Bernoulli Distribution Binomial Distribution
Number of Trials 1 trial Fixed number of trials (n)
Outcomes Two outcomes (success/failure) Two outcomes in each trial (success/failure), modeled over n trials
Probability Single trial probability (p) Same probability of success (p) in each of n trials
Random Variable Single success/failure (0 or 1) Number of successes (r) in n trials
Mean p n * p
Variance p(1 - p) n * p * (1 - p)
When to Use Single binary outcome (success/failure) Multiple trials with binary outcomes (successes in n trials)
Examples Coin toss (1 toss), yes/no survey (single respondent) Coin tosses (multiple tosses), quality control (multiple items tested)

Summary

The Bernoulli distribution models a single trial with two possible outcomes, whereas the Binomial distribution models the number of successes in multiple independent Bernoulli trials. While the Bernoulli distribution focuses on one trial, the Binomial distribution is used when there are several trials, all with the same probability of success.

What is a Continuous random variable and when should I use it?

A continuous random variable is a random variable that can take an infinite number of values within a given range. These variables are often associated with measurements like time, height, temperature, or distance. Unlike discrete random variables, which can only take specific values, continuous random variables have a continuum of possible outcomes.

Key Characteristics

Probability for Continuous Random Variables

Why is the probability for specific values zero and why can we calculate the probability within an interval?

In summary, while individual values for continuous random variables have zero probability, the probability of falling within a range (interval) is meaningful and can be calculated by integrating the PDF over that interval.

Normal Distribution

A normal distribution is a probability distribution that is symmetric about the mean, with a bell-shaped curve. It is widely used in statistics because many natural phenomena follow this distribution.

Key Characteristics

When to Use a Normal Distribution

Examples

Conclusion

Normal distribution is essential for modeling natural and social phenomena. If data is symmetric and bell-shaped, it likely follows a normal distribution, and many statistical methods rely on this assumption.

When to Use

Examples

Conclusion

Continuous random variables are used for measurements where the data can take any value within a given range. These are modeled with probability density functions (PDFs) and are applicable in various fields like physics, engineering, and natural sciences.

Normal Distribution and Its Uses

A normal distribution is often used to model random variables when the data follows a symmetric, bell-shaped curve. It is suitable in the following contexts:

Contexts to Use Normal Distribution

Conclusion

Normal distributions are used to model random variables in cases where the data is symmetric, influenced by many small, independent factors, and exhibits few extreme values. It is commonly applied in natural, social, and financial sciences, as well as in statistical analysis.

Random Sample

A random sample is a subset of individuals or observations selected from a larger population, where each member has an equal chance of being included. This method helps ensure that the sample is representative of the population.

Key Characteristics of a Random Sample

Types of Random Sampling

When to Use a Random Sample

Example

If you're conducting a survey on the effectiveness of a new product, you might randomly sample 500 customers from a pool of 10,000, ensuring each customer has an equal chance of being selected. This random sample can then be used to make generalizations about the entire customer base.

Sample Proportion

A sample proportion refers to the proportion of individuals in a sample that exhibit a specific characteristic or outcome. It is denoted by •p and is calculated by dividing the number of successes by the total number of individuals in the sample.

Formula for Sample Proportion

The formula for the sample proportion is:

•p = x / n

Example

If 60 out of 100 people surveyed prefer blue as their favorite color, the sample proportion of people who prefer blue is:

•p = 60 / 100 = 0.60

This means 60% of the people in the sample prefer blue.

When to Use Sample Proportion

Distribution of Sample Proportion

The sample proportion follows a normal distribution when the sample size is large enough and both np and n(1 - p) are greater than 5. This ensures that the normal approximation holds.

Approximating Normality for Sample Proportions

You can approximate normality for sample proportions when certain conditions are met, as per the Central Limit Theorem.

Conditions for Approximating Normality

Why These Conditions Matter

Example

Suppose you are surveying 200 people, and 40% (p = 0.4) of the population prefers a certain brand of soda. To check if normality can be approximated:

Since both values are greater than 10, the normal approximation is appropriate.

When Not to Use Normal Approximation

What is a Confidence Interval?

A confidence interval (CI) is a statistical tool used to estimate the range within which a population parameter (such as the mean or proportion) is likely to fall. It provides an interval estimate, rather than a single value estimate, and reflects the uncertainty associated with sampling.

The confidence level represents how certain we are that the true population parameter lies within this interval. A 90% confidence interval, for example, suggests that if we were to take 100 different samples from the population, approximately 90 of those intervals would contain the true population parameter, while 10 might not.

Confidence Interval Formula for a Proportion:

The formula for a confidence interval for a population proportion p is given by:

CI = p̂ ± z * √(p̂(1 - p̂) / n)

Where:

Confidence Interval Formula for a Mean:

The formula for a confidence interval for a population mean μ is:

CI = x̄ ± z * (s / √n)

Where:

Confidence Levels and Z-Scores:

These z-scores come from the standard normal distribution and represent the number of standard deviations from the mean that correspond to the desired level of confidence.

Examples:

Example 1: 90% Confidence Interval

Imagine you are estimating the proportion of people who prefer a certain brand of soda. You take a sample of 100 people, and 60 of them say they prefer the brand. The sample proportion is:

p̂ = 60 / 100 = 0.60

To construct a 90% confidence interval for the population proportion:

CI = 0.60 ± 1.645 * √(0.60 * 0.40 / 100) = 0.60 ± 0.0804

So, the 90% confidence interval is approximately: [0.5196, 0.6804].

Example 2: 95% Confidence Interval

Now, imagine you want to estimate the average height of students in a class. From a sample of 50 students, you find the mean height is 170 cm with a sample standard deviation of 10 cm. To construct a 95% confidence interval for the population mean:

CI = 170 ± 1.96 * (10 / √50) = 170 ± 2.77

So, the 95% confidence interval is: [167.23, 172.77].

Example 3: 99% Confidence Interval

Finally, let’s estimate the proportion of people who approve of a new policy. You survey 200 people, and 130 of them approve. The sample proportion is:

p̂ = 130 / 200 = 0.65

To construct a 99% confidence interval for the population proportion:

CI = 0.65 ± 2.576 * √(0.65 * 0.35 / 200) = 0.65 ± 0.0870

So, the 99% confidence interval is: [0.5630, 0.7370].

Conclusion

A confidence interval gives us a range within which we expect a population parameter to lie, with a specified level of confidence. The wider the confidence interval, the less precise the estimate, but the more confident we are that the interval contains the true parameter. The level of confidence (e.g., 90%, 95%, or 99%) determines how likely it is that the interval will capture the true population parameter.

What is the Margin of Error?

The margin of error is a statistic that quantifies the amount of random sampling error in a survey's results. It provides a range within which the true value of a population parameter (such as a population mean or proportion) is expected to lie, given the sample data. The margin of error is typically expressed as a plus-or-minus figure that indicates the range around the sample estimate.

Relation to the Confidence Interval:

The confidence interval (CI) is a range of values that is used to estimate the true population parameter. The margin of error is directly related to the confidence interval as it defines how wide the interval will be.

The confidence interval is constructed by adding and subtracting the margin of error from the sample estimate (e.g., sample mean or sample proportion):

Confidence Interval = Sample Estimate ± Margin of Error

Formula for Margin of Error:

The margin of error is calculated using the sample data and the desired confidence level (e.g., 90%, 95%, or 99%). It is influenced by the sample size, the variability in the data, and the confidence level:

Margin of Error = z × (σ / √n)

Where:

Example:

For a 95% confidence level, the margin of error tells us how much we expect the sample statistic (such as a sample mean) to differ from the true population mean. If the margin of error is ±5, the confidence interval would be the sample estimate ± 5.

If a poll reports a 95% confidence interval of 50% ± 3%, the true proportion in the population is expected to fall between 47% and 53% with 95% confidence.

Summary: