6 NORMAL PROBABILITY DISTRIBUTION

Author

Wen Tang

`r`-function	Description
`rnorm(n,mean,sd)`	Generate `n` random values of standard normal distribution with the given `mean` and `sd`.
`hist(x)`	Plot the histogram of the data vector `x`, pass `probability=TRUE` to use density estimate. Pass `breaks` argument to specify edges of bins. Eg.: `breaks = seq(0,1, by=0.1)`. `breaks="FD"` is a method based on data variability.
`seq(start, end, by=step)`	Generate a sequence.
`density(x)`	Estimate the density of the data vector `x`
`lines(x,y)`	Add a line to an existing plot. `y` may be omitted depending on `x`
`pnorm(q,mean=0, sd=1)`	Calculate the cumulative probability P(X\le q) for a normal distributed random variable X with a given `mean` and `sd`.
`diff(x)`	Calculate the first difference of a vector `x`.
`qnorm(p, mean=0, sd=1)`	Calculate the quantile of the normal distribution corresponding to the probability `p` (from left-tail).
`scale(x,center, scale)`	Scale data `x` to z-score using a given mean (`center`) and standard deviation (`scale`). E.g.: `scale(x, center=5, scale=2)`
`rbinom(s,size=n, prob=p)`	Generate `s` random binomial-distributed values with `n` trials and success probability `p`
`replicate(n, expr)`	Perform the Monte-Carlo simulation by replicating the experiment given by the expression `expr` `n` times.

6.1 The standard normal distribution

6.1.1 Normal distribution graph (Optional)

Code

set.seed(123)                       # Set the seed for reproducibility
x <- rnorm(1000, mean = 0, sd = 1)  # Generate data for a standard normal distribution

# Plot the data with density curve
hist(x, probability = TRUE, col = "lightblue", main = "Standard Normal Distribution")
lines(density(x), col = "red", lwd = 2)

6.1.2 Find the probability (area) when z scores are given

Code

# Find the area under the curve to the left of a certain value: P(z<1)
pnorm(1, mean = 0, sd = 1)

[1] 0.8413447

Code

# Find the area under the curve to the right of a certain value: P(z>1)
1-pnorm(1, mean = 0, sd = 1)

[1] 0.1586553

Code

# Find the area under the curve between two values: P(-1<z<1)
diff(pnorm(c(-1, 1), mean = 0, sd = 1))

[1] 0.6826895

6.1.3 Find z scores when the area is given

Code

# Find the value with a certain area under the curve to its left: critical value 
alpha <- 0.05
qnorm(1-alpha, mean = 0, sd = 1) # find the critical Z score.

[1] 1.644854

6.2 Real application of normal distribution

6.2.1 Convert an individual x value to a z-score

Code

x <- 80  # the individual value
mu <- 75  # the mean of the distribution 
sigma <- 10  # the standard deviation of the distribution 

# Calculate z-scores for the individual value using scale()
z_scores <- scale(x, center = mu, scale = sigma)
cat("Z-score:", z_scores, "\n") # print the z-score

Z-score: 0.5

Code

z <- (x - mu) / sigma  # find the z-score by using the formula 
cat("Z =", z, "\n") # print the z-score

Z = 0.5

6.2.2 Find the probability when x value is given (page 269 Pulse Rates Question)

Code

x1 <- 60
x2 <- 80
mu <- 69.6
sigma <- 11.3
# Find the probability that X is less than 60: P(X<60)
pnorm(x1, mean = mu, sd = sigma)

[1] 0.1977856

Code

# Find the probability that X is great than 80: P(X>80)
1-pnorm(x2, mean = mu, sd = sigma)

[1] 0.1786939

Code

# Find the probability between two values: P(60<X<80)
diff(pnorm(c(x1, x2), mean = mu, sd = sigma))

[1] 0.6235205

6.2.3 Convert a z-score back to x value

Code

z <- 1.96  # the z-score
mu <- 100  # the mean of the distribution
sigma <- 15  # the standard deviation of the distribution
x <- z * sigma + mu  # convert the z score to individual x value using formula
cat("X =", x, "\n")  # print the individual x value

X = 129.4

6.3 Sampling distributions and estimators (Optional)

6.3.1 General behavior of sampling distribution of sample proportions

Code

# Set the seed for reproducibility
set.seed (123)
# Generate data
n <- 10  # sample size
p <- 0.5  # population proportion
samples <- replicate(50000, rbinom(1, size = n, prob = p))

# Calculate sample proportions of successes
sample_props <- samples / n

# Plot the histogram

hist(sample_props, breaks = seq( 0, 1, by = 0.1 ), col = "lightblue", 
     main = "Sampling Distribution of Sample Proportion")

6.3.2 General behavior of sampling distribution of sample means

Code

#input the parameter values
mu <- 3.5    
sigma <- 1.7       
n <- 5         
# Simulate sampling distribution
sample_means <- replicate(10000, mean(rnorm(n, mu, sigma)))

# Create a histogram of the sampling distribution of the sample mean
hist(sample_means, breaks ="FD",  main = "Sampling Distribution of Sample Mean", 
     xlab = "Sample Mean", ylab = "Frequency", col = "lightblue", 
     border = "black")

6.3.3 General behavior of sampling distribution of sample variances

Code

mu <- 4    # True population mean
sigma <- 8       # Population standard deviation
sample_size <- 10         # Sample size
num_samples <- 10000       # Number of samples
# Function to calculate sample variance
sample_variance <- function(sample) {
  n <- length(sample)
  mean_sample <- mean(sample)
  sum_squared_deviations <- sum((sample - mean_sample)^2)
  return(sum_squared_deviations / (n - 1))
}
# Simulate sampling distribution
sample_variances <- replicate(num_samples, sample_variance(rnorm(sample_size, 
                                                                 mu, sigma)))

# Create a histogram of the sampling distribution of sample variance
hist(sample_variances, breaks = "FD", freq = FALSE, 
     main = "Sampling Distribution of Sample Variance",
     xlab = "Sample Variance", ylab = "Frequency", col = "lightblue", 
     border = "black")

6.4 The central limit theorem

6.4.1 Find the probability when individual value is used (Page 292 Ejection Seat Question)

Code

mu <- 171 # population mean
sigma <- 46 # population standard deviation
n <- 25 # sample size
x_lower <- 140
x_upper <- 211

# Find the probability between two X values
probability_range <- diff(pnorm(c(x_lower, x_upper), mean = mu, sd = sigma))
probability_range

[1] 0.5575477

6.4.2 Find the probability when sample mean is used (Page 292 Ejection Seat Question)

Code

# Find the probability between two mean values $x/bar$ (CLT)
standard_error <- sigma / sqrt(n) # Calculate the standard error of the sample mean
probability_range <- diff(pnorm(c(x_lower, x_upper), mean = mu, 
                                sd = standard_error))# Find the probability  
probability_range

[1] 0.9996167