Perform a hypothesis test for a single proportion. Pass a hypothesized (H_0) proportion p if it’s not 0.5. Eg. p=0.6 for H_0. Pass a parameter alternative for alternative hypothesis: “two.sided”(default) ,“less”, “greater”. Note the conf.int given by the test uses Wilson’s method than the Wald method used in the book. The test returns three values: $p.value (for p-value), $statistic (for test-statistic), and $conf.int (for confidence interval).
t.test(x, conf.level=0.95)
Perform a t-test for a population mean. Accepts an additional alternative argument for H_1. The default hypothesized mean is mu=0. Otherwise, pass a hypothesized mean value.
z.test(x, sigma.x=sigma, conf.level=0.95)
Perform a z-test for a population mean with known sigma.x=sigma. Accepts an additional alternative argument for H_1. The default hypothesized mean is mu=0. Otherwise, pass a hypothesized mean value.
qnorm(p, mean=0, sd=1)
Calculate the quantile of the normal distribution corresponding to the probability p (from left-tail).
qt(p, df)
Calculate the quantile for the probability p of t-distribution with degree of freedom equal to df
qchisq(p, df)
Calculate the quantile for the probability p of \chi^2-distribution with degree of freedom equal to df
attach(df)
add a data frame df to the search path, which allows you to access the variables within the data frame df directly by their names instead of using a normal way such as df$var.
table
tabulate the frequency counts of distinct values.
prop.table(table)
Compute the proportions of a table or data. Pass an argument margin for the direction: 1 for rows, 2 for columns, or NULL for the entire table (default).
8.1 Basic of hypothesis testing
We will use the following functions to perform hypothesis tests.
Code
library(BSDA)# prop.test(x, n, p = NULL,# alternative = c("two.sided", "less", "greater"),# conf.level = 0.95, correct = TRUE)# t.test(x, y = NULL,# alternative = c("two.sided", "less", "greater"),# mu = 0, paired = FALSE, var.equal = FALSE,# conf.level = 0.95, ...)# z.test(# x, y = NULL,# alternative = "two.sided",# mu = 0, sigma.x = NULL, sigma.y = NULL,# conf.level = 0.95)
We use qnorm() and qt() functions to calculate critical values. For example, we can obtain z_{0.95} using the qnorm(0.95) for a normal distribution, and the critical value t_{0.05, 5} using qt(0.95, 5) for a t-distribution with 5 degree of freedom with \alpha=0.05 as below.
Code
qnorm(0.95)
[1] 1.644854
Code
qt(0.95, 5)
[1] 2.015048
8.2 Testing a claim about a proportion
mtcars dataset has data for 32 automobiles in 1973-1974 with 11 variables. Among these variable, we are interested to check if the proportion of V-shaped engine (vs = 0) is 0.5. That is, H_0: p = 0.5.
Code
data(mtcars)attach(mtcars)table(vs)
vs
0 1
18 14
Code
prop.table(table(vs))
vs
0 1
0.5625 0.4375
8.2.1 Two-sided proportion test using the z-test (method in the textbook)
We first check if we can use a normal approximation to perform a proportion test. With a sample size of n=32 and a proportion of interest p=0.5, both the expected number of successes and failures are np= n(1-p) = 32\cdot 0.5 = 16. Since they are greater than 5, we can apply the proportion test using a normal approximation. In our sample, the number of success (vs=0) is 18 and the sample proportion is 0.56.
Code
# Example datasuccesses <-18# Number of successestrials <-32# Total number of trialsnull_prob <-0.5# Hypothesized population proportion under the null hypothesis# Calculate the sample proportionsample_proportion <- successes / trials# Perform the z-testz_stat <- (sample_proportion - null_prob) /sqrt(null_prob * (1- null_prob) / trials)# Calculate the p-valuep_value <-2* (1-pnorm(abs(z_stat)))# Calculate the critical valuealpha <-0.05critical_value <-c(qnorm(alpha),qnorm(1-alpha))# Print the resultscat("Z-statistic:", z_stat, "\n")
Z-statistic: 0.7071068
Code
cat("p-value:", p_value, "\n")
p-value: 0.4795001
Code
cat("Critical values:", critical_value, "\n")
Critical values: -1.644854 1.644854
We are ready to make a decision using the following method:
p-value method: The p-value 0.4795001 is greater than the significance level\alpha =0.05, therefore we fail to reject the Null hypothesis H_0: p=0.5.
critical value method: The test statistics 0.7071068 is not as extreme as the two critical values, therefore we fail to reject the Null Hypothesis.
8.2.2 Two-sided proportion test using the built-in function prop.test
Next we will use the R built-in prop.test() function to perform one sample proportion test. The syntax is below.
Code
# prop.test(x, n, p = p_0, conf.level=0.95, alternative=c("two.sided", "less", # "greater"))
Depending on the alternative hypothesis H_1, we can choose one among two.sided, less, and greater:
H_1: p \ne p_0: alternative = "two.sided"
H_1: p < p_0 :alternative = "less"
H_1: p > p_0: alternative = "greater"
It is remarkable that the built-in prop.test uses the Pearson \chi^2 distributed test statistic which is different than the z-test used by the textbook.
H_0: p = 0.5 \quad \textrm{ vs }\quad H_1: p \ne 0.5
Code
res <-prop.test(x=18, n=32, p =0.50, alternative ="two.sided", conf.level =0.95)res
1-sample proportions test with continuity correction
data: 18 out of 32, null probability 0.5
X-squared = 0.28125, df = 1, p-value = 0.5959
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.3788033 0.7316489
sample estimates:
p
0.5625
Code
cat("The p-value is given by ", res$p.value, "\n")
The p-value is given by 0.5958831
Code
cat("The chi^2 test statistic is given by ", res$statistic, "\n")
The chi^2 test statistic is given by 0.28125
Code
cat("The confidence interval is given by (", res$conf.int[1], "," ,res$conf.int[2], ")\n")
The confidence interval is given by ( 0.3788033 , 0.7316489 )
Decision:
P-Value: we fail to reject the Null Hypothesis since p-value 0.596 is greater than \alpha=0.05.
Critical Value: the \chi^2 test statistic 0.281 is not as extreme as the critical values which can be found as below. Thus, we fail to reject the Null Hypothesis.
Code
# the critical value can be calculated by the following code.c(qchisq(0.025, 1), qchisq(0.975,1))
[1] 0.0009820691 5.0238861873
Confidence Interval: the claimed proportion 0.5 falls within the confidence interval of (0.379, 0.732). Thus we fail to reject the null hypothesis.
8.2.3 One-sided proportion test
H_0: p = 0.5 \quad \textrm{ vs }\quad H_1: p > 0.5
Code
res <-prop.test(x=18, n=32, p =0.50, alternative ="greater", conf.level =0.95)res
1-sample proportions test with continuity correction
data: 18 out of 32, null probability 0.5
X-squared = 0.28125, df = 1, p-value = 0.2979
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.4041836 1.0000000
sample estimates:
p
0.5625
Decision:
P-Value: we fail to reject the null hypothesis since p-value 0.298 is greater than \alpha=0.05.
Critical Value: the \chi^2 test statistic 0.281 does not fall in the critical region which is greater than 3.8414588 or smaller than 0.0039321. Thus, we fail to reject the null hypothesis. The critical value can be found by
Code
# the critical value can be calculated by the following code.c(qchisq(0.05,1), qchisq(0.95,1))
[1] 0.00393214 3.84145882
Confidence Interval: the claimed proportion 0.5 falls within the confidence interval of (0.404, 1). Thus we fail to reject the null hypothesis.
8.3 Testing a claim about a mean
8.3.1 Unknown \sigma with Nnormality assumption
We use one sample t-test with t.test() function when we assume normality for population or the sample size is large enough. The syntax is as below if we want to test with a sample vector (variable) x for H_0: \mu = m with a given confidence level conf.level, for example, conf.level=0.95.
Code
# t.test(x, mu= m, conf.level=0.95, alternative=c("two.sided", "less", "greater"))
Depending on the alternative hypothesis H_1, we can choose one among two.sided, less, and greater.
H_1: \mu \ne m: alternative = "two.sided"
H_1: \mu < m :alternative = "less"
H_1: \mu > m: alternative = "greater"
As an example, we test for mpg with H_0: \mu = 22. That is, we test if the population mean of mpg is equal to 22. mtcars cars have 32 samples and the sample size is large enough to use t-test with \alpha = 0.05.
res <-t.test(mpg, mu=22, alternative ="two.sided", conf.level =0.95)res
One Sample t-test
data: mpg
t = -1.7921, df = 31, p-value = 0.08288
alternative hypothesis: true mean is not equal to 22
95 percent confidence interval:
17.91768 22.26357
sample estimates:
mean of x
20.09062
Code
cat("The p-value is given by ", res$p.value, "\n")
The p-value is given by 0.08287848
Code
cat("The test statistic is given by ", res$statistic, "\n")
The test statistic is given by -1.792127
Code
cat("The confidence interval is given by (", res$conf.int[1], ",", res$conf.int[2], ")\n")
The confidence interval is given by ( 17.91768 , 22.26357 )
Decision:
P-Value: we fail to reject the null hypothesis since p-value 0.083 is greater than \alpha=0.05.
Critical Value: the test statistic t= -1.792 is closer to 0 than the critical values which can be found as below. Thus, we fail to reject the null hypothesis.
Code
# the critical value can be calculated by the following code.c(qt(0.025, df=31), qt(0.975, df=31))
[1] -2.039513 2.039513
Confidence Interval: the claimed mean 22 falls within the confidence interval of (17.918, 22.264). Thus we fail to reject the null hypothesis.
res <-t.test(mpg, mu=22, alternative ="less", conf.level =0.95)res
One Sample t-test
data: mpg
t = -1.7921, df = 31, p-value = 0.04144
alternative hypothesis: true mean is less than 22
95 percent confidence interval:
-Inf 21.89707
sample estimates:
mean of x
20.09062
Decision:
P-Value: we reject the null hypothesis since p-value 0.041 is less than \alpha=0.05.
Critical Value: the test statistic t= -1.792 falls in the critical region which is less than t_{0.05, 31} = -1.696. Thus, we reject the null hypothesis.
Code
# the critical value can be calculated by the following code.qt(0.05, df=31)
[1] -1.695519
Confidence Interval: the claimed mean \mu=22 does not fall within the confidence interval of (-\infty, 21.897). Thus we reject the null hypothesis.
8.3.2 Known \sigma with normality assumption
We use one sample z-test or normal test with z.test() function when we assume normality for population with known population standard deviation \sigma. The syntax is as below if we want to test with a sample vector (variable) x for H_0: \mu = m with \alpha = 0.05 and a known sigma.
Code
#library(BSDA)# z.test(x, mu = m, sigma.x = sigma, conf.level = 0.95, # alternative = c("two.sided", "less", "greater"))
Depending on the alternative hypothesis H_1, we can choose one among two.sided, less, and greater.
H_1: \mu \ne m: alternative = "two.sided"
H_1: \mu < m :alternative = "less"
H_1: \mu > m: alternative = "greater"
For example, we test for mpg with H_0: \mu = 22. Assume mpg follows a normal distribution with \sigma = 6, then we can use z-test with \alpha = 0.05.
library(BSDA)res <-z.test(mpg, mu=22, sigma.x =6, alternative ="two.sided", conf.level =0.95)res
One-sample z-Test
data: mpg
z = -1.8002, p-value = 0.07183
alternative hypothesis: true mean is not equal to 22
95 percent confidence interval:
18.01177 22.16948
sample estimates:
mean of x
20.09062
Code
cat("The p-value is given by ", res$p.value, "\n")
The p-value is given by 0.07183285
Code
cat("The test statistic is given by ", res$statistic, "\n")
The test statistic is given by -1.800176
Code
cat("The confidence interval is given by (", res$conf.int[1], "," , res$conf.int[2], ")\n")
The confidence interval is given by ( 18.01177 , 22.16948 )
Decision:
P-Value: we fail to reject the null hypothesis since p-value 0.072 is greater than \alpha=0.05.
Critical Value: the test statistic z= -1.8 is closer to 0 than the critical values as found below. Thus, we fail to reject the null hypothesis.
Code
# the critical value can be calculated by the following code.c(qnorm(0.025), qnorm(0.975))
[1] -1.959964 1.959964
Confidence Interval: the claimed mean 22 falls within the confidence interval of (18.012, 22.169). Thus we fail to reject the null hypothesis.
res <-z.test(mpg, mu=22, sigma.x =6, alternative ="less", conf.level =0.95)res
One-sample z-Test
data: mpg
z = -1.8002, p-value = 0.03592
alternative hypothesis: true mean is less than 22
95 percent confidence interval:
NA 21.83526
sample estimates:
mean of x
20.09062
Decision:
P-Value: we reject the null hypothesis since p-value 0.036 is less than \alpha=0.05.
Critical Value: the test statistic z= -1.8 falls in the critical region which is less than z_{0.05} = -1.645. Thus, we reject the null hypothesis.
Code
# the critical value can be calculated by the following code.qnorm(0.05)
[1] -1.644854
Confidence Interval: the claimed mean does not fall within the confidence interval of (-\infty, 21.835). Thus we reject the null hypothesis.