Recommended Problem

if-else (Decision Making)

Show Topics

Topics:

Java

Solve Problem

Basic

60.12%

1.1L

Z-test is especially useful when you have a large sample size and know the population’s standard deviation. Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis.

Table of Content

- What is Z-Test?
- Z-Test Formula
- When to Use Z-test
- Hypothesis Testing
- Steps to perform Z-test
- Type of Z-test
- Practice Problems

## What is Z-Test?

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

## Z-Test Formula

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}[/Tex]

where,

- [Tex]\bar{x}[/Tex]: mean of the sample.
- [Tex]\mu[/Tex]: mean of the population.
- [Tex]\sigma[/Tex]: Standard deviation of the population.

**z-test assumes that the test statistic (z-score) follows a standard normal distribution.**

### Example

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

[Tex]\begin{aligned}\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma}\\&=\frac{300-200}{5}\\&=20\end{aligned}[/Tex]

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

**When to Use Z-test**

**When to Use Z-test**

- The sample size should be greater than 30. Otherwise, we should use the t-test.
- Samples should be drawn at random from the population.
- The standard deviation of the population should be known.
- Samples that are drawn from the population should be independent of each other.
- The data should be normally distributed, however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

**Hypothesis Testing**

**Hypothesis Testing**

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H**Null Hypothesis:**_{0}.The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H**Alternate Hypothesis:**_{A}.It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).**Level of significance:**

**Steps to perform Z-test**

**Steps to perform Z-test**

- First, identify the null and alternate hypotheses.
- Determine the level of significance (∝).
- Find the critical value of z in the z-test using
- Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.

[Tex]Z = \frac{(\overline{X}- \mu)}{\left ( \sigma /\sqrt{n} \right )}[/Tex]

where,- [Tex]\bar{x}[/Tex]: mean of the sample.
- [Tex]\mu[/Tex]: mean of the population.
- [Tex]\sigma[/Tex]: Standard deviation of the population.
- n: sample size.

- Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

**Type of Z-test**

**Type of Z-test**

**Left-tailed Test**

**Left-tailed Test**

In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

**Right-tailed Test **

**Right-tailed Test**

In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

### One-Tailed Test

A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

- First, we define the null hypothesis and the alternate hypothesis. Our null hypothesis will be:

[Tex]H_0 : \mu = 100[/Tex]

and our alternate hypothesis.

[Tex]H_A : \mu > 100[/Tex] - State the level of significance. Here, our level of significance is given in this question ([Tex]\alpha[/Tex]=0.05), if not given then we take ∝=0.05 in general.
- Now, we compute the Z-Score:

X = 110

Mean = 100

Standard Deviation = 15

Number of samples = 50

[Tex]\begin{aligned}\text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}\\&=\frac{110-100}{15/\sqrt{50}}\\&=\frac{10}{2.12}\\&=4.71\end{aligned}[/Tex] - Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
- Here 4.71 >1.645, so we reject the null hypothesis.
- If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

### Code Implementations of One-Tailed Z-Test

# Import the necessary librariesimport numpy as npimport scipy.stats as stats# Given informationsample_mean = 110population_mean = 100population_std = 15sample_size = 50alpha = 0.05# compute the z-scorez_score = (sample_mean-population_mean)/(population_std/np.sqrt(50))print('Z-Score :',z_score)# Approach 1: Using Critical Z-Score# Critical Z-Scorez_critical = stats.norm.ppf(1-alpha)print('Critical Z-Score :',z_critical)# Hypothesisif z_score > z_critical: print("Reject Null Hypothesis")else: print("Fail to Reject Null Hypothesis")# Approach 2: Using P-value # P-Value : Probability of getting less than a Z-scorep_value = 1-stats.norm.cdf(z_score)print('p-value :',p_value)# Hypothesisif p_value < alpha: print("Reject Null Hypothesis")else: print("Fail to Reject Null Hypothesis")

** Output**:

Z-Score : 4.714045207910317Critical Z-Score : 1.6448536269514722Reject Null Hypothesisp-value : 1.2142337364462463e-06Reject Null Hypothesis

**Two-tailed test **

**Two-tailed test**

In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

Below is an example of performing the z-test:

**Two-sampled z-test**

**Two-sampled z-test**

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u_{1 }and u_{2} to be the population mean, and X_{1 }and X_{2} to be the observed sample mean. Here, our null hypothesis could be like this:

[Tex]H_{0} : \mu_{1} -\mu_{2} = 0[/Tex]

and alternative hypothesis

[Tex]H_{1} : \mu_{1} – \mu_{2} \ne 0[/Tex]

and the formula for calculating the z-test score:

[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}[/Tex]

where[Tex]\sigma_1[/Tex]* and*[Tex]\sigma_2[/Tex] are the standard deviation and

*n*

_{1}

*and n**are the sample size of population corresponding to*

_{2}

*u*

_{1 }

*and u*

_{2}

*.*#### Example:

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10

Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

** Solution**:

Step 1: Null & Alternate Hypothesis

- Null Hypothesis: There is no significant difference between the mean score between the online and offline classes

[Tex]\mu_1 -\mu_2 = 0[/Tex] - Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes.

[Tex]\mu_1 -\mu_2 \neq 0[/Tex]

Step 2: Significance Label

- Significance Label: 5%

[Tex]\alpha = 0.05[/Tex]

Step 3: Z-Score

[Tex]\begin{aligned}\text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}\\ &= \frac{(75-80)-0}{\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}}\\ &= \frac{-5}{\sqrt{2+2.4}}\\ &= \frac{-5}{2.0976}\\&=-2.384\end{aligned}[/Tex]

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

- Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

- absolute(Z-Score) > Critical Z-Score
- Reject the null hypothesis. There is a significant difference between the online and offline classes.

### Code Implementations on **Two-sampled Z-test**

**Two-sampled Z-test**

import numpy as npimport scipy.stats as stats# Group A (Offline Classes)n1 = 50x1 = 75s1 = 10# Group B (Online Classes)n2 = 60x2 = 80s2 = 12# Null Hypothesis = mu_1-mu_2 = 0 # Hypothesized difference (under the null hypothesis)D = 0# Set the significance levelalpha = 0.05# Calculate the test statistic (z-score)z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))print('Z-Score:', np.abs(z_score))# Calculate the critical valuez_critical = stats.norm.ppf(1 - alpha/2)print('Critical Z-Score:',z_critical)# Compare the test statistic with the critical valueif np.abs(z_score) > z_critical: print("""Reject the null hypothesis.There is a significant difference between the online and offline classes.""")else: print("""Fail to reject the null hypothesis.There is not enough evidence to suggest a significant difference between the online and offline classes.""")# Approach 2: Using P-value # P-Value : Probability of getting less than a Z-scorep_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))print('P-Value :',p_value)# Compare the p-value with the significance levelif p_value < alpha: print("""Reject the null hypothesis.There is a significant difference between the online and offline classes.""")else: print("""Fail to reject the null hypothesis.There is not enough evidence to suggest significant difference between the online and offline classes.""")

** Output**:

Z-Score: 2.3836564731139807

Critical Z-Score: 1.959963984540054

Reject the null hypothesis.

There is a significant difference between the online and offline classes.

P-Value : 0.01714159544079563

Reject the null hypothesis.

There is a significant difference between the online and offline classes.

### Solved examples :

**Example 1: One-sample Z-test**

**Problem: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?**

Solution:

Step 1: State the hypotheses

H₀: μ = 12 (null hypothesis)

H₁: μ ≠ 12 (alternative hypothesis)

Step 2: Calculate the Z-score

Z = (x̄ – μ) / (σ / √n)

= (11.8 – 12) / (0.5 / √100)

= -0.2 / 0.05

= -4

Step 3: Find the critical value (two-tailed test at 5% significance)

Z₀.₀₂₅ = ±1.96

Step 4: Compare Z-score with critical value

|-4| > 1.96, so we reject the null hypothesis.

Conclusion: There is sufficient evidence to refute the company’s claim about battery life.

**Problem: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?**

**Solution:**

Step 1: State the hypotheses

H₀: μ₁ – μ₂ = 0 (null hypothesis)

H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis)

Step 2: Calculate the Z-score

Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂))

= (15 – 13) / √((3²/50) + (4²/60))

= 2 / √(0.18 + 0.2667)

= 2 / 0.6455

= 3.10

Step 3: Find the critical value (two-tailed test at 1% significance)

Z₀.₀₀₅ = ±2.576

Step 4: Compare Z-score with critical value

3.10 > 2.576, so we reject the null hypothesis.

Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.

**Problem 3 : A polling company claims that 60% of voters support a new policy. In a sample of 1000 voters, 570 support the policy. At a 5% significance level, is there evidence to support the company’s claim?**

**Solution:**

Step 1: State the hypotheses

H₀: p = 0.60 (null hypothesis)

H₁: p ≠ 0.60 (alternative hypothesis)

Step 2: Calculate the Z-score

p̂ = 570/1000 = 0.57 (sample proportion)

Z = (p̂ – p) / √(p(1-p)/n)

= (0.57 – 0.60) / √(0.60(1-0.60)/1000)

= -0.03 / √(0.24/1000)

= -0.03 / 0.0155

= -1.94

Step 3: Find the critical value (two-tailed test at 5% significance)

Z₀.₀₂₅ = ±1.96

Step 4: Compare Z-score with critical value

|-1.94| < 1.96, so we fail to reject the null hypothesis.

Conclusion: There is not enough evidence to refute the polling company’s claim at the 5% significance level.

**Problem 4 : A manufacturer claims that their light bulbs last an average of 1000 hours. A sample of 100 bulbs has a mean life of 985 hours. The population standard deviation is known to be 50 hours. At a 5% significance level, is there evidence to reject the manufacturer’s claim?**

Solution:

H₀: μ = 1000

H₁: μ ≠ 1000

Z = (x̄ – μ) / (σ / √n)

= (985 – 1000) / (50 / √100)

= -15 / 5

= -3

Critical value (α = 0.05, two-tailed): ±1.96

|-3| > 1.96, so reject H₀.

Conclusion: There is sufficient evidence to reject the manufacturer’s claim at the 5% significance level.

**Example 5 : Two factories produce semiconductors. Factory A’s chips have a mean resistance of 100 ohms with a standard deviation of 5 ohms. Factory B’s chips have a mean resistance of 98 ohms with a standard deviation of 4 ohms. Samples of 50 chips from each factory are tested. At a 1% significance level, is there a difference in mean resistance between the two factories?**

**Solution:**

H₀: μA – μB = 0

H₁: μA – μB ≠ 0

Z = (x̄A – x̄B) / √((σA²/nA) + (σB²/nB))

= (100 – 98) / √((5²/50) + (4²/50))

= 2 / √(0.5 + 0.32)

= 2 / 0.872

= 2.29

Critical value (α = 0.01, two-tailed): ±2.576

|2.29| < 2.576, so fail to reject H₀.

Conclusion: There is not enough evidence to conclude a difference in mean resistance at the 1% significance level.

**Problem 6 : A political analyst claims that 40% of voters in a certain district support a new tax policy. In a random sample of 500 voters, 220 support the policy. At a 5% significance level, is there evidence to reject the analyst’s claim?**

**Solution:**

H₀: p = 0.40

H₁: p ≠ 0.40

p̂ = 220/500 = 0.44

Z = (p̂ – p) / √(p(1-p)/n)

= (0.44 – 0.40) / √(0.40(1-0.40)/500)

= 0.04 / 0.0219

= 1.83

Critical value (α = 0.05, two-tailed): ±1.96

|1.83| < 1.96, so fail to reject H₀.

Conclusion: There is not enough evidence to reject the analyst’s claim at the 5% significance level.

**Problem 7 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?**

**Solution:**

H₀: pA – pB = 0

H₁: pA – pB ≠ 0

p̂A = 150/1000 = 0.15

p̂B = 180/1200 = 0.15

p̂ = (150 + 180) / (1000 + 1200) = 0.15

Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))

= (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200))

= 0 / 0.0149

= 0

Critical value (α = 0.05, two-tailed): ±1.96

|0| < 1.96, so fail to reject H₀.

Conclusion: There is no significant difference in the effectiveness of the two advertising methods at the 5% significance level.

**Problem 8 : A new treatment for a disease is tested in two cities. In City A, 120 out of 400 patients recover. In City B, 140 out of 500 patients recover. At a 5% significance level, is there a difference in the recovery rates between the two cities?**

**Solution:**

H₀: pA – pB = 0

H₁: pA – pB ≠ 0

p̂A = 120/400 = 0.30

p̂B = 140/500 = 0.28

p̂ = (120 + 140) / (400 + 500) = 0.2889

Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))

= (0.30 – 0.28) / √(0.2889(1-0.2889)(1/400 + 1/500))

= 0.02 / 0.0316

= 0.633

Critical value (α = 0.05, two-tailed): ±1.96

|0.633| < 1.96, so fail to reject H₀.

Conclusion: There is not enough evidence to conclude a difference in recovery rates between the two cities at the 5% significance level.

**Problem 9 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?**

**Solution:**

H₀: pA – pB = 0

H₁: pA – pB ≠ 0

p̂A = 150/1000 = 0.15

p̂B = 180/1200 = 0.15

p̂ = (150 + 180) / (1000 + 1200) = 0.15

Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB))

= (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200))

= 0 / 0.0149

= 0

Critical value (α = 0.05, two-tailed): ±1.96

|0| < 1.96, so fail to reject H₀.

Conclusion: There is no significant difference in the effectiveness of the two

advertising methods at the 5% significance level.

**Problem 10 : A company claims that their product weighs 500 grams on average. A sample of 64 products has a mean weight of 498 grams. The population standard deviation is known to be 8 grams. At a 1% significance level, is there evidence to reject the company’s claim?**

**Solution:**

H₀: μ = 500

H₁: μ ≠ 500

Z = (x̄ – μ) / (σ / √n)

= (498 – 500) / (8 / √64)

= -2 / 1

= -2

Critical value (α = 0.01, two-tailed): ±2.576

|-2| < 2.576, so fail to reject H₀.

Conclusion: There is not enough evidence to reject the company’s claim at the 1% significance level.

## Practice Problems

1).A cereal company claims that their boxes contain an average of 350 grams of cereal. A consumer group tests 100 boxes and finds a mean weight of 345 grams with a known population standard deviation of 15 grams. At a 5% significance level, is there evidence to refute the company’s claim?

2).A study compares the effect of two different diets on cholesterol levels. Diet A is tested on 50 people, resulting in a mean reduction of 25 mg/dL with a standard deviation of 8 mg/dL. Diet B is tested on 60 people, resulting in a mean reduction of 22 mg/dL with a standard deviation of 7 mg/dL. At a 1% significance level, is there a significant difference between the two diets?

3).A politician claims that 60% of voters in her district support her re-election. In a random sample of 1000 voters, 570 support her. At a 5% significance level, is there evidence to reject the politician’s claim?

4).Two different teaching methods are compared. Method A results in 80 students passing out of 120 students. Method B results in 90 students passing out of 150 students. At a 5% significance level, is there a difference in the effectiveness of the two methods?

5).A company claims that their new energy-saving light bulbs last an average of 10,000 hours. A sample of 64 bulbs has a mean life of 9,800 hours. The population standard deviation is known to be 500 hours. At a 1% significance level, is there evidence to reject the company’s claim?

6).The mean salary of employees in a large corporation is said to be $75,000 per year. A union representative suspects this is too high and surveys 100 randomly selected employees, finding a mean salary of $72,500. The population standard deviation is known to be $8,000. At a 5% significance level, is there evidence to support the union representative’s suspicion?

7).Two factories produce computer chips. Factory A’s chips have a mean processing speed of 3.2 GHz with a standard deviation of 0.2 GHz. Factory B’s chips have a mean processing speed of 3.3 GHz with a standard deviation of 0.25 GHz. Samples of 100 chips from each factory are tested. At a 5% significance level, is there a difference in mean processing speed between the two factories?

8).A new vaccine is claimed to be 90% effective. In a clinical trial with 500 participants, 440 develop immunity. At a 1% significance level, is there evidence to reject the claim about the vaccine’s effectiveness?

9).Two different advertising campaigns are tested. Campaign A results in 250 sales out of 2000 views. Campaign B results in 300 sales out of 2500 views. At a 5% significance level, is there a difference in the effectiveness of the two campaigns?

10).A quality control manager claims that the defect rate in a production line is 5%. In a sample of 1000 items, 65 are found to be defective. At a 5% significance level, is there evidence to suggest that the actual defect rate is different from the claimed 5%?

**Type 1 error and Type II error**

**Type 1 error and Type II error**

Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.**Type I error:**Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.**Type II error:**

Null Hypothesis is TRUE | Null Hypothesis is FALSE | |
---|---|---|

Reject Null Hypothesis | Type I Error (False Positive) | Correct decision |

Fail to Reject the Null Hypothesis | Correct decision | Type II error (False Negative) |

## Summary

Z-tests are used to determine whether there is a statistically significant difference between a sample statistic and a population parameter, or between two population parameters.Z-tests are statistical tools used to determine if there’s a significant difference between a sample statistic and a population parameter, or between two population parameters. They’re applicable when dealing with large sample sizes (typically n > 30) and known population standard deviations. Z-tests can be used for analyzing means or proportions in both one-sample and two-sample scenarios. The process involves stating hypotheses, calculating a Z-score, comparing it to a critical value based on the chosen significance level (often 5% or 1%), and then making a decision to reject or fail to reject the null hypothesis.

## FAQS

### What is the main limitation of the z-test?

The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.

### What is the minimum sample for z-test?

A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

### What is the application of z-test?

It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.

### What is the theory of the z-test?

The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.

Previous Article

F-Test in Statistics

Next Article

Residual Leverage Plot (Regression Diagnostic)