Analysis of Variation and Chi-Squared Analysis
The limitations that may exist if this database is used as a sample of all North Carolina births are as follows:
This data isn’t exhaustive set of all births of the North Carolina births.
There are many data values in many variables missing within the database.
Hence, there is a high chance that these factors could potentially affect our statistical analysis if this data is employed as a sample for all North Carolina births.
Analysis of Variance
We are getting to perform the Analysis of variance in the 0.05 significance level for the subsequent question:
Question: Does the factors smoking habits of the mother and legal status of the mother during pregnancy have an impression on the child’s birthweight?
We have the following populations:
Let X be the population of birthweight of children whose mother was a non-smoker during pregnancy.
Let Y be the population of birthweight of children whose mother was a smoker during pregnancy.
Let Z be the population of birthweight of children whose mother was married during pregnancy.
Let P be the population of birthweight of children whose mother is unmarried during pregnancy.
The Null and Alternate hypotheses are as follows:
Null hypothesis: The means of the populations of birthweights of children are equal.
Alternate hypothesis: Atleast one among the population means are different.
After doing the calculations, we arrive at the following results:
Populations
Count
Sum
Average
Variance
X (non-smoker)
878
6262.666
7.132877
2.356056
Y (smoker)
131
884.9056
6.755005
2.259016
Z (married)
618
4497.282
7.277155
2.053943
P (not married)
391
2650.641
6.779134
2.679846
Source of Variation
SS
df
MS
F
P-value
F crit
Between Groups
75.67428
3
25.22476
10.87303
4.38E-07
2.609321
Within Groups
4672.356
2014
2.319938
Total
4748.03
2017
We see that the p-value is 4.38E-07 which is clearly very less than 0.05, our alpha value. Hence we reject the null hypothesis. And our alternate hypothesis is significant within the 0.05 significance level. So we conclude that atleast one among the population means are different.
Chi-square test
Using the chi-square test within the 0.05 significance level we are getting to analyze the subsequent question:
Question: Is the smoking habit of a woman during pregnancy associated with the ethnicity of the woman?
We have the subsequent data in hand:
Smoking habits
Hypothesised proportion
Observed
Expected
Non-white mothers
0.5
30
63
White mothers
0.5
96
63
The Null and Alternate hypotheses are as follows:
Null Hypothesis: The smoking habits aren’t associated with the ethnicity of the woman.
Alternate Hypothesis: The smoking habits are associated with the ethnicity of the woman.
After performing the calculations with the information we have, we arrive at the subsequent results:
p-value is 4.10893E-09
The chi squared test statistic is 34.57142858
By looking at the p-value we see that it is very less than 0.05. So, we reject our null hypothesis and conclude that our alternate hypothesis is significant within the 0.05 significance level. We also see that the chi squared test statistic for the test is 34.57142858 and this is clearly greater than the critical value for the test which is 3.841458821. Hence the test statistic is bigger than the critical value. Hence this also agrees with the conclusion of rejecting the null hypothesis.