A consumer products company wishes to focus its marketing efforts on current cus tomers who are likely to be more profitable in the future. Using past data on customer transactions it has calculated profitability scores (Y) for 200 customers based on their pur chasing behavior over the last 5 years. Scores range from O (customers with few pur chases who have provided little return to the company) to 10 (very profitable customers with many purchases). The company would like to predict future profitability scores for customers acquired within the last year using two potential predictor variables: purchase frequency in the last year (X1) and average purchase amount (in dollars) in the last year (X2). The idea is that the company could focus marketing efforts on recently acquired cus tomers with a high predicted (long-term) profitability score, rather than wasting resources on customers with low predicted profitability scores. Two nested models are fit to the data for the 200 long-term customers. Statistical software output for the (complete) model,
E(Y ) = bo + b1X1 + IJ.iX2 +bJX1X2 + biXf + bsXf , is:
Model Squares df Mean Square Global F-stat Pr(>F)
a Response variable: Y.
b Predictors: (Intercept), XI, X2, XIX2, Xlsq, X2sq.
Statistical software output for the (reduced) model, E(Y) = bo + b1X1 + 1>iX2 , is:
Squares df Mean Square Global F-stat Pr(>F)
a Response vanable: Y. bPredictors: (Intercept), XI, X2.
Write down the null and alternative hypotheses to test whether the complete model is a useful one for predicting Y.
(b) Do the hypothesis test from part (a) using a significance level of 5% (use the fact that the 95th percentile of the F-distribution with 5 numerator degrees of freedom and 194 denominator degrees of freedom is, using Excel, FINV (O .05, 5, 194)
= 2.26). Remember to draw a conclusion from the results of the test.
(c) Write down the null and alternative hypotheses to test whether the complete model is significantly more useful than the reduced model for_predicting Y.
(d) Do the hypothesis test from part (c) using a significance level of 5% (use the fact that the 95th percentile of the F-distribution with 3 numerator degrees of freedom and 194 denominator degrees of freedom is, using Excel, FINV (O .05 ,3, 194)
= 2.65). Which of the two models would be more appropriate to use to predict Y?