MIS-655 Identifying Patterns and Relationships Directions: Use the information below to complete

MIS-655 Identifying Patterns and Relationships

Directions: Use the information below to complete this assignment.

Part 1

Descriptions of the variables are as follows:

Age: Age of the respondent

Race: Race of the respondent

Sex: Sex of the respondent

Marital Status: Marital status of the respondent

Occupation: Occupational category of the respondent

Education: Highest level of education completed by the respondent

Hours _Per _Week: Number of weekly hours that the individual works

Capital _Gain: Amount of capital gains from tax records (in thousands)

Income: Self-reported income of the respondent – either “>50K” or “<=50K"

The marketing department in your organization believes that customers with more education are likely to have higher incomes than those with less education. Based on analysis previously conducted showing that zip codes within 3 miles of a college campus in the regional market contain a population that is significantly more educated than other zip codes, the department director proposes an ad campaign targeting these zip codes to reach more of these highly educated individuals. The department director received questions from the executive team about the evidence they have showing that those with more education, in fact, have significantly higher incomes. The department director has asked you to analyze existing customer data to determine whether a relationship exists between level of education and income.

Based on your findings, the marketing department will determine whether or not marketing targeted zip codes close to a college or university is likely to create a positive return on investment. Use R to complete the following:

Question 1: Check your working directory to ensure your file is saved in the correct location and load the “Adult Incomes” data set into your R workspace. Save the data frame as an object called “incomedata.” Verify that your data has loaded correctly by checking the dimensions of the “incomedata” object. Summarize each of the variables in the data set. Include a screenshot of the R console output as part of the answer.

Question 2: When summarizing categorical data, sometimes it is easier to create a new indicator (0/1) variable to obtain a percentage (the average of a binary indicator is a percent). Create a new variable in the “incomedata” data set called “Income_GT50,” where the variable is equal to 1 when the income is greater than $50,000 and 0 otherwise. Using the table function, produce a table showing the number of individuals with incomes greater than $50,000. Include a screenshot of the table as part of the output. Using the appropriate R functions (s), calculate the overall percentage of individuals in the data set who earn more than $50,000 annually.

Question 3: You are interested in understanding the demographics of a particular cohort in your data set: those who have a bachelor’s degree or higher whose occupation is also “Exec-managerial.” (Hint: First write code to subset those with occupational category of “Exec-managerial,” then write code to subset for educational level.) Use the appropriate R function(s) to report the count of individuals who meet these criteria along with their average age and the percentage who earn more than $50,000 annually. Include a screenshot of the R console output as part of the answer.

Question 4: Using the Age variable, create a new variable that divides Age into age categories. Your categories should be as follows: Under 25; 25-34; 35-44; 45-54; 55-64; 65+. The code to create the first category would be as follows:

incomedat$age_cat[incomedat$Age < 25] <- "<25"

Use this line of code as a template to create the other categories. Ensure that your code worked by showing a table with the age categories and the counts of individuals in each category.

Question 5: Create a cross tab showing the count of customers in each age bracket who earn more than $50,000 annually. Use the appropriate R function(s) to calculate the percentage of customers who earn more than $50,000 annually for each age bracket. Include a screenshot of the R console output as part of the answer. Which age category has the highest percentage of individuals that earn over $50k?

Question 6: Using the appropriate R functions, calculate the percent of customers earning more than $50,000 annually by education level. Include a screenshot of the R console output as part of your answer. What education level has the highest percentage of individuals that earn over $50k?

Part 2 (Analysis of results and recommendations): Based on your analysis, is there a relationship between level of education and income? Present your findings and recommendations to the marketing department in the form of a 250-word (minimum) executive summary that includes relevant data, charts, and tables in Microsoft Word. Be sure to include your R code and R output as a .txt file with your submission.

1

© 2021. Grand Canyon University. All Rights Reserved.