DAT 650 Use Cases Document
All of the accompanying data sets can be found in the Assignment Guidelines and Rubrics area of your course.
Employee Attrition
The human resources department within GE has recently become aware that many high-potential employees have left the company to pursue other opportunities. This awareness was raised by many middle managers and supported by the recent increase in job postings. Given the need to remain competitive and the total cost and time required to train new employees, a need is present to identify how talent may be retained within GE. In the current environment the average cost of attrition for an individual is 80% of their annual salary.
The current data environment is an HR web-based desktop system that contains information about all employees, current and past, including their attrition status of YES—they have left—or NO—they have not left. This envornment includes metadata about each employee. It is maintained by HR staff via a web-based Java client server application. The data is stored in an Oracle database in transactional form. The IT department has built a data warehouse that is updated each evening with the current day data. The HR team uses this data warehouse data as the source of their reports and has the ability to have an ad hoc extract to select data into Excel for their unique research needs. The HR extracts are limited in rows as well as fields that have been corporately pre-approved for extraction.
GE has compiled a file for the purpose of this pilot project using the extract tool. The dataset provided in the Assignment Guidelines and Rubrics section of the course, Employee Attrition (CSV), includes information on employees which the GE HR team believes to be relevant to analyzing this problem.
The HR team would like to determine if this data can be used to identify attrition of employees that may leave. It is important to be able to understand attrition drivers for metadata like high performers, role types, and other pertinent groupings which come from the analysis.
The pilot will need to only show basis for this data to be able to describe and generally identify employees that may leave. The management team expects to make a GO or NO GO business decision based on the pilot recommendation. If there is a GO, then GE will allocate new project dollars to arrange for GE resources to develop a full-enterprise deployed predictive analytic model. Note that the results of this pilot will be used as a basis for that next project.
Data Description/Documentation for Employee Attrition
Name
Description
AGE
Numerical Value
ATTRITION
Employee leaving the company (0=no, 1=yes)
BUSINESS TRAVEL
(1=No Travel, 2=Travel Frequently, 3=Tavel Rarely)
DAILY RATE
Numerical Value – Salary Level
DEPARTMENT
(1=HR, 2=R&D, 3=Sales)
DISTANCE FROM HOME
Numerical Value – THE DISTANCE FROM WORK TO HOME
EDUCATION
Numerical Value
EDUCATION FIELD
(1=HR, 2=LIFE SCIENCES, 3=MARKETING, 4=MEDICAL SCIENCES, 5=OTHERS, 6= TEHCNICAL)
EMPLOYEE COUNT
Numerical Value
EMPLOYEE NUMBER
Numerical Value – EMPLOYEE ID
ENVIROMENT SATISFACTION
Numerical Value – SATISFACTION WITH THE ENVIROMENT
GENDER
(1=FEMALE, 2=MALE)
HOURLY RATE
Numerical Value – HOURLY SALARY
JOB INVOLVEMENT
Numerical Value – JOB INVOLVEMENT
JOB LEVEL
Numerical Value – LEVEL OF JOB
JOB ROLE
(1=HC REP, 2=HR, 3=LAB TECHNICIAN, 4=MANAGER, 5= MANAGING DIRECTOR, 6= REASEARCH DIRECTOR, 7= RESEARCH SCIENTIST, 8=SALES EXECUTIEVE, 9= SALES REPRESENTATIVE)
JOB SATISFACTION
Numerical Value – SATISFACTION WITH THE JOB
MARITAL STATUS
(1=DIVORCED, 2=MARRIED, 3=SINGLE)
MONTHLY INCOME
Numerical Value – MONTHLY SALARY
MONTHY RATE
Numerical Value – MONTHY RATE
NUMCOMPANIES WORKED
Numerical Value – NO. OF COMPANIES WORKED AT
OVER 18
(1=YES, 2=NO)
OVERTIME
(1=NO, 2=YES)
PERCENT SALARY HIKE
Numerical Value – PERCENTAGE INCREASE IN SALARY
PERFORMANCE RATING
Numerical Value – ERFORMANCE RATING
RELATIONS SATISFACTION
Numerical Value – RELATIONS SATISFACTION
STANDARD HOURS
Numerical Value – STANDARD HOURS
STOCK OPTIONS LEVEL
Numerical Value – STOCK OPTIONS
TOTAL WORKING YEARS
Numerical Value – TOTAL YEARS WORKED
TRAINING TIMES LAST YEAR
Numerical Value – HOURS SPENT TRAINING
WORK LIFE BALANCE
Numerical Value – TIME SPENT BEWTWEEN WORK AND OUTSIDE
YEARS AT COMPANY
Numerical Value – TOTAL NUMBER OF YEARS AT THE COMPNAY
YEARS IN CURRENT ROLE
Numerical Value -YEARS IN CURRENT ROLE
YEARS SINCE LAST PROMOTION
Numerical Value – LAST PROMOTION
YEARS WITH CURRENT MANAGER
Numerical Value – YEARS SPENT WITH CURRENT MANAGER