Given restaurant inspection dataset for 3 consecutive years (dataset2016, dataset2017, and dataset2018 as uploaded on Canvas for bonus assignment #5), please build a model using 3 datasets, to determine which restaurants are likely to have another food safety issue in September 2018.
You can train your model using all data in dataset2016 and dataset2017, and data before April 1, 2018 in dataset2018. Then use data after April 1, 2018 in dataset2018 as test data to evaluate your model performance and show the performance result with area under PR and area under ROC.
I have already merged the file and transformed the categorical data. you just have to apply the model