Uncategorized

MIS772 Predictive Analytics 1 / 1 MIS772 2020 T2 A2 Adv Predictive Models for Business AirbnbAI approached you again to develop a RapidMiner process capable of analysing customer feelings (sentiment) about their stay in one of the Sydney Airbnb rental properties.

MIS772 Predictive Analytics 1 / 1
MIS772 2020 T2
A2 Adv Predictive Models for Business
AirbnbAI approached you again to develop a RapidMiner process
capable of analysing customer feelings (sentiment) about their stay
in one of the Sydney Airbnb rental properties.
AirbnbAI sent you a data set of 36,000 rental listings and the text of
548,000 reviews across 38 Sydney neighbourhoods. The provided
information has been partially cleaned up and includes a variety of
numerical, nominal and text attributes, description of which can be
found on the Inside Airbnb web site (source to upper-right).
AirbnbAI would like you to use RapidMiner to analyse (mainly) text
contained in the data set. AirbnbAI technical advisers suggested to
address the following issues and provided some helpful hints:
A) Is there a significant discrepancy between the sentiment of a property host and of
the customers? And if so, in which property-types and neighbourhoods is this
most pronounced? (use Operator Toolbox sentiment tools, Join and Aggregate)
B) What property groups can be identified purely from their textual description, what
are their characteristics and the recent (i.e. 2020) sentiment of customers?
(use text mining, sentiment analysis, data clustering and segmentation analysis)
C) Can the customer sentiment be predicted for the newly listed properties purely by
looking at their text description? If not what other aspects of the rental property
need to be also considered? (use an estimation model)
AirbnbAI wants you to use RapidMiner to cleanup and explore the
provided data, then conduct sentiment analysis, text mining and
cluster analysis, as well as develop and evaluate an estimator to
predict the customer sentiment, to minimise RMSE, MAE and r2.
The following mini-case study will be used in assignment A2.
Data: http://www.deakin.edu.au/~jlcybuls/pred/data/AirBnB_Reviews_Sydney.zip
Source: http://insideairbnb.com/get-the-data.html
This assignment aims for students to learn how to …
● Articulate problems and solutions in business terms
● Gain insights from text data
● Prepare data for different models
● Develop estimation and clustering models
● Assess and report model performance
● Become curious about the world through data and analytics
Individual Tasks and Deliverables
Partial Submission (Question A – marked with the final submission)
Exec Problem: Define your problem in business terms, in doing so answer
question A, cross-reference with other report sections for support.
Data Preparation: Deal with duplicates, bad and missing values (may use
imputation). Transform the selected attributes or create the new ones as
needed. Produce supporting charts and tables to answer question A.
Final Submission (Questions B and C)
Exec Solution: Describe your solution in business terms, in doing so
answer questions B and C, cross-reference with other report sections.
Data Exploration: Use clustering and segmentation analysis to investigate
groups of rental properties based on their text description, consider
customer sentiment as one of the aspects. Deal with anomalies. Visualise
clusters and anomalies. Provide support for question B.
Model: Create two estimation models and variants, i.e. linear regression
and decision tree, optionally also an ensemble, to address question C.
Evaluation and Optimisation: Optimise clustering and estimation models.
Evaluate them. Compare the performance of different models and select
the best. Provide support for questions B and C.
Honest Testing: Honest test the best estimation model and investigate
results. Also apply a model to a single listing and describe the results.
● See CloudDeakin for more info,
especially the assignment template
and the assessment rubric.
● When in doubt students will be asked
to present their work to the markers.
● Weekly and comprehensive progress
submissions are compulsory.
● Late penalty of 5% per day on the final
submission will apply. No extensions /
lateness over 5 days.
Unzip and use the files as you see fit.

Leave a Reply

Your email address will not be published. Required fields are marked *