Name/s:
ID number/s:
Research Methods MA 2022/23
Assignment #2 – Databases and data preparation
Please submit the assignment in pdf format only.
For this assignment you need to search for a dataset in the field of social science and examine the data. Specifically, you are asked to find the data yourselves and not to use data which was provided to you in this course. Please use the recitation presentation on databases as a referral and instruction manual on how to do so. Once you find a dataset you want to work with, download it to your personal computer and open it using JMP for further analysis.
Pay attention!
If the database you found does not have questions corresponding to the three levels of measurement (nominal, ordinal and continuous), you must find another database.
Do not use screenshots as an answer but only attach them as attachments. You must write the answers completely yourself.
The only two data files you cannot use are the INES 2015 file and the ESS 2010 file because they are both on the course website and we use them on a regular basis
Part A. (60 Pts)
The dataset you are using must be uploaded to Moodle together with the answer file. If the data file is too large, you can compress it with a zip program and upload the zip file to the website (10 points)
Briefly explain about the dataset (don’t copy/paste the informaiton) (20 pts).
What information does it gather?
What is the unit of measurment?
How many observations it contains?
Who collected the data?
If this is indicated on the website, explain: how was the data collected (what is the research tool) and what is the samlpling method?
Choose three variables: one at each level of measurement (nominal, ordinal and continuous).
Explain what each variable tests, and what the possible answers given to each variable are.
Briefly describe and present relevant descriptive statistics (all relevant measures of center and dispersion and a corresponding graph) for each of the variables. The data must be displayed after defining missing values, and defining labels if necessary. (18 points).
For the 2 categorical variables (nominal and ordinal), present frequency tables (do not copy/paste the tables but prepare tables based on the data) (12 pts).
Part B. (40 pts)
Choose another continuous variable in the dataset (not the one you’ve chosen for part A).
Prepare it for analysis: define its measurment scale and define user missing values.
Describe it: Explain what does it measure, present the value of the responses and relevant descriptive statistics (both central tendency measures and measures of dispersion). (10 pts)
Use ‘Recode’ to create a new variable in which you divide the continuous variable into 3 or more groups. Explain the rationale for dividing the groups – why you decided to divide the continuous variable into these groups and not in another way. (10 points)
Present relevant descriptive statistics (central and disperison tendencies corresponding to the measure of the current level of measurment) and a frequency table for the grouped variable you created. (10 points)
Compare the two variables—the original one and the the grouped one you created in question 6 —discusss the theoretical and empirical advantages and disadvantaages of each kind of varibale. Which do you think is preferrable for social research and why. (10 pts)
Note – as researchers we need to make conscious decisions on how to use our variables. These decisions must be theoretically driven, and therefore, an autmoatic use of a given variable or an an arbitrary division of values into categories are insufficient.
Good Luck!