## Empirical Research Methods- Economics

**Question 1 (15 points)**

a. Why is the IIA property of the multinomial logit model an advantage and a disadvantage?

b. Provide a brief (i.e. no equations) explanation of the Hausman-McFadden test.

**Question 2 (20 points)**

a. Suppose that Dr. Smith’s research assistant mistakenly deleted observations for all respondents who graduated from a Liberal Arts University. Hoping to get some results on the relationship between Starting Salary and work experience for the population, Dr. Smith estimated the same model but on the sample of respondents who attended technological universities, i.e. he estimated

Starting Salary = a + b (*years of work experience*) + c (*male*) + e (*age*) + error term

for those respondents who attended a technological university.

i. Among his results, Dr. Smith found a positive relationship between starting salary and work experience. Briefly explain and graphically illustrate (with *Starting Salary *on the y-axis and *years of work experience *on the x-axis) why these estimates will be biased.

ii. Based upon a logit framework, suggest how might Dr. Smith correct for the bias in ‘i’?

**Question 3 (35 points)**

a. Consider a mode choice problem, using Greene’s data (210 respondents). The excel data set is mode choice Greene and is located on t-square.

Variables in the dataset include:

Mode 0/1 for mode chosen

Ttme Terminal Waiting Time

Invc In-vehicle Cost for all Stages

Invt In-vehicle Time for all Stages

Gc Generalized cost measure = Invc + Invt * value of time

Hinc Household income (000)

Psize Travelling Party Size

Id Household ID

Alt Index for the mode: 1=Air, 2=Train, 3=Bus, 4=Car

Provide summary statistics by mode and brief comments for variables in the dataset.

b. Using Greene’s data (210 respondents) estimate a (random coefficients) mixed logit model with two variables, in-vehicle travel time and generalized cost, both of which are assumed to be normally distributed.

~ N() ivcostˆβ2ivcostivcost,βσ

~ N() ivtimeˆβ2ivtimeivtime,βσ

Automobile is the reference mode. Interpret the results.

c. Based on the results in ‘a’, re-estimate the mixed logit model where only in-vehicle travel time is a random coefficient. Estimate two models, one with psize that is associated only with the automobile alternative and one without psize. Compare these results with those in part ‘b’ and with each other?

d. For the best model, for what percent of the population does in-vehicle travel time have a non-zero effect?

e. Demonstrate that the mixed logit model does not satisfy the IIA property. Is this an advantage or disadvantage?

4. **Question 4 (30 points)**

Use the dataset mdvisits.xls to answer this question. Variable definitions are in the excel file.

a. What is the unconditional mean and variance for number of doctor’s visits?

b. If estimating a Poisson model, what is the expected relationship between the conditional mean and the conditional variance?

c. i. Estimate and interpret a Poisson regression model where the dependent variable is the number of doctor’s visits and the explanatory variables are age, educ, reform, whether the individual is in bad health, and the logarithm of income.

ii. What is the conditional mean and conditional variance? Is this consistent with the assumptions underlying the Poisson regression model?

d. In what sense might the Poisson model be misspecified and how do you test for this?

e. Test the mean-variance equality of the above model by estimating a negative binomial model