** **

**STATISTICS**

[Student Name]

[Student Number]

[Date]

**Problem 1**

- Run regression
- The results of the regression of
*inc_2p_share*on*inc_tenure, ch_qual, inc_pos*, and*inc_spend*are as illustrated in the table below:

Estimate | Standard error | p-values | |

Intercept | 0.06534 | 0.01165 | 0.00000000 |

inc_tenure | 0.001771 | 0.0006232 | 0.00482 |

ch_qual | -0.02368 | 0.003225 | 0.00000000 |

inc_pos | -0.006731 | 0.01606 | 0.67537 |

inc_spend | -0.00000000862 | 0.000000001677 | 0.0000005 |

Multiple R-squared: 0.3059, Adjusted R-squared: 0.2963 |

- The regression equation can be expressed as:

The regression can be interpreted in a number of ways while considering one variable and holding all other variables constant:

- The coefficient for the number of years the incumbent has served in the Senate (
*inc_tenure*) is 0.001771 which indicates that for a unit increase in the years served, the share of the vote they receive in an election increases by 0.001771 - The coefficient for quality of the challenger (
*ch_qual*) is -0.02368 which indicates that depending on the type of challenger (inexperienced (0), state legislators (1), local elected officials (2), governors (3), former house member(4)), the share of the vote they receive in an election decreases by 0.02368 multiplied by the factor of experience

- The coefficient for the incumbent’s Common Space score (
*inc_pos*) is -0.006731 which indicates that for each unit increase in the measure of voting ideology, the share of the vote the candidate receives in an election decreases by 0.006731.

- The coefficient for the amount the incumbent spent in campaigning for the election in dollars (
*inc_spend*) is -0.00000000862 which indicates that for every extra dollar spent by the candidate, the share of the vote the candidate receives in an election decreases by 0.00000000862. - The intercept value of 0.06534 indicates that if all variables are equal to zero, the baseline share that any candidate may receive is 0.06534.
- For each coefficient, comment on the statistical significance. Indicate if the coefficient is significant at any of the commonly accepted levels (

) - At

Variables *inc_tenure, ch_qual *and *inc_spend *are statistically significant at this level of significance because their p-values are less than 0.1. However, the p-value for variable *inc_pos* is 0.67537 which is greater than 0.1, this indicates that the variable is not statistically significant

- At

Variables *inc_tenure, ch_qual *and *inc_spend *are statistically significant at this level of significance because their p-values are less than 0.05. However, the p-value for variable *inc_pos* is 0.67537 which is greater than 0.05, which indicates that it is not statistically significant

- At

Variables *inc_tenure, ch_qual *and *inc_spend *are statistically significant at this level of significance because their p-values are less than 0.01. However, the p-value for variable *inc_pos* is 0.67537 which is greater than 0.01, which indicates that it is not statistically significant

- Perform a two-sided test of the null hypothesis that the coefficient on
*inc_pos*is 0.5 in the population

The hypothesis to be tested can be stated as:

The t-score test statistic is calculated as:

The degrees of freedom are computed as:

We therefore determine the p-value:

Therefore, the p-value is 0.6623 + 0.6623 = 1.3246.

Since the p-value (1.3246) is greater than the significance level (0.05), we fail to reject the null hypothesis.

- Interpretation of the coefficient, ch_qual
- The code snippet below shows how to create indicator variables named ch_house, ch_gov, ch_local and ch_legis

#Creating a data frame for the data set

df.us_senate = data.frame(us_senate_data)

#Replacing all NA’s with zeros

df.us_senate[is.na(df.us_senate)]<-0

#Add in the new indicator variables

df.us_senate[c(“ch_house”, “ch_gov”,”ch_local”,”ch_legis”)]<-0

- The results of the regression of
*inc_2p_share*on*inc_tenure, ch_house, ch_gov, ch_local, ch_legis, inc_pos*, and*inc_spend*are as illustrated in the table below:

Estimate | Standard error | p-values | |

Intercept | 4.510e-01 | 3.067e-02 | <2e-16 *** |

inc_tenure | -2.668e-03 | 1592e-03 | 0.0946 |

ch_legis | -1.687e-02 | 3.781e-02 | 0.6557 |

ch_local | 1.091e-01 | 6.479e-02 | 0.0930 |

ch_gov | -5.162e-02 | 4.111e-02 | 0.2100 |

ch_house | 3.118e-03 | 3.768e-02 | 0.9341 |

inc_pos | -4.505e-02 | 4.210e-02 | 0.2853 |

inc_spend | 2.564e-08 | 4.275e-09 | 4.83e-09 *** |

Multiple R-squared: 0.1146, Adjusted R-squared: 0.09759 |

From the table above, it is evident that none of the newly created dummy variables is statistically significant as their p-values are all greater than the level of significance. This is unlike the initial regression model where the variable *ch_qual *was statistically significant.

The coefficient for variable *ch_legis* is -1.687e-02 which indicates that if the challenger is a former state legislator, the proportion of the two party vote won by the incumbent decreases by 0.01687. Further, the coefficient for variable *ch_local* indicates that if the challenger is a former local elected official, the proportion of the two party vote won by the incumbent increases by 0.1091.

The coefficient for variable *ch_gov* indicates that if the challenger is a former governor, the proportion of the two party vote won by the incumbent decreases by 0.05162. The coefficient for the variable *ch_house* indicates that if the challenger is a former House Member, the proportion of the two party vote won by the incumbent decreases by 0.003118.

- Compare your results (coefficients, p-values, and R-squared values) to those estimated in Part A. Did this change alter the results substantively?

Yes. There is a substantive change in the latest regression output compared to that in Part A. In part A, 3 out of 4 variables were statistically significant with p-values less than the significance level of 0.05. However, in the latest regression model in section B, only one variable is statistically significant (*inc_spend*) with a p-value of 4.83e-09.

- Improving our measure of incumbent spending
- Create a new variable named
*spend_diff*that is the difference in incumbent spending and challenger spending

The R-code below shows how to create the new variable, spend_diff.

# Section C Part 1

# New variable spend_diff

df.us_senate[c(“spend_diff”)]<-0

for(i in 1: length(df.us_senate$inc_spend))

{

df.us_senate$spend_diff[i] = df.us_senate$inc_spend[i] – df.us_senate$ch_spend[i]

}

- The table below illustrates the regression results (coefficients, p-value, standard errors) while using variable
*spend_diff*in place of*inc_spend*.

Estimate | Standard error | p-values | |

Intercept | 4.995e-01 | 2.787e-02 | <2e-16 *** |

inc_tenure | -4.184e-03 | 1578e-03 | 0.00838 ** |

ch_legis | 7.742e-03 | 3.784e-02 | 0.83800 |

ch_local | 1.122e-01 | 6.476e-02 | 0.08398 |

ch_gov | -4.855e-02 | 4.108e-02 | 0.23807 |

ch_house | -8.089e-03 | 3.768e-02 | 0.83012 |

inc_pos | -6.279e-02 | 4.234e-02 | 0.13890 |

spend_diff | 3.347e-08 | 5.572e-09 | 4.6e-09 *** |

Multiple R-squared: 0.1149, Adjusted R-squared: 0.09783 |

From the table above, the coefficient of variable *spend_diff* is 3.347e-08 indicates that a unit increase in the difference in incumbent spending and challenger spending results in an increase in the proportion of the two party vote won by the incumbent by 0.003118. The variable is also highly statistically significant to the regression model with a p-value of 4.6e-09 which is less than the significance level of

- Run the regression with the new measure of spending,
*spend*_*diff_percap*with the tabular results as below:

Estimate | Standard error | p-values | |

Intercept | 0.472972 | 0.027648 | <2e-16 *** |

inc_tenure | -0.004688 | 0.001537 | 0.00246 ** |

ch_legis | 0.022600 | 0.036809 | 0.53961 |

ch_local | 0.116204 | 0.062772 | 0.06495 |

ch_gov | -0.019358 | 0.039902 | 0.62788 |

ch_house | 0.009509 | 0.036565 | 0.79497 |

inc_pos | -0.061796 | 0.040941 | 0.13207 |

spend_diff_carp | 0.132425 | 0.016740 | 3.12e-14*** |

Multiple R-squared: 0.1699, Adjusted R-squared: 0.01539 |

- The coefficient of the variable
*spend_diff_carp*is 0.132425 which indicates that unit increase in the weighted spend per person results in a 0.132425 which results in a proportional increase in the proportion of the two party vote won by the incumbent. The variable*spend_diff_carp*is highly statistical given that its p-value (312e-14) is less than the level of significance.

**Problem 2**

- Run a regression of the time-to-ratification
*ratificationtime*on Polity IV Democracy score*pol*. Report your results.

Estimate | Standard error | t value | p-values | |

Intercept | 8.59972 | 0.51560 | 16.679 | <2e-16 |

pol | -0.19291 | 0.06575 | -2.934 | 0.00409 |

The regression equation can be expressed as:

- Interpret the estimated coefficient on pol and conduct a test of the hypothesis

The coefficient for Polity IV score is -0.19291 which indicates that a unit increase in the level of democracy (on the scale of -10 to 10), the time taken by states to sign and ratify the CAT decreases by 0.19291.

The hypothesis to be tested can be stated as:

The t-score test statistic is calculated as:

The degrees of freedom are computed as:

We therefore determine the p-value:

Therefore, the p-value is 0.998 + 0.998 = 1.996.

Since the p-value (1.3246) is greater than the significance level (0.05), we fail to reject the null hypothesis (Fox, 2008, p. 105).

- Can your estimated coefficient in A be interpreted causally? Evaluate the assumption of zero conditional mean error – in particular, pay attention to what other variables in the dataset might be correlated with both democracy score and time until ratification.

Yes. The estimated coefficient of Polity IV Democracy *pol* can be causally interpreted.

- Democracy tends to correlate with levels of economic development. Run a regression of the time-to-ratification
*ratificationtime*on Polity IV Democracy score*pol*and*lrgdp96pc*real GDP per capita. Present your results in a neatly formatted table

Estimate | Standard error | t value | p-values | |

Intercept | 26.02824 | 4.60074 | 5.657 | 1.3e-07 |

pol | -0.03856 | 0.07405 | -0.521 | 0.603694 |

lrgdp96pc | -2.08361 | 0.54695 | -3.810 | 0.000233 |

The regression equation can be expressed as:

*Is the coefficient on GDP per capita statistically significant (we reject the null that it equals zero)? Is the coefficient on Democracy score statistically significant?*

The coefficient of GDP per capita is statistically significant with a p-value of 0.000233 which is less than the level of significance (0.05). The coefficient of democracy score is not significant given that its p-value of 0.603694 is greater than the level of significance (0.05)

*Compare these results to your results from A. What is the difference between how we interpret the coefficient in A and how we interpret the coefficient here?*

In this case, the coefficient of the democracy score is -0.03856 which indicates that a unit increase in the level of democracy (on the scale of -10 to 10), decreases the time taken by states to sign and ratify the CAT by 0.03856 unlike in part A where it reduced by a larger value (0.19291).

*Researchers have hypothesized that new democracies are most likely to quickly sign on to and ratify human rights treaties*- Run a regression of
*ratificationtime*on*newdem*and report your results in a neatly formatted table. Interpret your regression results. Do we reject the null that there is no association between whether a country is a new democracies and time-to-ratification of the CAT?

Estimate | Standard error | t value | p-values | |

Intercept | 9.1188 | 0.5392 | 16.911 | <2e-16 |

newdem | -5.2299 | 1.8851 | -2.774 | 0.00652 |

The regression equation can be expressed as:

We reject the null hypothesis that that there is no association between whether a country is a new democracies and time-to-ratification of the CAT since the variable *newdem* is statistically significant in determining the ratification time as evidenced by its p-value (0.00652) which is less than the level of significance (0.05)

- We have reason to believe that factors like rule of law, respect for rights and economic development are associated with both time to ratification and whether a country is a new democracy or not. Run a regression of
*ratificationtime*on*newdem*and*lrgdp96pc, pol*and*law*. Add these results to your table in Part 1. Interpret your results for the coefficient on*newdem*and evaluate its statistical significance at the

Estimate | Standard error | t value | p-values | |

Intercept | 27.03285 | 4.67713 | 5.780 | 7.74e-08 |

newdem | -4.87095 | 1.87850 | -2.593 | 0.01087 |

Pol | 0.04084 | 0.07991 | 0.511 | 0.61040 |

lrgdp96pc | -2.05648 | 0.61006 | -3.371 | 0.00105 |

Law | -0.25280 | 0.40425 | -0.625 | 0.53310 |

The regression equation can be expressed as:

The coefficient of *newdem* is -4.87095 which indicates that if a country is a new democracy, the time taken to sign and ratify the CAT is reduced by 4.87095. At level of significance, the variable *newdem* is statistically significant given that its p-value is less than the level of significance.

- A fellow researcher hypothesizes that countries with common law legal systems take longer to sign international treaties and that new democracies in this period tended to have civil law systems. Re-run your regression from Part 2, but include
*legcom*as another independent variable. Add these results to your table in Part 2. Interpret your coefficient on*legcom*and evaluate its statistical significance at the

Compare the estimated coefficient on*newdem*and p-value of your hypothesis test with your results in Part 2.

Estimate | Standard error | t value | p-values | |

Intercept | 22.10228 | 4.56819 | 4.838 | 4.56e-06 |

newdem | -2.82605 | 1.83941 | -1.536 | 0.127480 |

Pol | -0.04095 | 0.07788 | -0.526 | 0.600148 |

lrgdp96pc | -1.65882 | 0.58158 | -2.852 | 0.005237 |

Law | -0.21420 | 0.37954 | -0.564 | 0.573722 |

legcom | 3.94475 | 1.01174 | 3.899 | 0.000171 |

The coefficient for *legcom* is 3.94475 which indicates that if a country has a common law legal system, the time taken to sign and ratify the CAT increases by 3.94475. At level of significance, the variable *legcom* is statistically significant given that its p-value (0.000171) is less than the level of significance (0.05).

In this case, the coefficient of *newdem *is -2.82605 compared to -4.87095 in Part 2. Furthermore, the variable *newdem *is not statistically significant (p-value = 0.127480 > 0.05) in the regression equation of part 3 unlike in part 2 where it was statistically significant.

**Problem 3**

- Using the concepts discussed in class, choose one issue with a regression model in the previous problems that was not discussed in the problem.

From the regression model in Problem 2 part C question 3, the adjusted R-square for the model is 0.2993 which implies that 29.93% of the variation of the dependent variable can be explained by the independent variables. However, this is slightly low and could be as a result of biased omission of certain predictor variables in the regression model.

- Propose an alternative model intended to correct the issue you identify in Part A. Use your alternative model on the replication dataset and provide the new results

I propose running a regression model with *ratificationtime *as the dependent variable and *signtime, censoredsign *and* censoredratification* as the independent variables. The results are as shown in the table below:

Estimate | Standard error | t value | p-values | |

Intercept | 3.67616 | 0.37945 | 9.688 | 2.82e-16 |

signtime | 0.66221 | 0.05726 | 11.566 | <2e-16 |

censoredsign | -7.75732 | 1.34694 | -5.759 | 8.34e-08 |

censoredratification | 9.14801 | 1.06734 | 8.571 | 9.07e-14 |

- Assess the alternative model or method and compare your new results to those in the original Explain any differences, particularly any that may have substantive implications

The alternative regression model has an adjusted R-square value of 0.7651 which implies that 76.51% of the variation of the dependent variable can be explained by the independent variables. Thus, regression models should be built using highly predictive and statistically significant explanatory variables increase the model’s accuracy.

Works cited

Fox J. *Applied Regression Analysis and Generalized Linear Models*, SAGE Publications, 2008