STATISTICS

[Student Name]

[Student Number]

[Date]

Problem 1

1. Run regression
2. The results of the regression of inc_2p_share on inc_tenure, ch_qual, inc_pos, and inc_spend are as illustrated in the table below:
 Estimate Standard error p-values Intercept 0.06534 0.01165 0.00000000 inc_tenure 0.001771 0.0006232 0.00482 ch_qual -0.02368 0.003225 0.00000000 inc_pos -0.006731 0.01606 0.67537 inc_spend -0.00000000862 0.000000001677 0.0000005 Multiple R-squared: 0.3059, Adjusted R-squared: 0.2963
1. The regression equation can be expressed as:

The regression can be interpreted in a number of ways while considering one variable and holding all other variables constant:

1. The coefficient for the number of years the incumbent has served in the Senate (inc_tenure) is 0.001771 which indicates that for a unit increase in the years served, the share of the vote they receive in an election increases by 0.001771
2. The coefficient for quality of the challenger (ch_qual) is -0.02368 which indicates that depending on the type of challenger (inexperienced (0), state legislators (1), local elected officials (2), governors (3), former house member(4)), the share of the vote they receive in an election decreases by 0.02368 multiplied by the factor of experience
• The coefficient for the incumbent’s Common Space score (inc_pos) is -0.006731 which indicates that for each unit increase in the measure of voting ideology, the share of the vote the candidate receives in an election decreases by 0.006731.
1. The coefficient for the amount the incumbent spent in campaigning for the election in dollars (inc_spend) is -0.00000000862 which indicates that for every extra dollar spent by the candidate, the share of the vote the candidate receives in an election decreases by 0.00000000862.
2. The intercept value of 0.06534 indicates that if all variables are equal to zero, the baseline share that any candidate may receive is 0.06534.
3. For each coefficient, comment on the statistical significance. Indicate if the coefficient is significant at any of the commonly accepted levels (
)
4. At

Variables inc_tenure, ch_qual and inc_spend are statistically significant at this level of significance because their p-values are less than 0.1. However, the p-value for variable inc_pos is 0.67537 which is greater than 0.1, this indicates that the variable is not statistically significant

1. At

Variables inc_tenure, ch_qual and inc_spend are statistically significant at this level of significance because their p-values are less than 0.05. However, the p-value for variable inc_pos is 0.67537 which is greater than 0.05, which indicates that it is not statistically significant

• At

Variables inc_tenure, ch_qual and inc_spend are statistically significant at this level of significance because their p-values are less than 0.01. However, the p-value for variable inc_pos is 0.67537 which is greater than 0.01, which indicates that it is not statistically significant

1. Perform a two-sided test of the null hypothesis that the coefficient on inc_pos is 0.5 in the population

The hypothesis to be tested can be stated as:

The t-score test statistic is calculated as:

The degrees of freedom are computed as:

We therefore determine the p-value:

Therefore, the p-value is 0.6623 + 0.6623 = 1.3246.

Since the p-value (1.3246) is greater than the significance level (0.05), we fail to reject the null hypothesis.

1. Interpretation of the coefficient, ch_qual
2. The code snippet below shows how to create indicator variables named ch_house, ch_gov, ch_local and ch_legis

#Creating a data frame for the data set

df.us_senate = data.frame(us_senate_data)

#Replacing all NA’s with zeros

df.us_senate[is.na(df.us_senate)]<-0

#Add in the new indicator variables

df.us_senate[c(“ch_house”, “ch_gov”,”ch_local”,”ch_legis”)]<-0

1. The results of the regression of inc_2p_share on inc_tenure, ch_house, ch_gov, ch_local, ch_legis, inc_pos, and inc_spend are as illustrated in the table below:
 Estimate Standard error p-values Intercept 4.510e-01 3.067e-02 <2e-16 *** inc_tenure -2.668e-03 1592e-03 0.0946 ch_legis -1.687e-02 3.781e-02 0.6557 ch_local 1.091e-01 6.479e-02 0.0930 ch_gov -5.162e-02 4.111e-02 0.2100 ch_house 3.118e-03 3.768e-02 0.9341 inc_pos -4.505e-02 4.210e-02 0.2853 inc_spend 2.564e-08 4.275e-09 4.83e-09 *** Multiple R-squared: 0.1146, Adjusted R-squared: 0.09759

From the table above, it is evident that none of the newly created dummy variables is statistically significant as their p-values are all greater than the level of significance. This is unlike the initial regression model where the variable ch_qual was statistically significant.

The coefficient for variable ch_legis is -1.687e-02 which indicates that if the challenger is a former state legislator, the proportion of the two party vote won by the incumbent decreases by 0.01687. Further, the coefficient for variable ch_local indicates that if the challenger is a former local elected official, the proportion of the two party vote won by the incumbent increases by 0.1091.

The coefficient for variable ch_gov indicates that if the challenger is a former governor, the proportion of the two party vote won by the incumbent decreases by 0.05162. The coefficient for the variable ch_house indicates that if the challenger is a former House Member, the proportion of the two party vote won by the incumbent decreases by 0.003118.

1. Compare your results (coefficients, p-values, and R-squared values) to those estimated in Part A. Did this change alter the results substantively?

Yes. There is a substantive change in the latest regression output compared to that in Part A. In part A, 3 out of 4 variables were statistically significant with p-values less than the significance level of 0.05. However, in the latest regression model in section B, only one variable is statistically significant (inc_spend) with a p-value of 4.83e-09.

1. Improving our measure of incumbent spending
2. Create a new variable named spend_diff that is the difference in incumbent spending and challenger spending

The R-code below shows how to create the new variable, spend_diff.

# Section C Part 1

# New variable spend_diff

df.us_senate[c(“spend_diff”)]<-0

for(i in 1: length(df.us_senate\$inc_spend))

{

df.us_senate\$spend_diff[i] = df.us_senate\$inc_spend[i] – df.us_senate\$ch_spend[i]

}

1. The table below illustrates the regression results (coefficients, p-value, standard errors) while using variable spend_diff in place of inc_spend.
 Estimate Standard error p-values Intercept 4.995e-01 2.787e-02 <2e-16 *** inc_tenure -4.184e-03 1578e-03 0.00838 ** ch_legis 7.742e-03 3.784e-02 0.83800 ch_local 1.122e-01 6.476e-02 0.08398 ch_gov -4.855e-02 4.108e-02 0.23807 ch_house -8.089e-03 3.768e-02 0.83012 inc_pos -6.279e-02 4.234e-02 0.13890 spend_diff 3.347e-08 5.572e-09 4.6e-09 *** Multiple R-squared: 0.1149, Adjusted R-squared: 0.09783

From the table above, the coefficient of variable spend_diff is 3.347e-08 indicates that a unit increase in the difference in incumbent spending and challenger spending results in an increase in the proportion of the two party vote won by the incumbent by 0.003118. The variable is also highly statistically significant to the regression model with a p-value of 4.6e-09 which is less than the significance level of

1. Run the regression with the new measure of spending, spend_diff_percap with the tabular results as below:
 Estimate Standard error p-values Intercept 0.472972 0.027648 <2e-16 *** inc_tenure -0.004688 0.001537 0.00246 ** ch_legis 0.022600 0.036809 0.53961 ch_local 0.116204 0.062772 0.06495 ch_gov -0.019358 0.039902 0.62788 ch_house 0.009509 0.036565 0.79497 inc_pos -0.061796 0.040941 0.13207 spend_diff_carp 0.132425 0.016740 3.12e-14*** Multiple R-squared: 0.1699, Adjusted R-squared: 0.01539
1. The coefficient of the variable spend_diff_carp is 0.132425 which indicates that unit increase in the weighted spend per person results in a 0.132425 which results in a proportional increase in the proportion of the two party vote won by the incumbent. The variable spend_diff_carp is highly statistical given that its p-value (312e-14) is less than the level of significance.

Problem 2

1. Run a regression of the time-to-ratification ratificationtime on Polity IV Democracy score pol. Report your results.
 Estimate Standard error t value p-values Intercept 8.59972 0.51560 16.679 <2e-16 pol -0.19291 0.06575 -2.934 0.00409

The regression equation can be expressed as:

1. Interpret the estimated coefficient on pol and conduct a test of the hypothesis

The coefficient for Polity IV score is -0.19291 which indicates that a unit increase in the level of democracy (on the scale of -10 to 10), the time taken by states to sign and ratify the CAT decreases by 0.19291.

The hypothesis to be tested can be stated as:

The t-score test statistic is calculated as:

The degrees of freedom are computed as:

We therefore determine the p-value:

Therefore, the p-value is 0.998 + 0.998 = 1.996.

Since the p-value (1.3246) is greater than the significance level (0.05), we fail to reject the null hypothesis (Fox, 2008, p. 105).

1. Can your estimated coefficient in A be interpreted causally? Evaluate the assumption of zero conditional mean error – in particular, pay attention to what other variables in the dataset might be correlated with both democracy score and time until ratification.

Yes. The estimated coefficient of Polity IV Democracy pol can be causally interpreted.

1. Democracy tends to correlate with levels of economic development. Run a regression of the time-to-ratification ratificationtime on Polity IV Democracy score pol and lrgdp96pc real GDP per capita. Present your results in a neatly formatted table
 Estimate Standard error t value p-values Intercept 26.02824 4.60074 5.657 1.3e-07 pol -0.03856 0.07405 -0.521 0.603694 lrgdp96pc -2.08361 0.54695 -3.810 0.000233

The regression equation can be expressed as:

1. Is the coefficient on GDP per capita statistically significant (we reject the null that it equals zero)? Is the coefficient on Democracy score statistically significant?

The coefficient of GDP per capita is statistically significant with a p-value of 0.000233 which is less than the level of significance (0.05). The coefficient of democracy score is not significant given that its p-value of 0.603694 is greater than the level of significance (0.05)

1. Compare these results to your results from A. What is the difference between how we interpret the coefficient in A and how we interpret the coefficient here?

In this case, the coefficient of the democracy score is -0.03856 which indicates that a unit increase in the level of democracy (on the scale of -10 to 10), decreases the time taken by states to sign and ratify the CAT by 0.03856 unlike in part A where it reduced by a larger value (0.19291).

1. Researchers have hypothesized that new democracies are most likely to quickly sign on to and ratify human rights treaties
2. Run a regression of ratificationtime on newdem and report your results in a neatly formatted table. Interpret your regression results. Do we reject the null that there is no association between whether a country is a new democracies and time-to-ratification of the CAT?
 Estimate Standard error t value p-values Intercept 9.1188 0.5392 16.911 <2e-16 newdem -5.2299 1.8851 -2.774 0.00652

The regression equation can be expressed as:

We reject the null hypothesis that that there is no association between whether a country is a new democracies and time-to-ratification of the CAT since the variable newdem is statistically significant in determining the ratification time as evidenced by its p-value (0.00652) which is less than the level of significance (0.05)

1. We have reason to believe that factors like rule of law, respect for rights and economic development are associated with both time to ratification and whether a country is a new democracy or not. Run a regression of ratificationtime on newdem and lrgdp96pc, pol and law. Add these results to your table in Part 1. Interpret your results for the coefficient on newdem and evaluate its statistical significance at the
 Estimate Standard error t value p-values Intercept 27.03285 4.67713 5.780 7.74e-08 newdem -4.87095 1.87850 -2.593 0.01087 Pol 0.04084 0.07991 0.511 0.61040 lrgdp96pc -2.05648 0.61006 -3.371 0.00105 Law -0.25280 0.40425 -0.625 0.53310

The regression equation can be expressed as:

The coefficient of newdem is -4.87095 which indicates that if a country is a new democracy, the time taken to sign and ratify the CAT is reduced by 4.87095. At level of significance, the variable newdem is statistically significant given that its p-value is less than the level of significance.

1. A fellow researcher hypothesizes that countries with common law legal systems take longer to sign international treaties and that new democracies in this period tended to have civil law systems. Re-run your regression from Part 2, but include legcom as another independent variable. Add these results to your table in Part 2. Interpret your coefficient on legcom and evaluate its statistical significance at the
Compare the estimated coefficient on newdem and p-value of your hypothesis test with your results in Part 2.
 Estimate Standard error t value p-values Intercept 22.10228 4.56819 4.838 4.56e-06 newdem -2.82605 1.83941 -1.536 0.127480 Pol -0.04095 0.07788 -0.526 0.600148 lrgdp96pc -1.65882 0.58158 -2.852 0.005237 Law -0.21420 0.37954 -0.564 0.573722 legcom 3.94475 1.01174 3.899 0.000171

The coefficient for legcom is 3.94475 which indicates that if a country has a common law legal system, the time taken to sign and ratify the CAT increases by 3.94475. At level of significance, the variable legcom is statistically significant given that its p-value (0.000171) is less than the level of significance (0.05).

In this case, the coefficient of newdem is -2.82605 compared to -4.87095 in Part 2. Furthermore, the variable newdem is not statistically significant (p-value = 0.127480 > 0.05) in the regression equation of part 3 unlike in part 2 where it was statistically significant.

Problem 3

1. Using the concepts discussed in class, choose one issue with a regression model in the previous problems that was not discussed in the problem.

From the regression model in Problem 2 part C question 3, the adjusted R-square for the model is 0.2993 which implies that 29.93% of the variation of the dependent variable can be explained by the independent variables. However, this is slightly low and could be as a result of biased omission of certain predictor variables in the regression model.

1. Propose an alternative model intended to correct the issue you identify in Part A. Use your alternative model on the replication dataset and provide the new results

I propose running a regression model with ratificationtime as the dependent variable and signtime, censoredsign and censoredratification as the independent variables. The results are as shown in the table below:

 Estimate Standard error t value p-values Intercept 3.67616 0.37945 9.688 2.82e-16 signtime 0.66221 0.05726 11.566 <2e-16 censoredsign -7.75732 1.34694 -5.759 8.34e-08 censoredratification 9.14801 1.06734 8.571 9.07e-14
1. Assess the alternative model or method and compare your new results to those in the original Explain any differences, particularly any that may have substantive implications

The alternative regression model has an adjusted R-square value of 0.7651 which implies that 76.51% of the variation of the dependent variable can be explained by the independent variables. Thus, regression models should be built using highly predictive and statistically significant explanatory variables increase the model’s accuracy.

Works cited

Fox J. Applied Regression Analysis and Generalized Linear Models, SAGE Publications, 2008

Advertisements