Regression Analysis
Author
University
Introduction
This study aims to determine what possible factors may be associated with the level of carbon dioxide emissions seen in various countries. This topic was chosen because of the constant change that is happening in the environment and its significant threat to humans. It is a known fact that the level of carbon dioxide is not the same as compared to how it was before given its 40% increase at the start of the Industrial Revolution (Think Global Green, 2008). The researcher aims to find out what factors can be considered as significant contributors to the potential increase or decrease in the level of carbon dioxide emission (in kt) measured in the environment. The three factors considered were the following: total population, energy use (kg of oil equivalent per capita) and GDP growth (annual %).
Data Gathering
The data used to conduct this research was taken from the website of Worldbank for the year 2011. This includes 160 randomly selected countries from different parts of the world.
Variables
Prior to seeing the results, the initial prediction of the researcher was that the level of carbon dioxide emissions seen in a country is positively affected by the three factors, namely, GDP, total population, and energy use. This means that an increase in any of these three variables would result to an increase in the level of carbon dioxide emission. These variables are also known as coefficient parameters (b’s) because they may or may not explain the variations seen in the dependent variable, level of carbon dioxide.
The table below shows the descriptive statistics of the variables. The mean carbon dioxide emission is 0.030 with a standard deviation of 0.944. The average energy use is -.005 with a standard deviation of 1.010 while population is 0.009 with a standard deviation of 1.050. Lastly, mean GDP growth is 3.573 with a standard deviation of 6.517.
Table 1 Descriptive Statistics
CO2 emissions | Energy Use | Population | GDP growth | |
Mean | 0.030 | -0.005 | 0.009 | 3.573 |
Standard Error | 0.075 | 0.080 | 0.083 | 0.515 |
Median | -0.022 | 0.070 | -0.024 | 3.861 |
Standard Deviation | 0.944 | 1.010 | 1.050 | 6.517 |
Sample Variance | 0.891 | 1.020 | 1.103 | 42.468 |
Kurtosis | -0.285 | 0.275 | -0.532 | 65.216 |
Skewness | 0.032 | 0.125 | 0.045 | -6.573 |
Range | 4.643 | 6.120 | 4.375 | 79.367 |
Minimum | -2.485 | -3.060 | -2.015 | -62.076 |
Maximum | 2.158 | 3.060 | 2.360 | 17.291 |
Sum | 4.842 | -0.768 | 1.467 | 571.611 |
Count | 160.000 | 160.000 | 160.000 | 160.000 |
Confidence Level(95.0%) | 0.147 | 0.158 | 0.164 | 1.018 |
Hypothesis Testing
The null hypothesis states that there is no relationship seen between the independent and dependent variables wherein slope is equal to zero while the alternative hypothesis states that there is a significant relationship seen between the independent and dependent variables wherein slope is not equal to zero. Using a significance level of 0.05, the regression test was done.
Normal Probability Plot
The normal probability plot gives a deeper understanding of the distribution of the data. It is evident that a normal distribution is applicable here since the data follows an almost linearly straight line with no significant outliers seen. Following it is the multiple linear regression result generated by the statistical software. There are three predictors considered and these are GDP, Energy use, and Population. The hypothesis being tested is aimed to check whether the predictor variables (GDP, Energy use, and Population) are able to affect the changes seen in the dependent variable (CO_{2} emission).
Figure 1 Normal Probability Plot
Regression
Assumptions for this statistical test are as follows. Variables are normally distributed and this is seen from the normal probability plot. There is a linear relationship between dependent and independent variables, which is justified through the scattered data points illustrated in the residual plots. Furthermore, homoscedasticity was checked using the standardized residual plots.
Using Excel data analysis, the regression output is shown in Table. The significance of F is very small (<0.0) which shows that this is the probability that the regression output was obtained by chance. This means that the model is significant to the study. The p-value of GDP (p<0.00) and Energy (p<0.00) use are both less than alpha = 0.05 which means that the null hypothesis should be rejected because there is a significant relationship seen between the independent and dependent variables.
The R square value or the coefficient of determination is 0.891. This means that 89% of the variability seen in the level of carbon dioxide emission is explained by the variables: GDP, Energy Use, and Total Population. In addition, the Significance F value is less than the chosen significance level of 0.05 which shows that the results are reliable.
Looking at the coefficients table, the p-value of GDP growth is 0.47> alpha = 0.05 makes it insignificant to the model. On the other hand, the p-value of Energy use is 0.00 as well as the p-value of Population < 0.000 are both less than alpha = 0.05. This shows that these two variables give valuable information to the model.
Table 2RegressionOutput
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.94 | |||||
R Square | 0.89 | |||||
Adjusted R Square | 0.88 | |||||
Standard Error | 0.32 | |||||
Observations | 160.00 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3.00 | 125.49 | 41.83 | 402.08 | 0.00 | |
Residual | 156.00 | 16.23 | 0.10 | |||
Total | 159.00 | 141.72 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 0.04 | 0.03 | 1.22 | 0.22 | -0.02 | 0.09 |
Energy Use | 0.49 | 0.03 | 19.05 | 0.00 | 0.44 | 0.54 |
Population | 0.79 | 0.02 | 32.01 | 0.00 | 0.74 | 0.84 |
GDP growth | 0.00 | 0.00 | -0.73 | 0.47 | -0.01 | 0.00 |
Residuals
The residuals are presented in the figures below and it can be seen that there is no obvious pattern followed by the data points. This indicates the assumption of normally distributed data is valid.
Figure 2 Residual Plot for Energy Use
Figure 3 Residual Plot for Population
Figure 4 Residual Plot for GDP growth
Regression Equation
Based on the model, the regression equation is as follows: CO_{2} level = 0.49*Energy use + 0.79*Population. The variable, GDP growth, was not included because it is not significant to the study based on the computed p-value. The line fit plots presented in the figures below show the “best-fit” line for each independent variable. Energy use is positively correlated to CO2 emission as well as Population while GDP is negatively correlated based on the trend line seen.
Figure 5 Energy Use Line Fit Plot
Figure 6Population Line Fit Plot
Figure 7GDP Growth Line Fit Plot
Conclusion
Based on the results of the analysis, only Population and Energy use were significant because they have a major effect on the level of carbon dioxide emission seen in a country. The number of people seen in a country can be considered as a contributor especially if there are high economic activities seen per person. Furthermore, the widespread use of energy is a significant contributor to the socio economic development of a country which eventually creates a long term impact to the environment in terms of CO_{2} levels. Despite the on-going programs geared towards improving awareness to the community, there is still a lot more to do in order to decrease the increasing rate of carbon dioxide emissions across different countries. This model can be improved by adding another potential factor such as level of industrialization, because it considers fossil fuels and manufacturing productions in the country. Lastly, this research will be more significant by reviewing previous studies related to the topic and constantly updating the data used.
References
EIA. (2015, November 23). U.S. Energy Information Administration – EIA – Independent Statistics and Analysis. Retrieved April 09, 2016, from https://www.eia.gov/environment/emissions/carbon/
The World Bank, World Development Indicators (2011). Population, total [Data file]. Retrieved from http://data.worldbank.org/indicator/SP.POP.TOTL
The World Bank, World Development Indicators (2011). CO2 emissions (kt) [Data file]. Retrieved from http://data.worldbank.org/indicator/EN.ATM.CO2E.KT/countries
The World Bank, World Development Indicators (2011). Energy use (kg of oil equivalent per capita) [Data file]. Retrieved from http://data.worldbank.org/indicator/EG.USE.PCAP.KG.OE/countries
The World Bank, World Development Indicators (2011). GDP growth (annual %) [Data file]. Retrieved from http://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG
Think Global Green. (2008). Carbon Dioxide. Retrieved April 09, 2016, from http://www.thinkglobalgreen.org/carbondioxide.html