statistics assignment

Applied Multivariate Statistical Analysis





















Part A

Response to Question 1:

We chose protein and fat for the scatter plot. There is very level of protein and fat in Sub- Saharan Africa and Latin America & Caribbean (group 2 & 3 regions), with the highest being in Latin America & Caribbean. The lowest levels are found in East Asia and North Africa the very least being East Asia. See attached scatter plot from excel.

(Page 579)

Response to Question 2:

  1. Linear discriminant analysis: We used the Eigen vector matrix in order to transform the sample into the subspace. The equation associated with that is given as; equation Y = X – Z .Then we establish the structure matrix that is pooled within groups correlation between discriminating variables.
  2. Quadratic discriminant analysis: The data from the sample was subjected to treatment in SPSS and the sample will be grouped according to the quadratic boundaries that we establish from the statistical treatment of data as shown in the attached output. It is good to point out that in linear we us linear boundaries while in this one we use the quadratic boundaries
  3. Multinomial logistic regression: We used the model of nominal outcome variables. The predicted probabilities will determine to which group each one belongs. The result was worked out from the SPSS output as attached. We use the data to explain the relationship between on dependent nominal variable (food) and the two other independent variables (fat & protein) to classify them into groups as shown in the output window.
  4. Classification trees: We pick the measurements of protein as our dependent on those of fat and food. The groups are then classified by how much in common are they in the relationship between the dependent variable (protein) and the independent variables (fat and food)

(pg 644)

Response to Question 3:                                      

Fat(Positive) 70 Protein(Negative) 81
Fat(Negative) 54 Protein (Positive) 90


Error =(54+81)/(151+144) X 100 = 45.76%

(pg 599)


See attached SPSS output

Confusion matrix = fraction of the sample misclassified

Response to Question 4:

  1. K-means cluster analysis: We used the K – means cluster and the groups are given in the attached SPSS sheet.
















Johnson, A. R., & Wichern, W. D. (2007). Applied Multivariate Statistical Analysis. Upper Saddle River: Pearson Education Inc.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s