Question One

Statistical Inference is the process of drawing conclusions about a population on the basis of sample results (Hogg and Tanis, 1996). Because such inference cannot be absolutely certain, the language of probability is often used in stating conclusions. A parameter is an unknown characteristic of probability distinction whose value determines a particular distribution (Hogg et al, 2004). It is the knowledge of parameters that specifies a distribution. Since parameters are associated with a population, the term ‘population parameters’ is in common use. On the other hand, a statistic is a function of a random sample, which does not depend on the unknown parameter, say θ. Statistical Inference is based on the assumption that some population characteristics can be represented by a random variable, with probability distribution, , and whose form is unknown, except that it contains an unknown parameter, , in its parameter space (Hogg et al, 2004). Point estimation is an approach concerned with methods of using observations to estimate the value of . The three main methods used in parameter estimation are as follows:

  • Method of Moments, µ, proposed by Karl Pearson in 1891
  • Method of Maximum Likelihood, proposed by Fisher in 1912; and
  • Bayesian Estimation Approach, based on Baye’s Theorem

This paper explores the use of Maximum Likelihood approach in estimating parameter values. This approach defines a likelihood function as:


This is so because  are independent and identically distributed as noted. Since the log function of L I monotonic increasing, the value of  which maximizes log L is the same value that maximizes L. However, it is usually easier to maximize log L. The point where the likelihood is a maximum is the solution to the k equations:


-And hence the Maximum Likelihood Estimator.

Solution 1:

Likelihood Function:


Natural Logarithm of the Likelihood Function:



First Order Conditions: α, β,



























Question Two




From Equation Two:









From Equation 2;






Therefore we have                             




Question Three

Hessian Matrix:

Hessian matrices are used in arriving at stationary point classifications, whereby a function  is twice continuously differentiable in , with the first and second order partial derivatives being well defined and continuous throughout the region of interest. A hessian matrix is used as a more systematic approach to stationary point classification, as this is not always possible using the minima, maxima and gradient approaches, which are in common use. A Hessian matrix will take the form shown below:


It is important to note that this matrix is always symmetrical.

The first part of this question is setting up the Hessian matrix of the second partials at . Therefore, we must compute the second order derivatives for the three parameters/ variables as shown below:








It can be seen from the above that the own-partials for both  are negative. The second partial derivative of  is negative becausegreater than.

We know that the determinant of a three-by-three matrix is computed as follows:

Given a 3 X 3 matrix, M as , the Determinant is defined as follows:


Clearly, from the above, the Determinant of the Hessian matrix here will be negative, at .

Question Four

Given that the probability of the random variable, X depends on an unknown parameter, , the Information matrix, also known as Fisher’s Information matrix is a way of measuring the amount of information that such a random variable provides about the parameter of interest. The Probability Density Function of X, conditioned on  is , which, as defined in the previous section, is the likelihood function for . Fisher’s Information matrix is computed as:


The above expression denotes that we would compute the conditional expectation over  with respect to the probability density function, . This value is always greater than zero (non-negative).  If the log of  is twice differentiable with respect to  , with certain factors held constant, Fisher’s Information may be represented as follows:


Fisher’s Information is therefore the negative of the expected value of the second partial derivative with respect to  of the natural log of . We know that the second partial derivatives arranged in the form of a square matrix form the Hessian matrix. It is therefore proven that an Information matrix is the negative of the inverse of the Hessian matrix.















  1. Hogg, R.V., Tanis, E.A. (1996), Probability and Statistical Inference, 5th Edition, Prentice Hall
  2. Hogg, R.V., Craig, A.T., McKean, J.W. (2004), Introduction to Mathematical Statistics, 6th Edition, Pearson

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s