Statistical Inference is the process of drawing conclusions about a population on the basis of sample results (Hogg and Tanis, 1996). Because such inference cannot be absolutely certain, the language of probability is often used in stating conclusions. A parameter is an unknown characteristic of probability distinction whose value determines a particular distribution (Hogg et al, 2004). It is the knowledge of parameters that specifies a distribution. Since parameters are associated with a population, the term ‘population parameters’ is in common use. On the other hand, a statistic is a function of a random sample, which does not depend on the unknown parameter, say θ. Statistical Inference is based on the assumption that some population characteristics can be represented by a random variable, with probability distribution, , and whose form is unknown, except that it contains an unknown parameter, , in its parameter space (Hogg et al, 2004). Point estimation is an approach concerned with methods of using observations to estimate the value of . The three main methods used in parameter estimation are as follows:
- Method of Moments, µ, proposed by Karl Pearson in 1891
- Method of Maximum Likelihood, proposed by Fisher in 1912; and
- Bayesian Estimation Approach, based on Baye’s Theorem
This paper explores the use of Maximum Likelihood approach in estimating parameter values. This approach defines a likelihood function as:
This is so because are independent and identically distributed as noted. Since the log function of L I monotonic increasing, the value of which maximizes log L is the same value that maximizes L. However, it is usually easier to maximize log L. The point where the likelihood is a maximum is the solution to the k equations:
-And hence the Maximum Likelihood Estimator.
Natural Logarithm of the Likelihood Function:
First Order Conditions: α, β,
From Equation Two:
From Equation 2;
Therefore we have
Hessian matrices are used in arriving at stationary point classifications, whereby a function is twice continuously differentiable in , with the first and second order partial derivatives being well defined and continuous throughout the region of interest. A hessian matrix is used as a more systematic approach to stationary point classification, as this is not always possible using the minima, maxima and gradient approaches, which are in common use. A Hessian matrix will take the form shown below:
It is important to note that this matrix is always symmetrical.
The first part of this question is setting up the Hessian matrix of the second partials at . Therefore, we must compute the second order derivatives for the three parameters/ variables as shown below:
It can be seen from the above that the own-partials for both are negative. The second partial derivative of is negative becausegreater than.
We know that the determinant of a three-by-three matrix is computed as follows:
Given a 3 X 3 matrix, M as , the Determinant is defined as follows:
Clearly, from the above, the Determinant of the Hessian matrix here will be negative, at .
Given that the probability of the random variable, X depends on an unknown parameter, , the Information matrix, also known as Fisher’s Information matrix is a way of measuring the amount of information that such a random variable provides about the parameter of interest. The Probability Density Function of X, conditioned on is , which, as defined in the previous section, is the likelihood function for . Fisher’s Information matrix is computed as:
The above expression denotes that we would compute the conditional expectation over with respect to the probability density function, . This value is always greater than zero (non-negative). If the log of is twice differentiable with respect to , with certain factors held constant, Fisher’s Information may be represented as follows:
Fisher’s Information is therefore the negative of the expected value of the second partial derivative with respect to of the natural log of . We know that the second partial derivatives arranged in the form of a square matrix form the Hessian matrix. It is therefore proven that an Information matrix is the negative of the inverse of the Hessian matrix.
- Hogg, R.V., Tanis, E.A. (1996), Probability and Statistical Inference, 5th Edition, Prentice Hall
- Hogg, R.V., Craig, A.T., McKean, J.W. (2004), Introduction to Mathematical Statistics, 6th Edition, Pearson