*Question One*

Statistical Inference is the process of drawing conclusions about a population on the basis of sample results (Hogg and Tanis, 1996). Because such inference cannot be absolutely certain, the language of probability is often used in stating conclusions. A parameter is an unknown characteristic of probability distinction whose value determines a particular distribution (Hogg et al, 2004). It is the knowledge of parameters that specifies a distribution. Since parameters are associated with a population, the term ‘population parameters’ is in common use. On the other hand, a statistic is a function of a random sample, which does not depend on the unknown parameter, say θ. Statistical Inference is based on the assumption that some population characteristics can be represented by a random variable, with probability distribution, , and whose form is unknown, except that it contains an unknown parameter, , in its parameter space (Hogg et al, 2004). Point estimation is an approach concerned with methods of using observations to estimate the value of . The three main methods used in parameter estimation are as follows:

- Method of Moments, µ, proposed by Karl Pearson in 1891
- Method of Maximum Likelihood, proposed by Fisher in 1912; and
- Bayesian Estimation Approach, based on Baye’s Theorem

This paper explores the use of Maximum Likelihood approach in estimating parameter values. This approach defines a likelihood function as:

This is so because are independent and identically distributed as noted. Since the log function of L I monotonic increasing, the value of which maximizes log L is the same value that maximizes L. However, it is usually easier to maximize log L. The point where the likelihood is a maximum is the solution to the k equations:

-And hence the Maximum Likelihood Estimator.

*Solution 1:*

Likelihood Function:

Natural Logarithm of the Likelihood Function:

First Order Conditions: α, β,

Therefore:

** **

*Question Two*

**Numerator:**

From Equation Two:

**Denominator:**

From Equation 2;

Therefore we have

Similarly:

*Question Three*

**Hessian Matrix:**

Hessian matrices are used in arriving at stationary point classifications, whereby a function is twice continuously differentiable in , with the first and second order partial derivatives being well defined and continuous throughout the region of interest. A hessian matrix is used as a more systematic approach to stationary point classification, as this is not always possible using the minima, maxima and gradient approaches, which are in common use. A Hessian matrix will take the form shown below:

It is important to note that this matrix is always symmetrical.

The first part of this question is setting up the Hessian matrix of the second partials at . Therefore, we must compute the second order derivatives for the three parameters/ variables as shown below:

It can be seen from the above that the ** own-partials** for both are

**The second partial derivative of is negative becausegreater than.**

*negative.*We know that the determinant of a three-by-three matrix is computed as follows:

Given a 3 X 3 matrix, M as , the Determinant is defined as follows:

Clearly, from the above, the Determinant of the Hessian matrix here will be negative, at .

*Question Four*

Given that the probability of the random variable, X depends on an unknown parameter, , the Information matrix, also known as Fisher’s Information matrix is a way of measuring the amount of information that such a random variable provides about the parameter of interest. The Probability Density Function of X, conditioned on is , which, as defined in the previous section, is the likelihood function for . Fisher’s Information matrix is computed as:

The above expression denotes that we would compute the conditional expectation over with respect to the probability density function, . This value is always greater than zero (non-negative). If the log of is twice differentiable with respect to , with certain factors held constant, Fisher’s Information may be represented as follows:

Fisher’s Information is therefore the negative of the expected value of the second partial derivative with respect to of the natural log of . We know that the second partial derivatives arranged in the form of a square matrix form the Hessian matrix. It is therefore proven that an Information matrix is the negative of the inverse of the Hessian matrix.

*Reference*

- Hogg, R.V., Tanis, E.A. (1996),
*Probability and Statistical Inference*, 5^{th}Edition, Prentice Hall - Hogg, R.V., Craig, A.T., McKean, J.W. (2004),
*Introduction to Mathematical Statistics*, 6^{th}Edition, Pearson