Lognormal Distribution

Lognormal distributions (with two parameters) have a central role in human and ecological risk assessment for at least three reasons. First, many physical, chemical, biological, toxicological, and statistical processes tend to create random variables that follow Lognormal distributions (Hattis and Burmaster, 1994).

For example, the physical dilution of one material (say, a miscible or soluble contaminant) into another material (say, surface water in a bay) tends to create non equilibrium concentrations which are Lognormal in character (Ott, 1995; Ott, 1990). Second, when the conditions of the Central Limit Theorem obtain (Mood, Graybill, and Boes, 1974), the mathematical process of multiplying a series of random variables will produce a new random variable (the product) which tends (in the limit) to be Lognormal in character, regardless of the distributions from which the input variables arise (Benjamin and Cornell, 1970). Finally, Lognormal distributions are self-replicating under multiplication and division, i.e., products and quotients of Lognormal random variables are themselves Lognormal distributions (Crow and Shimizu, 1988; Aitchison and Brown, 1957).

The Standard Normal Distribution

Since the lognormal is based on the normal distribution actually every property of the lognormal can be derived from the properties of the normal distribution.

A random variable Z is said to have the standard normal distribution if it has the probability density function φ given by

φ(z) = exp(−z2 / 2) / [(2π)1/2] for z in R.

The normal distribution with location parameter μ in R and scale parameter σ > 0 has probability density function f given by



f(x) = exp[−(x − μ)2 / (2σ2)] / [(2π)1/2σ], for x in R.

The Lognormal Distribution

A random variable X is said to have the lognormal distribution, with parameters μ and σ, if ln(X) has the normal distribution with mean μ and standard deviation σ. Equivalently,

X = exp(Y)

where Y is normally distributed with mean μ and standard deviation σ. While the parameter μ can be any real number the parameter σ must be a positive real number. The lognormal distribution is used to model continuous random quantities when the distribution is believed to be skewed, such as certain income and lifetime variables.

The lognormal density function, with parameters μ and σ, is given by


f(x) = exp{−[ln(x) − μ]2 / (2σ2) }/ [x (2π)1/2 σ] for x > 0.

The parameter μ is the mean and σ is the standard deviation of the distribution for the normal random variable ln[X], not the lognormal random variable X. Although sometimes confusing, μ is also the median of the normal random variable ln[X] because μ is the median of N(μ, σ).Equation represents the lognormal random variable X in "logarithmic space." The random variable ln[X] follows a normal distribution, but the random variable X follows a lognormal distribution.


Consider that the difference of the normal distribution pdf to the the lognormal pdf is not only the replacement of x by ln(x) but also an additional x factor in x (2π)1/2 σ due to the change of variables from X to ln[X].

The moments of the lognormal distribution can be computed from the moment generating function of the normal distribution.
For a lognormal distribution with parameters μ and σ it follows

E(Xn) = exp(nμ + n2σ2 / 2).

The mean and variance of X are

  1. E(X) = exp(μ + σ2 / 2).

  2. var(X) = exp[2(μ + &sigma2)] − exp(2μ + σ2).

The median is exp(μ) and the mode is exp(μ – σ2 ), see Appendix.

Even though the lognormal distribution has finite moments of all orders, the moment generating function is infinite at any positive number. This property is one of the reasons for the fame of the lognormal distribution.

The lognormal distribution arises from many small, multiplicative random effects, in contrast to additive random effects that lead to the normal distribution. It is used extensively in reliability applications to model failure times. The lognormal and Weibull distributions are probably the most commonly used distributions in reliability applications.

The lognormal is skewed to the right. For a given μ the skewness increases as σ increases.

HOW IMPORTANT THIS IS YOU CANNOT IMAGINE FOR CANCER PATIENTS!


Example of lognormal pdfs, the parameter x here is the patient survival time T > 0.

Caution!

Some use instead the natural logarith the base 10 logarithm for example because they use logarithmic paper to derive the parameters graphically. i.e. they assume that the log10(X) is normally distributed. Then the parameters are μ10 and σ10 which are related to μ and σ by the relations:

μ = ln(10) μ10

σ = ln(10)σ10

Example: Describing survival of cancer patients observed in a truncated time interval.

We consider a medical example with cancer patients. We start the observation at T0 and end it at TN. In intervals we record the number of patients who died. We assume that the integral (number of patients) in the truncated range is the same for the experimental and theoretical. Assume that T0 is from 0 to infinite. Then the integral is the number of all patients who were considered in the study, both theoretical and observed number should be the same. The only parameters to be found are σ and μ.

We consider a real case with cancer Patients (Mesothelioma). We have data set in form of a table such as 10 Patients died in the intervall from 5-15 weeks 8 in the interval 15-25 weeks etc or in tabular form for nT times ..

T

Cases

5-15

10

15-25

8

25-35

9

35-50

9

50-80

10

80-200

10



Is this described by a lognormal distribution? The purpose of the work is to provide a baseline for comparison. We are interesting to to compare the results of this this dataset with another set that considers patients who take a new drug that is supposed to „extend their lifetime”.

We transform T values to x values for normal distribution to calculate the cumulative value.

x = ( log(T) – μ)/σ;

The polynom f(x), see Appendix, is provided by Abramowitz Stegun but works only for positive x. For negative x we have to use 1-f(-x).

We use a function say Pint(x) to calculate the cummulative probability from -infinite to x for a normal distribution N(0,1), which from -infinity to +infinity goes from 0 to 1 and due to the symmetry f(0) = 0.5.

double PInt( double x)

{

// Abramowitz Stegun

double d1 = 0.0498673470;

double d2 = 0.0211410061;

double d3 = 0.0032776263;

double d4 = 0.0000380036;

double d5 = 0.0000488906;

double d6 = 0.0000053830;

double Pval;


// Only valid for positive values.. else use relation P(-x) = 1 - P(x)

double y;

if(x < 0.0)

y = -1.0*x;

else

y = x;


// Use Horner Schema for the polynomial f(x)

Pval = 1.0 + y*(d1 + y*(d2 + y*(d3 + y*(d4+y*(d5 + y*d6)))));

Pval = 1.0 - 0.5/pow(Pval, 16);

if ( x < 0.0) Pval = 1.0 - Pval;


return Pval;

}


Now here we calculate the Chi2 and the probability for the dataset for a given σ and μ parameter. We assume that a method is used to minimize this Chi2.


double SumE = 0.0;

double SumO = 0.0;

double SumChi2 = 0.0;


// Calculated Integrals in ranges R and expectations E

for(i = 0; i < nT-1; i++) {

RVal[i] = PxVal[i+1]-PxVal[i]; (calculate x and Pint() for the boundaries of the time interval,... i.e get integral in this time range)

EVal[i] = NFitCases*RVal[i]/(PxVal[nT-1]-PxVal[0]); // Expected value in the i.th Time interval

}


SumChi2 = 0.0;

for(i = 0; i < nT-1; i++) {

Chi2 = pow((ObsVal[i] - EVal[i]), 2)/EVal[i];

SumChi2 += Chi2;

SumE += EVal[i];

SumO += ObsVal[i]; // These are the observed values from the table

}


For the particular case we obtain from the minimization algorithm (not provided here)

Minimum for lognornal parameters μ = 3.605840 , σ = 0.970540

Sum Chi**2 = 0.586 // Chi2

Sum E = 56.000 // Expected Sum

Sum O = 56.000 // Observed Sum

DOF (Degrees of Freedom) = nT-4 =3 (nT-1 ranges and 2 parameters)

Probability ==0.899604 for this Chi2 and DOF


The results show that the survival times of patients can be modeled nice by the lognormal distribution.


We calculate the probability P that a patient will survive a given time in weeks. We use 1 – Pint(x) to find the fraction of patients still alive after a time T


T (Weeks)

20

30

40

50

60

70

80

90

100

120

140

P

0.7352

0.5835

0.4659

0.3762

0.3074

0.2539

0.2119

0.1785

0.1516

0.1117

0.0844


The results in the table shows that after 20 weeks (or 3 months) around 26% of the patients will die and only 8% approximately will survive 140 weeks (2.7 years).



References


Abramowitz, M. and Stegun, I.A., Eds. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards, Applied Mathematics Series Number 55, Issued June 1964, Tenth Printing with corrections in December 1972, US

Government Printing Office, Washington, DC.

Aitchison J and Brown JAC, 1957. The lognormal distribution, Cambridge University Press, Cambridge UK.

Benjamin, J.R. and Cornell, C.A. 1970. Probability, Statistics, and Decision for Civil Engineers, McGraw Hill, New York, NY.

Crow EL and Shimizu K Eds, 1988. Lognormal Distributions: Theory and Application, Dekker, New York.

McAlister D, 1879. Proc. Roy. Soc. 29, 367

Limpert E, Stahel WA and Abbt M, 2001. Lognormal distributions across the sciences: keys and clues. Bioscience 51 (5), 341-352

R F Mould, M Lederman, P Tai and J K MWong, Methodology to predict long-term cancer survival from short-term data using Tobacco Cancer Risk and Absolute Cancer Cure models, Phys. Med. Biol. 47 (2002) 3893–3924. (Note the typing error in page 3901 d6 = 0.0007 005 383 0 should be d6 = 0.0000 05 383 (remove the 7)

Bernard Asselain, Yann De Rycke and Alexia Savignon, Richard F Mould, Parametric modelling to predict survival time to first recurrence for breast cancer, Phys. Med. Biol. 48 (2003) L31–L33

R.F. Mould, M. Lahanas et al, Lognormal modelling of malignant pleural mesothelioma reference baseline survival rates: a study of 5563 cases, submitted for publication February 2004, (Table of Contents)

Hattis, D.B. and Burmaster, D.E. 1994. Assessment of Variability and Uncertainty Distributions for Practical Risk Assessments, Risk Analysis, Volume 14, Number 5, pp 713 – 730.

Ott, W.R. 1990. A Physical Explanation of the Lognormality of Pollutant Concentrations, Journal of the Air and Waste Management Association, Volume 40, pp 1378 et seq.

Ott, W.R. 1995. Environmental Statistics and Data Analysis, Lewis Publishers, Boca Raton, FL.

Mood, A.M., Graybill, F.A., and Boes, D.C. 1974. Introduction to the Theory of Statistics, Third Edition, McGraw Hill, New York, NY.


Appendix


Polynomial approximation P(x)

Polynomial approximation P(x) for cumulative normal distribution that can be used also for the calculation of the cumulative probability of the lognormal distribution but only for positive x values. P(x) = f(x) if x is positive or 0, else P(x) = 1-f(-x).

f(x) = 1 - 0.5*( 1 + d1 x + d2 x2 + d3 x3 + d4 x4 + d5 x5 + d6 x6 )-16 + eps(x) , abs(eps(x)) < 1.5*10-7

d1 = 0.0498673470
d2 = 0.0211410061
d3 = 0.0032776263
d4 = 0.0000380036
d5 = 0.0000488906
d6 = 0.0000053830

Consider that polynom is valid only for positive x values!

mean
The sum of a list of numbers, divided by the total number of numbers in the list. Also called arithmetic mean

median
"Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two. The median can be estimated from a histogram by finding the smallest number such that the area under the histogram to the left of that number is 50%

mode
For lists, the mode is the most common (frequent) value. A list can have more than one mode. For histograms, a mode is a relative maximum ("bump")


Lognormal Generalization with 3 parameters


f(x) = exp[−[ln(x-a) − μ]2 / (2σ2)] / [(2π)1/2σ(x-a)], for x >a (0 else)

Parameters

  • Location parameter: a, real number (Shift)

  • scale parameter : σ> 0

  • shape parameter : μ real number


x in [a, infinite]


  • Mode: a + exp(μ-σ2)

  • Median: a + exp(μ)

  • Mean: a + exp(μ+σ2/2)

  • Variance: exp(2μ+σ2)[exp(σ2 ) -1]



Generation Algorithm

Generate normal random number r = N( μ,σ2)

Return a + exp(r)

Contact - Search - Statistics