Computer aided diagnosis

We can define computed-aided diagnosis (CAD) as the diagnosis a physician makes using output from a computerized analysis of medical data. Such information could be the malignancy likelihood from a breast mammogram or data obtained by ultrasound or MRI imaging.

Multiple features are used to classify an observation as normal or abnormal. For example the size, shape and margin sharpness of a potential breast lesion in a mammogram are used to determine whether a cancer is present.

The goal in training a diagnostic classifier is to employ a limited dataset to determine the parameters of the classifier such that it approximates the likelihood ratio decision rule. A dataset of features from both normal (without disease) and abnormal (with disease) datasets is used for “training” the classifier i.e. determining the classifier parameter values so that it correctly classifies other datasets of unknown pathology.


The training of a classifier can be viewed as an optimization problem where the quantity to be maximized is the performance of the classifier on an independent dataset. Binary classifiers [1] consider two objective functions:


* The sensitivity describing how well they classify the abnormal cases.


* The specificity describing how well they classify the normal cases.


There is a trade-off between these two objective functions and it is not always possible to simultaneously improve both the sensitivity and specificity. Traditional methods of classifier training combine these two objective functions, or two analogous class performance measures, into a single scalar objective function optimized by single objective optimization techniques. Various combination functions are tried until a suitable objective function is found [2]. Most classifiers do not aggregate sensitivity and specificity directly such as artificial neural networks (ANN) that use a sum-of-squares error function [3].

A binary classifier separates two classes of observations (e.g. images) and assigns new observations to one of the two classes the normal (no disease evident) and abnormal (indicative of disease) class, denoted by pn and pa, respectively.

Certain characteristics of the observations, called features, are used in making the classification decision. The set of features corresponding to an observation can be expressed as a vector x = [x1,x2,…,xp]. A dataset of known pathology, the so-called training dataset, is used for the training of the classifier. The [x1, x2] space spanned by the feature vector is denoted by S. An automated classifier uses a parameter vector w to partition this space into the sets Cn(w), the set of observations that belong to class pn and Cn(w), the set of observations belonging to class pa. The vector w can represent, for example, the weights of an ANN or the threshold values in a rule-based classifier. For a fixed w, Cn(w) Ca(w) = S and Cn(w) Ca(w) = 0.

Given a measurement x, the classifier assigns x to class pn if x e Cn(w) or to class pa if x Î Ca(w).

The probability that an observation belonging to class pa is correctly classified is referred to as the sensitivity of the classifier, denoted by Sens(w). Similarly, the probability that an observation is correctly classified as belonging to class pn is referred to as the specificity Spec(w) of the classifier. Both the sensitivity and specificity of the classifier depend explicitly on the choice of and implicitly on the underlying distribution of the normal and abnormal observations. The sensitivity is a measure of how well the classifier performs on abnormal cases, whereas the specificity is a measure of how well a classifier performs on normal cases. In practice, the fraction of class pa observations that are correctly classified is used as an estimate of Sens(w). Likewise, the fraction of class pn observations that are correctly classified is used as an estimate of Spec(w).


For MO diagnostic classification the members of the Pareto-optimal set correspond to operating points on an optimal receiver operating characteristic (ROC) curve, whose performances describe the limiting sensitivity–specificity tradeoffs that the classifier can provide for the given training dataset.


See some collected facts about ROCS


References


1 L. Devroye, L. Györfi and G. Lugosi, A probabilistic Theory of Pattern Recognition, New York: Springer Verlag 1996


2 M. A. Anastasio, H. Yoshida, R. Nagel, R. M. Nishikawa and K. Doi, A genetic algorithm-based method for optimizing the performance of a computer-aided diagnosis scheme for detection of clustered micro-calcifications in mammograms, Med. Phys. 25 1613 1998


3 C. Bishop, Neural networks for pattern recognition, Oxford UK, Oxford Univ. Press 1995.

Google

Web

www.mlahanas.de


BACK