*AAPG '98 Annual Convention, Advances in Geostatistics, Salt Lake City, Utah, May 17-20, 1998.*

PDF File: Pre-formatted paper as presented in proceedings.

William L. Wingle and
Eileen P. Poeter,
Colorado School of Mines,
Department of Geology and Geological Engineering

When using traditional discrete multiple indicator conditional simulation, semivariogram models are based on the spatial variance of data above and below selected thresholds (cut-offs). There are two problems though; 1) the spatial distribution of a threshold can be difficult to conceptualize, and 2) ordering of the indicators may influence the results, unfortunately to change the arbitrary order, to test sensitivity, involves substantial effort. If the conditional simulations instead are based on the indicators themselves (classes), rather than the thresholds separating the indicators, then the spatial statistics are more intuitive, and reordering the indicators becomes a trivial endeavor. When class indicators are used, the indicator order can be switched at any time without recalculating the semivariograms. If thresholds are used, and the ordering is changed, all the semivariograms must be recalculated. A final advantage of using the class approach is that semivariograms calculated from transition probabilities go directly into the simulation. Despite significant differences in the methods, the simulation results are nearly identical, for cases where ordering does not cause differences when using the threshold approach. Given the consistency resulting from the class approach and its ease of use, this approach is preferred.

**Introduction**

In traditional Multiple Indicator Conditional Simulation (MICS), the kriged model results are based on semivariograms describing the spatial distribution of the thresholds between indicators. The affect of the order of the indicators on the resulting realizations is rarely evaluated even though the numerical order is arbitrary. For traditional simulation, the estimated indicator at a location is based on the probability that the location is below each threshold (the number of thresholds equals the number of indicators minus one). A more intuitive approach is based on calculating the probability of occurrence of each individual indicator. This paper presents a technique which uses semivariogram models based on individual indicators (classes), as opposed to the traditional threshold semivariograms which are based on the indicators below a cut-off versus the indicators above the cut-off.

These differences can be described mathematically as follows. Where the data set has been differentiated
into a finite number of indicators, it is possible to define a random function (Z(x)) whose outcomes will have values
in the range z_{min} to z_{max}. From the definition of the indicators, K thresholds can be defined
(K + 1 equals the number of indicators) where:

(1)

(2)

The first moment of the indicator transform yields:

(3)

where E{I(x:z

(4)

Once the classes are defined, the random variable Z(x) can be transformed into an indicator random variable I(x:z

(5)

and the first moment of the indicator transform yields:

(6)

In this case, the univariate probability distribution function (PDF) is defined. By summing the PDF components, the univariate CDF is obtained.

Because the equations to define the class or threshold expectation are fundamentally the same, the class method generates realizations that are equally accurate to threshold realizations, but with three advantages. First, it is easier to conceptually relate the model semivariograms to the spatial distribution of the geologic units. When class semivariograms are calculated, the range reflects the average size of the units, whereas the threshold semivariograms represent the distribution of indicators above or below a threshold and these can be difficult to conceptually equate to units in complex geologic settings. The first and last class and threshold semivariograms will always be identical (they are based on equivalent indicator sets), however the intermediate semivariograms may vary substantially. The intuitive sense for the threshold semivariogram range decreases with an increasing number of indicators, while Class semivariogram ranges, still reflect the average size of the unit. The second advantage to using classes is that sensitivity to indicator ordering can be evaluated without developing additional semivariogram models. If thresholds are used, the full suite of threshold semivariogram models must be recalculated for each reordering. The final advantage is that semivariograms can be calculated for transition probabilities (Carle and Fogg, 1996). The class approach does have several disadvantages: 1) more order relation violations occur, though because of the techniques utilized with thresholds, some of these may simply not be visible, though present, 2) it is computationally more expensive (one additional kriging matrix must be solved per grid cell), and 3) it requires one additional semivariogram model definition. The last two items are only a concern, if ordering sensitivity is not evaluated. If sensitivity to ordering is a concern, preparation for the threshold method requires far more human effort and computer time to develop and analyze the additional semivariogram models.

**Methods**

To use classes, the threshold simulation process is modified at the data definition level, and in the evaluation of the kriged CDF.

**Data Definition**

To calculate a threshold indicator semivariogram, an individual threshold is selected. All values below the threshold are assigned a 1, and values above the threshold are assigned a 0. For class semivariograms, locations with sample values that are in the class being evaluated are set to 1, the remaining values are set to 0.

If imprecise soft data are used (data with non-negligible uncertainty), with associated misclassification
probabilities, the following steps are required. First, the probability that the data correctly, or
incorrectly, reflect the class is defined using misclassification probabilities (p_{1} and p_{2}):

- p
_{1}: Given that the actual value is less than the threshold (in the class), p_{1}is the probability that the measured value is less than the threshold (in the class). - p
_{2}: Given that the actual value is not less than the threshold (not in class), p_{2}is the probability that the measured value is less than the threshold (in the class).

Using indicator thresholds, p_{1} and p_{2} are determined by measuring the
ability of soft information to correctly classify the hard training set data above and below a specified
threshold level. The misclassification probabilities are defined as:

(7)

(8)

(9)

(10)

**FIGURE 1:**Graphical method of calculating p_{1}and p_{2}values for a specific class. Data from CSM Survey Field.

**Difference Between Prior Hard and Prior Soft Data CDF's for Class and Threshold Simulations**

An additional and important difference between class and threshold simulation is the definition and treatment of the difference in the hard data and soft data prior probability distributions. Often, hard and soft data collection techniques suggest different percentages of each indicator occurring at the site. If the simulator uses thresholds, the correction term is based on:

If classes are used, the correction term is based on:

The difference is subtle, but important. For the threshold approach, if the probability of a single threshold varies significantly between the hard and soft data, the importance of the remaining thresholds can be under-valued. Reordering the indicators can alleviate some of this problem. For the class approach, the relative occurrence of each indicator is directly compared, therefore when one class has very different prior hard and prior soft probabilities, it does not seriously affect other class estimates, because the error is not cumulative.

**Order Relation Violations**

As with traditional threshold simulation, the class CDF for a particular grid location
may not be monotonically increasing and may not sum to 1.0. These are order relation violations
(ORV's). They can be caused by use of inconsistent semivariogram models for the different thresholds
or classes, or by use of different prior probabilities and p_{1}-p_{2} weights
applied to soft data. Threshold and class methods manage ORV's differently, due to differences
in how the CDF's are generated, and technical difficulties in reducing the threshold CDF to a PDF.

One type of ORV occurs when the CDF declines from one threshold to the next (Figure 2a). A CDF is a cumulative probability, so a declining CDF is an impossibility. It is not possible to determine which threshold causes the problem, therefore to remedy the situation, the average of the two probabilities is assigned to both thresholds. For classes, the equivalent problem is an individual class having a negative probability of occurrence (Figure 2b: indicator #2), which is also an impossibility. In this case though, it is reasonable to assign that class a zero probability of occurrence. There is no reason to distribute the error to another unrelated indicator.

**FIGURE 2:**For both the class and threshold approach, there are two basic types of order relation violations (ORV). a) An individual CDF probability is less than the CDF of a smaller threshold (the CDF is decreasing); this is equivalent to a class having a negative probability of occurence. This type of ORV is resolved for thresholds by averaging the two CDF's so that they are equal; for classes, a 0.0 probability of occurence is assigned to the PDF. b) When cumulative probabilities are greater than 1.0, the value is truncated to 1.0 for the threshold approach, while for the class approach, the probability for each class is proportionally rescaled, so that the CDF will sum to 1.0.

These techniques for managing class ORV's are less biased then the threshold method. This is fortunate, since the class method also produces more ORV's. However, these additional ORV's are basically ignored by the threshold approach (compare Figures 2a vs. 2c).

**Conclusions**

Class simulation has significant advantages over threshold simulation:

- Class simulation is more intuitive.
- Testing simulation sensitivity to indicator ordering is trivial to setup.
- The last CDF value is calculated rather than implied.
- Class simulation better identifies ORV's, and correctly adjusts the weights.
- Hard and soft data prior probabilities differences tend to be smaller.
- Semivariograms can be calculated from transition probabilities.

- Class simulation yields poorer p
_{1}-p_{2}estimates. - Class simulation requires one additional semivariogram model.
- Class simulation is computationally more expensive.

**Acknowledgments**

We appreciate the United States Army Corps of Engineers, Waterways Experiment Station for supporting this research.

**References**

Alabert, F.G., 1987, Stochastic Imaging and Spatial Distributions Using Hard and Soft Information. Master's Thesis, Department of Applied Earth Sciences. Stanford, Stanford University.

Carle, S.F. and G.E. Fogg, 1996, ``Transition Probability-Based Indicator Geostatistics.'' Mathematical Geology, Vol. 28, No. 4, pp. 453-476.

Gómez-Hernández, J.J. and R.M. Srivastava, 1990, ``ISIM3D: An ANSI-C Three Dimensional Multiple Indicator Conditional Simulation Program.'' Computers in Geoscience, Vol. 16, No. 4, pp. 395-440.

Journel, A.G. and C.J. Huijbregts, 1978, Mining Geostatistics. London, Academic Press.