Dealing with Classified Data
Many data sources are numeric or alpha characters representing
various classification schemes or groupings. Classified data
requires special care when choosing the method of data aggregation.
Treatment as nominal or ordinal
In most cases, classified data should be treated as nominal or
ordinal data, using PARS to produce a table of unique class names
and the percentage of each class which intersects with the target
polygon. For example, when population data has been classified into
ranges of values, rather than actual count data, it is appropriate
to use ordinal interpolation, which would result in a table related
to the target polygon coverage listing each source population class
that intersected with the target polygon and its percentage of the
target.
Conversion to count values
In cases where numeric values are required in the target polygons,
count values can be substituted for classified data, either before
or after the interpolation process. This approach is problematic
because substantial errors can be introduced to the dataset.
One method of substituting count values for a class range is to
use the midpoint of a class. For example, a class range of "100 to
200 persons" might be represented by the count value 150. The
difficulty with this approach arises when dealing with the extreme
high and low classes. What value should be used to represent the "0
to 99 persons" class? Many areas in Canada have 0 persons, and this
may be a better value to substitute for this class than 50. It is
equally difficult to determine an appropriate count value for the
class "greater than 200 persons", and a substitute must be selected
with some knowledge of the data.
Source: modified from Ballard and Schut, 1995 by
Peter Schut
Contact: Head, CanSIS