Dealing with Classified Data

Many data sources are numeric or alpha characters representing various classification schemes or groupings. Classified data requires special care when choosing the method of data aggregation.

Treatment as nominal or ordinal

In most cases, classified data should be treated as nominal or ordinal data, using PARS to produce a table of unique class names and the percentage of each class which intersects with the target polygon. For example, when population data has been classified into ranges of values, rather than actual count data, it is appropriate to use ordinal interpolation, which would result in a table related to the target polygon coverage listing each source population class that intersected with the target polygon and its percentage of the target.

Conversion to count values

In cases where numeric values are required in the target polygons, count values can be substituted for classified data, either before or after the interpolation process. This approach is problematic because substantial errors can be introduced to the dataset.

One method of substituting count values for a class range is to use the midpoint of a class. For example, a class range of "100 to 200 persons" might be represented by the count value 150. The difficulty with this approach arises when dealing with the extreme high and low classes. What value should be used to represent the "0 to 99 persons" class? Many areas in Canada have 0 persons, and this may be a better value to substitute for this class than 50. It is equally difficult to determine an appropriate count value for the class "greater than 200 persons", and a substitute must be selected with some knowledge of the data.

Source: modified from Ballard and Schut, 1995