Problem 6: Clustering
A psychologist tests N different individuals. The score for each individual is a pair of
real numbers, x and y, that the psychologist treats as
points in a plane. The psychologist
then wants to separate the individuals into at least 2 groups, with at least 2
individuals in each group, based on the proximity of their scores. The decision is made to use the Euclidian
distance between two scores as a measure of their proximity. That is, for two scores (x1,y1)
and (x2,y2), their proximity is defined by the value of
the expression
.
To place the individuals in groups, the psychologist decides to
use the following steps. (1) The two
individuals whose scores are closest together are placed in group 1; call them A
and B. (2) The next two individuals
(not in group 1) with the closest scores are placed in group 2; call them C and
D. (Clearly at least four individuals
must be tested.) (3) The next ungrouped
individual, called E, to be included in a group is the one closest to any
individual already in a group or any other ungrouped individual. If E is closer to A or B than to C, D, or
any other ungrouped individual, then E joins group 1. Likewise, if E is closer to C or D than to A, B, or any other
ungrouped individual, then E joins group 2.
Finally, if E is closer to another ungrouped individual, say F, than A,
B, C, or D, then a new group is formed from E and F. If the shortest distance to an existing group is within 0.001 of
the shortest distance to an ungrouped individual, or doesnt uniquely identify
the group in which E should be placed, then E is placed in the existing group
created earliest that has a member closest to E. This step (3) is repeated for all remaining ungrouped
individuals.
Input
There will be multiple
cases. For each case there appears an
integer N specifying the number of individuals tested; N will be
at least 4, but no larger than 100.
There will then follow N pairs of real x and y values, one for
each individual. The first pair is for
individual 1, the second pair is for individual 2, and so forth.
Output
For each case, first
display the case number (theyre numbered sequentially starting with 1) on a
line by itself. Then display the
identity of the individuals in each group, one line per group. Use the format shown in the samples
below. Specifically, output lines must
be no longer than 79 characters, indentation (as shown in the samples) is
required, and commas are required between the individual identity numbers.
Sample Input
4
0.0 0.0 0.0 3.0 4.0 0.0 3.0 5.0
6
0.0 0.0 0.0 3.0 4.0 0.0 3.0 5.0 2.0 6.0 3.0
1.0
25
1 0 1 1 100 100 100 101 1 2 1 3 1 4 1 5 1 6
1 7 1 8 1 9
1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18
1 19 1 20 1 21 1 22
0
Expected Output
Case 1:
Group 1: 1, 2
Group 2: 3, 4
Case 2:
Group 1: 3, 6
Group 2: 4, 5
Group 3: 1, 2
Case 3:
Group 1: 1, 2, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25
Group 2: 3, 4