Equilateral Encoding is a means by which data made up of classes can be normalized into an array of floating point values. Equilateral encoding accomplishes the same goal as one-of-n/one-hot/dummy variables. That is, it encodes nominal values (classes) for input into machine learning. It has been reported to sometimes give better results than one-of-n encoding (Masters, 1993).
Equilateral encoding is somewhat rare compared to the much more popular dummy variable/one-hot encoding methods commonly used today. I frequently used equilateral encoding in some of my earlier works. Because I have not seen a substantial improvement with Equilateral, I generally stick with the more common one-hot encoding for categorical data. Equilateral is generally reported to bring two main features to the table:
• Requires one fewer output than one-of-n
• Spreads the “blame” of error across more neurons than one-of-n
Equilateral encoding uses one fewer output than one-of-n. This means that if you have ten categories to encode, one-of-n will require ten outputs while equilateral will require only nine. This gives you a slight performance boost that might have been more important in 1991 than 2017.
During training, the output neurons are constantly checked against the ideal output values provided in the training set. The error between the actual output and the ideal output is represented by a delta. This limits the amount of neurons that contribute to an incorrect answer for one-hot encoding. We will look at a case where a neural network must deal with a class of size 7. For this example we will normalized between 0 and 1. If the ideal were class one and the actual class two we would have the following:
1 | Ideal Output: [1, 0, 0, 0, 0, 0, 0, 0] |
Only two of the output neurons are incorrect. Yet the entire group of neurons are part of the answer. Equilateral encoding seeks to spread the “guilt” for this error over more of the neurons. To do this, we must come up with a unique set of values for each. Each set of values should have an equal Euclidean Distance from the others. The equal distance makes sure that incorrectly choosing class 0 for class 6 has the same error weight as choosing class 1 for class 3. Equilateral encoding produces a lookup table of (n-1) values for every (n) of classes. For example, the encoding table for this 7-class example is as follows:
- Class #1: [0.118 0.28 0.34 0.38 0.40 0.41]
- Class #2: [0.88 0.28 0.34 0.38 0.4 0.42]
- Class #3: [0.5 0.94 0.34 0.38 0.4 0.42]
- Class #4: [0.5 0.5 0.97 0.38 0.40 0.42]
- Class #5: [0.5 0.5 0.5 0.98 0.4 0.42]
- Class #6: [0.5 0.5 0.5 0.5 0.99 0.42]
- Class #7: [0.5 0.5 0.5 0.5 0.5 1]
Applying this to the example above, we now have the following.
1 | Ideal Output: [0.118, 0.28, 0.34, 0.38, 0.40, 0.41] |
Unlike the one-hot encoding example above, more neurons are now incorrect.
I originally learned of equilateral encoding from the following two sources. Guiver is the creator of the algorithm. I have never been able to find Guiver’s original article. I learned of this method from Masters. To see how to calculate the values for equilateral encoding, you can see my Javascript example.
- Masters, T. (1993). Practical neural network recipes in C++. Morgan Kaufmann.
- Guiver, J. P., & Klimasauskas, C. C. (1991). Applying neural networks, Part IV: improving performance. PC AI Magazine<, 5, 34-41.