I trained an ANN (two deep perceptron) which takes a sequence of characters, and two possible next characters, and produces a number on [0,1] representing which of the given two possible next characters is most likely. Running the net on every combination of characters and doing statistics on the results yields the most likely next character after the input sequence. This is an example of ordinal probability, and is nice itc because it doesn't necessitate a finite number of discrete outputs; you could potentially have unlimited possible outputs. This is in contrast to using softmaxed vectors (as explicit probability vectors), which limit you to N possible outputs
Anyway, it seemed to perform better than a straight softmaxed vector predictor with the same training time. So it might be something to look into more