The neural network provides a well founded
similarity measure based on information-theoretical principles that
allow the comparison of documents according to their content.
The proximity of the documents implies a high level of similarity
(and vice versa). In mathematical terms, the similarity measure is given
by the weighted scalar product of the two vectors, corrected by the
Kullback-Leibler distance from the main themes, combined with the weighted
score sum of the matching keywords and their nodes in the taxonomy tree,
respectively.
The patented similarity measure is independent of the document
language and is only somewhat dependent on the exact wording. It allows
InfoCodex to recognize document families, i.e. documents very similar
in content to each other, but that do not necessarily share the same
keywords or expressions. This is also the fundamental process for automatic
generation of abstracts.
The similarity measure is a solid basis for