Turning Information
... into Knowledge
Our Solutions Semantic Technologies Support Partners About
Distributed Sources Cross-lingual text analysis Content recognition and categorization Content similarity Semantic and similarity search Abstracts Generation Advanced visualization Privacy and security Semantic Web Customer benefits
Contact English Deutsch Franšais
 

Content similarity


Heat map showing the "hot spots"

The neural network provides a well founded similarity measure based on information-theoretical principles that allow the comparison of documents according to their content.

The proximity of the documents implies a high level of similarity (and vice versa). In mathematical terms, the similarity measure is given by the weighted scalar product of the two vectors, corrected by the Kullback-Leibler distance from the main themes, combined with the weighted score sum of the matching keywords and their nodes in the taxonomy tree, respectively.

The patented similarity measure is independent of the document language and is only somewhat dependent on the exact wording. It allows InfoCodex to recognize document families, i.e. documents very similar in content to each other, but that do not necessarily share the same keywords or expressions. This is also the fundamental process for automatic generation of abstracts.

The similarity measure is a solid basis for

  • the similarity search: find documents similar to a given text block
  • a reliable ranking of search results: similarity between documents and a search query
  • recognition of new facts: notable differences between new and existing documents
  • matching documents similar in content

News

26. - 27.09.2016 Keynote at Big Data Analytics and Data Mining, London

14.03.16 WebPlatform Ares4 with InfoCodex presented at CeBIT 2016 Hannover

16.09.15 Presentation at SEMANTICS Vienna 2015 "Scientific Discovery by Machine ..." (Session 1.4)

02.06.2015 Presentation at Swiss Competitive Intelligence

26.02.13 Semantic Tech Turns up Biomarkers

13.02.13 Discovery of Novel Biomarkers (joint paper with Merck & Co)

Search: