Why does InfoCodex need a linguistic database?
The linguistic database in InfoCodex contains over three million classified words/phrases linked to a universal taxonomy (ontology). This forms the basis for the cross-lingual content recognition ("understanding the content of a document"), and for the automatic document categorization by the integrated neural network technology ("grouping of the documents in well-organized bookshelves").
It is chiefly the linguistic database that enables InfoCodex to automatically match documents according to their content and to categorize these documents without any cumbersome training.
How is the linguistic database kept up-to-date?
InfoCodex' linguistic database is based on some 100 major sources such as the WordNet of the Princeton University, the EuroVoc of the EU, the AgriVoc of the United Nations. It is continuously updated from these sources, and it is also extended with names of new celebrites and with new brands.
Can users supply their own vocabularies and their own thesaurus?
Yes, they can. For example, a German manufacturer of electronic components has easily integrated a structured list of 50,000 parts in the form of a front-end linguistic database; this list will have priority over the standard InfoCodex linguistic database.
However, in most cases, neither special vocabularies nor a thesaurus are needed. Adding a thesaurus with 8,000 words to InfoCodex is not an improvement from 0 to 8,000 words, but merely an increase from the already existing 3,000,000 words/phrases to 3,008,000, which represents an increase of less than 0.3%.