Main Definitions

Tokens separator

The character that is used in text for splitting tokens (the most common character that is used for this purpose is space)

Text separators

Characters that are used for logical separation of the text. Some of the characters are used for text separation into sentences, while other characters are used for sentences separation into logical parts. Example of such characters: ,.!/?:;"'-()[]

Text separators groups

All text separators divided into 2 groups: Group1 and Group2

Group1

This group contains separators that are usually used for splitting text into sentences. Most common characters in this group: .!? . Separators of this group are not always (but usually) used for separation of text into sentences, for example: “dear Mr. Max”; the dot in the example is not used for sentence separation. In this particular case the dot is used as a separator from Group2

Group2

This group contains separators that are usually used for splitting sentences into logical sub-sentences. Most common characters in this group: ;:,-"'() . Separators of this group are not always (but usually) used for separation of sentences into sub-sentences, for example: “- Hi - Hello”

Sentence

Sequence of tokens that contains semantic piece of information and surrounded by text separators. Sentence can include text separators. Usually sentence is surrounded by text separators of Group1 and includes text separators from Group2.

Word

Consists of an approximated root token and a list of tokens with similar forms

Fact

Fact is a sentence, that contains one or more proper nouns

Initial text processing

Process that transforms plain text to the:

Semantic text processing

Process that transforms result of the initial text processing to the::

• graph of the semantic connections between the words from the text;
• list of the facts that exists in the text;
• list of the proper nouns that exists in the text;
• graph of the connections between proper nouns in the text;