This page describes the functions that AIF2 can perform with tokens.
There are 2 main functions that can be used in AIF2 about tokens:
- extract tokens splitter characters from the text;
- split the text into tokens.
This function gives you a possibility to extract the token separators list from the input text. This function should be used by using interface: ITokenSeparatorExtractor (from package: com.aif.language.token). If you want to create an instance of this interface, you need to select the type of TokenSeparatorExtractor and get instance, like this:
For the difference between separators types see the section below
Token Separators Extractor Types
This extractor has predefined characters that will be used for token splitting. It means that the extractor type will not parse the text in any way and will just return predefined characters.
If you want to split tokens, you need to:
- create a TokenSplitter instance(from package: com.aif.language.token);
- call the “split” method.
You can initiate it in 2 ways:
- setting Token Separators Extractor Type;
- using default Token Separators Extractor Type.
1 2 3
This will create tokenSplitter that will use tokenSeparatorExtractor for splitting the text into tokens. Also, you can create TokenSplitter with default ITokenSeparatorExtractor like this:
By default it will use this ITokenSeparatorExtractor.Type.PREDEFINED.getInstance() token separator extractor.
Splitting the text with TokenSplitter
After you have a TokenSplitter instance, you can split the text by calling the “split” method like this:
1 2 3 4 5
Here you can find usage example.