Today is the first day of work with Alpha4. One of the main improvements that we’ve done is the way how we tune AIF settings. AIF has a lot of different variables in settings and we need to prove which value is best for quality.
note: in this post we left measuring quality process out of our scope. Quality measurement is the question of other article.
Here is config example of current revision:
1 2 3 4 5 6 7 8 9 10 |
|
each of these variables can impact quality. So we need to have a way of building function that correlates each of these variables with output quality. In the perfect world with pony we should have the one big function that correlates all variables from settings with the quality. Unfortunately building the one global function is hard task. So we decided to build function for each variable independly to see how this particular variable impacts the quality.
Today we will show the small step that was done with the “splitter_characters_grouper_search_step” variable.
Our experiment is very simple:
- execute a lot of quality tests with different “splitter_characters_grouper_search_step” values;
- plot the results;
- build the function that based on the data from first step;
- find global extremum of the function and set configuration to that extremum.
So, basically, we need to find value of “splitter_characters_grouper_search_step” that gives the best quality.
Executing a lot of quality tests
Done with the simple implementation (may be refactored in future):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
This code will generate Python array with data. Using this we can use Python for visualizing data to see the shape of the function.
note: this variable was chosen as the first to be analyzed because we expecting it to has simple function without local extremums at all with 1 global extremum that we are searching for.
Visualizing data with Python
After we have all the data we can plot it to see how quality chart is looks like. Here is Python code that is doing exactly this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
result of execution is:
Next steps
Now we can try to build function that represents the plot. It will give us the answers, can we improve quality by reducing value even more or we already reach the limit of this function. For this we will have to find lim(f(x)) of our function.
We will create a new blog entry when we have new data to show!