(1) Synthesis data:
A data set of 4000 frames was created with the last 2000 images being exactly the same as the first 2000 images. Each frame has pixels of 48*64 and it is created as below to simulate different changing rate: the first image is created by calling a random function which produce random numbers between 0 and 1; the second image is created by copying the first image and randomly selected N pixels from them to assign a different value which has a maximum difference to its original value in the range between 0 and 1. The N controls the changing rate and it is assigned 80,200,500 and 120 respectively for the frame range [0 400], [400 1200], [1200 1600], [1600 2000].
We test the data set with different segments and as shown in the Figure 1, the smaller segment size produce almost perfect matching results on this data set but the performance with larger scale deteriorates in the range [1200 1600]. That is where we assign the largest between frame differences: 500. This may be explained that in an environment where the frame changes largely, a smaller scale segment can produce better recognition results. However, in the examples below, we do not observe that larger scale segment can dominate smaller scales in less changing area, for example in the range [0 400] where between frame difference is only 80. A closer examination reveals that although the subsequent segments are similar, the normalization and PCA steps will magnify these differences and the similarities between frames are not kept anymore; therefore, even in smoothly changing area, the learning model with small segment size is still able to produce distinguishable enough model to separate itself with other similar segments. I constrain this to the extreme situations where between frame difference is only 1 or 10. As shown in Figure 2, we can observe that both large and small segment models produce poor results in the middle range but from which we cannot still discuss the advantage of large segment on smoothly changing area. Future ideas to solve this:
(1) Test different between-frame differences to see whether there is a range in which large segment model can produce significantly better results on small segment models.
(2) Try other dimension reduction methods which can better keep the similarity between frames. Currently, PCA is used which can only produce linear dimension reduction.
Figure 1. Top: segment recognition results with 120 segments (smaller segment size). Bottom: segment recognition result with 30 segments (larger segment size).
Figure 2 Top: segment recognition results with 120 segments (smaller segment size). Bottom: segment recognition result with 30 segments (larger segment size).
Most of the work has also been spent on combining the results from different scales. A recent experiment indicates that the combination do improve the recognition rate and help to filter out some false positives. The details are as below:
When testing the algorithm on real world data set, most of the results reveal that the larger segment can produce better recognition results, possibly due to the fact that larger segment contains more frames within one segment for training the model which helps to reduce the interference from other similar segments. Considering this, the current combination strategy is as below: when combining, we only consider the results from larger segment size and we only trust current segment recognition result when it is consistent with the results from larger segment. For example, we test our algorithm on a data set using 20, 30, 50 and 70 segments (note: more segments means smaller segment size, because the length of the data set is the same). For results of 50 segments, each segment recognition result will produce a ranked list indicating the most possible to least possible candidate. By normalizing the results from 20, 30 and 50 to a uniform scale, we only keep the candidates that all appear in the top M of the recognition list from 20, 30 and 50 segments (M is 3 for 20, 4 for 30 and 7 for 50 segments). For the results of 70 segments, it has to be consistent with the results of 20, 30 and 50. The constrain is strict but we can still observe clear improvement after combination from Figure 3. Figure 3 shows the results from the data set Rowrah: ( http://www.youtube.com/watch?v=_UfLrcVvJ5o).
I will move on to investigate the match distribution of each segment size. Also try other combination strategy, for example, we do not need the current recognition result to match all of the previous segment size but only match any one of them and if there are multiple candidate, we simply choose the one with highest possibility
Figure 3. Recognition histogram comparison before and after combination.Left: results using results from one segment size. Right: results using combination with other larger segment size. X axis indicates the index of the correct matching result (ground truth) in the ranked list and y axis indicates the number of that index. For example, x = 1 has y value of 35 which means 35 segment recognition results rank the correct index at its most possible candidate. In the right figure, x = 0 means no coherence between different segment size is found.
A better visualization comparison results are shown in Figure 4 where in each picture, the top figure (with red segment) indicates the matching index after combination and the bottom figure indicates the result with individual recognition result.
Figure 4 Top Results about combination. Bottom: Individual recognition matching index
It is shown from the top picture of Figure 4 that in the case of segment 120, the results after combining other scales improve a lot compared to using only the results with scale 120. A closer investigation confirms this improvement and show that because of the good "guidance" of the results from larger scale (the larger scale produce much better estimation), it helps to filter out many other false estimations in scale 120. So the noisy results from a smaller scale can be improved by combining the results from larger scale.