The selection of the class should be independent of its order in the reference data.
The following code at inner_procedure.r#L249
index <- which.max(dif[2,])
Will pick the first class which has the maximum value. That means the order in which the classes are define in the reference data will dictate which class is selected. We have printed the values for each run and we have seen the following:
[1] "The difference between classes:"
[1] "Deciduous_trees was 0.375"
[1] "Bare_ground was 0.3"
[1] "Xeric_grass was 0.3"
[1] "Fesh_meadow was 0.357142857142857"
[1] "Heather_mature was 0.375"
[1] "Heather_pioneer was 0.409090909090909"
[1] "Heather_scrubby was 0.409090909090909"
As we can see Heather_pioneer
will be selected, but if in the reference data the order was different Heather_scrubby
would be the one selected. We think in case there is more than one class with the max value, then the user should re-sample. Would that be correct?
Related to this we still have few questions:
-
We would like to understand what these value mean? Is it a percentage?
-
In some cases the decision is done at the 5 decimal digit. It also happens that this values are all below
0.4
and we have never seen them above0.5
. We wonder if the information above should not be shared with the user to assist the user in deciding either to accept the class selection or do a re-sample? This question depends on the meaning of the values, which is question 1.