set.seed is always called before a sample() function reducing the sampling randomness
The parameter seed
can be changed by the user to get different sampling values. HaSa uses the parameter seed
to set a new seed the following 2 code blocks:
if (abs(di[1] - di[2]) > min(di) * 0.3) {
if (which.min(di) == 2) {
set.seed(seed)
d3 <- sample(which(pbtn1@data$nam == 1), di[1] - di[2], replace = F)
pbtn1 <- pbtn1[-d3, ]
test1 <- test1[-d3, ]
} else {
set.seed(seed)
d4 <- sample(which(pbtn1@data$nam == 2), di[2] - di[1], replace = F)
pbtn1 <- pbtn1[-d4, ]
test1 <- test1[-d4, ]
}
}
if (sum(pbtn1@data$nam == 1) > max_samples_per_class) {
set.seed(seed)
dr <-
sample(which(pbtn1@data$nam == 1),
sum(pbtn1@data$nam == 1) - max_samples_per_class,
replace = F)
pbtn1 <- pbtn1[-dr, ]
test1 <- test1[-dr, ]
}
if (sum(pbtn1@data$nam == 2) > max_samples_per_class) {
set.seed(seed)
dr <-
sample(
which(pbtn1@data$nam == 2),
sum(pbtn1@data$nam == 2) - max_samples_per_class,
replace = F
)
pbtn1 <- pbtn1[-dr, ]
test1 <- test1[-dr, ]
}
By having set.seed(seed)
just before a sample()
function the return values are always the same for a function call. Is this desired? Why is the seed
set so many times? It might help to reproduce results, but it reduces the randomness of the sample()
function, and thus we might get to local optimum.
We think set.seed()
should only be called once, and at the outer_procedure.r
, and to obtain new/different random values on each function call the user could set seed
to as.numeric(Sys.time())
so it is always different.
@carstenn what do you think?