The only question is why there is a big difference between the result of Levene's test over two groups (right-tailed, center=mean) and F test for two variances (two-tailed), I would expect it to have a similar p-value

I set up two random normal samples, one with the variance 3 times the other, and found a p value from each method. Like obh I would have expected a reasonably close match. Here is a graph plotting the pairs of p values for 100 such trials.