Why you should not use r or R^2 to gauge model quality
I can maybe just copy-paste the stuff here from that one email rant I sent out.
Pearson’s correlation coefficient ($r$) and the related coefficient of determination ($R^2$) are very common metrics used to describe the relationship between two variables.
There are good reasons why these are not commonly used in evaluating an ML model, however.
One pitfall