Videos
The key thing to check first is the model's calibration, either using the bootstrap to correct for overfitting or using a huge independent sample not used for model development or fitting. The best way to assess calibration is using a loess smooth nonparametric regression. Once you establish calibration you can go on to predictive discrimination using the pseudo and Somers'
rank correlation coefficient, or a simple translation of it to the
-index AKA concordance probability or AUROC. The Brier score is an excellent addition to all this.
I may be wrong with exactly what it is that you are looking for, but if you are worried about the overall distribution of your prediction quality, then I would borrow from machine learning validation tools since this is exactly what they are interested in as well.
Here, you could do a 10-fold cross-validation (or a single hold out sample but you couldn't get information about the distribution of your metric over multiple samples of your data) of an area under the ROC curve metric (AUROC), for instance, if you would like to see how your predictions behave with different class cut-off probabilities.
Another metric could be the mean squared error if you prefer that loss function.