in the
subdirectory of the target.
The evaluate.rdb files have been updated to include log of RMSD scores
and "clens"---a new contact-based evaluation function.
The GDT and smooth_GDT scores are essentially interchangeable, with
smooth_GDT =approx 0.9423 * GDT,
but when the models are really bad, smooth_GDT/GDT is slightly larger---scattered
around 1.
One can predict the log RMSD and log RMSD_CA scores from the GDT and
clens scores:
log_RMSD =approx 2.8864 * clens + 0.116975
log_RMSD_CA =approx 3.27581 * clens -0.280389
log_RMSD =approx -0.0298756 * GDT + 3.50611
log_RMSD_CA =approx -0.0354623 * GDT + 3.66133
log_RMSD =approx -0.0317885 * smooth_GDT + 3.50091
log_RMSD_CA =approx -0.0379614* smooth_GDT + 3.66936
Looking at RMSD values of the fit when the clens or GDT values are
perfect gives us an idea of the lower limit of resolution for the
evaluation method.
For clens=0, RMSD=approx 1.124 and RMSD_CA=approx 0.7555
For GDT=100, RMSD=approx 1.6796 and RMSD_CA=1.1220
For smooth_GDT=100, RMSD=approx 1.3800 and RMSD_CA=0.8809
The only apparent advantage that smooth_GDT has over GDT is that it
allows detection of smaller differences in very good predictions, but
clens is even more sensitive to such small errors.
Unfortunately, some of the outliers for clens vs. GDT are not very
promising for clens: on very short sequences that lack a core (such
as the 24 residues of T0229_1) the clens evaluation seems overly
pessimistic. I still need to look at other outliers to see if clens
or GDT is a better measure of quality for them. I also need to run
evaluations on other people's predictions, since there may be more
extreme outliers there (clens may be more sensitive to overcompaction
than GDT, for example).
The linear fits for log_rmsd_ca and log_rmsd are closest for GDT, very
slightly worse for smooth_GDT, and quite a bit worse for clens.
The only advantage clens seems to have so far is that it is fast and
determinisitic, not requiring sampling superpositions---it is
computed by comparing distance maps.
If we do a non-linear fit of GDT from clens, we get a pretty good fit with
GDT=approx -123.664*clens^3+125.779*clens^2-100.631*clens+100.888
(restricting the fit to models longer than 40 residues).
Even with this non-linear scaling of clens, clens is not quite as good a
predictor of rmsd as GDT is.
The non-linear fit can be reduced to a one-parameter fit:
GDT =approx 100/b *(1-x)*(x^2+b)
with b=approx 0.757291. The same curve for smooth_GDT can be fit with
b=approx 0.884582.
Even better fits for smooth_GDT are with
smooth_GDT =approx 100/c *(1-x)*(x^3+c)
for c=approx 0.6485, but this form is not as good a fit
for GDT as the x^2 form.