There are two visible issues, but they may have the same root cause: 1) a mismatch in x86 between the default & -noasm 1a) gcc: with -O0 1b) clang: with -O2
2) a mismatch between x86 and x86-64 2a) gcc: with -O2 2b) clang: with -O2
I didn't really look into it, but for what it's worth the first occurrence of x86 != x86-64 seems to be v0.4.0-57-gc16cd99a [1]. Bisecting from 0.4.0..0.4.4 and 0.4.0..master land at the same place. It's worth noting too that in v1.1.0-69-g289757fe we'll generate a floating point exception (FE_INEXACT) in the FastLog function. Changing the double precision to 64 didn't change the result, so it may have to do with the float precision itself as we were suspecting.
[1] c16cd99abab37042aa1bc89e10f50aa4f86f348c is the first bad commit commit c16cd99abab37042aa1bc89e10f50aa4f86f348c Author: Vikas Arora <vik...@google.com> Date: Fri Feb 21 11:41:38 2014 -0800
Speed up lossless encoder.
Speedup lossless encoder by 20-25% by optimizing: - GetBestColorTransformForTile: Use techniques like binary search and local minima search to reduce the search space. - VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for log(1 + x) and increase the threshold for calling the approximate version of log_2 (compared to costly call to log()).