13/09/24 16:26:07.84 yDbtGzZA0
On some (but not all) micro-architectures, there are timing differences due to "domain crossing penalties".
For this reason, one should generally use movdqa when the data is being used with integer SSE instructions,
and movaps when the data is being used with floating-point
instructions. For more information on this subject,
consult the Intel Optimization Manual,
or Agner Fog's excellent microarchitecture guide.
Note that these delays are most often associated with register-register moves instead of loads or stores.
だそうな、integerだったらmovqdnなんだと。マニアックすぐるw