floating point - Return a float from a 64-bit assembly function that uses x87 FPU -
i trying make program calculates equations (what equation doesn't matter currently) use 64-bit registers, floats, , coprocessor instructions. unfortunately don't know how access final outcome of equation float. can do:
fist qword ptr [bla] mov rax,bla and change function type int , value, cannot access float. when leave result in st(0) (the top of coprocessor stack) doesn't work expected , c++ program gets wrong result. assembly code is:
public funct .data bla qword ? bla2 qword 10.0 .code funct proc push rbp mov rbp, rsp push rbx mov bla,rcx fild qword ptr[bla] fld qword ptr [bla2] fmul st(0), st(1) fist dword ptr [bla] pop rbx pop rbp ret funct endp end my c++ code is:
#include <stdlib.h> #include <cstdlib> #include <stdio.h> extern "c" float funct(long long n); int main(){ float value1= funct(3); return 0; } what problem, , how can fix it?
your question bit ambiguous, , code. i'll present few ideas using x87 fpu, , sse instructions. usage of x87 fpu instructions discouraged in 64-bit code, , sse/sse2 preferred. sse/sse2 available on 64-bit amd , 64-bit intel x86 processors.
32-bit float in 64-bit code using x87 fpu
if question "how write 64-bit assembler code uses 32-bit floats using x87 fpu?" there c++ code looks fine, assembler code needs work. c++ code suggests output type of function 32-bit float:
extern "c" float funct(long long n); we need create function returns 32-bit float. assembler code modified in following fashion. keeping stack frame code , push/pop of rbx in code, since assume giving minimal example , real code using rbx. in mind following code should work:
public funct .data ten real4 10.0 ; define variable ten 32-bit (4-byte float) ; real4 , dword both same size. ; real4 makes more readable code when using floats .code funct proc push rbp mov rbp, rsp ; setup stack frame ; rsp aligned 16 bytes @ point push rbx mov [rbp+16],rcx ; 32 byte shadow space above return address ; @ rbp+16 (this address 16 byte aligned). rather ; use temporary variable in data section ; store value of rcx, store ; shadow space on stack. fild qword ptr[rbp+16] ; load , convert 64-bit integer st(0) fld [ten] ; st(0) => st(1), st(0) = 10.0 fmulp ; st(1)=st(1)*st(0), st(1) => st(0) fstp real4 ptr [rbp+16] ; store result shadow space 32-bit float movss xmm0, real4 ptr [rbp+16] ; store single scalar (32-bit float) xmm0 ; xmm0 = return value 32(and 64-bit) floats ; in 64-bit code. pop rbx mov rsp, rbp ; remove stack frame pop rbp ret funct endp end i've commented code, thing might of interest don't use second variable in data section. 64-bit windows calling convention requires caller of function ensure stack aligned on 16-byte boundary , there 32 byte shadow space (aka register parameter area) allocated before making call. area can used scratch area. since set stack frame, rbp @ rbp+0, return address @ rbp+8 , scratch area starts @ rbp+16. if weren't using stack frame return address @ rsp+0, , shadow space start @ rsp+8 can store result of our floating point operation there instead of in qword labelled bla.
it reasonable idea unwind floating point stack nothing remains on before exit our function. use fpu floating point functions pop registers after done using them.
the 64-bit microsoft calling convention requires floating point values returned in xmm0. use sse instruction movss move scalar single (32-bit float) xmm0 register. c++ code expect value returned.
32-bit float in 64-bit code using sse
building on ideas in section above, can modify code use sse instructions 32-bit floats. example of such code follows:
public funct .data ten real4 10.0 ; define variable ten 32-bit (4-byte float) ; real4 , dword both same size. ; real4 makes more readable code when using floats .code funct proc push rbp mov rbp, rsp ; setup stack frame ; rsp aligned 16 bytes @ point push rbx cvtsi2ss xmm0, rcx ; convert scalar integer in rcx ; scalar single(float) , store in xmm0 mulss xmm0, [ten] ; 32-bit float multiply 10.0 store in xmm0 ; xmm0 = return value 32(and 64-bit) floats ; in 64-bit code. pop rbx mov rsp, rbp ; remove stack frame pop rbp ret funct endp end this code removes usage of x87 fpu using sse instructions. in particular use:
cvtsi2ss xmm0, rcx ; convert scalar integer in rcx ; scalar single(float) , store in xmm0 cvtsi2ss converts scalar integer scalar single (float). in case 64-bit integer value in rcx converted 32-bit float , stored in xmm0. xmm0 register we'll placing our returned value into. xmm0 xmm5 considered volatile don't need save values.
mulss xmm0, [ten] ; 32-bit float multiply 10.0 store in xmm0 ; xmm0 = return value 32(and 64-bit) floats ; in 64-bit code. mulss sse instruction used sse multiplication using scalar single (float). in case mulss xmm0=xmm0*(32-bit float memory operand). have effect of doing 32-bit floating point multiply of xmm0 32-bit float of 10.0. since xmm0 contains our final result have nothing more exit function.
64-bit double float in 64-bit code using x87 fpu
this variation on first example, using 64-bit floats known double type in c++, real8 (or qword) in assembler, , scalar double in sse2. since using double return type have modify c++ code be:
#include <stdlib.h> #include <cstdlib> #include <stdio.h> extern "c" double funct(long long n); int main() { double value1 = funct(3); return 0; } the assembly code like:
public funct .data ten real8 10.0 ; define variable ten 64-bit (8-byte float) ; real8 , qword both same size. ; real8 makes more readable code when using floats .code funct proc push rbp mov rbp, rsp ; setup stack frame ; rsp aligned 16 bytes @ point push rbx mov [rbp+16],rcx ; 32 byte shadow space above return address ; @ rbp+8 (this address 16 byte aligned). rather ; use temporary variable in data section ; store value of rcx, store ; shadow space on stack. fild qword ptr[rbp+16] ; load , convert 64-bit integer st(0) fld [ten] ; st(0) => st(1), st(0) = 10.0 fmulp ; st(1)=st(1)*st(0), st(1) => st(0) fstp real8 ptr [rbp+16] ; store result shadow space 64-bit float movsd xmm0, real8 ptr [rbp+16] ; store double scalar (64-bit float) xmm0 ; xmm0 = return value 32(and 64-bit) floats ; in 64-bit code. pop rbx mov rsp, rbp ; remove stack frame pop rbp ret funct endp end this code identical x87 code using 32-bit float. using real8 (same qword) store 64-bit float , use movsd move 64-bit double float (scalar double) xmm0. movsd sse2 instruction. important return proper size float in xmm0. had used movss value returned c++ function incorrect.
64-bit double float in 64-bit code using sse2
this variation on second example, using 64-bit floats known double type in c++, real8 (or qword) in assembler, , scalar double in sse2. c++ code should use code previous section double used instead of float. assembler code similar this:
public funct .data ten real8 10.0 ; define variable ten 64-bit (8-byte float) ; real8 , qword both same size. ; real8 makes more readable code when using floats .code funct proc push rbp mov rbp, rsp ; setup stack frame ; rsp aligned 16 bytes @ point push rbx cvtsi2sd xmm0, rcx ; convert scalar integer in rcx ; scalar double(double float) , store in xmm0 mulsd xmm0, [ten] ; 64-bit float multiply 10.0 store in xmm0 ; xmm0 = return value 32(and 64-bit) floats ; in 64-bit code. pop rbx mov rsp, rbp ; remove stack frame pop rbp ret funct endp end the primary difference second example use cvtsi2sd instead of cvtsi2ss. sd in instruction means converting scalar double (64-bit double float). use mulsd instruction multiplication using scalar doubles. xmm0 hold 64-bit scalar double (double float) returned calling function.
Comments
Post a Comment