c - Is it possible to round-trip a floating point double to two decimal integers with fidelity? -


i trying discern whether possible decompose double precision ieee floating point value 2 integers , recompose them later full fidelity. imagine this:

double foo = <inputvalue>; double ipart = 0; double fpart = modf(foo, &ipart);  int64_t intipart = ipart; int64_t intfpart = fpart * <someconstant>;  double bar = ((double)ipart) + ((double)intfpart) / <someconstant>;  assert(foo == bar); 

it's logically obvious 64-bit quantity can stored in 128-bits (i.e. store literal bits.) goal here decompose integer part , fractional part of double integer representations (to interface , api storage format don't control) , bit-exact double when recomposing 2 64-bit integers.

i have conceptual understanding of ieee floating point, , doubles stored base-2. observe, empirically, above approach, foo != bar large values of <someconstant>. i've been out of school while, , can't quite close loop in head terms of understanding whether possible or not given different bases (or other factor).

edit:

i guess implied/understood in brain not captured here: in situation, i'm guaranteed overall magnitude of double in questions within +/- 2^63 (and > 2^-64). understanding, integer part guaranteed fit within 64-bit int type expectation ~16 bits of decimal precision, fractional part should representable in 64-bit int type well.

if know number in [–263, +263) , ulp (the value of lowest bit in number) @ least 2-63, can use this:

double ipart; double fpart = modf(foo, &ipart);  int64_t intipart = ipart; int64_t intfpart = fpart * 0x1p63;  double bar = intipart + intfpart * 0x1p-63; 

if want couple of integers value can reconstructed , not care meaning of integers (e.g., not necessary 1 of them integer part), can use frexp disassemble number significand (with sign) , exponent, , can use ldexp reassemble it:

int exp; int64_t = frexp(foo, &exp) * 0x1p53; int64_t e = exp;  double bar = ldexp(i, e-53); 

this code work finite value of ieee-754 64-bit binary floating-point object. not support infinities or nans.

it possible pack i , e single int64_t, if want go trouble.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -