c - Is it possible to round-trip a floating point double to two decimal integers with fidelity? -
i trying discern whether possible decompose double precision ieee floating point value 2 integers , recompose them later full fidelity. imagine this:
double foo = <inputvalue>; double ipart = 0; double fpart = modf(foo, &ipart); int64_t intipart = ipart; int64_t intfpart = fpart * <someconstant>; double bar = ((double)ipart) + ((double)intfpart) / <someconstant>; assert(foo == bar);
it's logically obvious 64-bit quantity can stored in 128-bits (i.e. store literal bits.) goal here decompose integer part , fractional part of double integer representations (to interface , api storage format don't control) , bit-exact double when recomposing 2 64-bit integers.
i have conceptual understanding of ieee floating point, , doubles stored base-2. observe, empirically, above approach, foo != bar
large values of <someconstant>
. i've been out of school while, , can't quite close loop in head terms of understanding whether possible or not given different bases (or other factor).
edit:
i guess implied/understood in brain not captured here: in situation, i'm guaranteed overall magnitude of double in questions within +/- 2^63 (and > 2^-64). understanding, integer part guaranteed fit within 64-bit int type expectation ~16 bits of decimal precision, fractional part should representable in 64-bit int type well.
if know number in [–263, +263) , ulp (the value of lowest bit in number) @ least 2-63, can use this:
double ipart; double fpart = modf(foo, &ipart); int64_t intipart = ipart; int64_t intfpart = fpart * 0x1p63; double bar = intipart + intfpart * 0x1p-63;
if want couple of integers value can reconstructed , not care meaning of integers (e.g., not necessary 1 of them integer part), can use frexp
disassemble number significand (with sign) , exponent, , can use ldexp
reassemble it:
int exp; int64_t = frexp(foo, &exp) * 0x1p53; int64_t e = exp; double bar = ldexp(i, e-53);
this code work finite value of ieee-754 64-bit binary floating-point object. not support infinities or nans.
it possible pack i
, e
single int64_t, if want go trouble.
Comments
Post a Comment