Porting C++ codes to the 64-bit platform

Table of Contents

some types

some-integer-types.jpg

data-models.jpg

Use of functions with a variable number of arguments

const char *invalidFormat = "%u";
size_t value = SIZE_MAX;
printf(invalidFormat, value);

It is not taken into account that size_t type is not equivalent to unsigned type on the 64-bit platform. As a result, it will cause printing of an incorrect result if value > UINT_MAX.

// PR_SIZET on Win64 = "I"
// PR_SIZET on Win32 = ""
// PR_SIZET on Linux64 = "l"
// ...
size_t u;
scanf("%" PR_SIZET "u", &u);
// or
size_t x = ...;
ssize_t y = ...;
printf("%zu\n", x);  // prints as unsigned decimal
printf("%zx\n", x);  // prints as hex
printf("%zd\n", y);  // prints as signed decimal

Magic numbers

4       Number of bytes in a pointer type
32      Number of bits in a pointer type
0x7fffffff      The maximum value of a 32-bit signed variable. Mask for zeroing of the high bit in a 32-bit type.
0x80000000      The minimum value of a 32-bit signed variable. Mask for allocation of the high bit in a 32-bit type.
0xffffffff      The maximum value of a 32-bit variable. An alternative record -1 as an error sign.

You should study the code thoroughly in search of magic numbers and replace them with safe numbers and expressions. To do so you can use sizeof() operator, special values from <limits.h>, <inttypes.h> etc.

1) size_t ArraySize = N * 4;
   intptr_t *Array = (intptr_t *)malloc(ArraySize);
2) size_t values[ARRAY_SIZE];
   memset(values, 0, ARRAY_SIZE * 4);
3) size_t n, newexp;
   n = n >> (32 - newexp);

Let's assume that in all cases the size of the types used is always 4 bytes. To make the code correct, we should use the sizeof() operator.

1) size_t ArraySize = N * sizeof(intptr_t);
   intptr_t *Array = (intptr_t *)malloc(ArraySize);
2) size_t values[ARRAY_SIZE];
   memset(values, 0, ARRAY_SIZE * sizeof(size_t));
//or
memset(values, 0, sizeof(values)); //preferred alternative
3) size_t n, newexp;
   n = n >> (CHAR_BIT * sizeof(n) - newexp);

Storing integers in double type

Some programmers use double type for storing of, and working with, integer types.

size_t a = size_t(-1);
double b = a;
--a;
--b;
size_t c = b; // x86: a == c
              // x64: a != c

The given example can be justified on a 32-bit system, as double type has 52 significant bits and is capable of storing a 32-bit integer value without loss. But while trying to store a 64-bit integer in double the exact value can be lost.

size_t_and_double.png

Bit shifting operations

ptrdiff_t SetBitN(ptrdiff_t value, unsigned bitNum) {
  ptrdiff_t mask = 1 << bitNum;
  return value | mask;
}

The given code works only on the 32-bit architecture, and allows the definition of bits with numbers from 0 to 31. After the program is ported to a 64-bit platform, it becomes necessary to define bits from 0 to 63. What value will the SetBitN(0, 32) call return? If you think that the value is 0x100000000, you are wrong. You'll get 0.

Pay attention to the fact that "1" has int type and during the shift on 32 positions, an overflow will occur:

overflow-bit-shift.png

To correct the code, it is necessary to make the constant "1" of the same type as the variable mask.

ptrdiff_t mask = ptrdiff_t(1) << bitNum;

What will be the result of the uncorrected function SetBitN(0, 31) call? The right answer is 0xffffffff80000000. The result of 1 << 31 expression is negative number -2147483648. This number is formed in a 64-bit integer variable as 0xffffffff80000000. You should keep in mind, and take into consideration, the effects of the shift of values of different types. To make you understand the stated information better table N4 contains interesting expressions with shifts on the 64-bit system.

shitfs-and-results-64bit.jpg

Storing of pointer addresses

Unfortunately, the 32-bit code is based on this supposition everywhere. Pointers are often cast to int, unsigned int and other types improper to fulfill address calculations.

1) char *p;
   p = (char *) ((int)p & PAGEOFFSET);
2) DWORD tmp = (DWORD)malloc(ArraySize); 
   ...
   int *ptr = (int *)tmp;

the correct versions:

1) char *p;
   p = (char *) ((intptr_t)p & PAGEOFFSET);
2) DWORD_PTR tmp = (DWORD_PTR)malloc(ArraySize); 
   ...
   int *ptr = (int *)tmp;

Memsize types in unions

When there is a necessity to work with a pointer as an integer, sometimes it is convenient to use the union as it is shown in the example, and work with the numeric form of the type without using explicit conversions.

union PtrNumUnion {
  char *m_p;
  unsigned m_n;
} u;
u.m_p = str;
u.m_n += delta;

This code is correct on 32-bit systems and is incorrect on 64-bit ones.

union PtrNumUnion {
  char *m_p;
  size_t m_n; //type fixed
} u;

Another frequent use of the union is the presentation of one member as a set of other smaller ones.

union SizetToBytesUnion {
  size_t value;
  struct {
    unsigned char b0, b1, b2, b3;
  } bytes;
} u;

SizetToBytesUnion u;
u.value = value;
size_t zeroBitsN = TranslateTable[u.bytes.b0] +
                   TranslateTable[u.bytes.b1] +
                   TranslateTable[u.bytes.b2] +
                   TranslateTable[u.bytes.b3];

Here is a fundamental algorithmic error which consists in the supposition that sizet type consists of 4 bytes.

union SizetToBytesUnion {
  size_t value;
  unsigned char bytes[sizeof(value)];
} u;

SizetToBytesUnion u;
u.value = value;
size_t zeroBitsN = 0;
for (size_t i = 0; i != sizeof(bytes); ++i)
  zeroBitsN += TranslateTable[bytes[i]];

Changing an array type

int array[4] = { 1, 2, 3, 4 };
enum ENumbers { ZERO, ONE, TWO, THREE, FOUR };
//safe cast (for MSVC2005)
ENumbers *enumPtr = (ENumbers *)(array);
cout << enumPtr[1] << " ";
//unsafe cast
size_t *sizetPtr = (size_t *)(array);
cout << sizetPtr[1] << endl;

//Output on 32-bit system: 2 2
//Output on 64 bit system: 2 17179869187

Serialization and data exchange

1) size_t PixelCount;
   fread(&PixelCount, sizeof(PixelCount), 1, inFile);
2) __int32 value_1;
   SSIZE_T value_2;
   inputStream >> value_1 >> value_2;
3) time_t time;
   PackToBuffer(MemoryBuf, &time, sizeof(time));

The usage of such types provides data portability between programs on different platforms, although it needs the usage of odd ones. The three shown examples are written inaccurately, and this will show up on the changing of the capacity of some data types from 32-bit to 64-bit. Taking into account the necessity to support old data formats, the correction may look as follows:

1) size_t PixelCount;
   __uint32 tmp;
   fread(&tmp, sizeof(tmp), 1, inFile);
   PixelCount = static_cast<size_t>(tmp);
2) __int32 value_1;
   __int32 value_2;
   inputStream >> value_1 >> value_2;
3) time_t time;
   __uint32 tmp = static_cast<__uint32>(time);
   PackToBuffer(MemoryBuf, &tmp, sizeof(tmp));

Pointer address arithmetic

unsigned short a16, b16, c16;
char *pointer;
// ...
pointer += a16 * b16 * c16;

This example works correctly with pointers if the value of "a16 * b16 * c16" expression does not exceed UINTMAX (4Gb). Such code may always work correctly on the 32-bit platform, as the program has never allocated arrays of large sizes. On the 64-bit architecture the size of the array exceeded UINTMAX items.

variables a16, b16 and c16 have values 3000, 2000 and 1000 correspondingly. While calculating "a16 * b16 * c16" expression all the variables according to C++ rules will be converted to int type, and only then their multiplication will occur. During the process of multiplication an overflow will occur.

One should take care to avoid possible overflows in pointer arithmetic

short a16, b16, c16;
char *pointer;
// ...
pointer += static_cast<ptrdiff_t>(a16) *
           static_cast<ptrdiff_t>(b16) *
           static_cast<ptrdiff_t>(c16);

This example works in the 32-bit version, but not in the 64-bit version.

int A = -2;
unsigned B = 1;
int array[5] = { 1, 2, 3, 4, 5 };
int *ptr = array + 3;
ptr = ptr + (A + B); //Invalid pointer value on 64-bit platform
printf("%i\n", *ptr); //Access violation on 64-bit platform

Let's follow the calculation flow of the "ptr + (a + b)" expression:

  • According to C++ rules variable A of int type is converted to unsigned type.
  • Addition of A and B occurs. The result we get is value 0xFFFFFFFF of unsigned type.

Then calculation of "ptr + 0xFFFFFFFFu" takes place, but the result of it depends on the pointer size on the particular architecture. If the addition takes place in a 32-bit program, the given expression will be an equivalent of "ptr - 1" and we'll successfully print number 3.

In a 64-bit program 0xFFFFFFFFu value will be added fairly to the pointer and the result will be that the pointer will be outbound of the array. And we'll face problems while getting access to the item of this pointer.

To avoid the situation shown, as well as in the first case, we advise you to use only memsize types in pointer arithmetic. Here are two variants of the code correction:

ptr = ptr + (ptrdiff_t(A) + ptrdiff_t(B));
ptrdiff_t A = -2;
size_t B = 1;
// ...
ptr = ptr + (A + B);

Array indexing

unsigned Index = 0;
while (MyBigNumberField[Index] != id)
  Index++;

The given code won't process an array containing more than UINT_MAX items in a 64-bit program. After the access to the item with UNIT_MAX index, an overflow of the Index variable will occur, and we'll get infinite loop.

class Region {
  float *array;
  int Width, Height, Depth;
  float Region::GetCell(int x, int y, int z) const;
//  ...
};
float Region::GetCell(int x, int y, int z) const {
  return array[x + y * Width + z * Width * Height];
}

he given code, however, will work correctly only with arrays containing less than INTMAX items. The reason for this, is the use of 32-bit int types for calculation of the items index.

Programmers often make a mistake trying to correct the code in the following way:

float Region::GetCell(int x, int y, int z) const {
  return array[static_cast<ptrdiff_t>(x) + y * Width +
               z * Width * Height];
}

They know that according to C++ rules, the expression for calculation of the index will have ptrdiff_t type, and hope to avoid the overflow with its help. But the overflow may occur inside the sub-expression "y * Width" or "z * Width * Height" since the int type is still used to calculate them.

If you want to correct the code without changing types of the variables participating in the expression, you may use explicit type conversion of every variable to memsize type:

float Region::GetCell(int x, int y, int z) const {
  return array[ptrdiff_t(x) +
               ptrdiff_t(y) * ptrdiff_t(Width) +
               ptrdiff_t(z) * ptrdiff_t(Width) *
               ptrdiff_t(Height)];
}

Mixed use of simple integer types and memsize types

Let's provide an example of a small code fragment which shows the danger of inaccurate expressions with mixed types (the results are retrieved in Microsoft Visual C++ 2005, 64-bit compilation mode).

int x = 100000;
int y = 100000;
int z = 100000;
intptr_t size = 1;                  // Result:
intptr_t v1 = x * y * z;            // -1530494976
intptr_t v2 = intptr_t(x) * y * z;  // 1000000000000000
intptr_t v3 = x * y * intptr_t(z);  // 141006540800000
intptr_t v4 = size * x * y * z;     // 1000000000000000
intptr_t v5 = x * y * z * size;     // -1530494976
intptr_t v6 = size * (x * y * z);   // -1530494976
intptr_t v7 = size * (x * y) * z;   // 141006540800000
intptr_t v8 = ((size * x) * y) * z; // 1000000000000000
intptr_t v9 = size * (x * (y * z)); // -1530494976

The right variant:

intptr_t v2 = intptr_t(x) + intptr_t(y) * intptr_t(z); // OK!

Mixed use of types may cause changes in program logic.

ptrdiff_t val_1 = -1;
unsigned int val_2 = 1;
if (val_1 > val_2)
  printf ("val_1 is greater than val_2\n");
else
  printf ("val_1 is not greater than val_2\n");

//Output on 32-bit system: "val_1 is greater than val_2"
//Output on 64-bit system: "val_1 is not greater than val_2"

On the 32-bit system the variable val_1 according to C++ rules was extended to unsigned int, and became value 0xFFFFFFFFu. As a result the condition "0xFFFFFFFFu > 1" was executed. On the 64–bit system, it's the other way around - the variable val_2 is extended to ptrdiff_t type. In this case the expression "-1 > 1" is checked. On picture 6 the occurring changes are shown sketchy.

If you need to return the previous behavior, you should change the val_2 variable type.

ptrdiff_t val_1 = -1;
size_t val_2 = 1;
if (val_1 > val_2)
  printf ("val_1 is greater than val_2\n");
else
  printf ("val_1 is not greater than val_2\n");

Exceptions

Throwing and handling exceptions using integer types is not a good programming practice for the C++ language.

char *ptr1;
char *ptr2;
try {
  try {
    throw ptr2 - ptr1;
  }
  catch (int) {
    std::cout << "catch 1: on x86" << std::endl;
  }
}
catch (ptrdiff_t) {
  std::cout << "catch 2: on x64" << std::endl;
}

Static analyzers

  1. Gimpel Software PC-Lint (http://www.gimpel.com). This analyzer has a large list of supported platforms and a general purpose static analyzer. It allows you to catch errors while porting programs on architectures with LP64 data model.
  2. Parasoft C++test (http://www.parasoft.com). It has a built-in environment, which greatly simplifies the work process and setting of the analysis rules.
  3. Viva64 (http://www.viva64.com). Unlike other analyzers, this one is intended to work with Windows (LLP64) data model. It is integrated into the development environment Visual Studio 2005.

References

  1. 20 issues of porting C++ code to the 64-bit platform: http://www.viva64.com/en/a/0004/
  2. Converting 32-bit Applications Into 64-bit Applications: Things to Consider: http://www.oracle.com/technetwork/server-storage/solaris/ilp32tolp64issues-137107.html
  3. Andrew Josey. Data Size Neutrality and 64-bit Support. http://www.unix.org/whitepapers/64bit.html.
  4. Harsha S. Adiga. Porting Linux applications to 64-bit systems. http://www.ibm.com/developerworks/library/l-port64/index.html.

Author: Shi Shougang

Created: 2016-11-08 Tue 00:04

Emacs 24.3.1 (Org mode 8.2.10)

Validate