Non-alignment integer access performance

When you define a structure like this, the integer field is allocated not just after the byte field, but aligned to 4 byte boundary.

struct foo{
  unsigned char v1;
  unsigned int v2;
};

It’s because x86 architecture has a performance issue in accessing non-aligned data. Other CPU architectures such as Itanium even don’t allow non-aligned access.

http://www.intel.com/cd/ids/developer/asmo-na/eng/170533.htm?page=3

So, how much performance deceleration happens in x86? Here is a bench mark.

The test is performed by accessing various size of buffers and getting the elapsed time. The graph is based on average values of 10 runs.

6a00d10a7a8c668bfa00e3989e716b0001-320pi

Int_read

In reading test, there is actually little performance difference between aligned access and non-aligned access. Using byte-to-byte access to avoid alignment issue is actually the worst choice; it’s far slower than non-aligned integer access.

6a00d10a7a8c668bfa00e3989e932d0005-320pi

Int_write

In integer writing test, however, aligned and non-aligned have almost x2 difference. Using byte-to-byte access would make sense here – it’s slightly faster than non-aligned access.

Now, here is a question. You can see a spike at 512KB in non-aligned access. Where does it come from?

Advertisements

About Moto

Engineer who likes coding
This entry was posted in Optimization. Bookmark the permalink.

2 Responses to Non-alignment integer access performance

  1. new299 says:

    Are you allocing with malloc? I’ve noticed Malloc on a mac can have an overhead of up to 512 bytes, perhaps it’s somehow related:

    http://41j.com/blog/2011/09/finding-out-how-much-memory-was-allocated/

    • Moto says:

      Hi, “alignment” being discussed here was not heap allocation block alignment. Basically, it was about memory access performance of the following code. The variable “p” points at an odd address while its type is 16 bits. I was informed x86 architectures issued two write operations and therefore slower than aligned 16 bits access. I tried to see how much penalty it really had.


      void f(){
      char a;
      char b;
      char c;

      short* p = reinterpret_cast(&b);
      *p = 0;
      }

      Sorry, due to the migration over different blog hosting services, some links were broken and the point was not clear.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s