Making a wave which has a specific byte boundary alignment

Hi, Recently I am coding with OpenCL. In order to gain a better performance for data transfer, it requires the host buffer to be aligned to a byte boundary of 4096: https://software.intel.com/en-us/articles/getting-the-most-from-opencl-… When using the CL_MEM_USE_HOST_PTR flag, if we want to guarantee a zero copy buffer on Intel processor graphics, we need to ensure that we adhere to two device-dependent alignment and size rules. We must create a buffer that is aligned to a 4096 byte boundary and have a size that is a multiple of 64 bytes. I think the latter is fulfilled. For the former requirement, when I tested the pointer of Igor wave in C using the follow code: if (((uintptr_t)IgorPtr % 4096u) != 0) { return -903; } IgorPtr was obtained from WaveData(waveHandle). In this case, -903 was returned in Igor when the XOP was executed. Otherwise, if I created an aligned ptr in c and check it: int *pbuf = (int *)_aligned_malloc(sizeof(int) * 1024, 4096); if (((uintptr_t)pbuf % 4096u) != 0) { return -903; } -903 was NOT returned and the alignment was good. Therefore the question is, how should I create the buffer with the specific alignment, e.g. 4096?
thomas_braun wrote:
Have you looked at https://github.com/pdedecker/IgorCL? Relating to your question. If you need memory aligned to a certain boundary and can not enfore that on creation time (as is the case for Igor waves) the usual approach is to make the wave a bit larger and don't use the first x bytes so that the real data starts at the desired boundary. https://stackoverflow.com/a/227900 has some details on that.
Thanks for the reminder about the IgorCL work, I can probably get some hints as to how the transfer was optimized. At the moment I have already got working codes, except the transfer time is taking a longer than the expected (~ 2-3 times) such that I trying to find out why. Your link probably gave what I wanted, I will try it later, thanks a lot.

FYI, I have been looking into whether we could add a flag to Make to force alignment to a user specified boundary. This appears to be impractical. The data of a wave actually starts at the end of an array of wave info where there is a final field: double wData[1]; There are about 2500 places in the code where we directly access this using, for example, double x= xwP->wData[i];

To support alignment, we would have to change double wData[1] to double *wDataPtr which could then be allocated separately. That of course changes 2500 places in the code.

I also found in researching Apple Metal MPSMatrixMultiplication, that to use shared memory (to avoid copying) the newBufferWithBytesNoCopy method takes a pointer input that has to be allocated with vm_allocate or mmap. Memory allocated by malloc is specifically disallowed. The memory must both start and end on a virtual memory page boundary.

 

In reply to by Larry Hutchinson

Larry Hutchinson wrote:
There are about 2500 places in the code where we directly access this using, for example, double x= xwP->wData[i]

Manually doing that will be really no fun. From following the git mailing list I've came across coccinelle [1] which allows to write rules for code refactoring. It is designed only for C though. In [2] I've seen DMS Software Reengineering Toolkit which claims to do the same for C++.

Then there is also clang-rename [3] but that AFAIK only works if your code base is compiled with cmake as it requires a compilation database.

[1]: http://coccinelle.lip6.fr
[2]: https://stackoverflow.com/a/2427608
[3]: https://clang.llvm.org/extra/clang-rename.html