How does memcpy( ) work in C lang?
c - Understanding the source code of memcpy() - Stack Overflow
c - memcpy() implementation - Code Review Stack Exchange
c - How does the internal implementation of memcpy work? - Stack Overflow
Videos
Long time ago when i was a wee little lad learning C, before I was learning pointers, I learned how to swap values between two variables ;
uint32_t x=1; uint32_t y=2; uint32_t tmp; tmp = y; y=x; x=tmp;
Of-course, with pointers, we can do ;
void xorSwap (int* x, int* y) {
if (x != y) {
*x ^= *y;
*y ^= *x;
*x ^= *y;
}
}Yesterday at work, we write bare metal C for fw i had this uniquely weird issue, we work in RISC V environment,
where i have to use memcpy() something similar to a
memcpy( uint32_t * dest, const uint32_t * source, size_t sz );
and the issue was that, in the linker I had specified maximum buffer size too, so it should've worked.
This below is actual code:
memcpy(sample_waveform, (void *)(0x7C000), sizeof(sample_waveform));
and when i had sample_waveform which was a signed double GLOBAL array of 3K bytes, it won't copy, However it did was able to copy first 500 bytes though when I made the sample_waveform size smaller.
but when I made sample_waveform local on the stack which took a stack space of 3K, I was able to do memcpy( ).
I can't explain why this is so? I didn't want to copy and take/use that much space on the stack.
Has this happened to anyone?
How does memcpy( ) work? Does it copy to temp somewhere and then copy to your destination, or does it copy like that pointer method. How is this different from memmove()?
Thanks!
I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?
Because they are copying words, not individual bytes, in this case (as the comment says, it is an optimization - it requires less iterations and the CPU can handle word aligned data more efficiently).
len is the number of bytes to copy, and sizeof(long) is the size of a single word, so the number of elements to copy (means, loop iterations to execute) is len / sizeof(long).
to understand how it differs from using a loop. But I couldn't any difference of using a loop rather than memcpy, as memcpy uses loop again internally to copy
Well then it uses a loop. Maybe other implementations of libc doesn't do it like that. Anyway, what's the problem/question if it does use a loop? Also as you see it does more than a loop: it checks for alignment and performs a different kind of loop depending on the alignment.
I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?
This is checking for memory word alignment. If the destination and source addresses are word-aligned, and the length copy is multiple of word-size, then it performs an aligned copy by word (long), which is faster than using bytes (char), not only because of the size, but also because most architectures do word-aligned copies much faster.
Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.
Here's a fun article that describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.
The implementation of memcpy is highly specific to the system in which it is implemented. Implementations are often hardware-assisted.
Memory-to-memory mov instructions are not that uncommon - they have been around since at least PDP-11 times, when you could write something like this:
MOV FROM, R2
MOV TO, R3
MOV R2, R4
ADD LEN, R4
CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
CMP R2, R4
BNE CP
The commented line is roughly equivalent to C's
*to++ = *from++;
Contemporary CPUs have instructions that implement memcpy directly: you load special registers with the source and destination addresses, invoke a memory copy command, and let CPU do the rest.