Here:

s2      strobe_s *          0x800a497 <_fflush_r+66>   

s2is a flash (read-only) address. Copying to read-only memory is both semantically erroneous and may trigger an MPU fault if the region were set to read-only.

It is not clear to me how the original code worked or indeed how:

 *s1 = ss->strobe;

is not causing a problem too however. Certainly it won't work as intended even if there were no exception.

Answer from Clifford on Stack Overflow
🌐
GitHub
github.com › zephyrproject-rtos › zephyr › issues › 54670
stm32: memcpy crashes with NEWLIBC · Issue #54670 · zephyrproject-rtos/zephyr
February 9, 2023 - I found a workaround by using -Wl,--wrap=memcpy and implementing the "stupid" version on memcpy (partially copied from Zephyr implementation in "minimal")
Author   jchabod
🌐
STMicroelectronics Community
community.st.com › t5 › stm32-mcus-products › memcpy-function › td-p › 422717
memcpy() function - STMicroelectronics Community
November 6, 2015 - One of the typical memcpy() optimizations is reading multiple words, and then writing multiple words. DMA would have a single holding register · Tips, Buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working.. ... STM32N6 – USB Mass Storage over SDMMC2: Drive appears in Windows but cannot be opened in STM32 MCUs Embedded software 2025-12-15
Discussions

Fast memcpy
Try to check if your device has a DMA peripheral with RAM to RAM capabilities. Then just program one channel to make these kind of transactions and call it when you need to copy chunks of memory. Generally, STM32 and most other Cortex M devices have DMA. As a plus, if you need padding, some DMA peripherals allow for padding. You can also use them to zero out regions of memory by setting the byte, half or word it is reading as 0 and locking the read address. More on reddit.com
🌐 r/embedded
11
6
January 26, 2021
memcpy crashes with NEW_LIBC on stm32 cortex m7 with debugger attached
Describe the bug memcpy crashes due to unaligned access under certain circumstances (Only reproductible when using the debugger -STLINK) My BOARD is a ARM7 more specificly a stm32h747i_disco_m7. I ... More on github.com
🌐 github.com
7
February 8, 2023
Top answer
1 of 3
6

Several problems:

  • DMA buffers need to be volatile qualified or otherwise the compiler might go bananas when generating the code accessing them.

  • You use memcpy incorrectly, should have been memcpy(Wave_Active , Wave_High, sizeof Wave_Active);

  • The use of memcpy to begin with is often incorrect when it comes to hardware-related programming. Copying 256 bytes takes a lot of time. Worst case, your DAC might even request new data before you are done copying.

    The correct way to write such code would be to have several allocated buffers, then swap an "active" pointer to point at the one used. With the disclaimer that I don't understand the purpose of these arrays, something like this would be an immense speed optimization:

      volatile uint32_t Wave_Low[NS] = {2048,[...],2047};  # lookup table 1
      volatile uint32_t Wave_High[NS] = {4096,[...],4067}; # lookup table 2
      volatile uint32_t* Wave_Active = Wave_High;
    
      ...
      if(DMA_flag)
      {
        Wave_Active = (Wave_Active==Wave_Low) ? Wave_High : Wave_Low;
    
        /* you might have to tell the DMA which array to use next time here */
      }
    
2 of 3
2

@ 0___________ suggested a solution with the right approach. He suggested overwriting one half of the lookup table while the DMA reads the other half. This would avoid reading a byte while it's getting read. The problem with this is that I'm sampling faster than memcpy can write the values.

Therefore I've tried the simple approach and bluntly overwrite the array with memcpy no matter what. This partly causes funny signal patterns (you can actually see which part of the lookup table ges overwritten first) but over all it works. This causes a signal transition within around 20 µs which is sufficient. As I can live with the imperfect signal pattern that will be my solution. Below is an oscilloscope screenshot of the signal transition from low to high.

Thanks for your help!

🌐
STMicroelectronics Community
community.st.com › t5 › stm32-mcus-products › memcpy-vs-dma › td-p › 444784
Memcpy vs DMA - STMicroelectronics Community
February 11, 2016 - For example, the newlib library provides a speed optimized version of memcpy(), which automatically detects word-aligned memory transfers. The newlib-nano memcpy(), being optimized for size, it doesn't perform this type of check. Moreover, you should also try DMA m2m transfers word-aligned: for some STM32 MCU you can achieve more than 4x speed-up.
🌐
VisualGDB
visualgdb.com › tutorials › arm › chronometer › memory
Analyzing STM32 Memory Performance with Chronometer – VisualGDB Tutorials
Switch to the release configuration, enable chronometer for it and run through your program:Observe the times shown in the Chronometer window. In our experiments the memcpy() function was as slow as copying the image byte-by-byte; copying it word-by-word was ~4x faster and using DMA was actually ...
Find elsewhere
🌐
EmbDev
embdev.net › arm programming with gcc/gnu tools
help! memcpy causes hardfault on STM32F4 - EmbDev.net
December 10, 2012 - EmbDev.net · Contact – Data privacy information – Advertising on EmbDev.net
🌐
TutorialsPoint
tutorialspoint.com › c_standard_library › c_function_memcpy.htm
C library - memcpy() function
Below the program uses two functions− puts() and memcpy() to copy the content from one memory address/location to another.
🌐
Reddit
reddit.com › r/embedded › fast memcpy
r/embedded on Reddit: Fast memcpy
January 26, 2021 -

Hello,

I've read a lot about fast memcpy, type punning and strict aliasing rule in C99 and I feel a bit confused and would like to make sure that my understanding is correct. So, as far as I understand, the safest way of implementing a memcpy that works with chunks of data bigger than one byte is to use assembly, because:

  1. Accessing a uint8_t buffer with an uint32_t pointer is undefined behavior (due to strict aliasing, not only address alignment):

if (pu8Buffer % 4 == 0)

* (uint32_t *) pu8Buffer = u32Array[szIndex];

2) Type punning with unions is specified behavior only with gcc extensions:

union {

uint8_t * p8;

uint32_t * p32;

} uPointer;

uPointer.p8 = pu8Buffer;

*uPointer.p32 = u32Array[szIndex];

Is there really no standard way of implementing a faster memcpy in C99?

🌐
Chibios
forum.chibios.org › board index › support section › stm32 support
memcpy and cache on STM32F7 - ChibiOS Free Embedded RTOS
Hi, I think that compiler-provided implementations would never use resources outside the CPU, so no copy using DMA, it would not be even portable. Use of DMA, in any possible scenario, would definitely be affected by cache coherency and that on any M7 device, not just STM32s.
🌐
GitHub
github.com › zephyrproject-rtos › zephyr › issues › 54630
memcpy crashes with NEW_LIBC on stm32 cortex m7 with debugger attached · Issue #54630 · zephyrproject-rtos/zephyr
February 8, 2023 - Whe using STLNIK debugger, the result is not displayed, an exception is raised in memcpy at this point : The First odd byte is copied correctly thus bringing src (R0) and dst (R1) to odd addresses. The ldrh raises an exception due to unaligned address ... StalebugThe issue is a bug, or the PR is fixing a bugThe issue is a bug, or the PR is fixing a bugplatform: STM32ST Micro STM32ST Micro STM32priority: lowLow impact/importance bugLow impact/importance bug
Author   jchabod
🌐
STMicroelectronics Community
community.st.com › t5 › stm32cubeide-mcus › problem-with-memcpy › td-p › 92629
Problem with memcpy - STMicroelectronics Community
February 20, 2023 - Hello, I am having a problem with simple line of code. If I run code bellow, the result seems off. //Main job of this code is to tranform hex into float uint32_t hex = 0x3fc51eb8; float f; memcpy(&f, &hex, 4); If i go into Live Expressions i get that f = 0x1 (hex), it should be f = 0x3fc51...
🌐
STMicroelectronics Community
community.st.com › s › question › 0D50X00009XkbBASAZ › help-memcpy-causes-hardfault
help ! memcpy causes hardfault - STMicroelectronics Community
July 31, 2018 - If you use opt level 1, and if you use memcopy for <= 8 Bytes, then possibly memcpy will NOT be invoked, but a LDR / STR sequence, which of course is much faster - Keil compiler really is very smart. (I am not sure if Keil even would do this already at opt level 0 - I never use opt level 0). Depending on your basic CortexM Core settings (usually in stm32f4xx-startup.s), hardfault will be generated on half-word (16bit) address or not (ARM has possibility to support 16-bit access for certain operations like LDR and STR - check the ARM v7m TRM A3.2 ''Alignment behaviour'') .
🌐
IBM
ibm.com › docs › en › zos › 2.5.0
memcpy() — Copy buffer
We cannot provide a description for this page right now
🌐
STMicroelectronics Community
community.st.com › t5 › stm32-mcus-products › single-bulk-memory-copy-instead-of-memcpy › td-p › 338248
Single bulk memory copy (instead of memcpy) - STMicroelectronics Community
May 5, 2018 - We are using STM32-767ZI and copying a small array into a memory area which is mapped to an external FPGA. Looking for how to make the copy in a single bulk operation. ... This discussion is locked. Please start a new topic to ask your question. ... Going to depend on the compiler. memcpy() is most frequently optimized, both in terms of aligning addresses, but also using LDR/STR multiple instructions, ie fetch 8 words, write 8 words.
🌐
Cprogramming
cboard.cprogramming.com › c-programming › 154333-fast-memcpy-alternative-32-bit-embedded-processor-posted-just-fyi-fwiw.html
Fast memcpy() alternative for a 32-bit embedded processor (Posted just FYI and FWIW!)
February 15, 2013 - Generally optimising code for microcontrollers is a trade off between code size and performance. This code takes performance to an extreme at the cost of really rather bulky code. Compiled with Linaro GCC for Cortex-M4 it's over 500 bytes (with manualCopy inlined twice). The library memcpy is 130, ...