Brave Search

STM32 call to memcpy causes hardfault (the call to memcpy itself, not the execution of memcpy)

stackoverflow.com › questions › 67968027 › stm32-call-to-memcpy-causes-hardfault-the-call-to-memcpy-itself-not-the-execut

Here:

s2      strobe_s *          0x800a497 <_fflush_r+66>

s2is a flash (read-only) address. Copying to read-only memory is both semantically erroneous and may trigger an MPU fault if the region were set to read-only.

It is not clear to me how the original code worked or indeed how:

 *s1 = ss->strobe;

is not causing a problem too however. Certainly it won't work as intended even if there were no exception.

Answer from Clifford on Stack Overflow

GitHub

github.com › zephyrproject-rtos › zephyr › issues › 54670

stm32: memcpy crashes with NEWLIBC · Issue #54670 · zephyrproject-rtos/zephyr

February 9, 2023 - I found a workaround by using -Wl,--wrap=memcpy and implementing the "stupid" version on memcpy (partially copied from Zephyr implementation in "minimal")

Author jchabod

STMicroelectronics Community

community.st.com › t5 › stm32-mcus-products › memcpy-function › td-p › 422717

memcpy() function - STMicroelectronics Community

November 6, 2015 - One of the typical memcpy() optimizations is reading multiple words, and then writing multiple words. DMA would have a single holding register · Tips, Buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working.. ... STM32N6 – USB Mass Storage over SDMMC2: Drive appears in Windows but cannot be opened in STM32 MCUs Embedded software 2025-12-15

Discussions

Fast memcpy

Try to check if your device has a DMA peripheral with RAM to RAM capabilities. Then just program one channel to make these kind of transactions and call it when you need to copy chunks of memory. Generally, STM32 and most other Cortex M devices have DMA. As a plus, if you need padding, some DMA peripherals allow for padding. You can also use them to zero out regions of memory by setting the byte, half or word it is reading as 0 and locking the read address. More on reddit.com

r/embedded

January 26, 2021

memcpy crashes with NEW_LIBC on stm32 cortex m7 with debugger attached

Describe the bug memcpy crashes due to unaligned access under certain circumstances (Only reproductible when using the debugger -STLINK) My BOARD is a ARM7 more specificly a stm32h747i_disco_m7. I ... More on github.com

github.com

February 8, 2023

Stack Overflow

stackoverflow.com › questions › 67968027 › stm32-call-to-memcpy-causes-hardfault-the-call-to-memcpy-itself-not-the-execut

c - STM32 call to memcpy causes hardfault (the call to memcpy itself, not the execution of memcpy) - Stack Overflow

Top answer

1 of 1

Here:

s2      strobe_s *          0x800a497 <_fflush_r+66>

s2is a flash (read-only) address. Copying to read-only memory is both semantically erroneous and may trigger an MPU fault if the region were set to read-only.

It is not clear to me how the original code worked or indeed how:

 *s1 = ss->strobe;

is not causing a problem too however. Certainly it won't work as intended even if there were no exception.

Stack Overflow

stackoverflow.com › questions › 65828938 › memcpy-command-alternating-data-while-copying

c++ - memcpy command alternating data while copying - Stack Overflow

Top answer

1 of 3

Several problems:

DMA buffers need to be volatile qualified or otherwise the compiler might go bananas when generating the code accessing them.
You use memcpy incorrectly, should have been memcpy(Wave_Active , Wave_High, sizeof Wave_Active);
The use of memcpy to begin with is often incorrect when it comes to hardware-related programming. Copying 256 bytes takes a lot of time. Worst case, your DAC might even request new data before you are done copying.

The correct way to write such code would be to have several allocated buffers, then swap an "active" pointer to point at the one used. With the disclaimer that I don't understand the purpose of these arrays, something like this would be an immense speed optimization:
```
  volatile uint32_t Wave_Low[NS] = {2048,[...],2047};  # lookup table 1
  volatile uint32_t Wave_High[NS] = {4096,[...],4067}; # lookup table 2
  volatile uint32_t* Wave_Active = Wave_High;

  ...
  if(DMA_flag)
  {
    Wave_Active = (Wave_Active==Wave_Low) ? Wave_High : Wave_Low;

    /* you might have to tell the DMA which array to use next time here */
  }
```

2 of 3

@ 0___________ suggested a solution with the right approach. He suggested overwriting one half of the lookup table while the DMA reads the other half. This would avoid reading a byte while it's getting read. The problem with this is that I'm sampling faster than memcpy can write the values.

Therefore I've tried the simple approach and bluntly overwrite the array with memcpy no matter what. This partly causes funny signal patterns (you can actually see which part of the lookup table ges overwritten first) but over all it works. This causes a signal transition within around 20 µs which is sufficient. As I can live with the imperfect signal pattern that will be my solution. Below is an oscilloscope screenshot of the signal transition from low to high.

Thanks for your help!

STMicroelectronics Community

community.st.com › t5 › stm32-mcus-products › memcpy-vs-dma › td-p › 444784

Memcpy vs DMA - STMicroelectronics Community

February 11, 2016 - For example, the newlib library provides a speed optimized version of memcpy(), which automatically detects word-aligned memory transfers. The newlib-nano memcpy(), being optimized for size, it doesn't perform this type of check. Moreover, you should also try DMA m2m transfers word-aligned: for some STM32 MCU you can achieve more than 4x speed-up.

VisualGDB

visualgdb.com › tutorials › arm › chronometer › memory

Analyzing STM32 Memory Performance with Chronometer – VisualGDB Tutorials

Switch to the release configuration, enable chronometer for it and run through your program:Observe the times shown in the Chronometer window. In our experiments the memcpy() function was as slow as copying the image byte-by-byte; copying it word-by-word was ~4x faster and using DMA was actually ...

Arm Developer

developer.arm.com › documentation › 101655 › latest › Cx51-User-s-Guide › Library-Reference › Reference › memcpy

memcpy

STMicroelectronics Community

community.st.com › s › question › 0D50X00009XkaLtSAJ › use-of-memset-and-memcpy

Use of memset and memcpy - STMicroelectronics Community

July 31, 2018 - Loading · ×Sorry to interrupt · Refresh

Find elsewhere

Google Bing Mojeek

STMicroelectronics Community

community.st.com › t5 › stm32-mcus-products › memcpy-problem › td-p › 544282

memcpy problem - STMicroelectronics Community

Top answer

1 of 8

Posted on October 11, 2013 at 01:25 · I had a related problem which I'll share as you may be experiencing the same. · I use the GCC compiler which I use to compile both C and C++ applications for STM32. When using memcpy() the compiler includes optimisations to use the stmdb and ldmia instruction to copy multiple items at a time. While the normal ldr and str instructions are able to work just fine with non-aligned memory pointers, the ldmia and stmdb instructions resulted in hard faults (although maybe memory faults). · If you use Google you can find other references to this issue, and also how to specify the variables to stop this optimisation from happening - or at least only be applied when the memory being copied is correctly aligned. I can't remember the exact details though, as I wrote my own memcpy replacement instead.

2 of 8

Posted on October 08, 2013 at 15:27 · More probably the stack, and blowing through the top of RAM. · If you know the specific memcpy(), you could place sanity checks around it to assure you don't exceed the scope of the buffer/whatever that is being used. · Tips, Buy me a coffee, or three.. PayPalVenmoUp vote any posts that you find helpful, it shows what's working..

EmbDev

embdev.net › arm programming with gcc/gnu tools

help! memcpy causes hardfault on STM32F4 - EmbDev.net

December 10, 2012 - EmbDev.net · Contact – Data privacy information – Advertising on EmbDev.net

TutorialsPoint

tutorialspoint.com › c_standard_library › c_function_memcpy.htm

C library - memcpy() function

Below the program uses two functions− puts() and memcpy() to copy the content from one memory address/location to another.

STMicroelectronics Community

community.st.com › s › question › 0D50X00009XkXSXSA3 › single-bulk-memory-copy-instead-of-memcpy

Single bulk memory copy (instead of memcpy) - ST Community

August 31, 2018 - Loading · ×Sorry to interrupt · Refresh

reddit.com › r/embedded › fast memcpy

r/embedded on Reddit: Fast memcpy

January 26, 2021 -

Hello,

I've read a lot about fast memcpy, type punning and strict aliasing rule in C99 and I feel a bit confused and would like to make sure that my understanding is correct. So, as far as I understand, the safest way of implementing a memcpy that works with chunks of data bigger than one byte is to use assembly, because:

Accessing a uint8_t buffer with an uint32_t pointer is undefined behavior (due to strict aliasing, not only address alignment):

if (pu8Buffer % 4 == 0)
* (uint32_t *) pu8Buffer = u32Array[szIndex];

2) Type punning with unions is specified behavior only with gcc extensions:

union {
uint8_t * p8;
uint32_t * p32;
} uPointer;
uPointer.p8 = pu8Buffer;
*uPointer.p32 = u32Array[szIndex];

Is there really no standard way of implementing a faster memcpy in C99?

Top answer

1 of 3

2 of 3

Yes, there are a lot of fast C memcpy implementations out there. Word width transfers and loop unrolling are pretty common ways to optimize it. memcpy takes void*, not uint8_t*. Strict aliasing doesn't apply here, because the compiler can't make assumptions for what a void type is. As others have noted, if you have a need for speed, DMA can help you here. But caveat emptor: these implementations can be tricky to get right, and obviously are not very portable. FWIW, a properly optimized pure software memcpy is generally going to be fast enough, unless you've actually profiled your code and can prove otherwise.

Chibios

forum.chibios.org › board index › support section › stm32 support

memcpy and cache on STM32F7 - ChibiOS Free Embedded RTOS

Hi, I think that compiler-provided implementations would never use resources outside the CPU, so no copy using DMA, it would not be even portable. Use of DMA, in any possible scenario, would definitely be affected by cache coherency and that on any M7 device, not just STM32s.

GitHub

github.com › zephyrproject-rtos › zephyr › issues › 54630

memcpy crashes with NEW_LIBC on stm32 cortex m7 with debugger attached · Issue #54630 · zephyrproject-rtos/zephyr

February 8, 2023 - Whe using STLNIK debugger, the result is not displayed, an exception is raised in memcpy at this point : The First odd byte is copied correctly thus bringing src (R0) and dst (R1) to odd addresses. The ldrh raises an exception due to unaligned address ... StalebugThe issue is a bug, or the PR is fixing a bugThe issue is a bug, or the PR is fixing a bugplatform: STM32ST Micro STM32ST Micro STM32priority: lowLow impact/importance bugLow impact/importance bug

Author jchabod

STMicroelectronics Community

community.st.com › t5 › stm32cubeide-mcus › problem-with-memcpy › td-p › 92629

Problem with memcpy - STMicroelectronics Community

February 20, 2023 - Hello, I am having a problem with simple line of code. If I run code bellow, the result seems off. //Main job of this code is to tranform hex into float uint32_t hex = 0x3fc51eb8; float f; memcpy(&f, &hex, 4); If i go into Live Expressions i get that f = 0x1 (hex), it should be f = 0x3fc51...

STMicroelectronics Community

community.st.com › s › question › 0D50X00009XkbBASAZ › help-memcpy-causes-hardfault

help ! memcpy causes hardfault - STMicroelectronics Community

July 31, 2018 - If you use opt level 1, and if you use memcopy for <= 8 Bytes, then possibly memcpy will NOT be invoked, but a LDR / STR sequence, which of course is much faster - Keil compiler really is very smart. (I am not sure if Keil even would do this already at opt level 0 - I never use opt level 0). Depending on your basic CortexM Core settings (usually in stm32f4xx-startup.s), hardfault will be generated on half-word (16bit) address or not (ARM has possibility to support 16-bit access for certain operations like LDR and STR - check the ARM v7m TRM A3.2 ''Alignment behaviour'') .

IBM

ibm.com › docs › en › zos › 2.5.0

memcpy() — Copy buffer

We cannot provide a description for this page right now

STMicroelectronics Community

community.st.com › t5 › stm32-mcus-products › single-bulk-memory-copy-instead-of-memcpy › td-p › 338248

Single bulk memory copy (instead of memcpy) - STMicroelectronics Community

May 5, 2018 - We are using STM32-767ZI and copying a small array into a memory area which is mapped to an external FPGA. Looking for how to make the copy in a single bulk operation. ... This discussion is locked. Please start a new topic to ask your question. ... Going to depend on the compiler. memcpy() is most frequently optimized, both in terms of aligning addresses, but also using LDR/STR multiple instructions, ie fetch 8 words, write 8 words.

Cprogramming

cboard.cprogramming.com › c-programming › 154333-fast-memcpy-alternative-32-bit-embedded-processor-posted-just-fyi-fwiw.html

Fast memcpy() alternative for a 32-bit embedded processor (Posted just FYI and FWIW!)

February 15, 2013 - Generally optimising code for microcontrollers is a trade off between code size and performance. This code takes performance to an extreme at the cost of really rather bulky code. Compiled with Linaro GCC for Cortex-M4 it's over 500 bytes (with manualCopy inlined twice). The library memcpy is 130, ...