C embedded

Disable optimization for a piece of code

#pragma GCC push_options
#pragma GCC optimize ("O0")   

for(uint i=0; i<T; i++){__NOP()}

#pragma GCC pop_options




Multiplication and division by power of 2:

Use left shift(<<) for multiplication and right shift(>>) for division. The bit operations will be much faster than multiplication and division operations.For simple operations, the compiler may automatically optimize the code but in case of complex expressions it is always advised to use bit operations.
Example :

Multiply by 6 : a= a<<1 + a<<2; 
Multiply by 7 : a= a<<3 - a;
Divide by 8 : a= a>>3; // division by power of 2


Prefer pre Increment/Decrement to post Increment/Decrement


In pre-increment, it first increments the value and just copies the value to variable location but in post-increment, it first copies the value to a temporary variable, increments it and then copies the value to the variable location. If post-increment is 1000 times in a loop it will decrement the efficiency.

Optimizing Arrays

If you access members of array like this:

for(int i=0; i<n; i++) nArray[i]=nSomeValue;

Instead of the above code, the following is better:

for(int* ptrInt = nArray; ptrInt< nArray+n; ptrInt++) *ptrInt=nSomeValue;

Faster for() loops

It is a simple concept but effective. Ordinarily, we used to code a simple for() loop like this:

 
for( i=0;  i<10;  i++){ ... }

i loops through the values 0,1,2,3,4,5,6,7,8,9 ]

If we needn't care about the order of the loop counter, we can do this instead:

 
for( i=10; i--; ) { ... }

Using this code, i loops through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.

This works because it is quicker to process i-- as the test condition, which says "Is i non-zero? If so, decrement it and continue". For the original code, the processor has to calculate "Subtract i from 10. Is the result non-zero? If so, increment i and continue.". In tight loops, this makes a considerable difference.

The syntax is a little strange, put is perfectly legal. The third statement in the loop is optional (an infinite loop would be written as for( ; ; )). The same effect could also be gained by coding:

 
for(i=10; i; i--){}

or (to expand it further):

 
for(i=10; i!=0; i--){}

The only things we have to be careful of are remembering that the loop stops at 0 (so if it is needed to loop from 50-80, this wouldn't work), and the loop counter goes backwards. It's easy to get caught out if your code relies on an ascending loop counter.

We can also use register allocation, which leads to more efficient code elsewhere in the function. This technique of initializing the loop counter to the number of iterations required and then decrementing down to zero, also applies to while and do statements.

 

 

常量数据(.rodata段)

 1)rodata用来存放常量数据。 ro: read only

 2)字符串会被编译器自动放在rodata中,加 const 关键字的常量数据会被放在 rodata 中

 3)在有的嵌入式系统中, rodata放在 ROM(或 NOR Flash)里,运行时直接读取,不须加载到RAM内存中。

     所以,在嵌入式开发中,常将已知的常量系数,表格数据等造表加以 const 关键字。存在ROM中,避免占用RAM空间。

已被初始化为非零的全局变量(.data段)

 data类型的全局变量既占用运行时的内存空间,也占用可执行文件自身的文件空间
int data_array[1024*1024]={1};

未初始化的全局变量(.bss段)

bss段用来存放 没有被初始化 和 已经被初始化为0 的全局变量。bss类型的全局变量只占用 运行时的内存空间,而不占用可执行文件自身的文件空间


#if
#if  defined (macro1)  || !defined (macro2) || defined (macro3)
printf( "Hello!\n" );
#endif
 

中断中DBG、DMB、DSB 和 ISB指令作用

ARMv8指令集提供了3条内存屏障指令。

  • 数据存储屏障(Data Memory Barrier,DMB)指令:仅当所有在它前面的存储器访问操作都执行完毕后,才提交(commit)在它后面的访问指令。DMB指令保证的是DMB指令之前的所有内存访问指令和DMB指令之后的所有内存访问指令的执行顺序。也就是说,DMB指令之后的内存访问指令不会被处理器重排到DMB指令的前面。DMB指令不会保证内存访问指令在内存屏障指令之前完成,它仅仅保证内存屏障指令前后的内存访问的执行顺序。DMB指令仅仅影响内存访问指令、数据高速缓存指令以及高速缓存管理指令等,并不会影响其他指令(例如算术运算指令等)的顺序。
  • 数据同步屏障(Data Synchronization Barrier,DSB)指令:比DMB指令要严格一些,仅当所有在它前面的内存访问指令都执行完毕后,才会执行在它后面的指令,即任何指令都要等待DSB指令前面的内存访问指令完成。位于此指令前的所有缓存(如分支预测和TLB维护)操作需要全部完成。
  • 指令同步屏障(Instruction Synchronization Barrier,ISB)指令:确保所有在ISB指令之后的指令都从指令高速缓存或内存中重新预取。它刷新流水线(flush pipeline)和预取缓冲区后才会从指令高速缓存或者内存中预取ISB指令之后的指令。ISB指令通常用来保证上下文切换(如ASID更改、TLB维护操作等)的效果。

总结:

  1. 乱序是处理器特性,和优不优化没太大关系。大部分mcu就2级,3级,并不支持乱序,所以你可以看到大部分mcu的这个操作是空操作,只是os为了移植性放的占位。乱序往往需要深度流水,当然这不是充分或必要关系。主要解决的是可见性问题。如果只有一个核,乱不乱,同不同步也没什么影响,反正数据或指令也就你一个人用。可加了多个核或者多个master比如dma的时候,其他的master看不到你的本地内容,往往就需要这个同步措施。
  2. __DSB() 指令的作用
_DSB() 指令,特别是在一些中断处理函数中。

程序通过中断信号进入中断处理函数时,首先应当清除相应的中断标志位,但有些CPU的时钟太快,快于中断使用的时钟,就会出现清除中断标志的动作还未完成,CPU就又一次重新进入同一个中断处理函数,导致死循环,__DSB() 指令的作用就是避免上述情况的发生。







MEMCPY & MEMMOVE
void * __cdecl memcpy ( void * dst, const void * src, size_t count ) 
{ 
        void * ret = dst;
        while ( count - - ) { 
                * ( char * ) dst = * ( char * ) src; 
                dst = ( char * ) dst + 1; 
                src = ( char * ) src + 1; 
        }
        return ( ret) ; 
} 

void * __cdecl memmove ( void * dst, const void * src, size_t count ) 
{ 
        void * ret = dst;
        if ( dst < = src | | ( char * ) dst > = ( ( char * ) src + count ) ) { //区域没有重叠,有两种情况,一个是dst在前,src在后,或者dst在后面的count之后也不会造成数据冲突
                while ( count - - ) { 
                        * ( char * ) dst = * ( char * ) src; 
                        dst = ( char * ) dst + 1; 
                        src = ( char * ) src + 1; 
                } 
        } 
        else { //从后往前赋值,防止数据冲突
                dst = ( char * ) dst + count - 1; 
                src = ( char * ) src + count - 1;
                while ( count - - ) { 
                        * ( char * ) dst = * ( char * ) src; 
                        dst = ( char * ) dst - 1; 
                        src = ( char * ) src - 1; 
                } 
        }
        return ( ret) ; 
}

 

 

posted on 2023-07-16 18:26  荷树栋  阅读(13)  评论(0编辑  收藏  举报

导航