嵌入式C/C++开发中的代码优化

事情应该被做得尽可能的简单,但不是任意地简单化。
――爱因斯坦
虽然做好的程序能按项目需求正确运行到最后一步,但在嵌入式系统开发中并不总是能成功的。由于低成本的需要硬件设计者几乎不可能设计出足够的内存和处理器
性能来使得程序能被运行。当然,在软件开发过程中使得程序能够正确运行是更重要的。为了这点,通常有一个或更多的开发平台,这些平台拥有更多的内存和更快
的处理器速度,能够使得软件正确运行,并且在项目开发最后阶段能够优化代码。总之,项目开发的最后目标是使得开发出的程序能够在配置低的设备上运行。
1 提高代码的运行效率
所有的C或C++编译器在一定程度上能够优化代码,然而,大部份的优化是基于运行速度和代码长度的折衷,你的程序无法做到既速度快又代码长度小。实际上,
在某一方面改进了,但又在其它方面会有负面影响,这取决于程序员决定什么改进是最重要的。设置一些优化方面的信息,编译器在优化阶段当碰到运行速度和代码
长度要折衷时能能够做出适当选择。
既然不能让编译器在两方面同时达到优化效果,我建议首先要减小代码量。对于实时或频繁执行的代码块运行速度通常是重要的,并且有许多方法通过动手可以提高运行效率。然而,代码长度通过动手来改变是很难的,这方面编译器能够做得更好。
在程序运行之前,你也许知道哪些子程序或模块是决定代码效率的关键,或许你对这有非常好的解决方法。程序中断、高优先任务、实时计算、高强度计算、频繁被
调用函数等都是决定代码效率的关键。有一个存在于软件开发过程中的profiler工具能让你更加专注于那些花费较多时间的程序。
如果你确定需要更好的代码效率,下面的一些技术能让你减少代码运行时间:
内联函数
在C++程序中,“inline”关键字可以在任何一个函数上申明。该关键字能使编译器改变函数调用方式为直接使用拷贝函数内的代码来运行。这种减少运行时间开销的方法与实际函数调用有关系,当内联函数经常被调用且函数内代码少时效果是最好的。
内联函数提供了一个怎样提高运行速度的很好的例子,但链接的代码量有时会有相反的效果。重复增加的代码会增加程序的代码,这与函数的调用次数成正比,显然,对于大函数将增加更加显著。结果是程序运行快了,但需要更多的内存。
表查找

“switch”语句是一个要被小心使用的常用编程技术,每一次的检查和跳转在机器语言中会简单地因决定下一步要做什么而耗尽处理器的时间。为了提高速
度,可以根据每一种情况发生的频率来排列每一个“case”的顺序,也就是说,把最容易发生的情况放在前面,把最不容易发生的情况放在后面。虽然在最坏情
况下不会改进运行效率,但是可以提高平均运行时间。
如果每一个“case”语句中要处理很多任务,用一个数组指针函数来替代“switch”语句可能会更有效果。例如下面的代码就可以采用这种方法:
enum NodeType { NodeA, NodeB, NodeC };

switch (getNodeType())
{
    case NodeA:
        .
        .
    case NodeB:
        .
        .
    case NodeC:
        .
        .
}
为了提高处理速度,我们把“switch”语句替换成下面的形式。第一部份是设置:建立一个函数指针数组。第二部份是用一行代码替换“switch”语句且提高了运行效率。
int processNodeA(void);
int processNodeB(void);
int processNodeC(void);

/*
* Establishment of a table of pointers to functions.
*/
int (* nodeFunctions[])() = { processNodeA, processNodeB, processNodeC };

.
.

/*
* The entire switch statement is replaced by the next line.
*/
status = nodeFunctions[getNodeType()]();

用汇编语言写代码

有些软件模块最好汇编语言来写,这给了程序员一个尽可能提高效率的机会。虽然大部份C/C++编译器产生的机器码会比一般的程序员要好,但对同一个函数一
个好的程序员仍能够做得比一般的编译器要好。例如,在我的职业生涯早期我用C语言实现一个数字过滤算法并在TI TMS320C30
DSP目标机上运行,我们使用的编译器不知道也不会利用特殊的指令的优势来做我需要的数学运算,我把C程序中的一个循环代码替换成汇编语言并实现一样的功
能,结果程序整个运行时间提高了10多个百分点。
寄存器变量
当申明局部变量时可以用到寄存器,告诉编译器把变量存到一个通用寄存器,而不是在内存栈上。 对于最频繁使用的变量合理使用这个方法来编译能提高一些程序运行效率,函数被调用得越频繁,代码效率就提高得越明显。
全局变量
用全局变量比用传递参数更有效率,用全局变量可以减少因函数调用和退出而产生的参数进栈和出栈。实际上,大部份高效的子程序的实现总是不带参数的。然而,
使用全局变量也会对程序产生负面影响,软件工程中一般反对使用全局变量,旨在提高模块化和可重入性,这也是个重要的需要考虑的事项。
轮询
中断服务程序经常被用于提高程序效率。然而, 一些少有的事件会因中断而变得没有效率,这些事件在中断等待时间上处于同一个数据级,这时用轮询的方法与硬件设备通信会更好,当然,这会大大导致小模块化的软件设计。
固定点算法
除非你的目标平台上有浮点运算协处理器,否则你的程序会在处理浮点数据上花费大量开销。支持浮点运算的编译器中的程序库包含一组子程序来模拟浮点运算协处理器中的指令组。这些函数中许多在整数计算上花费很长时间而且这些函数不能被重入。
如果你只在一些计算上用了浮点,用固定点算法来重新实现这些计算会更好。虽然它可能难以实现,但是在理论上用固定点算法是可实现任何浮点计算的。(总之,
那是浮点运算软件是怎样实现的事,对吗?)你的最大有利在于你可能不用去实现整个IEEE
754标准,只是实现一两个计算而已。如果你不需要这类全部的函数,那就坚持使用编译器的浮点运算库并寻找其它提高程序的方法。
2 减少代码量
就如前面我说的,当要减少代码量最好让编译器来为你赌一把。然而,如果最后程序对于你的可用内存来说还是太大,这儿有几个编程技术可以让你减少更多的代码量。在这部份我们将讨论自动和手动来优化代码量。
当然,墨菲法则(任何可能出错的事终将出错)表明第一次让编译器优化先前的代码将会突然失败。也许自动优化的最大坏处是清除没用的代码(dead
code
elimination),认为这些代码是冗余的和无关的。例如,添加一个值为0的变量,但在无论什么计算中都没有被使用,但你可能仍然要编译器产生那些
无关的指令来实现编译器无法知道的某些功能。
例如下面的代码,大部份编译器会去掉第一行语句,因为“*pControl”的值在第三行中被重写前没有被使用:
    *pControl = DISABLE;
    *pData    = ’a’;
    *pControl = ENABLE;
但是如果pControl和pData实际上是指向用于内存映射的设备寄存器的的指针,结果会是怎样呢?这种情况下,在写入数据前外围设备无法找到
“DISABLE”命令。这将会破坏处理器和处围设备的交互。为了避免出现这个问题,就必须要用“volatile”申明所有指向用于内存映射的设备寄存
器的指针和线程(或是一个线程和一个中断程序)共享的全局变量。如果不这样做,墨菲法则在你的项目中将最后会无法预料地出现。我保证会。
注意:不要犯这种错误:以为优化后程序会和原来没有优化的程序一样运行。在每个优化水平上必须完全再次检查你的软件,确认运行结果没有被改变。
情况变量更糟后,调试一个优化后的程序是具有挑战性的。在编译器优化下,一行代码和实现这行代码的处理器指令不是紧密相关的,那些特殊的指令可能已经被除
掉或分离开,或者两个相似的代码块可能用同一个方法实现。实际上,一些高级语言中的代码行可能已经被一起除去(如上面的例子中的代码)!结果是,你可能无
法在某行设置断点或检查变量的值。
如果你正要自动优化代码,这儿有些能够用手动方法更好地减少代码量的技巧:
避免使用标准库中的子程序
减少代码量的最好方法之一是避免使用标准库中的子程序,这些子程序因为要处理各种可能情况而造成代码量很大。可能可以通过自己实现其中的一部份功能来显著
减少代码量。例如,标准C库中“sprintf”函数是非常大的,其中许多是与支持浮点运算有关的代码。但如果你不需格式化和显示浮点数(%f 或
%d),你可以写一个只支持整数处理的“sprintf”函数,这样可以省下几千字节的代码量。实际上,标准C库(如Cygnus’
newlib)很少有实现这样的函数,如“sprintf”。
本地字长度
每一个处理器有自己的字大小,ANSI C和C++标准数据类型必须映射成本地字大小。处理更小和更大的数据类型有时需要额外的机器指令。在程序中使用一致的整数可能可以减少几百字节的代码量。
GOTO 语句
和全局变量一样,好的软件工程准则规定反对使用这种方法。但在紧急情况下,GOTO语句能够消除复杂的控制结构或共享一块重复使用的代码。
除了这些技术,前面描述的表查找、写汇编语言代码、寄存器变量、全局变量都对减少代码量有用,其中写汇编语言代码通常能减少最多的代码量。
3 减少内存的使用
在某些情况下,相比较可读写内存(RAM)和只读内存(ROM),RAM才是程序运行的限制因素。在这种情况下,你就得减少对全局数据、栈空间和堆空间的依靠。这些优化程序员比编译器会做得更好。
因为ROM通常比RAM要便宜,减少全局数据的一个可接受的策略是把常量数据移到ROM中,如果你用“const”申明了所有的常量数据,这种方法可以被
编译器自动解决。大多C/C++编译器会把遇到的全局常量数据移到一个相当于ROM的指定的数据段中。这个方法对于在运行中有许多不会改变的字符串或数组
数据是很有价值的。
如果其中的一些数据在运行中是固定的但不必是常量,则这个不变的数据段可以被放置在混合型内存设备中,这种内存设备可通过网络或一种写入技术来改变其中的
数据。比如在你的产品中要配置每个地方的销售税率,如果要改变税率,这种内存设备可以被更新,但另外的RAM也能同时把这更改的数据存起来。
减少栈空间也能降低对RAM的需求。一种方法是准确计算出存储在整个内存空间的栈数据需要多大的栈空间,然后,在一般和不好的状态下运行软件,用调试器来测试修改后的栈空间,如果栈空间中的数据从没有被重写过,则按计算出的数据来减少栈空间是安全的。
在实时操作系统中要特别注意栈空间,大多操作系统为会每一个任务建立一个独立的栈空间,这些栈被用于函数调用和出现在任务上下文中的子程序服务中断。你可
以在用早期的多种形式的描述方法来确定每个任务需要的栈数量。你也可以通过减少任务数量或让操作系统有一个单独的中断栈用于执行所有的子程序服务中断,后
一种方法能够显著减少每个任务所需要的栈空间。
全局数据和栈空间占用后剩下的内存空间便是堆空间的限制范围。如果堆空间太小,在需要时则无法分配内存空间,所以在释放前一定要比较“malloc”或
“new”的结果是不是等于NULL。如果你采用所有的这些建议,但你的程序仍然需要太大的空间,你可能没有其它选择,只有去减少堆空间了。
4.(略)
原文:
《Programming Embedded Systems in c and C++》O’Reilly,1ArrayArrayArray
Chapter 10: Optimizing Your Code
Things should be made as simple as possible, but not any simpler.
?Albert Einstein
Though getting the software to work correctly seems like the logical
last step for a project, this is not always the case in embedded systems
development. The need for low-cost versions of our products drives
hardware designers to provide just barely enough memory and processing
power to get the job done. Of course, during the software development
phase of the project it is more important to get the program to work
correctly. And toward that end there are usually one or more
"development" boards around, each with additional memory, a faster
processor, or both. These boards are used to get the software working
correctly, and then the final phase of the project becomes code
optimization. The goal of this final step is to make the working program
run on the lower-cost "production" version of the hardware.
10.1 Increasing Code Efficiency
Some degree of code optimization is provided by all modern C and C++
compilers. However, most of the optimization techniques that are
performed by a compiler involve a tradeoff between execution speed and
code size. Your program can be made either faster or smaller, but not
both. In fact, an improvement in one of these areas can have a negative
impact on the other. It is up to the programmer to decide which of these
improvements is most important to her. Given that single piece of
information, the compiler’s optimization phase can make the appropriate
choice whenever a speed versus size tradeoff is encountered.
Because you can’t have the compiler perform both types of optimization
for you, I recommend letting it do what it can to reduce the size of
your program. Execution speed is usually important only within certain
time-critical or frequently executed sections of the code, and there are
many things you can do to improve the efficiency of those sections by
hand. However, code size is a difficult thing to influence manually, and
the compiler is in a much better position to make this change across
all of your software modules.
By the time your program is working you might already know, or have a
pretty good idea, which subroutines and modules are the most critical
for overall code efficiency. Interrupt service routines, high-priority
tasks, calculations with real-time deadlines, and functions that are
either compute-intensive or frequently called are all likely candidates.
A tool called a profiler, included with some software development
suites, can be used to narrow your focus to those routines in which the
program spends most (or too much) of its time.
Once you’ve identified the routines that require greater code
efficiency, one or more of the following techniques can be used to
reduce their execution time:
Inline functions
In C++, the keyword inline can be added to any function declaration.
This keyword makes a request to the compiler to replace all calls to the
indicated function with copies of the code that is inside. This
eliminates the runtime overhead associated with the actual function call
and is most effective when the inline function is called frequently but
contains only a few lines of code.
Inline functions provide a perfect example of how execution speed and
code size are sometimes inversely linked. The repetitive addition of the
inline code will increase the size of your program in direct proportion
to the number of times the function is called. And, obviously, the
larger the function, the more significant the size increase will be. The
resulting program runs faster, but now requires more ROM.
Table lookups
A switch statement is one common programming technique to be used with
care. Each test and jump that makes up the machine language
implementation uses up valuable processor time simply deciding what work
should be done next. To speed things up, try to put the individual
cases in order by their relative frequency of occurrence. In other
words, put the most likely cases first and the least likely cases last.
This will reduce the average execution time, though it will not improve
at all upon the worst-case time.
If there is a lot of work to be done within each case, it might be more
efficient to replace the entire switch statement with a table of
pointers to functions. For example, the following block of code is a
candidate for this improvement:
enum NodeType { NodeA, NodeB, NodeC };switch (getNodeType()){    case
NodeA:        .        .    case NodeB:        .        .    case
NodeC:        .        .}
To speed things up, we would replace this switch statement with the
following alternative. The first part of this is the setup: the creation
of an array of function pointers. The second part is a one-line
replacement for the switch statement that executes more efficiently.
int processNodeA(void);int processNodeB(void);int processNodeC(void);/* *
Establishment of a table of pointers to functions. */int (*
nodeFunctions[])() = { processNodeA, processNodeB, processNodeC };../* *
The entire switch statement is replaced by the next line. */status =
nodeFunctions[getNodeType()]();
Hand-coded assembly
Some software modules are best written in assembly language. This gives
the programmer an opportunity to make them as efficient as possible.
Though most C/C++ compilers produce much better machine code than the
average programmer, a good programmer can still do better than the
average compiler for a given function. For example, early in my career I
implemented a digital filtering algorithm in C and targeted it to a TI
TMS320C30 DSP. The compiler we had back then was either unaware or
unable to take advantage of a special instruction that performed exactly
the mathematical operations I needed. By manually replacing one loop of
the C program with inline assembly instructions that did the same
thing, I was able to decrease the overall computation time by more than a
factor of ten.
Register variables
The keyword register can be used when declaring local variables. This
asks the compiler to place the variable into a general-purpose register,
rather than on the stack. Used judiciously, this technique provides
hints to the compiler about the most frequently accessed variables and
will somewhat enhance the performance of the function. The more
frequently the function is called, the more likely such a change is to
improve the code’s performance.
Global variables
It is more efficient to use a global variable than to pass a parameter
to a function. This eliminates the need to push the parameter onto the
stack before the function call and pop it back off once the function is
completed. In fact, the most efficient implementation of any subroutine
would have no parameters at all. However, the decision to use a global
variable can also have some negative effects on the program. The
software engineering community generally discourages the use of global
variables, in an effort to promote the goals of modularity and
reentrancy, which are also important considerations.
Polling
Interrupt service routines are often used to improve program efficiency.
However, there are some rare cases in which the overhead associated
with the interrupts actually causes an inefficiency. These are cases in
which the average time between interrupts is of the same order of
magnitude as the interrupt latency. In such cases it might be better to
use polling to communicate with the hardware device. Of course, this too
leads to a less modular software design.
Fixed-point arithmetic
Unless your target platform includes a floating-point coprocessor,
you’ll pay a very large penalty for manipulating float data in your
program. The compiler-supplied floating-point library contains a set of
software subroutines that emulate the instruction set of a
floating-point coprocessor. Many of these functions take a long time to
execute relative to their integer counterparts and also might not be
reentrant.
If you are only using floating-point for a few calculations, it might be
better to reimplement the calculations themselves using fixed-point
arithmetic only. Although it might be difficult to see just how this can
be done, it is theoretically possible to perform any floating-point
calculation with fixed-point arithmetic. (After all, that’s how the
floating-point software library does it, right?) Your biggest advantage
is that you probably don’t need to implement the entire IEEE 754
standard just to perform one or two calculations. If you do need that
kind of complete functionality, stick with the compiler’s floating-point
library and look for other ways to speed up your program.
10.2 Decreasing Code Size
As I said earlier, when it comes to reducing code size your best bet is
to let the compiler do the work for you. However, if the resulting
program is still too large for your available ROM, there are several
programming techniques you can use to further reduce the size of your
program. In this section we’ll discuss both automatic and manual code
size optimizations.
Of course, Murphy’s Law dictates that the first time you enable the
compiler’s optimization feature your previously working program will
suddenly fail. Perhaps the most notorious of the automatic optimizations
is " dead code elimination." This optimization eliminates code that the
compiler believes to be either redundant or irrelevant. For example,
adding zero to a variable requires no runtime calculation whatsoever.
But you might still want the compiler to generate those "irrelevant"
instructions if they perform some function that the compiler doesn’t
know about.
For example, given the following block of code, most optimizing
compilers would remove the first statement because the value of
*pControl is not used before it is overwritten (on the third line):
    *pControl = DISABLE;    *pData    = ’a’;    *pControl = ENABLE;
But what if pControl and pData are actually pointers to memory-mapped
device registers? In that case, the peripheral device would not receive
the DISABLE command before the byte of data was written. This could
potentially wreak havoc on all future interactions between the processor
and this peripheral. To protect yourself from such problems, you must
declare all pointers to memory-mapped registers and global variables
that are shared between threads (or a thread and an ISR) with the
keyword volatile. And if you miss just one of them, Murphy’s Law will
come back to haunt you in the final days of your project. I guarantee
it.
嵌入式C/C++开发中的代码优化
Never make the mistake of assuming that the optimized program will
behave the same as the unoptimized one. You must completely retest your
software at each new optimization level to be sure its behavior hasn’t
changed.
To make matters worse, debugging an optimized program is challenging, to
say the least. With the compiler’s optimization enabled, the
correlation between a line of source code and the set of processor
instructions that implements that line is much weaker. Those particular
instructions might have moved or been split up, or two similar code
blocks might now share a common implementation. In fact, some lines of
the high-level language program might have been removed from the program
altogether (as they were in the previous example)! As a result, you
might be unable to set a breakpoint on a particular line of the program
or examine the value of a variable of interest.
Once you’ve got the automatic optimizations working, here are some tips for further reducing the size of your code by hand:
Avoid standard library routines
One of the best things you can do to reduce the size of your program is
to avoid using large standard library routines. Many of the largest are
expensive only because they try to handle all possible cases. It might
be possible to implement a subset of the functionality yourself with
significantly less code. For example, the standard C library’s sprintf
routine is notoriously large. Much of this bulk is located within the
floating-point manipulation routines on which it depends. But if you
don’t need to format and display floating-point values (%f or %d ), you
could write your own integer-only version of sprintf and save several
kilobytes of code space. In fact, a few implementations of the standard C
library (Cygnus’ newlib comes to mind) include just such a function,
called siprintf.
Native word size
Every processor has a native word size, and the ANSI C and C++ standards
state that data type int must always map to that size. Manipulation of
smaller and larger data types sometimes requires the use of additional
machine-language instructions. By consistently using int whenever
possible in your program, you might be able to shave a precious few
hundred bytes from your program.
Goto statements
As with global variables, good software engineering practice dictates
against the use of this technique. But in a pinch, goto statements can
be used to remove complicated control structures or to share a block of
oft repeated code.
In addition to these techniques, several of the ones described in the
previous section could be helpful, specifically table lookups,
hand-coded assembly, register variables, and global variables. Of these,
the use of hand-coded assembly will usually yield the largest decrease
in code size.
10.3 Reducing Memory Usage
In some cases, it is RAM rather than ROM that is the limiting factor for
your application. In these cases, you’ll want to reduce your dependence
on global data, the stack, and the heap. These are all optimizations
better made by the programmer than by the compiler.
Because ROM is usually cheaper than RAM (on a per-byte basis), one
acceptable strategy for reducing the amount of global data might be to
move constant data into ROM. This can be done automatically by the
compiler if you declare all of your constant data with the keyword
const. Most C/C++ compilers place all of the constant global data they
encounter into a special data segment that is recognizable to the
locator as ROM-able. This technique is most valuable if there are lots
of strings or table-oriented data that does not change at runtime.
If some of the data is fixed once the program is running but not
necessarily constant, the constant data segment could be placed in a
hybrid memory device instead. This memory device could then be updated
over a network or by a technician assigned to make the change. An
example of such data is the sales tax rate for each locale in which your
product will be deployed. If a tax rate changes, the memory device can
be updated, but additional RAM can be saved in the meantime.
Stack size reductions can also lower your program’s RAM requirement. One
way to figure out exactly how much stack you need is to fill the entire
memory area reserved for the stack with a special data pattern. Then,
after the software has been running for a while?preferably under both
normal and stressful conditions?use a debugger to examine the modified
stack. The part of the stack memory area that still contains your
special data pattern has never been overwritten, so it is safe to reduce
the size of the stack area by that amount.
[url=mk:@MSITStore:C:\Documents%20and%20Settings\cdz\Desktop\Cppe.chm::/_chapter%2010.htm#EN1-1][1][/url]

Be especially conscious of stack space if you are using a real-time
operating system. Most operating systems create a separate stack for
each task. These stacks are used for function calls and interrupt
service routines that occur within the context of a task. You can
determine the amount of stack required for each task stack in the manner
described earlier. You might also try to reduce the number of tasks or
switch to an operating system that has a separate "interrupt stack" for
execution of all interrupt service routines. The latter method can
significantly reduce the stack size requirement of each task.
The size of the heap is limited to the amount of RAM left over after all
of the global data and stack space has been allocated. If the heap is
too small, your program will not be able to allocate memory when it is
needed, so always be sure to compare the result of malloc or new with
NULL before dereferencing it. If you’ve tried all of these suggestions
and your program is still requiring too much memory, you might have no
choice but to eliminate the heap altogether.
10.4 Limiting the Impact of C++
One of the biggest issues I faced upon deciding to write this book was
whether or not to include C++ in the discussion. Despite my familiarity
with C++, I had written almost all of my embedded software in C and
assembly. In addition, there has been much debate within the embedded
software community about whether C++ is worth the performance penalty.
It is generally agreed that C++ programs produce larger executables that
run more slowly than programs written entirely in C. However, C++ has
many benefits for the programmer, and I wanted to talk about some of
those benefits in the book. So I ultimately decided to include C++ in
the discussion, but to use in my examples only those features with the
least performance penalty.
I believe that many readers will face the same issue in their own
embedded systems programming. Before ending the book, I wanted to
briefly justify each of the C++ features I have used and to warn you
about some of the more expensive features that I did not use.
The Embedded C++ Standard
You might be wondering why the creators of the C++ language included so
many expensive?in terms of execution time and code size?features. You
are not alone; people around the world have wondered the same
thing?especially the users of C++ for embedded programming. Many of
these expensive features are recent additions that are neither strictly
necessary nor part of the original C++ specification. These features
have been added one by one as part of the ongoing "standardization"
process.
In 1ArrayArray6, a group of Japanese processor vendors joined together
to define a subset of the C++ language and libraries that is better
suited for embedded software development. They call their new industry
standard Embedded C++. Surprisingly, for its young age, it has already
generated a great deal of interest and excitement within the C++ user
community.
A proper subset of the draft C++ standard, Embedded C++ omits pretty
much anything that can be left out without limiting the expressiveness
of the underlying language. This includes not only expensive features
like multiple inheritance, virtual base classes, runtime type
identification, and exception handling, but also some of the newest
additions like templates, namespaces, and new-style casts. What’s left
is a simpler version of C++ that is still object-oriented and a superset
of C, but with significantly less runtime overhead and smaller runtime
libraries.
A number of commercial C++ compilers already support the Embedded C++
standard specifically. Several others allow you to manually disable
individual language features, thus enabling you to emulate Embedded C++
or create your very own flavor of the C++ language.
Of course, not everything introduced in C++ is expensive. Many older C++
compilers incorporate a technology called C-front that turns C++
programs into C and feeds the result into a standard C compiler. The
mere fact that this is possible should suggest that the syntactical
differences between the languages have little or no runtime cost
associated with them.
[url=mk:@MSITStore:C:\Documents%20and%20Settings\cdz\Desktop\Cppe.chm::/_chapter%2010.htm#EN1-2][2][/url]
It is only the newest C++ features, like templates, that cannot be handled in this manner.
For example, the definition of a class is completely benign. The list of
public and private member data and functions are not much different
than a struct and a list of function prototypes. However, the C++
compiler is able to use the public and private keywords to determine
which method calls and data accesses are allowed and disallowed. Because
this determination is made at compile time, there is no penalty paid at
runtime. The addition of classes alone does not affect either the code
size or efficiency of your programs.
Default parameter values are also penalty-free. The compiler simply
inserts code to pass the default value whenever the function is called
without an argument in that position. Similarly, function name
overloading is a compile-time modification. Functions with the same
names but different parameters are each assigned unique names during the
compilation process. The compiler alters the function name each time it
appears in your program, and the linker matches them up appropriately. I
haven’t used this feature of C++ in any of my examples, but I could
have done so without affecting performance.
Operator overloading is another feature I could have used but didn’t.
Whenever the compiler sees such an operator, it simply replaces it with
the appropriate function call. So in the code listing that follows, the
last two lines are equivalent and the performance penalty is easily
understood:
Complex  a, b, c;c = operator+(a, b);                 // The traditional
way: Function Callc = a + b;                           // The C++ way:
Operator Overloading
Constructors and destructors also have a slight penalty associated with
them. These special methods are guaranteed to be called each time an
object of the type is created or goes out of scope, respectively.
However, this small amount of overhead is a reasonable price to pay for
fewer bugs. Constructors eliminate an entire class of C programming
errors having to do with uninitialized data structures. This feature has
also proved useful for hiding the awkward initialization sequences that
are associated with complex classes like Timer and Task.
Virtual functions also have a reasonable cost/benefit ratio. Without
going into too much detail about what virtual functions are, let’s just
say that polymorphism would be impossible without them. And without
polymorphism, C++ would not be a true object-oriented language. The only
significant cost of virtual functions is one additional memory lookup
before a virtual function can be called. Ordinary function and method
calls are not affected.
The features of C++ that are too expensive for my taste are templates,
exceptions, and runtime type identification. All three of these
negatively impact code size, and exceptions and runtime type
identification also increase execution time. Before deciding whether to
use these features, you might want to do some experiments to see how
they will affect the size and speed of your own application.
[url=mk:@MSITStore:C:\Documents%20and%20Settings\cdz\Desktop\Cppe.chm::/_chapter%2010.htm#ENB1-1][1][/url]
   Of course, you might want to leave a little extra space on the
stack?just in case your testing didn’t last long enough or did not
accurately reflect all possible runtime scenarios. Never forget that a
stack overflow is a potentially fatal event for your software and to be
avoided at all costs.
[url=mk:@MSITStore:C:\Documents%20and%20Settings\cdz\Desktop\Cppe.chm::/_chapter%2010.htm#ENB1-2][2][/url]
  Moreover, it should be clear that there is no penalty for compiling an ordinary C program with a C++ compiler.

 

转自:http://www.sudu.cn/info/19700101/286892.html

原文链接: https://www.cnblogs.com/zjfdbz/archive/2013/03/11/2953357.html

欢迎关注

微信关注下方公众号,第一时间获取干货硬货;公众号内回复【pdf】免费获取数百本计算机经典书籍

    嵌入式C/C++开发中的代码优化

原创文章受到原创版权保护。转载请注明出处:https://www.ccppcoding.com/archives/80240

非原创文章文中已经注明原地址,如有侵权,联系删除

关注公众号【高性能架构探索】,第一时间获取最新文章

转载文章受原作者版权保护。转载请注明原作者出处!

(0)
上一篇 2023年2月9日 下午7:28
下一篇 2023年2月9日 下午7:28

相关推荐