C++ conventions¶
Code layout¶
Editor configuration¶
vim¶
If you use the vim
or neovim
editor, then these settings will
help you lay out code to match Xapian C++ coding conventions:
set sw=4
set sts=4
set noet
set cinoptions=l1,g0.5s,h0.5s,:0.5s,=0.5s,t0,(0
Indentation¶
Indent C++ code by 4 spaces for a new indentation level, and set your editor to tab-fill indentation (with a tab being 8 spaces wide).
As an exception, “public”, “protected” and “private” declarations in classes and structs should be indented by 2 spaces, and the following code should be indented by 2 more spaces:
class Foo {
public:
method();
};
The rationale for this exception is that class definitions in header files often have fairly long lines, so losing an indent level to the access specifier tends to make class definitions less readable.
The default access for a class is always “private”, so there’s no need to specify that explicitly - in other words, write this:
class Foo {
int internal_method();
public:
int external_method();
};
Don’t write this:
class Foo {
private:
int internal_method();
public:
int external_method();
};
If a class only contains public methods and data, consider declaring it as a “struct” (the only difference in C++ is that the default access for a struct is “public”).
Spaces and line breaks¶
Put a space before the (
after control flow constructs like
for
, if
, while
, and so on. So write if (strlen(p) >
10)
not if(strlen (p) > 10)
. Don’t put a space before the (
in function calls.
When if
, else
, for
, while
, do
, switch
,
case
, default
, try
, or catch
is followed by a block
enclosed in braces, the opening brace should be on the same line, like
so:
if (x > 12) {
foo(x);
x = 12;
} else {
bar(x);
}
The rationale for this is that it conserves vertical space (allowing more code to fit on screen) without reducing readability.
C++ idioms in Xapian¶
If you have an empty loop body, use
{ }
rather than;
as the former stands out more clearly to the reader (but also consider if the code might be clearer written a different way).Prefer
++i;
toi++;
,i += 1;
, ori = i + 1
. For simple integer variables these should generate equivalent (if not identical) code, but ifi
is an iterator object then the pre-increment form can be more efficient in some cases with some compilers. It’s simpler and more consistent to always use the pre-increment form (unless you make use of the old value which the post-increment form returns). For the same reasons, prefer--i;
toi--;
,i -= 1;
, ori = i - 1;
.Prefer
container.empty()
tocontainer.size() == 0
(and!container.empty()
tocontainer.size() != 0
orcontainer.size() > 0
).Some containers (e.g.
std::forward_list
) supportempty()
but notsize()
. Pre-C++11 finding the size of a container wasn’t necessarily a constant time operation for some containers (e.g.std::list
with GCC) - that’s no longer the case for any STL containers since C++11, but it could still be true for non-STL containers.Also the
empty()
form makes the intent of the test more explicit.Prefer not to use
else
when the control flow is diverted elsewhere at the end of theif
block (e.g. byreturn
,continue
,break
,throw
). This eliminates a level of indentation from the code in theelse
block, and typically makes the control flow logic clearer. For example:if (x == 0) { foo(); return; } while (x--) { bar(); }
rather than:
if (x == 0) { foo(); return; } else { while (x--) { bar(); } }
For standard ISO C headers, prefer the C++ form for ISO C headers (e.g.
#include <cstdlib>
rather than#include <stdlib.h>
) unless there’s a good reason (e.g. portability) to do otherwise. Be sure to document such exceptions to avoid another developer changing them to the standard form. Global exceptions: <signal.h> (lots of POSIX stuff which e.g. Sun’s compiler doesn’t provide in <csignal>).For standard ISO C++ headers, always use the ISO C++ form
#include <list>
(pre-ISO compilers used#include <list.h>
, but GCC has generated a warning for this form for years, and GCC 4.3 dropped support entirely).Prefer
new SomeClass
tonew SomeClass()
, since the latter tends to lead one to writeSomeClass foo();
which is a function prototype, and not equivalent to the variable definitionSomeClass foo
. However, note thatnew SomePODType()
is not the same asnew SomePODType
(if SomePODType is a Plain Old Data type) - the former will zero-initialise scalar members of SomePODType.RTTI (
dynamic_cast<>
,typeid
,std::typeinfo
): Needing to use RTTI features in the library most likely indicates a design flaw, and you should avoid use of these features. Where necessary, you can use a technique similar toDatabase::as_networkdatabase()
to replacedynamic_cast<>
.using namespace std;
andusing std::XXX;
- it’s OK to use these in applications, library code, and internal library headers. But in externally visible headers (such as anything included by#include <xapian.h>
) you MUST use explicitstd::
qualifiers - it’s not acceptable to pull anything from namespace std into the namespace of an application which uses Xapian.Use C++ style casts (
static_cast<>
,reinterpret_cast<>
, andconst_cast<>
) or constructor-syntax (e.g.double(value)
) in preference to C style casts. The syntax of the C++ casts is ugly, but they do make the intent much clearer which is definitely a good thing, and they avoid issues such as casting away const when you only meant to cast the type of a pointer.std::pair<>
with an STL class as one (or both) of the members can produce very long symbols (over 4KB!) after name mangling - long enough to overflow the size limits of some vendor compilers or toolchains (so this can affect GCC if it is using the system ld or as). Even where the compiler works, the symbol bloat in an unstripped build is probably best avoided, so it’s preferable to use a simple two member struct instead. The code is probably more readable anyway, and easier to extend if more members are needed later.We try to avoid putting the full definition of virtual methods in header files. This is because current compilers can’t (as far as we know) inline virtual methods, so putting the definition in the header file simply slows down compilation (and, because method definitions often require further header files to be included, this can result in many more files needing recompilation after a change to a header file than is really necessary). Just put the declaration in the header file, and put the definition in a .cc file with the same basename.
Efficient use of std::string
¶
- When passing an empty string to a method expecting
const std::string &
preferstd::string()
to""
orstd::string("")
(it is more efficient with some compilers). - To make a string object empty,
s.resize(0)
(if you want to keep the current reserved space) ors = string()
(if you don’t) seem the best options. - Use
std::string::assign()
rather than building a temporary string object and assigning that. For example,foo = std::string(ptr, len);
is better written asfoo.assign(ptr, len);
. - It’s generally better to build up strings using
+=
rather than combining series of components with+
. Sofoo = a + " and " + c
is better written asfoo = a; foo += " and "; foo += c;
. It’s possible for compilers to handle the former without a lot of temporary string objects by returning a proxy object to allow the concatenation to happen lazily, but not all compilers do this, and it’s likely to still have some overhead. Note that GCC 4.1 seems to produce larger code in some cases for the latter approach, but it’s a definite win with GCC 4.4. std::string(1, '\0')
seems to be slightly more efficient thanstd::string("", 1)
for constructing astd::string
containing a single ASCII nul character.
Use of C++ Features¶
C++11¶
As of Xapian 1.3.3, a compiler with decent support for C++11 is required to build Xapian. We currently aim to allow users to use a non-C++11 compiler to build code which uses Xapian.
There are now several compilers with good C++11 support, but there are a few shortfalls in commonly deployed versions of most of them. Often we can work around this, and we should do where the effort is low compared to the gain (so a compiler version which is widely used is more worth supporting than one which is hardly used by anyone).
However, we shouldn’t have to jump through hoops to cater for compilers where their authors aren’t putting in the effort to keep up with the language standards.
Please avoid the following C++11 features for the time being:
std::to_string()
- this is completely missing on current versions of mingw and cygwin - in the library, you can#include "str.h"
and then use thestr()
function instead for most cases. This is also usually faster thanstd::to_string()
.
C++ features we assume¶
We assume that all compilers will correctly implement the following, so it’s safe to rely on them when working on Xapian.
- We assume
<sstream>
is available. GCC < 2.95.3 didn’t have it but GCC 2.95.3 includes a backported version. We aren’t aware of any other compilers still in use which lack it. - Non-“.h” versions of standard ISO C++ headers (e.g.
#include <list>
rather than#include <list.h>
). We aren’t aware of any compiler still in use which lacks these, and GCC 4.3 no longer has the old versions. If there are any, we could add a directory full of forwarding headers to work around this issue. - Standard header
<limits>
(fornumeric_limits<>
) - for GCC, this was added in GCC 3.0. - Standard header
<streambuf>
(GCC < 3.0 only has<streambuf.h>
).
Exceptions¶
When catching an exception which is an object, do it by const reference, so like this:
try {
foo();
} catch (const ErrorClass& e) {
bar(e);
}
Catching by value is bad because it “slices” the object if an object of a derived type is thrown. Even if derived types aren’t a worry, it also causes the copy constructor to be called needlessly. More information is available in a Standard C++ FAQ entry.
A const reference is preferable to a non-const reference as it stops the object being inadvertently modified. In the rare cases when you want to modify the caught object, a non-const reference is OK.
Exceptions should be avoided except for truly exceptional situations, since throwing and handling them has a significant cost. It also generally makes the API easier to understand, and client code easier to read.
Include ordering for source files¶
To help us move towards a consistent ordering of #include
lines in
source files, please follow the following policy when ordering them:
#include <config.h>
should be first, and use <> not “” (as recommended by the autoconf manual). Always includeconfig.h
from C/C++ source files, but don’t include it from header files - the autoconf manual recommends that it should be included first, so including it from headers is either redundant, or may hide a missing config.h include in the source file the header was included from (better to get an error in this case).- The header corresponding to the source file should be next. This means that compilation of the library ensures that each header with a corresponding source file is “self supporting” (i.e. it implicitly or explicitly includes all of the headers it requires).
- External xapian-core headers, alphabetically. When included from
other external headers, use <> to reduce problems with finding
headers in the user’s source tree by mistake. In sources and
internal headers, use “” (?) - practically this makes no difference
as we have
-I
for srcdir and builddir, but <> suggests installed header files so “” seems more natural). - Internal headers, alphabetically (using “”).
- “Safe” versions of library headers (include these first to avoid issues if other library headers include the ones we want to wrap). Use “” and order alphabetically.
- Library headers, alphabetically.
- Standard C++ headers, alphabetically. Use the modern (no .h suffix) names.
Branch Prediction Hints¶
For compilers which support __builtin_expect()
(GCC >= 3.0 and some others)
you can provide manual hints to assist branch prediction. We’ve wrapped these
in macros which evaluate to just their argument for compilers which don’t
support __builtin_expect()__
.
Within the xapian-core library code, you can mark the expressions in if
and
while
statements as rare
(if the condition is rarely true) or usual
(if the condition is usually true).
For example:
if (rare(something_unusual())) deal_with_it();
while (usual(!end_condition()) keep_going();
It’s easy to make incorrect assumptions about where hotspots are and which
branches are usually taken or not, so except for really obvious cases (such
as if (!consistency_check()) throw_exception();
) you should benchmark
that new rare
and usual
hints help rather than hinder before committing
them to the repository. It’s also likely to be a waste of effort to add them
outside of areas of code which are executed very frequently.
Don’t expect miracles - the first 15 uses added saved approximately 1%.
If you know how to implement the rare
and usual
macros for other
compilers, please let us know.
Use of Assert¶
Use Assert to perform internal consistency checks, and to check for invalid arguments to functions and methods (e.g. passing a NULL pointer when this isn’t permitted). It should NOT be used to check for error conditions such as file read errors, memory allocation failing, etc (since we want to perform such checks in non-debug builds too).
File format errors should also not be tested with Assert - we want to catch a corrupted database or a malformed input file in a non-debug build too.
There are several variants of Assert:
Assert(P)
– asserts that expression P is true.AssertRel(a,rel,b)
– asserts that (a rel b) is true - rel can be a boolean relational operator, i.e. one of==
,!=
,>
,>=
,<
,<=
. The message given if the assertion fails reports the values of a and b, soAssertRel(a,<,b);
is more helpful thanAssert(a < b);
AssertEq(a,b)
– shorthand forAssertRel(a,==,b)
.AssertEqDouble(a,b)
– asserts a and b differ by less thanDBL_EPSILON
AssertParanoid(P)
– a particularly expensive assertion. If you want a build with Asserts enabled, but without a great performance overhead, then passing –enable-assertions=partial to configure and AssertParanoids won’t be checked, but Asserts will. You can also useAssertRelParanoid
andAssertEqParanoid
.
An earlier assert, CompileTimeAssert(P)
, has now been removed,
since we require C++11 support from the compiler, and C++11 added
static_assert
.