C++ For Physicists: 010 Dos and Don'ts (& why)
This isn't just for physicists. Nor is it for all physicists; I know physicists who can program in C++ much better than myself. However, having spent a large amount of my programming time over the last two weeks debugging code written by physicists, I have decided to list some of the common problems and areas improvements.
These points will assume you've done some programming / have been taught how to program. The information here is mainly based of mistakes I have seen undergraduate physicists make over the last couple of years. Don't just blindly accept my word for things - if you think I've got something wrong, email me, or leave a comment. As, if you hand in some code using stuff you've read here, and your lecturer disagreed with it, do encourage them to email me - one of us will learn something new :).
The code on this post is entirely public domain, except where noted. Not that any of it does anything useful, mainly on account of it being broken in many places. The methods and concepts described are also just that - well know programming patterns, which are also free for all. So, if you're a lecturer trying to pin someone for plagiarising this post, go back to finding out what Dark Matter is.1
Do try to understand scope
I'm starting with this (less than helpful) instruction because it came up so much in writing the later points, and it got to the point where I couldn't avoid explaining things in terms of it. The scope of a variable or a function is the sections of code in which it can be used. Consider the following code:
int i;
namespace foo
{
int bar()
{
int j;
j++;
return j;
}
}
Here i and bar have global scope - they will be accessible from any part of the program (include other files). However, bar has been declared in the namespace 'foo', and would have to be called either by foo::bar or after a using namespace foo. j, on the other hand, only exists inside the function bar. Moreover, its value is not preserved between calls to the function - if this function was called multiple times, it would not return an increasing series of integers2. Generally speaking, the scope of something is the same as the it's defined in: j is in-scope in the function bar, which is, in turn, in-scope in the namespace foo.
It's generally a good idea to define things only where they need to be defined - I have seen programs broken by for-loop counters being defined globally, and multiple functions overwriting them. Needless to say that particular programmer got ridiculed muchly.
Don't use 'using namespace std'
unless you understand what it does. What it does is brings everything declared in the std namespace from all of include files into the global scope. The std namespace includes all of the C++ standard libraries, etc., etc. and, if you're included a number of the header files, you may find yourself importing a large number of things you don't really want - once such example is the global left, which is used in formatting cout and other stream outputs. I've seen one piece of code fail because the compiler got confused between the different things called 'left'.
For safety reasons, I would always avoid using a using statement in a header file - when this is included in another file, it may cause difficult to understand errors, and is generally bad practice.
In implementation files, it is slightly less of a problem. However, I still prefer to either qualify the types (void printit(std::string str)) or import a particular entity (using std::endl) rather than importing the entire namespace. Especially in situations where a number of libraries might use the same name, both of these make the code more usable: furthermore, the qualified using syntax is essentially equivalent to Java's import system, which helps people who've worked in Java understand what is going on.
As pointed out by a diligent and intelligent reader, the using statement also has bounded scope - putting it in a function imports the namespace only within that function, which is another good approach to the problem
Do plot out your programs' information flow
Even if it's just in your head, it'll help when you come to write the code. Just a sketched out flow chart, showing the actions code performs, and what data is passed between. This will often give you a hint on what functions you are going to need to use, and make you think about the data your working with.
Don't conflate ideas about file types
For those of you who use Windows, you might expect the extension on a file (e.g. .pdf, .doc, .txt) to actually mean something. To the computer it does not; you can give all your header files .cpp extensions, and all the implementation files .hpp extensions, and your compiler probably won't even blink3. However, to the user, there is a large difference; headers and implementations should be kept separate.
For most models as programs, there isn't much of a need for code modularisation - it would be ok to have all of your code in one big file, if not for the issue of it being unreadably long4. In those kind of situations, one header file containing the signatures for the functions you've written (e.g. int add(int, int);, and a number of implementation files which define these functions, grouped sensibly. And other global definitions would also go in the header file, which should be #included in each of the implementation files. Note: this is not a good practice, merely a practical one.
The correct practice is to have a header file for each implementation file, containing the signatures of the (publicly available) methods in that file. This header file is then #included where it is needed. Also, it's worth noting that conventions says the extensions are .h and .c for c, and .hpp and .cpp for c++.
Do make you compiler do more for you
Compiler design has been an area of research for about thirty years and, regardless of which one you're using, there's a lot of work you can get them to do for you. I generally recommending using the GNU Compiler Collection, which is generally considered to contain the last word in c/c++ compilers. It also has a lot of other stuff too. However, a lot of you will be used to compile from within some kind of graphical environment (Microsoft Visual Studio, etc.), and making the transition can be a bit of a leap.
For those of you who do use G++, the GCC's C++ compiler, there are a few really useful command line options.
- -Wall, -Wextra: display all warnings. -W indicated that you wish to enable a warning; all contains all of common mistakes that can generally be avoided. extra contains some which might occasionally be trigger even if you are doing things correctly. There are a few types of warning which aren't included in the these, and have to be turned on explicitly, but they normally don't come up anyway.
- -std=c++: specifies the standards to check against. The default value is gnu++98, which is the main standard (c++98), plus certain GNU extensions. Either of these options are acceptable; it's generally a good idea to pick one and put it explicitly in the command.
- -pedantic, -pedantic-errors: "Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C ++ . For ISO C, follows the version of the ISO C standard specified by any -std option used." This is the reason for specifying the standard explicitly, and really does help with spotting things you're doing wrong. Using the -errors version means that the things -pedantic finds are treated as errors rather than warnings.
g++ -std=c++98 -Wall -pedantic
Other compilers have similar options. In Visual Studio, there's the rather daunting Project Properties Dialog. Whenever changing anything thing on this, check that you have the right configuration selected (And, generally, changes you want to make will be across all configurations5). The 'C/C++' submenu contains a number of options that are useful - in General, the warning level (which I recommend setting to the highest level), and various other options spread around (including in the language, code generation, and advanced which I don't have the time to go into here. There is also a 'Treat Warnings as Errors' flag, which I have found to be more of a nuisance, however some people may find it useful to make sure they read all the warnings.
Don't leave your code unexplained and unstyled
Even if you know what you're doing doesn't mean your friends and lecturers will be able to understand. Hard-to-read code nearly always leads to mistakes and problems. One thing I will not do is suggest a preferred codestyle. There are plenty of people who have already done this for me. The key points in all of them are consistency and distinction; exactly which variant you use doesn't matter as much as always doing things the same way. Also important is to be descriptive, and not try to optimise the code to the point of it not being understandable. Take, for example, this wonderful line6:
double u2 = ((2 + 10*T(startout + h*i,E))*u1 - (1 - T(startout +(h*(i-1)),E))*u0)/(1-T(startout -(h*(i+1)),E));
Variables and functions here are not consistently named (both E and h are variables, where as T is a function), and the code is overly complex. Whereas7:
double ta = T(startOut + h * i , e);
double tb = T(startOut + h * (i - 1), e);
double tc = T(startOut + h * (i + 1), e);
ta = 2 + 10 * ta;
tb = 1 - tb;
tc = 1 - tc;
double u2 = (ta * u1 - tb * u0) / tc;
If this code is defined within a function, a good compiler (most compilers you'll find in use today) will optimise out the use of the three extra variables, and generate code that is equivalent to the original line
This code, however, is still missing an explanation as to what it is doing. Out of context, I can assume that i is a loop counter, and prior knowledge allows me to guess that h is a step size. SO, this function is looking at the relationship between a three 'consecutive' values of another function. Even now, I still don't have any idea to the why.
Do use static variables and functions (with care)
They make a nice way to control the scope of variables and functions. Earlier, I said that the scope of variable is the block it's defined in. The keyword static modifies this somewhat: when it is applied to a global entity, that entity is only available within the current file (this behaviour only makes sense when you consider the behaviour of the extern keyword). When used within a function, it makes a variable only exist within the function, but maintain it's value between calls to the function. Taking the foo() function from the earlier section on scope, and modifying it slightly, gives you a function that returns the integer above the one it previously returned, with the first call returning 6.
int foo()
{
static int k = 5;
k++;
return k;
}
Statics are probably most used in classes, which I haven't mentioned much here. There are lots of guides on how to do things with classes in C++, many of which aren't really that much help.
Don't sit outside a closed café, using the free wireless to finish your coursework
Even if it's because the computing labs are closed and you don't have internet at home. Especially when it's -8°C (265°K) outside. This, worryingly, is (almost) a true story. I recommended they walked the few minutes extra to mine where there was heating and the potential for coffee. By the time it reached 3am, I was almost tempted to go find them and physically drag them here.
For people who are trying to work out the title, it's not binary. Counting the dos and don'ts should give it away; the nth + 1 one probably would have been to do with number formatting or similar, to make sure you don't get caught out by tricks like this. When a number literal starts with a 0, a number of languages will take it to be in octal (base 8).
- 1 ↑ This section is mainly for a friend. Hope you get to read it :). Also, if you are a lecturer, I'd actually really, really like to know what Dark Matter is. Higgs' Boson? Easy - it's gravel and it's sitting on my shelf, where CERN will never find it. Dark Matter - more complicated.
- 2 ↑ Actually, depending on how this program was compiled, it just might. However, this would be most unexpected. Also note that there is no guarantee that the same number will be returned each time - the initial value of j may be completely random. Test quick test I ran resulted in the same value from each call, but different on each run - I'm going to guess that the code is reading some point in memory reserved by my program, but the compiler have optimised out the j++; return. j; to not write back to it (which is the correct behaviour).
- 3 ↑ Metaphorically blink, that is. Also, please no-one do this. Every time someone does this, I kick a kitteh. Please: think of the kittehs.
- 4 ↑ A problem that I'm worried might also be applicable to this post
- 5 ↑ The two default configurations are Debug and Release and, for the most part, they're exactly the same. There is some variation in optimisations, and the Release configuration doesn't include the data to debug the program.
- 6 ↑ Code from G Carpenter. The code he sent me to check actually works as far as either of us can tell; it looks like physics was wrong in the end. The context, If I remember correctly, is a iterative solver for a second order differential equation of some description.
- 7 ↑ I've made this revised version rather quickly - I may have made a mistake. They'll be a prize for the first person to point it out. And a bonus prize if they submit a correct and readable code section