Fixing templates with constexpr’s

For my hundredth (and a bit) c++ post I decided to do something I never did before: fix my old code!

A long time ago I wrote about template metaprogramming devices. There, I tried to explain that many atrocities have been commited in the name of performance and “compile time evaluation”. Template metaprogramming is probably one of the worse culprits of job security. Its (ab)use can create monstrosities, all in the name of runtime performance. Like, for example, my template device to calculate e. Let’s remember what that atrocious code looks like (follow the link if you want an explanation on how this works):

template <int N, int D> struct Frak {
        static const long Num = N;
        static const long Den = D;
};
  
template <int N, typename F> struct ScalarMultiplication {
    typedef Frak<N*F::Num, N*F::Den> result;
};
  
template <int X, int Y>   struct MCD {
        static const long result = MCD<Y, X % Y>::result;
};
  
template <int X> struct MCD<X, 0> {
        static const long result = X;
};
  
template <class F> struct Simpl {
        static const long mcd = MCD<F::Num, F::Den>::result;
        typedef Frak< F::Num / mcd, F::Den / mcd > result;
};
  
template <typename X1, typename Y1> struct SameBase {
    typedef typename ScalarMultiplication< Y1::Den, X1>::result X;
    typedef typename ScalarMultiplication< X1::Den, Y1>::result Y;
};
  
template <typename X, typename Y> struct Sum {
    typedef SameBase<X, Y> B;
    static const long Num = B::X::Num + B::Y::Num;
    static const long Den = B::Y::Den; // == B::X::Den
    typedef typename Simpl< Frak<Num, Den> >::result result;
};

template <int N> struct Fact {
    static const long result = N * Fact<N-1>::result;
};
template <> struct Fact<0> {
    static const long result = 1;
};
 
template <int N> struct E {
    // e = S(1/n!) = 1/0! + 1/1! + 1/2! + ...
    static const long Den = Fact<N>::result;
    typedef Frak< 1, Den > term;
    typedef typename E<N-1>::result next_term;
    typedef typename Sum< term, next_term >::result result;
};
template <> struct E<0> {
    typedef Frak<1, 1> result;
};
 
int main() {
  typedef E<8>::result X;
  std::cout << "e = " << (1.0 * X::Num / X::Den) << "\n";
  std::cout << "e = " << X::Num <<"/"<< X::Den << "\n";
  return 0;
}

While this is just a toy example to play with templates, it does illustrate code I’ve seen in the wild. Would this look cleaner in c++11? Yes, it would. Constexprs are, in my opinion, one of the most overlooked “killer” features of c++11.

Starting with a simple example:

constexpr int foo(int a, int b) { return a+b; }
static constexpr int n = foo(1, 2);
int bar() { return n; }

Try to compile it with “g++ -std=c++11 -fverbose-asm -O0 -c -S -o /dev/stdout” and see what happens. You should get the equivalent of “return 3” – just as anyone would expect – but note that no optimizations were enabled. What about loops? Let’s try this:

constexpr int f(int n) {
    return (n<2)? 1 : n + f(n-1);
}

constexpr int n = f(999);

You’ll probably get an error about maximum depth exceeded, but that’s alright: we have loops in constexprs too! (note that some of these restrictions have been relaxed in C++17).

In general, if you can express your function as a single const return statement, it should be a valid constexpr. With this new knowledge, let’s convert the template meta-atrocity above to something a bit less hideous:

struct PodFrac {
    int num;
    int den;
};

constexpr int mcd(int a, int b) {
    return (b==0)? a : mcd(b, a%b);
}

constexpr PodFrac simpl(const PodFrac &f) {
    return PodFrac{f.num / mcd(f.num, f.den), f.den / mcd(f.num, f.den)};
}

constexpr PodFrac sum(const PodFrac &a, const PodFrac &b) {
    return simpl(PodFrac{a.num*b.den + b.num*a.den, a.den*b.den});
}

constexpr int fact(int n) {
    return (n==0)? 1 : n*fact(n-1);
}

constexpr PodFrac e(int n) {
    return (n==0)? PodFrac{1, 1} :
                   sum(PodFrac{1, fact(n)}, e(n-1));
}

constexpr float e_num = 1.0 * e(8).num / e(8).den;

float get_e() {
    return e_num;
}

Disclaimer: while I explicitly stated this multiple times in my “C++ template metaprogramming introduction” article, it’s worth re-stating it: this code is meant as an example to showcase a c++ feature, not as a proper way of deriving a mathematical constant in production code.

First thoughts after comparing the two versions: much, much [, much]*100 cleaner.

As you may notice, all constexprs need to be a return statement. There’s no multi-statement constexpr in c++11, which explains why loops are not really supported. For the same reason the implementation of e() is a bit hindered by this limitation: its code would be much more readable splitting it in a few lines with proper names. Good news: some of these restrictions have been relaxed in C++17.

Note that if you analyze your compiler’s output when building without optimizations, you may see either a const with e’s value, or a static initializer that does some trivial operation, like loading e’s value from a fraction: gcc seems to get tired of constexpr evaluation after a few recursive calls, so your results may vary (slightly).

I called constexpr’s one of c++11’s killer features, and hopefully you can see why I’m so enthusiastic about them now: there’s much less incentive for people to write horrible template metaprogramming devices when simply adding a little keyword to a normal function has the same effect, only cleaner.

Advertisements

Quick refresher: argument dependent lookup

Since I wasted a few precious minutes stuck on an ADL problem, I figured I needed a quick reminder on how they work. Check this code: does it compile?

namespace N {
    int foo() {
    }
}

int main() {
    return foo();
}

Of course it doesn’t! You’d expect a ‘foo’ not declared/out of scope error from your compiler. What about this other example?

namespace N {
    struct Dummy;

    int foo(Dummy*) {
        return 0;
    }

    int foo() {
    }
}

int main() {
    return foo((N::Dummy*)0);
}

You’d be tempted to say it won’t work either. (Un?)fortunately, ‘argument dependant lookup’ is a thing, and the second code sample works. How? The compiler will look for ‘foo’ in the global namespace, and also in the namespace of the arguments to ‘foo’. Seeing ‘N::Dummy’ in there, the compiler is allowed to peak into the namespace N for method ‘foo’. Why? Short: operator overloading. Long: check here (the ‘Why ADL’ section is very good).


Google Test: Quarantine for tests?

Google Test: Putting a test under quarantine

GTest works wonders for c++ testing, even more so when combined with GMock. I’ve been using these frameworks for a few side projects. I’ve seen them used in large scale projects too. In all cases, there is a very common problem for which (I think) there is no elegant solution: managing temporarily disabled tests.

It may be because you found a flaky piece of code or a test that exposes a heisenbug. Maybe the test itself is just unstable, or perhaps you are using TDD and want to submit a test to your CI before its implementation is ready. In these cases, you can choose to disable the offending test or let it run, possible halting your CI because of it. When that happens, you maybe masking other, real, problems.

Most people would stick a “DISABLED_” before the test name, to let GTest know not to run it. Maybe even stick a “// TODO: reenable” in there too. When run, GTest will generate a message to let you know there is a disabled test. Even so, I find that people -myself included- tend to forget to re-enable the disabled tests.

For one of my side projects, I hacked GTest to quarantine tests up to a certain date:

TEST(Foo, Bar) {
    QUARANTINE_UNTIL("16/8/22");
    EXPECT_EQ(1,2);
}

In my CI setup, that test will be showing a happy green (and a warning, which I will probably ignore) until the 22nd of August. By the 23rd the test will run again and fail if I haven’t fixed the code. If I have indeed fixed it, it will print a warning to remind me that it’s safe to delete the quarantine statement.

Is there any advantage in this approach over the usual _DISABLE strategy? In my opinion, there is: if you ignore warnings in your test, for whatever reason, a _DISABLE might go unnoticed and it may hide a real problem. In the same scenario, for a quarantined test, nothing bad happens: the warning just says “you should delete this line” but the quarantined test is again part of your safety net.

How does it work? The first caveat in my article mentions it: hackishly. There are a few facilities missing in GTest to make this implementation production-ready but, ugly as it looks, it should work as intended:

#include <ctime>
#include <string>
#include <sstream>
std::string now() {
    time_t t = time(0);
    struct tm *now = localtime(&t);

    std::stringstream formatted_date;
    formatted_date << (now->tm_year+1900) << '/'
                   << (now->tm_mon+1) << '/'
                   << now->tm_mday;

    return formatted_date.str();
}

#define QUARANTINE_UNTIL(date_limit)                                     \
        if (now() < date_limit) {                                        \
            GTEST_LOG_(WARNING) << "Test under quarantine!";             \
            return;                                                      \
        } else {                                                         \
            GTEST_LOG_(WARNING) << "Quarantine expired on " date_limit;  \
        }

If I find there is interest in this approach for real world applications, I may try to come up with a nicer interface for it.


The best hack you should never use

Please don’t do this. But if you do: leave a comment here!

#define private public;
#include "something";
#define private private;

Found in some random project.


Shared pointers: don’t

Ahh, shared pointers! A nice, magical pointer type that will make all of your memory problems go away. Sprinkle some shared_ptrs here and there and, voilà, Valgrind is now happy. Sounds like a silver bullet, doesn’t it? That should be your first clue that something’s up with the promise of getting Java-like memory management in your c++ application.

Java and C(++) have a very different concept of memory management. My Java-foo, obviously enough to anyone reading this blog, is not that great, but, from experience, memory management is seen as a chore better left to the bowels of your system, something you don’t need (nor want) to care about. Sure, you can tweak and somewhat manage memory allocations if you really want to; the default mindset, however, is to ignore those concerns. The garbage collector will, eventually, find any unused resource and deal with it accordingly.

C++ programs, at least those that have been more or less successfully designed as opposed to organically grown, tend to have a radically different approach. Memory management is an integral part of a program’s design and it can’t be left to some automagic memory manager. This leads, again, for those more or less successful designs, to programs with a tree-like hierarchy in which a child or dependent object must live at least as long as its parent scope. This hierarchy leads to easier to understand programs. Some idioms (RAII!) depend on this hierarchy. Some tools (like scoped and unique pointers) make its implementation much simpler. I’ve seen that Rust really builds on this idea (and seems to take it to 11! I’m still trying to figure out if that’s a good or a bad thing, but so far I quite like it).

The tree-like structure of the scopes in C++ also implies single ownership (again something Rust seems to be very idiosyncratic about). While you may “use” someone else’s objects (for example, via a reference) there is always one single owner. If this owner goes away while someone holds a reference to one of its children… well, you get a bug. But sure enough this bug is clear as long as you can visualize the tree scope structure of your program. Shared pointers completely obliterate this tree.

A shared pointer means an object can have multiple owners. Whoever goes out of scope last needs to clean it. Why is that bad? In my (highly subjective but experience based) opinion:

  • It becomes harder to reason about your program. You never know if all the “pieces” you need are in scope. Is an object already initialized? Who is responsible for building it? If you call a method with side effects, will any of the other owners be affected by it?
  • It becomes harder to predict whether going out of scope is trivial, or an operation that can take a minute. If you’re the last one holding a reference to an object through a shared pointer, you may be stuck running its destructor for a long time. That’s not necessarily a bad thing, but not being able to predict it can lead to all sort of obscure bugs.

There are also many ways to avoid shared pointer usage:

  • Try restructuring your code. This will usually yield the biggest benefits, you’ll end up with a clearer structure and less clutter.
  • Try to use scoped pointers from boost or unique pointers if you can. Way too often shared pointers are used when a scoped pointer would be enough.
  • Don’t be scared of copies! Sometimes you can just copy your object and end up with cleaner (and maybe even faster) code. Are you really sure you need to share state?

Does that mean you should never ever use shared pointers? Of course not. In some cases it’s unavoidable. Some algorithms are probably impossible to implement without them (or even impossible without GC). A shared pointer is just one more tool. Just like gotos. And, just like gotos – although not quite as dangerous – they have some appropriate use cases. Just try not to make it your default goto (lol) tool for memory management.

 

// TODO: There is a very good reason I found to use shared pointers: to create weak_ptr’s. Is there a good solution without actually using shared_ptr’s? I don’t know!


C++: Why is undefinedness important

Let’s start with an example:

int *x = (int*) NULL;
x[0] = 42;

Luckily so far I’ve never seen anyone argue about this one: we all know we’re dealing with undefined behavior and that it’s bad. Things get a bit more tricky when the example is not so trivial.

C’s abstract machine

In a way, C and C++ describe a “virtual machine”. This is what the standard defines: what kind of operations are valid in this VM. This VM resembles an old single-thread mono-processor architecture. Most often, the code will run in a modern architecture that will resemble very little the design of C’s VM. “New” features (like caching, vectorization, atomics, pipelining, etc) implemented by the target architecture make the process of mapping our code (in the VM that C defines) much more difficult. The compiler needs to map instructions in C’s simple architecture to a much (*MUCH*) more complex design. To do that, it needs to analyze the code to guarantee certain constrains are met.

Let’s see how these constrains and undefined behavior relate to each other with this snippet:

template <typename T>
bool always_true(T x) {
return (x < x+1);
}

From a math perspective, and assuming that T is a numeric type, always_true should always return true. Is that the case for C’s virtual machine?

If we call always_true with a type like “unsigned int”, then x+1 may overflow and wrap around. This is fine because unsigned int’s are allowed to wrap around. What happens if instead we use a signed type? Things get more interesting.

Signed types are not allowed to overflow. If they do, the standard says the behavior is undefined. And the standard also says that our program can not invoke undefined behavior. This is a very important phrase: the standard says undefined behavior can NOT occur. There is no “if it does”: it just can’t, so the compiler will assume that UB will never happen. What if it does happen? Nasal demons, that’s what!

Knowing that UB can’t happen, and in our example above, the compiler can assume that x+1 will never overflow. If it will never overflow, (x<x+1) will always be true.

The compiler, by analyzing our program, can detect what conditions might trigger undefined behavior. By knowing that undefined behavior is not allowed, it can assume those conditions will never happen. That’s why, for the sample above, any optimizing-compiler will just produce code similar to “return true”, at least for -O2.

Undefined behavior is not (only) to make programmer’s lives miserable, it actually is needed to create optimizing compilers.


I can’t believe this works!

Are you bored? Try pasting this, as is, in a cpp file:

// What is going on here??/
Is this even legal C++??/
Yes, it is!

NB: You may have to use -trigraphs to compile this. Try it out! You can use this command:

echo -e “// What is going on here??/Is this legal C++?” | g++ -E -c -trigraphs –

With some luck, this won’t be legal C++ anymore after C++ 17 deprecates trigraphs.


Initialization oddities: Aggregate initialization

Do you know the quickest way to create a constructor that initializes the elements in this struct?

#include <string>
struct MyStruct {
    int x;
    std::string y;
    const char *z;
};

If you answered “by typing really fast”, you may be interested in knowing that the fastest way to create this constructor is to not write it at all!

MyStruct a = {42, "Hello", "World"};

Yes, the line above works and it’s perfectly legal C++. It’s event C++ 98! This language feature is called aggregate initialization and it says the compiler should be smart enough to initialize MyStruct using each value successively. Of course C++11 has made this syntax somewhat simpler and a lot more uniform:

MyStruct a{42, "Hello", "World"};

There are some caveats when using this initialization, namely that the initialized type must be an aggregate. An aggregate, in standard lingo, is a type that has some restrictions. No virtuals, no privates, etc. You can say it’s a POD and in most cases you’d be right.

Now, is this also legal?

MyStruct a = {42, "Hello"};

You’d be tempted to say that’s a syntax error. It’s not, now z will just be default-initialized. What about this, then?

MyStruct a = {42, "Hello", "World", "Extra!"};

According to the standard, that’s an error. Or… is it? Let’s try out this example:

struct A {
    int x;
};

struct B {
    A a;
    std::string y;
};

struct C {
    B b;
    const char *z;
};

C o = {42, "Hello", "World"};

Yes. Believe it or not, the object o will now contain three members: o.b.a.x, o.b.y and o.z. All three will be properly initialized with their respective value.

Aggregate initializations should, according to the standard, be smart enough to initialize aggregate objects and use any “spill over” to continue initializing other values/aggregate objects recursively.

Bonus I:

Aggregate initialization is also what makes this idiom valid:

char x[] = {1, 2, 3}

In this case, x will be of length 3 because that’s the length of its aggregate initializer.

Bonus II:

I’m sure anyone trying to get up to date with C++11 will have played around with variadic templates. One of the first exercises I’d recommend for this would be a compile-time list of different types. Knowing about aggregate initializations now, how would you write a constructor for this type?

template <typename H, typename... T>
struct Multilist<H, T...> {
    H x;
    Multilist<T...> next;
};

Multilist<int, string, float> foo{42, "XXX", 1.23};

Code and Google translate: awesomeness

Some time ago I found out one of my articles was translated to another language (yay for that, woo for not letting me know about it). To understand what my own article said, I had to use Google translate on the site. Guess what? c++ and Google translate can produce hilarious results:
# The include "throw.h" the extern "the C" {void seppuku () {throw statement the Exception () ; }}
Another one I liked:
the struct the Exception {}; # ifdef __cplusplus the extern "the C" {# endif void seppuku (); # Ifdef __cplusplus} # endif
Now you know it. Next time you’re looking at some incomprehensible c++ code, run it through Google translate. It may improve it.

Gimple

Lately I’ve been toying around with gcc to learn a bit better how its optimization phases work. Understanding Gimple, the intermediate representation used by gcc, is a useful skill for this. Of course actually *understanding* it is quite an ambitious and daunting task, so it may be a bit more useful to skim through it.

Turns out that using -fdump-tree-all and -fdump-rtl-all its possible to get a lot of interesting information on the phases the compiler follows to get your code optimized, but the sheer amount of information produced makes it rather hard to make sense out of it. During the next few posts (weeks? months? probably until I satisfy my curiosity about gcc) I will be investigating a little bit the output of the -fdump options in gcc, to see what can be learned from it.