Wednesday, March 27, 2013

Code structure: Cleanliness and efficiency

There are many ways to help make your code more readable, less bug-prone, and even more efficient. These are ignored by so many people whose code I've seen that I've decided to post about it. To many of you this stuff may be considered common sense, and that's a good thing.

Early out with conditionals when appropriate. This prevents your conditionals from nesting.


Bad:
void processMapNodes()
{
    if ( !map.hasBeenProcessed() )
    {
        map.process();

        // Other stuff here

        map.setProcessed(true);
    }
}
Good:
void processMapNodes()
{
    if ( map.hasBeenProcessed() )
    {
        return;
    }

    map.process();

    // Other stuff here

    map.setProcessed(true);
}

Early out with for loops. If you have to loop to find something, break out as soon as you've found it.


Bad:
void removeMapNodeByName(const std::string& name)
{
    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() == name )
        {
            // Possibly some stuff here

            map_nodes.erase( map_nodes.begin() + i );
        }
    }
}
Better:
void removeMapNodeByName(const std::string& name)
{
    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() == name )
        {
            // Possibly some stuff here

            map_nodes.erase( map_nodes.begin() + i );
    
            // We've found what we're looking for, so we break and stop looping
            break;
        }
    }
}
Best: (Use a conditional to continue as soon as possible)
void removeMapNodeByName(const std::string& name)
{
    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() != name )
        {
            continue;
        }

        // Possibly some stuff here

        map_nodes.erase( map_nodes.begin() + i );
        break;
    }
}

Some of these examples are just to give you an idea of how to break out of code as soon as possible, but sometimes you can get even more efficient. In the example above we're simply going through a vector of data and looking for a match and then removing it. If what you're doing is really that trivial then you can use the standard library remove_if algorithm to do that for you. There are good examples of that, here, and here.

Don't nest your conditionals unnecessarily


Bad: Here we have nested conditionals, it starts to get messy
void foo(Bar* pBar)
{
    if ( pBar )
    {
        if ( pBar->hasData() )
        {
            Data* pData = pBar->getData();
            
            for ( int i = 0; i < pData->size(); ++i )
            {
                if ( pData[i].needsProcessing() )
                {
                    pData[i].process();
                }
            }
        }
    }
}
Good: We unnest our if statements and leave the for loop as soon as possible.
void foo(Bar* pBar)
{
    if ( !pBar || !pBar->hasData() )
    {
        return;
    }
     
    Data* pData = pBar->getData();
            
    for ( int i = 0; i < pData->size(); ++i )
    {
        if ( !pData[i].needsProcessing() )
        {
            continue;
        }

        // Do a bunch of stuff here

        pData[i].process();
    }
}

Create data only once it's needed

Bad: We're creating 'node_pos' at a point before we might exit the function early. This is wasteful and unnecessary.
void moveToMapNode(const std::string& name)
{
    Vector3 node_pos;

    if ( map_nodes.empty() )
    {
        return;
    }

    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() != name )
        {
            continue;
        }

        node_pos = map_nodes[i].getPosition();
        break;
    }

    player.setPosition( node_pos );
}
Better: We create node_pos after any point were we could leave the function early, so we're never creating it when it might not be needed.
void moveToMapNode(const std::string& name)
{
    if ( map_nodes.empty() )
    {
        return;
    }

    Vector3 node_pos;

    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() != name )
        {
            continue;
        }

        node_pos = map_nodes[i].getPosition();
        break;
    }

    player.setPosition( node_pos );
}
Best: We don't need to create a named variable 'node_pos', and it doesn't need to be used in as large of a scope as it was, we can use the position we get directly within the for loop.
void moveToMapNode(const std::string& name)
{
    if ( map_nodes.empty() )
    {
        return;
    }

    for ( int i = 0; i < map_nodes.size(); ++i )
    {
        if ( map_nodes[i].getName() != name )
        {
            continue;
        }

        // This is still fairly clear, even though we have a function call within a function call.
        // I wouldn't recommend nesting function calls any deeper than this, or debugging them 
        // and their return values can get messy if you end up with any bugs.
        player.setPosition( map_nodes[i].getPosition() );
        break;
    }
}

Sort your conditionals such that the cheapest ones are evaluated first, or the one most likely to evaluate positively is evaluated first.

Bad: We're calling the more expensive function first. Both functions in the conditional must be true for us to continue, so we should be calling the cheap function first, and if it returns false there is no need to call the expensive function.
void someFcn()
{
    if ( isExpensiveFcnTrue() && isCheapFcnTrue() )
    {
        // Do stuff here
    }
}
Good: We're calling the least expensive function first.
void someFcn()
{
    if ( isCheapFcnTrue() && isExpensiveFcnTrue() )
    {
        // Do stuff here
    }
}
Bad: We're placing the cheaper methods first, however, if they're functions that are true most of the time then we're still likely to have to evaluate all of the functions, even the expensive one that is usually false.
void someFcn()
{
    if ( cheapFcnThatUsuallyReturnsTrue() 
      && someOtherCheapFcnThatsUsuallyTrue() 
      && moderatelyExpensiveButUsuallyFalse() )
    {
        // Do stuff here
    }
}
Good: We call the more expensive method first, because it usually returns false, allowing us to skip the calls to the less expensive functions that usually return true. Basically your goal is to pay the lowest overall cost in your conditional statement, sometimes that will be calling cheaper methods first to avoid having to call expensive ones, if all of the functions stand similar chances of returning positive values, and sometimes you'll want to call more expensive ones first if it means you can save yourself numerous calls to other methods afterward. You'll need to make a judgement call based on the expense of the methods and what allows you to bail out as early as possible.
void someFcn()
{
    if ( moderatelyExpensiveButUsuallyFalse() 
      && cheapFcnThatUsuallyReturnsTrue() 
      && someOtherCheapFcnThatsUsuallyTrue() )
    {
        // Do stuff here
    }
}
If you're using || in your conditional, then you want to evaluate to true as soon as you can in your conditional, if you have && then you want to evaluate to false as soon as possible. Whatever the logic you're using, you want to do the least amount of work as possible to terminate the conditional check early and/or cheaply. This isn't only the most efficient, but it can make it easier to debug as you're not having to step through as much code when you exit early.

Cache results from an expensive algorithm or method that you might be calling more than once.

The compiler cannot optimize out multiple calls to the same function, unless those functions are purely constant expressions, so it's up to you to do it.

Bad: Here we're calling numCompletedNodes() more than once.
void displayCompletedNodeCount()
{
    // Let's pretend that 'numCompletedNodes()' has a big O of N*log(n) each time it's called.

    // If the player has completed any nodes in the map
    if ( numCompletedNodes() > 0 )
    {
        // We update the UI to display the number of completed nodes
        ui_interface->updateNodeCount( numCompletedNodes() );
    }
}

Better: We cache the number of nodes so we don't have to call the function each time we use it.
void displayCompletedNodeCount()
{
    // Let's pretend that 'numCompletedNodes()' has a big O of N*log(n) each time it's called.

    const unsigned int num_completed_nodes = numCompletedNodes();

    // If the player has completed any nodes in the map
    if ( num_completed_nodes > 0 )
    {
        // We update the UI to display the number of completed nodes
        ui_interface->updateNodeCount( num_completed_nodes );
    }
}

Best: If you really want to be more efficient, in this example it would probably be best to make sure that the numCompletedNodes() function is cheaper to call. Just from the example we can probably assume that it has to go through the map of nodes and calculate for each one if the player has completed it. That is probably information that could be stored on each node, which would bring the big O down to O(n). And you could go further and have a variable that keeps track of the completed node count as it changes, and then it would bring the cost down to O(1) (constant).

5 comments:

  1. In your second set, I am confused as to how the best is better than the better. Are!= calls more efficient that == calls? Is {map_nodes[i].getName() != name} more efficient/quicker than {map_nodes[i].getName() == name}?

    ReplyDelete
    Replies
    1. I guess from an assembly perspective a jne back to the beginning of the loop is more efficient. If I had taken a few more minutes to think about it....

      Delete
    2. Actually neither is really more efficient in that case, it's simply a case of keeping your code flatter (less nested) by continuing as soon as possible within for loops. The example in my post was pretty simplistic, but if you can imagine a case where there was a lot more logic and nesting within that if statement, the depth from nesting could get ugly. Using a continue to early-out when possible reduces the depth of the nesting by 1.

      Technically a != will usually perform a ! on a == operator, so == may be faster if it's the operators aren't efficient enough to inline. In practice a != will almost never be enough of a difference from == to matter as to which one you use.

      Delete
    3. Thanks for the reply. Most of my programming does not require much optimization so I haven't really thought about it that much. They are all general utilities and a 1s speed increase is not that important. For giggles I may go through some of my code and see if I can't improve some of my loops.

      This does give me some insight for when I am looking through others source code trying to see why/how they do something. When I see these techniques used I will better understand why they did it.

      Delete
  2. Thanks for you work it is very good!

    ReplyDelete