Comments – Introduction

All of us have written comments for our code. Many times we have relied on comments to try to make sense of the code we’re seeing. As Uncle Bob puts it in his book, “comments are, at best, a necessary evil” and shouldn’t be used carelessly.

Why we need comments

To put it simply, we write comments in our code when we are unable to properly express our intent only using code. This can be due to the programming language’s limitations or our own.

Think about it, if all your classes and variables had clear names, all your functions names expressed what they did without any doubts, then your code would be so easy to understand that comments would just be an unnecessary redundancy.

Other times we’ll need to write something that can’t be expressed in code, or text that isn’t code but has to be present in the file for other reasons (like legal details ).

Writing Clean Code Is Better Than Writing Comments

Comments shouldn’t be used as a complement to bad code. If our code is complex, difficult to read and hard to understand, instead of spending time and energy writing comments, we should try to make our code cleaner.

In future posts, we will see when adding comments to our code can be a good thing, and when it can be a bad thing.

Functions – The DRY Principle

A very important lesson that every programmer has to learn at some point, is that duplication is bad. This is what the DRY Principle addresses.

What is duplication?

Simply put, duplication is having lines of code in your application, that are repeated in multiple places. Whenever you copy and paste a snippet of code, you are very likely guilty of duplication.

Why duplication is bad

One of the problems that comes from having duplication in your code is that it gets bloated. Copying and pasting code can get out of hand very quickly.

The main reason that duplication is something you want to avoid, is that it makes maintaining your code very difficult. Whenever you need to modify your code, you have to track down all the places where that code has been duplicated and make the changes for all the duplications. This is very time consuming and requires you to have a deep knowledge of the entire system. If you have a lot of duplication, it is probable that you will miss one or more of them. This will cause bugs  in your code that are very hard to track down.

The DRY Principle

DRY stands for “Dont Repeat Yourserlf”

The formal definition for the DRY principle is:

“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

Another way of explaining is to say that your program shouldn’t have two or more blocks of code that all do the same thing.

Functions help us keep our code DRY

Functions allow us to keep our code DRY, because we encapsulate the logic inside the function. This way when we need our code to do something more than once, instead of copying the code, we call the function instead. This helps prevent code bloat.

DRY Code is easier to modify

Modifying the code also becomes a lot easier. If we want to change the behavior of a certain functionality in our program, we just need to modify the function that implements it. All the places in our program that call the function will automatically have been “updated”. There is no need to search all our code base to modify anything else.

 

Functions – Exceptions Instead Of Error Codes

Maybe at some point you’ve written a function that had several possible results, and depending on the result, you had to do different things or show different error messages. One thing you might be tempted to do is have the function return a number representing the result code, or maybe the function returned an enumeration. While this may be a valid thing to do in certain cases, you should try to avoid it.

Why shouldn’t you use Error Codes?

It violates Command Query Separation

If you read the post about Command Query Separation, then you can see that this approach violates that principle.

It makes you write more code

If you call a function that returns error codes, then it is safe to assume that you will want to act according to the error code received. Usually this happens via a switch statement.

// things that happen before calling the function
int code = MyFunctionThatReturnsErrorCodes();
switch(code){
  case 0:
    //if 0 means success then we continue our function here
    break;
  case 1:
    //handle this error
    break;
  ...
  case 10:
    //handle this error
    break;
}
// things that happen after calling the function

This means that your function will have to handle every single possible error. If there are a lot of things that can go wrong, then your function will be large because of all the error handling code.

It makes modification difficult

What happens when your function has a new possible error that can happen? Not only will you have to change the function (and the enumeration if you are using one for the possible errors), you will also have to change all the code that calls that function and handle the new error code. That means adding an extra case to all the switch statements in the previous example. If the code that calls the functions is in a different assembly, then those assemblies will have to be recompiled as well (and that can make you a lot of new enemies very quickly!)

Error handling is one thing

Many people would argue that error handling is one thing. That means that a function that handles it’s errors is doing more than one thing. It would be better to have a separate function that handles the possible errors that could happen,

Use exceptions instead

A better approach would be to have your Command functions not return an error code, but make them throw an exception when an error happens. When you call the function, assume that everything will work correctly.  and use a try/catch statement to handle any errors outside of the logical flow of the function itself.

try{
  // things that happen before calling the function
  MyFunctionThatThrowsExceptions();
  // things that happen after calling the function
}
catch (Exception e)
{
  // handle possible errors here, preferably in a separate function
}

The code in the previous example is easier to read. You assume that your functions will work without errors and if an error happens, there is a separate piece of code that will handle that error. We are also respecting Command Query Separation, while having functions that do one thing.

Functions – Command Query Separation

The idea of Command Query Separation is simple. You should have functions that perform actions that change the state of your system, and you should have functions that return (query) the current state of some part of the system, but your functions shouldn’t do both.  That means you should have Commands and Queries, but should try to avoid Commands that are also queries

Commands

These are the functions that Change the state of your system. They should not have a return value since they asking the system to do something.

void LoginUser(string username,string password);

Commands usually require you to be more careful, since calling them in the wrong order, or at the wrong time can result in an error.

Queries

Queries are the functions that you call when you need to know the state/value of something in your system.

User GetUser(string username);

Queries are functions that you should be able to call at any moment without any fear of them altering the state of your system or having side effects.

 

Trying to write your functions according to the principle of Command Query separation should allow your code to be understood more easily. Seeing a functions without a return value means that it is a command and should be called knowing that it will cause a status change, while a function with a return value will not.

 

There is also an article by Martin Fowler that gives a very simple and clear explanation if you want to learn some more.

Functions – Side Effects

What is a side effect?

According to Wikipedia, a function has a side effect when “it has on observable interaction with calling functions or the outside world”. Uncle Bob says that side effects are lies. A side effect happens when a function is supposed to do something, but it also does other things you don’t expect it to do. Examples of side effects can be changing the value of class or static variables, leave objects in a different and unexpected state.

A simple example (modified from the Clean Code book) is the following function

public bool CheckPassword(string userName, string password){
  User user = UserGateway.FindByName(userName);
  if(user != null)
  {
    if(user.Password == password)
    {
      Session.Initialize();
      return true;
    }
  }
  return false;
}

The side effect in the functions is the call to the Session.Initialize() method. The function name says that it will check the passwork, but it doesn’t give any clue that it will initialize the user session if the password is correct. If someone called the method just to check the password, unnecessary user sessions could be started or valid user sessions could be overwritten.

We must always take care that our functions do not have side effects. Functions that do “extra” things you don’t expect them to have been the source of many sleepless nights for programmers everywhere.

 

Functions – Arguments

A very important thing you should have in mind when writing functions, is how many arguments each functions should have. Just like shorter functions are easier to understand, functions with few arguments are easier to understand and use.

How many arguments should a function have?

The short answer for this question is that a function should have as few arguments as possible. You should aim for functions with no arguments, when possible. If not, then pass in one or two arguments. According to the Clean Code book, three arguments is getting to the limit, and you should “never” use more than that. While this might seem like a rule without any justification, it can help your code in the following ways:

It makes your functions easier to read

If a function has only a few arguments (or none at all), then it will be much easier to understand how those arguments are used by the function. If your functions are also small, then it will be that much easier to quickly read the code.

It helps make testing easier

When writing your unit tests, the less arguments a function has, the less argument combinations you have to write tests for.

What does it mean if my functions have more than three arguments?

In my personal experience, if your functions seem to have too many arguments, it can be because:

  • your functions are doing more than one thing, you could probably split the function into smaller ones with fewer arguments
  • Your function should really be a separate class.
  • You are passing values separately when you should be using a data structure

Some examples

Below we can see some examples of valid cases where you can use one, two, or even three arguments.

One Argument

Usually a single argument is used when you want the function to check something about the argument, or you want the function to do something to the argument.

bool FileExists(string fileName);

StreamReader OpenFile(string fileName);

Boolean (flag) arguments

One thing you should try to avoid is passing flag arguments. This is a very clear indicator that you have a function that does two things. Instead of a function like

void DoThisOrThat(bool flag);

You should try to write

void DoThis();

void DoThat();

Two Arguments

When writing functions with two arguments you should try to make sure that the relationship between the arguments is clear and that the argument follow a logical order

void MakeReservation(string customerName, datetime reservationDate);

void UpdateUserPassword(int userId,string newPassword);

Three Arguments

While functions with three arguments should try to be avoided, in some cases they are justified. Just be careful with the argument naming and ordering.

void TransferFunds(int originAccount, int destinationAccount, decimal amount);

In the previous example, switching the origin account and the destination account by mistake could get you in a lot of trouble!

Argument Objects

If your functions require more arguments, then it’s very likely that what you really need is an argument object. If we examine the TransferFunds function from the previous section, the arguments clearly are related and could be abstracted into an object. The function could be rewritten to be cleaner.

void TransferFunds(Transfer data);

We used a new Class that has the account info and transfer amount as fields.

Argument Lists

Another way to pass multiple similar arguments is to store them in a list or array. This is necessary when you don’t know how many arguments there are.

 

Functions – Signs that your functions are doing more than one thing

We’ve established that the key to having small functions is to make sure that they do one thing. A guideline to follow is that functions should remain on a single abstraction level. Now we will mention two very easy to detect signs that our functions may be doing more than one thing.

Lots of indentation

streetfighterindentation

Why do we indent code? When we use a tab or spaces to “push” some lines of code, we do it because we want to make it visually evident that those lines form a block of code that does something specific. In some cases it is very valid to have some degree of indentation, but if you start noticing a lot of nested  if-else statements or loops that have other control structures within them then you should consider extracting some functions. We will now look at an example with a single level of indentation:

public void ExampleFunction(int x, int y){
  if (x > y)
  {
   // some lines of code that do something
   ...
   ...
   ...
   ...
   ...
  }
  else
  {
    // some lines of code that do something else
    ...
    ...
    ...
    ...
    ...
  }
}

Maybe the instructions inside the if-else statements aren’t very difficult to understand, maybe there’s not a lot of lines of code, but you could argue that this function is doing more than one thing. Maybe it would be better to extract the code inside the control structures and have the code look like this:

public void ExampleFunction(int x, int y){
  if (x > y)
    DoSomething();
  else
    DoSomethingElse();
}

This function is clearly a lot easier to read than the previous one, the specific instructions that do something or do something else have been extracted so we don’t have to look at them unless we need to.

Sections Inside Functions

This one is very easy to detect, because it manifests itself into large functions, that have several blocks of code where each one does a single task. Take for example a function that opens a text file, processes the data and then writes the data to a new file.

public void ProcessDataFile(string sourceFile, string destinationFile){

  // some code that opens the source file
  ...
  ...
  ...
  // some code that processes the data from the file
  ...
  ...
  ...
  // some code that writes the data to the new file
  ...
  ...
  ...
}

This function has several clearly defined sections where each one does a different thing. this function could be rewritten the following way:

public void ProcessDataFile(string sourceFile, string destinationFile){
  var data =  ReadDataFromSourceFile(sourceFile);
  ProcessData(data);
  WriteNewDataFile(destinationFile,data);
}

If you can see, a pattern emerges. We take a function and we start extracting new functions from it where the newly extracted functions have a name that clearly states what it does, and it does only one thing.

Use your common sense

Does this mean that every single function has to be only a few lines of code? Is it forbidden to have nested control structures? No, that’s not what it means, sometimes it will be valid for you to decide that you want to have a large functions for some reason. Sometimes it will be the right choice to have a few nested control structures. The key is to not go to one extreme or the other. You shouldn’t have functions that are thousands of lines of code, nor should you try to make all your functions only one or two lines of code. Our goal is to make sure that our functions do one thing, and that can mean different things in different situations.

Functions – Levels of Abstraction

In the previous post, we talked about the importance of functions doing one thing. We briefly touched on something called abstraction levels.  One of the simplest ways to be sure that your functions only do one thing, is to make sure that they stay on one abstraction level.

Let’s try and explain this using a very simple example.

How to wash your hair

Every single day, millions of people around the world wash their hair in the shower. If you asked them the steps for washing their hair, they would likely give answers like the following:

“I just wash it”

“I first wash with shampoo and then rinse”

“When you’re in the shower and you want to wash your hair, first you have to make sure to wet your hair. Then you take the shampoo bottle and pour some shampoo onto your hands. Using your fingers make sure to lather the shampoo and distribute evenly across your hair. Wait a short time for the product to work, and then rinse your hair. Afterwards, take some conditioner and apply it to  your hair. Finally, rinse for a second time and you have finished washing your hair.”

Which description is the correct one? They all are, the difference between the answers is the level of detail they go into. This is what we mean by abstraction levels, the higher level an abstraction is, the less details we show. Then, the more levels we go down, the more details we get.

If we tried to write code to represent this it would look like this:

public void WashHair();

This would be the highest abstraction level for the function that washes your hair. At this point we don’t know how hair actually gets washed, and maybe we don’t want to know. Maybe we’re responsible for writing a function for taking a shower. In that case, this is as far as we want to go. We don’t need to know how hair is washed, we only care that it happens.

Going deeper

Suppose we want to go down one level of abstraction and see how that happens:

public void WashHair(){
  WetHair();
  ApplyShampoo();
  Wait();
  Rinse();
  ApplyConditioner();
  Wait();
  Rinse();
}

Now, we have more information about how the function works. We can understand the steps taken, even though there are no comments and we’re not looking at the code that actually does the work.

Go only as deep as you have to

Finally, let’s go down one more level of abstraction and see how shampoo is applied:

private void ApplyShampoo(){
  Shampoo shampooDose = ShampooBottle.GetShampoo();
  Hair.ApplyShampoo(shampooDose);
  Hair.LatherShampoo();
}

Now, we finally get to see variables, classes and “lower level” code. We could go even deeper, but this level of detail is enough for us. We don’t need to know how the Shampoo class works (or maybe we can’t see the source code), we just trust that it does.

Dealing with changes

If at some point, you had to change the way that Hair is washed, then you would only have to go the abstraction level where that task is performed and modify a very small amount of code. For example, if someone wanted to wash their hair without conditioner you only need to write the following function:

public void WashHairWithoutConditioner(){
  WetHair();
  ApplyShampoo();
  Wait();
  Rinse();
}

There was no need to look at hundreds of lines of code. Since each function does only one thing, and the names are very clear about what each function does, we only have to call the functions that execute the steps we need. Any other programmer that later looks at the new code will understand what the new function does. There will be no need for commenting the new function or going into a deeper abstraction level since the name is very clear, and the functions called are already known.

Writing small functions, that deal with a single level of abstraction will help you write code that is cleaner, easier to understand and easier to modify. This can be a difficult change to make since most of use are used to writing longer (often much longer) functions. However, it will be worth it.

 

Functions – Smaller Is Better

This will be the first post covering the topic of functions. So far, this has probably been the approach to clean code that has been the hardest for me to wrap my head around. Not because it’s difficult, but because it is opposite to the way that I have been writing functions for my entire coding career. In this post we will start to build the argument for small functions.

How most functions are written

Think about the last time you sat down in front of your computer to write a function. Hopefully, you dedicated some time beforehand to think about it, until you had a pretty good idea of how to write it. Maybe you felt brave that day and decided to just start coding and let the steps show themselves to you. In any case, you opened your text editor or IDE and started typing.  After some time you sit back and marvel at your work. The code seems to be clear and efficient, even though the logic was somewhat complex, and you did it in only 200 lines of code! You even commented the key parts of the code, in case someone has to modify it in the future. You consider it a job well done and move on to the next task at hand.

Make your functions small! Smaller Than That!

Why would you want to make the function in the previous example smaller? How could you even make it smaller if you wanted to? One answer to the first question would be to make the code even easier to understand. The answer to the second question is to extract the different steps taken inside the function to separate functions until you can’t extract anymore. In his video series, uncle Bob refers to this as “extract ’til you drop”. He even goes as far as to say that functions should be no longer than 6 lines of code. The first time I saw the video, I thought he was crazy for saying that, but he actually managed to make a valid argument for his point of view. While I can´t say that all the functions I write now adhere to his guideline, they have certainly gotten a lot smaller.

How can we make the functions smaller? How do we know which parts to extract into which functions? Future posts will help you learn how to do this, but it boils down to make sure that each function does one thing.

Do One Thing

Functions should do one thing. They Should do it well. They should do it only.

Writing functions that do one thing is the key to having small functions. The definition of doing one thing can sometimes be tricky.  If you were writing a shopping cart application, someone could argue that placing the customer’s order is one thing. Someone else would say that it’s not one thing because placing an order actually involves updating the customer’s records, reducing the inventory, notifying the shipping company that there’s a package, etc. So who’s right? Actually, they both are, only that they are right for different abstraction levels.

We will continue this discussion in the next post, which addresses abstraction levels. Stay tuned!

Meaningful Names – Other Tips

As a finish to the meaningful names part of our journey to produce clean code here are some other general tips mentioned in the book.

Use Pronounceable Names

Use names that you can pronounce. If you’ve followed the previous guidelines given, then most of your names should be clear and easy to say. Talking to others about a certain piece of code shouldn’t sound like a conversation in Klingon (unless that’s what you’re aiming for).

Use Searchable Names

What was the name you chose for the variable that had the account balance? Was it acctbal, acctblnc or actbal? If you choose meaningful names, then you can be sure that searching for the terms account or balance will probably point you in the right direction towards finding that variable.

Avoid Encodings

There is no need to encode information into the name we choose for a variable, class or method. Most of us now use IDEs or text editors that have some kind of intellisense. If we want to know more information about something it’s as easy as hovering your cursor over it.

Class Names

Class Names should be nouns.

Method Names

Method names should be verbs

Pick One Word per Concept

This is one that I fail at constantly. Choose one word to represent something and stick to it. Don’t use Update for some methods and Modify for others. The same goes for Remove/Delete, Read/Get, Set/Assign, etc.

Use Solution Domain Names / Use Problem Domain Names

When there is a common programming or computer sciences term for what you are doing, then you should use it. Terms like observer, queue, decorator, etc. Are widely known and used by programmers and can make it very clear what it is you’re trying to do. When there is no programming term that fits the name you want to assign, then use the Problem domain terms. Using problem domain names is more common when you’re dealing with higher level concepts of your software, while programming terms become more common as you get into the deeper levels of your code.

Remember, names are important!

After reading through all these posts, hopefully I’ve managed to impress upon you the importance of dedicating some time to choosing the names you will use in your code. I know that for my case, it has saved me a lot of time in the long run. However, don’t be afraid of choosing a name and later finding out that it isn’t the best one. Sometimes you will only discover this as you’re coding, and you can always use a refactoring tool to change it to the new name.