natechoe.dev The blog Contact info Other links The github repo

A struct trilemma

A lot of times a structure will have two different properties that are very closely related to each other. For example, you might have a Rectangle class with a width, height, and area property. There are a bunch of ways of implementing this class, the most obvious one is probably something like this:

class Rectangle {
	private int width;
	private int height;
	private int area;

	public Rectangle(int w, int h) {
		width = w; height = h; area = width*height;
	}
	public int getWidth() {
		return width;
	}
	public int getHeight() {
		return height;
	}
	public int getArea() {
		return area;
	}
	public int setWidth(int w) {
		width = w;
		area = width*height;
	}
	public int setHeight(int h) {
		height = h;
		area = width*height;
	}
}

Of course, now we have some code duplication in that area calculation. If the area calculation was much more complicated than a single line of code you could imagine that this would create some very unclean code. We could fix that by getting rid of the area variable entirely and making the getArea function implicitly calculate the area, like this:

class Rectangle {
	private int width;
	private int height;

	public Rectangle(int w, int h) {
		width = w;
		height = h;
	}
	public int getWidth() {
		return width;
	}
	public int getHeight() {
		return height;
	}
	public int getArea() {
		return width*height;
	}
	public int setWidth(int w) {
		width = w;
	}
	public int setHeight(int h) {
		height = h;
	}
}

That's fine, but now our getArea function has grown from a single memory read to two memory reads and a multiplication.

We could maintain efficiency and establish an upper bound on our code duplication by moving the area calculation code to its own function, like this:

class Rectangle {
	private int width;
	private int height;
	private int area;

	public int getWidth() {
		return width;
	}
	public int getHeight() {
		return height;
	}
	public int getArea() {
		return area;
	}

	private void calculateArea() {
		area = width*height;
	}

	public int setWidth(int w) {
		width = w; calculateArea();
	}
	public int setHeight(int h) {
		height = h; calculateArea();
	}
}

Now, if the area calculation ever gets more complicated due to unforeseen requirements, we can just extend the calculateArea function to keep everything cached.

This code does make it a bit easier to introduce bugs, though. For example, I might forget to include the calculateArea function in one of my setters.

class Rectangle {
	private int width;
	private int height;
	private int area;

	public Rectangle(int w, int h) {
		width = w;
		height = h;
		calculateArea();
	}

	public int getWidth() {
		return width;
	}
	public int getHeight() {
		return height;
	}
	public int getArea() {
		return area;
	}

	private void calculateArea() {
		area = width*height;
	}

	public int setWidth(int w) {
		width = w;
		calculateArea();
	}
	public int setHeight(int h) {
		height = h;
		// I forgot to call calculateArea()!
	}
}

Later, I might write some code like this:

Rectangle r = new Rectangle(5, 4);
r.setWidth(6);
System.out.printf("Rectangle area: %d\n", r.getArea());

This seems to work, so I commit and don't realize that there's a bug in production code.

These problems really arise from the fact that my data is mutable. If I get rid of all of my setters, then this code becomes much simpler.

class Rectangle {
	private int width;
	private int height;
	private int area;

	public Rectangle(int w, int h) {
		width = w;
		height = h;
		area = width*height;
	}

	public int getWidth() {
		return width;
	}
	public int getHeight() {
		return height;
	}
	public int getArea() {
		return area;
	}
}

Now that my data is immutable, I don't have to worry about my width, area, and height becoming desynchronized. I just precalculate the area and forget about it.

This highlights a small trilemma when working with this sort of data. You can have fast code, clean code, and mutable data, but you can only pick two.

In most cases, you should sacrifice speed. The marginal cost of calculating your new data is almost always so small that it really doesn't matter.

If you really need to have fast, mutable data, you should probably write unit tests to make sure that you're not screwing anything up.

Getting rid of mutable data is just functional programming, and is probably the coolest solution.