Tuesday, December 25, 2018

Data Neutrality

Persisting data is a double-edged sword.  The benefit is that the data continues to exist after power is lost.  The cost is that some space is consumed.  Luckily we live in the "Space Age" where storage space for data is dirt cheap.  Nevertheless, persisting data is still a side effect; it is an action that changes state.  As such, there must exist (at least in principle) a dual action to undo the effect.

When I review code, I want to be aware of all its side effects, and I check if there exists code to undo these side effects.  If an application automatically persists some data, it is not necessary that this application will eventually delete this data automatically, but if that is not the case, then I want it to be the case this application provides the user the ability to delete this data.

Consider collections of persisted data that can grow arbitrarily large.  A common example of this is an application's logs.  If an application will delete (automatically or manually) all such persisted data, then I consider this application "data neutral", which is a play on the phrase carbon neutral.  In this case, the only a constant amount of the application's data will never be deleted.  Otherwise, an application is data positive, which is a bad thing.  Eventually, such an application will consume all of its storage space, even if it began with lots of it.

Don't let your applications be data positive.  Ensure that they are data neutral.