Digital Legacy

There’s a thought that’s occupied my mind lately – working in software, will I ever leave a lasting legacy? A central concept in the software world is building solutions that will inspire new and better solutions, and so the history of the earlier pieces is often lost.

I grew up being taught that anything you put online stays online. This was meant as a privacy safeguard more than anything else, but by the same reasoning I thought that my projects would also stand the test of time. Even if people forgot, the data would be there.

My first software project was an encyclopedia and skill calculator for an online game called Tibia. This was long before wikis were a thing, mind you. The game was still rather basic back then, and users often built tools one could use on the side to compensate for missing features. My inspiration to do this was a user called Loki X, who built a couple of tools to supplement the game. The only one I remember now is the friend list, which fetched a list of the players who are online from the game servers and displayed which of your friends were online. The tool became an integral part of playing the game, and I wanted to make a similar impact.

Today, when I query google for “Loki X” or “Loki tools”, nothing shows up in the results. My trigger for getting into software development has practically been purged from history. When it comes to the tool I made, it has suffered a similar fate. The web server I hosted it from is long gone, and the community site that also hosted it has disappeared as well. Luckily, it seems to live on in some people’s memories. Recently a thread discussing the game showed up on /v/, and I asked if anyone remembered Tomes of Knowledge. To my surprise, someone replied with a picture of their desktop, with my shitty old icon still sitting there!

Both Loki’s tools and my own application served as building blocks in one way or other, but as the game itself declined they became irrelevant. The only legacy I was left with was a simple piece of trivia that can’t even be verified anymore (as the web servers shut down, and I didn’t keep any offsite backups of the code).

This story is from when I was a teenager, long before I became a professional software developer. I’ve worked on a multitude of personal projects since, but none that have had the same reach as this one simple application made by a bored teenager. And even this application disappeared from history.

This post may sound more negative than I intended. It’s not all doom and gloom, as these thoughts have spurred me into finally setting up a proper github account and beginning to share my smaller projects with the world. I’ve kept thinking everything I make on my free time is insignificant, but if it can serve to inspire a better product (or even be integrated into something) then I can finally say I’ve done my part to serve human progress. And that would be something, wouldn’t it?

Identifying batches of consecutive dates in SQL

Disclaimer: This is a solution done in T-SQL aka Microsoft SQL. It may be possible to recreate in other database systems, but I haven’t explored that option yet.

This is a solution to a very specific problem I’ve encountered, which might prove useful to others. A while ago I started looking into the code of our solution for overdue payments (in regards to subcriptions, e.g. the customers are billed monthly but not always paying), which appeared to have some issues. I found an oddly constructed procedure where I couldn’t for the life of me understand why it would’ve been built that way. My first instinct was to try to build it from scratch and see what the major showstopper would be.

The biggest hurdle was identifying each specific cluster of undisrupted payments, as that’s not quite straightforward when looking at the data.

To take you through this process, let’s start off by defining the table we’re working with. I’ll exclude fields that aren’t relevant, keeping this explanation as simple as possible.

SubscriptionID (bigint – Identifier for the subscription whose payment is overdue)AmountDue (float/money)
AmountPaid (float/money)
DueDate (date)

This should be enough to get us going.
Now, on to the previously mentioned hurdle; How do we split the data into groups for each series of completed (or missing) payments?

Our first query gives a list of all the missing payments. This is where we start.

SELECT
i.SubscriptionID,
i.DueDate
FROM Invoices i
WHERE i.AmountPaid = 0 /* let's only look at the completely unpaid ones for now */
AND i.DueDate < GETDATE()

Now comes the clever part – a trick I came across somewhere on Stack Overflow. We want to group each series of recurring payments into its own group, and figure out how long the break lasted and how much was due. There isn’t any obvious variable to group this data by, but we can create one! We know the time series will be increasing by 1 month for each row, and so any deviation from that formula should mean a break in the pattern (and thus a new series).

You may be familiar with the MSSQL function ROW_NUMBER(), which can enumerate rows based on your given criteria. This will give a steadily rising number for the rows we’re already looking at. It won’t care whether there was a break in the dates or not, so we need something more. By calculating the number of months (for a monthly recurring payment – if the intervals are different then this method would need to be adjusted for that) that have passed since a given point in time, we’ll be closing in on our goal.

ROW_NUMBER() OVER( ORDER BY i.SubscriptionID, i.DueDate ) AS rn,
DATEDIFF(month, '1900-01-01' , i.DueDate) AS mnth

These two combined will give us what we need. The number of months minus row number will give the same result for each recurring record, giving us a variable we can start grouping them by!

SELECT
ROW_NUMBER() OVER( PARTITION BY SubscriptionID, Batch  ORDER BY DueDate) AS IntervalsMissed,
DueDate, SubscriptionID,
ROW_NUMBER() OVER (PARTITION BY SubscriptionID ORDER BY InvoiceStartDate DESC) AS rn -- the row with rn=1 would be the most recent

FROM (
SELECT
i.SubscriptionID,
i.DueDate,
DATEDIFF(month, '1900-01-01' , i.DueDate)- ROW_NUMBER() OVER( ORDER BY i.SubscriptionID, i.DueDate ) AS Batch
FROM Invoices i
WHERE i.AmountPaid = 0
AND i.DueDate < GETDATE()
) a

Here we go, the resulting data set gives us a list of every cluster of missed payments our customers have, complete with how many recurring months they missed their payments for each of them. In our case we use this to identify those who are currently in a miss-streak, and ignore the historical data. There’s quite the number of things one can do with such data, but that’s a topic for another day.

Until next time!

An Introduction

Hello, curious reader.
It’s been 5 years since I last kept a blog. My exit coincides with the beginning of my professional career, which I can assume was either due to too much free time during my studies or simply the realization that I actually had more to learn than to teach.
Now, some time later, I feel I’ve done a couple of things that may actually be worth sharing, so I’ll try to get this thing started again.

This blog will first and foremost reflect my professional life, covering everything from server maintenance and shell scripting to software development and interesting database solutions.
I currently work for a NGO where I work with the database and our core applications (and do some minor coding on the side whenever I see a way to improve workflows). I’ve also been part of a two-man team that built and deployed a data warehouse solution in several of our European country offices.

Why “Anticode”?
The name stems from one of my music projects, which I considered a vacation of sorts from my day job. The name stuck, and is markedly more unique than my real name (a quick google search of my name is unlikely to give any results pointing to me).

So there we go. Time to get this thing up and running.