Your Daily Geekery

modulo of negative numbers

ruby,java,c,python

The modulo operator returns the remainder of a division. In programming it's often used to restrict an index to the bounds of an array or length limited data structure.

values = [ 1, 2, 3 ]
value = [ index % values.length ]

Whether this really is the ultimate guard against an invalid index is revealed when trying negative numbers. What does -1 mod 13 really mean?

Turns out there is no exact definition. In fact it depends on the language you are using....

read more

master slave consistency

rails,mysql,scalability

Scaling out mysql (or any other database for that matter) means creating a master slave setup where the slaves replicate information from the master. This naturally creates replication lag. Depending on the load of the systems the slaves will be behind master anything between seconds or even minutes and more. This of course creates problems in the application tier. When data consistency in the user interaction is required the application needs to fall back to the master. When the user changes a setting the subsequent request should not show stale data from a slave that is not up to speed yet.

For Rails multiple master slave adapters are available that act...

read more

top-k with mapreduce

mapreduce

Providing time series data about plays/views/visits is a rather simple problem compared to providing top-k information for arbitrary time windows. Most research found in this area has a different focus of what the usual web startup needs. The following approach seems to work fairly well.

The Problem

For exact order information an ordered list with all counters has to be maintained.

track1, 1040 plays
track2, 100 plays
track3, 10 plays
...

Of course this does not scale if you add another dimension into the mix. Adding...

read more

time series data

mapreduce

Many sites provide statistics for their users contents. And many of them provide a graph of how many plays/views/visits the content got over time. This requires a quick aggregate of counts over a certain time window.

One way to provide this information in the short lifespan of a web request is to offload the aggregation from "on read" to "on write". Instead of summing the number of raw events "on read", the aggregates are kept up-to-date....

read more