Input/Output Completion Ports (IOCP) for fun and profit

I’ve written a number of C++/MFC programs using the overlapped I/O approach, and it has always been a real pain. Even MSDN doesn’t really seem to like it:

The problem with doing asynchronous I/O with OVERLAPPED function calls like ReadFileEx is that you have to be extremely careful with how you design your program to continue executing with a callback function or by waiting on an EVENT handle to get signaled. The callback function requires dealing with alertable states, which is really error-prone.¬† No, seriously, don’t do it because dealing with the re-entrancy can drive you nuts. Alertable wait states are terribly hard to reason about because it is the non-GUI analog to pumping messages (c.f. Raymond Chen).

So I recently started using IOCP for the same purposes, and I’m happy to report that it works much better! I guess that shouldn’t be a surprise because it’s been recommended by some excellent programmers. Anyway, here’s the rough idea of how I got it to work for me:

  1. Create a background thread with an IOCP handle.
  2. Open a file (or socket, serial port, etc.) with OpenFile and associate it with the IOCP.
  3. Initiate I/O using ReadFile (not ReadFileEx).
  4. Handle both synchronous and asynchronous completion of the I/O.

That last point was the hardest for me to realize that I absolutely have to deal with synchronous, immediate completion of the I/O (not just asynchronous completion from the IOCP). So you really do have to check the return value from ReadFile and also GetLastError() == ERROR_IO_PENDING.

IOCP is so good that it is the basis for C#’s multi-threading I/O. The designers of C# must not have even cared about the old ways to do I/O (e.g., waiting on events, alertable callbacks with APCs, OVERLAPPED callback functions, etc.).

Power digit sum

The Project Euler website appears to have died, so I’m writing a post as a tribute to its memory. An old but fun problem goes like this:

2^15 = 32768, and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 2^1000?

The simplest way to approach this problem is to brute force it in a programming language with big integer support, e.g., Python:

N=2**1000

mysum=0
while N>0:
    N, r = divmod(N,10)
    mysum += r

print mysum

RIP, Project Euler.

Happy Cube Numbers

 

Fun programming: this post is based on a certain Project named for a famous mathematician (Euler), but the exact problem is different than the programming question posted there because it is considered bad form to spoil the solutions. This text should be unattractive to Google searches and people looking to spoil the fun of figuring out a hard problem without help.

I’ll discuss a related problem which has different mathematical properties but the similar programming challenges. The original problem is better known as Happy Numbers. The modification for today’s discussion is a variation called¬†Happy Cube Numbers which has the following new properties: A positive integer i is represented in base-10 and its digits are individually summed and cubed to produce another integer j. The following definition captures this function:

let
\[
i=\sum_{k=1}^{\left\lceil \log_{10}i\right\rceil }10^{k-1}d_{k}
\]

and define
\[
f\left(i\right)=\sum_{k}d_{k}^{3}
\]

If f(i)=i, then i is a Happy Cube Number.

For example,

\[
i=371=10^{2}3+10^{1}7+10^{0}1
\]

Which is a Happy Cube Number because:

\[
f(371)=3^{3}+7^{3}+1^{3}=371
\]

Given this definition, we can already write some code and find some Happy Cube Numbers:

def sumCubedDigits(N):
    mysum=0L
    while N > 0:
        mysum += (N%10)**3
        N //= 10
    return mysum

This python code is very fast, O(log10(i)), and we can use it to exhaustively search for 5-digit Happy Cube Numbers:

print [i for i in range(100000L) if sumCubedDigits(i)==i]

Output:
[0, 1, 153, 370, 371, 407]

From these results, we might conjecture that there are no Happy Cube Numbers above 407. Indeed, this is the case because f(i) < i for most i. Consider all 5-digit integers:

\[
i=[d_{5}d_{4}d_{3}d_{2}d_{1}]_{10}=10^{4}d_{5}+10^{3}d_{4}+10^{2}d_{3}+10^{1}d_{2}+10^{0}d_{1}\leq10^{5}
\]

When running these integers through f:

\[
f\left(i\right)=d_{5}^{3}+d_{4}^{3}+d_{3}^{3}+d_{2}^{3}+d_{1}^{3}\leq5\cdot9^{3}
\]

The latter inequality follows because each of the 5 digits can be at most 9. So f(i) is much smaller than i for most integers, and f(i) cannot be equal to i for any number with 4 or more digits. Therefore, the only Happy Cube Numbers are [0, 1, 153, 370, 371, 407], as stated earlier.

There is one other interesting behavior to note: some infinitely repeating cycles of integers can occur in Happy Cube sequences. Consider:

print [int(sumCubedDigits(i)) for i in [133, 55, 250, 133]]

Output:

[55, 250, 133, 55]

Therefore, this cycle will repeat forever. How can other cycles be identified? Writing some code provides a relatively quick solution. For example, two-cycle pairs of integers can be identified by applying f() twice and excluding the trivial solutions (immediately repeating numbers).

print [(i,int(sumCubedDigits(i))) for i in range(10000L) if sumCubedDigits(sumCubedDigits(i))==i and sumCubedDigits(i)!=i ]

Output:

[(136, 244), (244, 136), (919, 1459), (1459, 919)]

Bigger integers can’t be part of a two-integer cycle because f() decreases too rapidly.

Applying similar logic, we can find three-integer cycles:

s3 = sumCubedDigits
print [(i,int(s3(i)),int(s3(s3(i)))) for i in range(10000L) if s3(s3(s3(i)))==i and s3(s3(i))!=i and s3(i)!=i ]

Note that I took advantage of python’s true object nature to rename the function to s3 to make the command above easier to read. Output:

[(55, 250, 133), (133, 55, 250), (160, 217, 352), (217, 352, 160), (250, 133, 55), (352, 160, 217)]

So if we remove the redundant entries, there are two cycles of length three: (55, 250, 133) and (160, 217, 352), two cycles of length two: (136, 244) and (919, 1459), and five immediately repeating positive integers: 1, 153, 370, 371, 407. Since every iteration of f will rapidly decrease large values of i, we can conjecture that all integers will terminate in one of the above 9 situations.

How do we know that cycles of length four or longer don’t exist? Exhaustively searching 4-digit integers with code similar to s3 above shows that such cycles don’t occur. A different way to think about it is that this entire exercise for 4-digit integers is trying to solve the following Diophantine equation:

\[
10^{3}d_{4}+10^{2}d_{3}+10d_{2}+d_{1}=d_{4}^{3}+d_{3}^{3}+d_{2}^{3}+d_{1}^{3}
\]

Unfortunately, Diophantine equations are very hard to solve. Brute force solutions are often the only ones available. So let’s get back to programming. Among all of the 4-digit integers, how many of them will reach each of the 9 situations? How can we efficiently find the final value or cycle from iterating this process on a given integer?

The first programming tool we’ll need for an efficient solution is a form of caching because there is no point in continuing to iterate when we’ve already found the final value or entered a cycle. Memoization solves the caching problem by avoiding a recalculation of an integer we’ve previously computed. Here’s a straightforward class decorator which takes advantage of the fact that Python objects are true classes:

class memoizer(object):
    '''Decorator. Caches results'''
    def __init__(self,fn):
        self.fn = fn
        self.cache = {}
    def __call__(self,*args):
        if args in self.cache:
            return self.cache[args]
        else:
            val = self.fn(*args)
            self.cache[args] = val
            return val

The next tool we need is some basic recursion and datatypes. We’ll (ab)use the python dictionary to get what we need:

# recurse until a known solution is reached
@memoizer
def IterateCubedDigits(N):
    return IterateCubedDigits(int(sumCubedDigits(N)))

knownValues = {
(1,): (1),
(55,): (55, 250, 133),
(133,): (55, 250, 133),
(136,): (136, 244),
(153,): (153),
(160,): (160, 217, 352),
(217,): (160, 217, 352),
(244,): (136, 244),
(250,): (55, 250, 133),
(352,): (160, 217, 352),
(370,): (370),
(371,): (371),
(407,): (407),
(919,): (919, 1459),
(1459,): (919, 1459)
}

IterateCubedDigits.cache = knownValues.copy()

By only storing the final result, we can avoid using too much memory. The cache uses O(N) entries for N integers, and lookup times will be log(N). That’s a good tradeoff because otherwise, we’d have to iterate many more times to find the final value of each integer.

Finally, we can ask where a given integer goes. E.g., 1337:

print IterateCubedDigits(1337)

Output: [371].

So what happens to the first 20 integers?

list1 = [(i,IterateCubedDigits(i)) for i in range(1,21)]
print '\n'.join(map(str, list1))

Output:

(1, 1)
(2, 371)
(3, 153)
(4, (55, 250, 133))
(5, 371)
(6, 153)
(7, 370)
(8, 371)
(9, 153)
(10, 1)
(11, 371)
(12, 153)
(13, (55, 250, 133))
(14, 371)
(15, 153)
(16, (160, 217, 352))
(17, 371)
(18, 153)
(19, 370)
(20, 371)

371 and 370 look like popular final values for this process. Let’s create a histogram out of the first thousand integers.

d={}
list1000 = [(i,IterateCubedDigits(i)) for i in range(1,1001)]
for (k1,v1) in list1000:
    if v1 in d:
        d[v1] = d[v1]+1
    else:
        d[v1] = 1

hist = sorted(d.items(), key=lambda x: x[1], reverse=True)
print '\n'.join(map(str,hist))

This code requires a little more explanation. Having generated an explicit list of the final values corresponding to each integer, a dictionary is constructed, and it counts the number of values in each of the final values (either a cycle or a single integer). We then sort it by a descending number of counts and print pairs consisting of the final value and the count.

Output:

(153, 333)
(371, 303)
(370, 174)
((55, 250, 133), 66)
((160, 217, 352), 51)
(407, 30)
((919, 1459), 24)
(1, 10)
((136, 244), 9)

It looks like my intuition was wrong, at least for the integers up to 1000. 153 is actually the most common final value (33.3%), as compared to 371 at 30.3% and 370 at 17.4%.

Going up to 6-digit numbers, those percentages are similar: 153 is 33.3%, 371 is 29.5%, and 370 is 18.1%. Surprisingly, the (136, 244) cycle is less popular than having 1 as a final value.

Output:

(153, 333333)
(371, 295281)
(370, 181041)
((55, 250, 133), 53370)
((160, 217, 352), 39195)
(407, 38052)
((919, 1459), 32274)
(1, 14953)
((136, 244), 12501)

The execution time of this algorithm was under 2 seconds for computing these values for the first million integers. A slower algorithm would surely fail under these conditions.

Conclusions: I explored an interesting number theory problem and produced some concise python code which solves the problem very quickly. Algorithmic and mathematical analyses were used to produce correct results with minimal computation.