# Archive for December, 2009

### The Sieve of Eratosthenes and F#

Posted by Scott Seely in Uncategorized on December 18, 2009

There is a problem on the Euler project, www.projecteuler.net, which asks to find the sum of all values under a given number. Problems on the Euler project have a range of solutions, where at least one solution has a runtime of under 1 minute. A popular, time efficient algorithm that finds all primes in a given range is the Sieve of Eratosthenes. The basic algorithm is:

- Create an array that contains all values from 2 up to the final maximum value
- Starting at 2, For each value in the array, mark all items that have indices of the current value that are multiples and NOT equivalent to the current index as 0 (eg. index mod [current value] == 0).
- March forward until you hit a non-zero value in the array, then mark all multiples as 0.
- Halt condition: stop when you cross the midpoint of the array (anything remaining that is non-zero is a prime since you already all multiples of 2).

In practice, this gives successive iterations of an array from 2 .. 10 as:

- [2, 3, 4, 5, 6, 7, 8, 9, 10]
- [2, 3, 0, 5, 0, 7, 0, 9, 0]
- [2, 3, 0, 5, 0, 7, 0, 0, 0]
- At this point, we reach 5, which is greater than the length of the array (9) divided by 2 (4.5). Any non-zero values in the array are prime.

In C#, one possible implementation is:

private static long SieveOfErastosthenes(long maxvalue) { long[] values = new long[maxvalue]; // Populate the list for (long i = 0; i < values.LongLength; ++i) { values[i] = i; } values[0] = 0; values[1] = 0; for (long i = 2; i < values.LongLength; ++i) { if (values[i] != 0) { for (long j = i * 2; j < values.LongLength; j += i) { values[j] = 0; } } } long retval = 0; foreach (long k in values) { retval += k; } return retval; }

Overall computational complexity is O(n). If you aren’t familiar with big O notation, the overall complexity is bound by the size of the dataset times a constant-in this case 3, since we iterate through the complete dataset 3 times:

- Populate the dataset
- Mark all non primes as 0
- Sum the primes

And yes, this works quite well, even for large values of n. My challenge was to get the sieve to work with performance equal to C#, only in F#. At the start of this, I had my first pass of the C# code running in ~600 ms for a value of 5 million. The challenge I set for myself was to achieve a similar timing in F#. The first few passes left me with runtimes of over 15 minutes (I killed the runs prior to completion). The reason why was simple: I wrote an O(n^3) algorithm. My challenge was to get this faster without using mutable values. My first pass looked like this:

let rec sieveofErastosthenes values filterout maxval = if (filterout >= (maxval/2)) then values else let shouldFilter = try let a = values |> List.find(fun x -> x = filterout) a <> 0 with | _ as ex -> false if (shouldFilter) then sieveofErastosthenes (values |> List.filter(fun x -> (x = filterout) || ((x % filterout) <> 0))) (filterout + 1) maxval else sieveofErastosthenes values (values |> List.find(fun x -> x > filterout)) maxval let max = 5000000let sumofprimes = sieveofErastosthenes [2 .. max] 2 max |> List.map( fun x -> int64 x) |> List.sum

This code is correct-pass in 10 for the max and the value returned is 17. But the answer just never returned for larger values within reasonable time. Next, I tried using a mutable type: System.Collections.Generic.List<T>. This got the time down to a measureable 4s. That’s still 6.5x longer than the C# version. That next attempt looks like this:

let sieveOfEratosthenes x = let retval = new List<int32>() retval.AddRange([0 .. x]) retval.Item(0) <- 0 retval.Item(1) <- 0 for i in 2 .. x do if retval.Item(i) <> 0 then for j in 2 * i .. i .. x do retval.Item(j) <- 0 retval.ToArray() let max = 5000000let sumofprimes = sieveOfEratosthenes max |> Seq.map( fun x -> int64 x) |> Seq.sum printfn "%An" sumofprimes

Notice that this latest iteration executes a lot less code and, frankly, looks pretty close to the C# code. Through some investigation, I found that the call to preload the array consumed 2.6 s. I then switched to a sequence and shrank the execution time to 2300 ms. I was closing in on my goal and learning some stuff about the performance of various F# constructs. Loading using a sequence changed the AddRange call to this:

`retval.AddRange([|0 .. x|])`

The reason this takes less time is simple. When allocating a List, F# allocates and initializes all values up front. The Seq is created with an iterator and only allocates enough space to store the iterator and not much else. So far, so good. At this point, I did a search on F# implementations of the Sieve of Eratosthenes and found that Chris Smith had one. His implementation keeps track of which numbers AREN’T prime. Performance on his algorithm hits 8600 ms. That wound up being a step in the wrong direction.

Next, I tried, explicitly initializing the list of integers and cut the time down to 1450 ms. The strategy was getting me closer to optimal timing and showed that, for 5 million int32s, I was filling in the list in 46 ms.

let sieveOfEratosthenes x = let retval = new List<int32>(x + 1) for i in 0 .. x do retval.Add(i) retval.Item(0) <- 0 retval.Item(1) <- 0 for i in 2 .. x do if retval.Item(i) <> 0 then for j in 2 * i .. i .. x do retval.Item(j) <- 0 retval |> Seq.filter(fun x -> x <> 0)

This was better, but I’d like to get closer to the C# version. The last thing I tried was moving to Array from System.Collections.Generic.List. This last change cut 250 ms, bringing my times down to 1200 ms. It looks like using lighter weight objects really will speed things up! That version is:

let sieveOfEratosthenes x = let retval = Array.zeroCreate(x + 1) for i in 0 .. x do retval.[i] <- i retval.[0] <- 0 retval.[1] <- 0 for i in 2 .. x do if (i % 2 <> 0) && retval.[i] <> 0 then for j in 2 * i .. i .. x do retval.[j] <- 0 retval |> Seq.filter(fun x -> x <> 0) let max = 5000000let sumofprimes = sieveOfEratosthenes max |> Seq.map( fun x -> int64 x) |> Seq.sum

### Greatest Prime Factor in F#

Posted by Scott Seely in Uncategorized on December 17, 2009

Today I was reading an article in The Onion, Conquerors You May Have Missed, and noticed that the number for the ant looked like it might be a big old prime, or at least have a large prime divisor. (For reference, the ant is # 43,168,974,563,247.) There are a number of algorithms for finding this answer, but my favorite is a little brute force algorithm that keeps dividing the big number by some other value until the two numbers are equal. Yes, I know I can short circuit a bunch of testing via a square root function, but BigInteger doesn’t have a sqrt function (yet.).

Anyhow, I cobbled this together and it worked right out of the gate:

let rec largestFactor x y = if (x = y) then y else let z = x % y if z = 0I then largestFactor (x/y) y else largestFactor x (y + 1I) let LargestPrime x = largestFactor x 2I let a = LargestPrime 43168974563247I printf "%A" a

Answer: 84,826,177

So, the number isn’t prime, but the largest prime factor is pretty big! And, writing up the code to figure out the answer itself was pretty simple.