On Algorithm Problems: Goldbach and Euler

Problem Name: Goldbach and Euler
UVa ID: 10311
Keywords: number theory, primes, sieve

When you come across a problem titled after two well-known, brilliant mathematicians, you know it’s going to be good :).

What the problem is asking for can be expressed as follows: given an integer n (which can go as high as 10⁸), find —if possible— two prime numbers p₁ and p₂ such that:

\(p_1 + p_2 = n\)
\(p_2 > p_1\)
\(p_2 - p_1\) is minimised

It is clear that for this problem it is necessary to find out which numbers are primes in the range \((0 : 10^8)\). Let’s focus on that first.

As you probably know, one good way to find prime numbers computationally is using the classic Sieve of Eratosthenes (or a similar but possibly faster alternative, such as the Sieve of Atkin). No matter what kind of “sieve” you use, you have to implement it in a way that allows you to store information about 10⁸ numbers, which is not a trivial job.

Let us generalise the problem and say that you want to find all prime numbers up to an integer N. If you write a regular implementation using a boolean (bool in C++ or similar) data type for each integer in the range [1:N] then you would end up using N bytes of memory, which for this problem would mean about 95MB —just “a tad” heavy.

Fortunately, there are ways to reduce the memory requirements with relatively few changes that are easy to code. The main idea is that a boolean value can be stored in a single bit, so you can store 8 boolean values in a single byte. In addition, consider that most prime numbers are odd; in fact, there is only one even prime number which is 2. With this in mind, you can further cut your memory requirements in half by simply not storing information for even numbers (you have to handle the case of 2 separately, of course).

So, with these adjustments, we can now implement a prime-finding sieve that requires only \(N \div (8 * 2)\) bytes. If we store this information in 4-byte integers, then we require only \(N \div 64\) integers. The following is a C++ implementation of this type of sieve, using the standard Eratosthenes algorithm, which has served me well for some time:

const int MAX = 100000000;  // 10^8
const int LMT =     10000;  // sqrt(MAX)

// array of data; each bit tells if a certain number is composite or not
int _c[(MAX>>6)+1];

// we use a STL vector to store the primes
vector<int> primes;

// possibly the most important piece for a bitwise sieve
// these are the macros that check and set a bit
#define IsComp(n)  (_c[n>>6]&(1<<((n>>1)&31)))
#define SetComp(n) _c[n>>6]|=(1<<((n>>1)&31))

void prime_sieve() {
    for (int i = 3; i <= LMT; i += 2)
        if (!IsComp(i))
            for (int j = i*i; j <= MAX; j += i+i)
                SetComp(j);

    primes.push_back(2);
    for (int i=3; i <= MAX; i += 2)
        if (!IsComp(i))
            primes.push_back(i);
}

bool is_prime(int n) {
    if (n < 2 || n % 2 == 0) return false;
    return ! IsComp(n);
}

As you can see, this implementation reserves about \(10^8 \div 64\) integers for the main array, which represents about 6MB of memory, a good deal better than the original 95MB. However, it also uses memory to store the actual prime numbers in a vector. Since there are about 5.8 million primes under 10⁸, then this would require an extra \(4 \times 5.8 \times 10^6\) bytes (about 22MB).

Having solved the problem of calculating and storing the primes, now let’s move on to the problem of finding \(p_1\) and \(p_2\). The first idea that comes to mind is something like simply trying to find \(p_1\) starting with \(n \div 2\) and going downwards, checking that \(p_1\) and \(p_2 = n-p_1\) are primes.

However, it’s easy to see that this simple idea would take too long in practise. There’s one thing that we could do that would improve significantly the complexity of the algorithm, which is not iterating through all integers, not even through odd integers only, but only over the prime numbers. Since we already have a vector full of primes below 10⁸, we can binary search the location of \(n \div 2\) and go down from it. That would make for a worse case of \(5.8 \times 10^6\) iterations instead of \(10^8 \div 2\).

We have improved things but, can we do even better? The discussion found in the problem statement about Goldbach’s conjecture and Euler’s ideas on the subject is very interesting on itself, but it also points to something that could be useful: an even number can always be expressed as the sum of two primes (well, it’s a conjecture, but it has been tested for a range of numbers much larger than \((0 : 10^8)\), so we can assume it’s true for our purposes). This means that we can be reasonably confident that the process of finding \(p_1\) will succeed for most even numbers, hopefully quickly enough that it doesn’t have to check millions of primes. Note that I said most even numbers; this is because we need two distinct primes \(p_1\) and \(p_2\), remember? For example, 6 can be expressed as the sum of two primes (3 + 3) but not as the sum of two distinct primes.

Okay, we have concluded that for even numbers our algorithm may not be so bad, but what about odd numbers? Well, it turns out that for odd numbers the algorithm can be \(O(1)\)! If \(n\) is odd, then one of the two primes \(p_1\) and \(p_2\) has to be even, while the other has to be odd (if you don’t see this clearly, think about how even numbers are integers of the form \(2k\) and odd numbers are integers of the form \(2k + 1\)). Do you see where this is going? If one of the primes has to be even, then that prime would have to be 2 (there are no other even primes). So, if \(n\) is odd we simply check if \(n-2\) is prime, and if it is, then we know that \(p_1=2\) and \(p_2=n-2\), otherwise there is no solution.

We have covered all possible cases now, and after testing it, it produces an answer reasonably fast. Prime numbers are lovely :).

On Algorithm Problems

Pages

Friday, 3 August 2012

Goldbach and Euler

1 comment: