July 16, 2011

Prime numbers from 0 to n

Programming challenges, like the ones found on Project Euler, provide a opportunity to learn algorithms and practice optimizing code. Problem 10 requires that you "Find the sum of all the primes below two million." I decided this was the perfect problem to tackle on a train ride home. The summation portion of the problem is trivial. The fun part was finding all of the primes in a particular range of numbers.

A prime number is a number which is only dividable by one and itself without producing a remainder. That sounds simple enough. I wrote a small program which accepts an integer as a command line argument and prints out all of the prime numbers between 0 and that number.

Case 1

#include <iostream>

include <stdlib .h>

bool isPrime(int num) { bool prime = true; if (num < 2) return false; for (int i = 2; i < num; ++i) { if (num % i == 0) prime = false; } return prime; }

int main(int argc, char *argv[]) { if (argv[1] == NULL) return 0; const int TOP = atoi(argv[1]); std::cout << "Prime Numbers from 0 - " << TOP << ": "; for (int j = 0; j <= TOP; ++j) { if (isPrime(j)) std::cout << j << " "; } std::cout << std::endl; return 0; }

Case 1 was the slowest way that I could dream up of reaching a solution. How could I optimize it? When isPrime() is called, it divides by every single number before the function returns with either a boolean true or false. Once a number has been proven to be non-prime there is no need to continue calculating whether it is prime or not. I replaced the following code:

Case 1

bool isPrime(int num) {
  bool prime = true;
  if (num < 2)
    return false;
  for (int i = 2; i < num; ++i) {
    if (num % i == 0)
      prime = false;
  }
  return prime;
}

with:

Case 2

bool isPrime(int num) {
  if (num < 2)
    return false;
  for (int i = 2; i < num; ++i) {
    if (num % i == 0)
      return false;
  }
  return true;
}

The function is quite a bit faster now. The next optimization was a no-brainer. A number can't be divided by more than half of itself and not produce a remainder. I decided to divide the number of divisors in half by replacing:

Case 2

for (int i = 2; i < num; ++i) {

with:

Case 3

for (int i = 2; i < (num/2)+1; ++i) {

This produced yet another incremental speedup. I sat on the train and had a fuzzy theory pop up in my head. Can the upper limit of a divisor need to be greater than the square root of a number? It was worth a try:

Case 3

bool isPrime(int num) {
  if (num < 2)
    return false;
  for (int i = 2; i < (num/2)+1; ++i) {
    if (num % i == 0)
      return false;
  }
  return true;
}

Replaced with:

Case 4

#include <math .h>

bool isPrime(int num) { if (num < 2) return false; double maxDivisor = pow((double)num+1, 0.5); for (int i = 2; i <= (int)maxDivisor; ++i) { if (num % i == 0) return false; } return true; }

The speed increase that case 4 provided was substantial. By this point I was able to solve the problem in a trivial amount of time. This was when I decided that it would be fun to see how others have solved this problem. I searched around and discovered this guy names Eratosthenes (maybe you've heard of him). After marveling at the fact that Eratosthenes was able to solve this problem efficiently more than 2000 years ago I dug into learning about the Sieve of Eratosthenes.

Essentially, Eratosthenes was able to efficiently calculate a series of prime numbers by using previous primes in order to help calculate future primes by eliminate non-primes. I then remembered learning about it in high school but it was long gone by the time I reached this problem.

Case 5

#include <iostream>

include <stdlib .h>

int main(int argc, char *argv[]) { if (argv[1] == NULL) return 0; const int TOP = atoi(argv[1]); std::cout < < "Prime Numbers from 0 - " << TOP << ": ";

bool* sieve = new bool[TOP+1]; std::fill_n(sieve, TOP+1, 1); sieve[0] = 0; sieve[1] = 0; for (int i = 2; i <= TOP; ++i) { if (sieve[i] == 1) { std::cout << i << " "; for (int j = i; j <= TOP; j+=i) { sieve[j] = 0; } } } delete[] sieve; return 0; }

With larger inputs, this was the most optimal way to compute a range of primes that I wrote up. It uses more memory than the previous methods. There are further optimized algorithms which do things like remove memory for even numbers(which are never prime except for 2). I would have dug deeper and attempted the Sieve of Atkin but that was out of the scope of my train ride.

Performance

Performance