For Fun

MTSkull · Nov 2, 2006

I like to try programming challenges to keep the skills up and the mind sharp. One I tried recently was coding a prime number generator in C#. It works well and timing with a basic Tickcount timer it will generate primes to 1000 in about 16 milliseconds. 10000 in 125 milliseconds and 100000 in 10047 milliseconds (on my slowly helpful machine). Since the algorithm uses 2 for loops the execution time is a logarithmic function, if I remember right from algorithm theory. So here is what I came up with as the fastest method I could devise. Any tips to try and make it faster. I tried it up to 1000000 and broke execution since I got tired of waiting.

C#:

        static void Main(string[] args)
        {
            //Console Shell to mess around in

            Program pMain = new Program();

            //Basic: Count File lines in a given text file at the root of c
            //pMain.File_LineCount(args);

            //Basic: Permute a string (not working correctly)
            //pMain.Permutations(pMain, args);

            //Int: Dec to Bin(string)
            //pMain.DecToBinary();

            //Int: Primes
            pMain.Primes(pMain, args);

        }

        #region Primes
        public void Primes(Program pMain, string[] args)
        {
            string UserInput = "";
            Int64 Num = 0;
            string Result = "";
            int Start = 0;
            int Stop = 0;
            

            while (UserInput != "Q")
            {
                Console.WriteLine();
                Console.WriteLine("Input a number to find Primes or Q to quit: ");
                UserInput = Console.ReadLine();

                if (UserInput.ToUpper() != "Q")
                {
                    Start = Environment.TickCount;
                    Num = Convert.ToInt64(UserInput);
                    for (Int64 x = 2; x <= Num; x++)
                    {
                        if (TestBrute(x) == false)
                        {
                            Result = Result + "," + x.ToString();
                        }
                    }//end for
                    Stop = Environment.TickCount;
                }//end user input if
                Console.WriteLine(Result);
                Console.WriteLine(Convert.ToString(Stop-Start));
            }//end while
        }
        public bool TestBrute(Int64 Num)
        {
            Int64 Half = Num / 2;

            if (Num == 2 || Num == 3)
                return false; 

            for (Int64 x = 2; x <= Half; x++)
            {
                if (Num % x == 0)
                    return true;
            }
            return false;
        }

        #endregion

mskeel · Nov 3, 2006

Dude...that's awesome.

I used to mess with this kind of stuff all the time.

Here are some different ideas I've come up with over the years that may want to try out.

1. Your for loops should be iterating at i+=2 instead of i++. Think about it, other than 2, you'll never get an even prime number. Go ahead and skip all the even numbers.

2. Rather than going up to half of the target number, you could go up to the square root of the target number. You know right of the bat if the floor of the square root == the square root, then the number is not prime (because it has an integer square root). But even if the square root is not an integer you still don't know yet, but this should be as high as you have to check. Here's a quick proof: Say the square root of x is n and n is not an integer and x is not a prime. Then for any other two integer multiples of x, i and j, it is the case that i < n < j because any factorization must be up of integers. end proof. Your goal is to find i. Try this out on x = 105 as a trivial example. It has to do with prime factorizations and the fact that at least one prime multiple must be less than the square root of a number if that number is non-prime.

3. Which brings me to my last algorithm change, again related to prime factorization. Basically, if you were to save every prime number that you found as you go through the list, you can save amazing time by first iterating through that list and checking that a number does reduce to a prime factorization. If it doesn't, you can probably say that the number is a new prime, but I don't have a proof handy. You could even apply suggestion 2 above to suggestion 3 for an even greater reduction...

You might be able to make further optimizations in how the code is written, but I'm pretty sure the compiler does a good job of loop unrolling and in-lining functions for you. It might be something to check after you're satisfied with your algorithmic changes.

snarfblam · Nov 3, 2006

Those are some pretty good suggestions. I would never have come up with the square root one. But I'm going to elaborate on #1 and throw another idea out there.

When dealing with numbers higher than four, instead of incrementing by two each time, you could increment by four, then two, then four again, then two again, and so on. The reasoning is similar to that of mskeel's. You can count by two because every other number is not prime, being a multiple of two. If you take the same kind of consideration for multiples of three you will see that for every six sequential numbers, only two are potential candidates for prime numbers (the first and the fifth, if you start at 1). You could expand it even more for dealing with multiples of five but at that point you need to do a lot of copy-and-paste coding or use clever algorithms to determine how much to increment the counter by each time.

My other suggestion would be to use Int32s instead of Int64s when dealing with numbers less than two billion. They perform much faster on a 32-bit computer.

MTSkull · Nov 6, 2006

Great suggestions.

I had tried looking for multiples of two three and five, but using the counter to skip these numbers is a killer idea. I am burning to play with these suggestions now, unfortunately all of the free time I had been using to do this just dried up. The cool thing about code is even when I am screwing around with stuff like this it still looks like I am working

Gimmie a day or 2 and I will let you know how it goes.

Another thing I found interesting was, I ran the executable on several different computers of varying processor speed, mother board and memory configurations and the elapsed time counter was not very different from machine to machine. From my slow crappy work computer to my killer number cruncher at home. Does this have to do with the Dot Net overhead requirements?

Thanks
Brian

snarfblam · Nov 6, 2006

Dotnet applications are JIT-compiled. Only the smallest portion of time is spent compiling code for a CPU-intensive and relatively simple operation like this. The code is running in good ol' X86 with very little interaction with the CLR from start to finish. One thing to take into consideration is that when your code is JIT-compiled it is optimized for the computer on which it is running.

Also, certain bottle-necks that are often encountered in computing do not become an issue here. There is no disk/Internet/ or special hardware I/O going on here. There is no graphics rendering going on. So your work computer's poopy graphics card and slow hard drive and crappy network aren't going to slow you down one bit. In fact, with code this simple and so few variables you won't be dealing with many page faults and there will be very few CPU branch mis-predictions. In other words, all the things that speed your nift-o-tastic gaming PC up become irrelevant.

MTSkull · Nov 10, 2006

I spent this morning on the roof messing with antennas and when I came down I had 20 minutes to kill before lunch so I messed with some of the suggestions. I set both for loops up to skip even numbers. This worked great and as expected cut the processing time roughly in half.

Implementing int32's gained another 20% speed boost as well.

Will try some of the more creative techniques soon.

Marble-Eater
That makes sense when you think about it. Not a whole lot going on in the code.

Thanks
MT

MrPaul · Nov 12, 2006

Prime proof!

mskeel said:
3. Which brings me to my last algorithm change, again related to prime factorization. Basically, if you were to save every prime number that you found as you go through the list, you can save amazing time by first iterating through that list and checking that a number does reduce to a prime factorization. If it doesn't, you can probably say that the number is a new prime, but I don't have a proof handy. You could even apply suggestion 2 above to suggestion 3 for an even greater reduction...

Layman's proof:

Lets call the number you are testing T and the largest prime you have so far P (such that T > P). Now, assuming your algorithm has not missed any primes between P and T:

Take any number Q0 between P and T. Since Q0 is not prime, it has just one factor if and only if that factor is a prime (which would be sqrt(Q)). If it has other factors, call them Q1 and Q2. Substitute them for Q0 and repeat. Eventually either Qn becomes prime, or Qn has only one factor, which happens to be prime. Either way, it is clear that Q0 is not prime. Therefore, to test if T is prime, you need only try to divide it by known prime numbers up to the sqrt(T), as every non-prime number is a product of just prime factors.

Wikipedia said:
The fundamental theorem of arithmetic states that every positive integer larger than 1 can be written as a product of one or more primes in a unique way, i.e. unique except for the order. Primes are thus the "basic building blocks" of the natural numbers.

I believe mskeel's suggestion will grant you the greatest speed gains when working with larger primes.

I also read this on Wikipedia and thought it interesting:

Wikipedia said:
If p is a prime number greater than 6 then p mod 6 is either 1 or 5 and p mod 30 is 1, 7, 11, 13, 17, 19, 23 or 29

There is no proof, but if true, it provides a very quick way for testing primes.

Good luck

grip003 · Nov 13, 2006

I threw this together, let me know how it compares...

Code:

DateTime start_time, end_time;
long num_primes, loop_max, test;
int table_idx;
bool is_prime;

ResetPrimes();
primesGrid.DataSource = null;
start_time = DateTime.Now;

try
{
	num_primes = Convert.ToInt64(txtNumPrimes.Text);
	loop_max = (long)Math.Sqrt(num_primes);
	for (int i=3; i<=num_primes; i+=2)
	{
		loop_max = (long)Math.Sqrt(i);
		table_idx = 0;
		test = Convert.ToInt64(prime_tbl.Rows[table_idx][0].ToString());
		is_prime = true;
		while (test <= loop_max)
		{
			if (i % test == 0)
			{
				// Not a prime
				is_prime = false;
				break;
			}
			test = Convert.ToInt64(prime_tbl.Rows[++table_idx][0].ToString());
		}

		if (is_prime)
		{
			prime_tbl.Rows.Add(new object[] { i } );
		}

	}
}
catch (Exception ex)
{
	MessageBox.Show("Exception!!\n" + ex.Message + "\nCancelling operation.", "Cancelling Operation", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
	return;
}

end_time = DateTime.Now;
TimeSpan ts = end_time - start_time;
primesGrid.DataSource = prime_tbl;
MessageBox.Show(string.Format("Operation took: {0:00}:{1:00}:{2:00}:{3:0000} to " + "complete.", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds), "Operation Finished", 				MessageBoxButtons.OK, MessageBoxIcon.Information);

I have a text box (txtNumPrimes) that you enter the # of primes you want to generate, and the result is in the prime_tbl DataTable (which is connected to the primesGrid.DataSource for viewing the primes). More optimization can be done with the outter loop increment, but this is ok. It runs slower because of the windows form and using a DataTable, but I only had a half hour to play with this.

MTSkull · Nov 13, 2006

Wow

Score so far

My Original Algorithm = 13.2 minutes for Primes to 1Million
Loops incrment at "x+=2" = 7.6 minutes for Primes to 1Million
Int 64s changed to int 32s = 6.1 minutes for Primes to 1Million
Test Max limited to Sqrt(numTested) = 2.1 minutes for Primes to 1Million

I checked the results to 1000 using an internet source http://primes.utm.edu/lists/small/1000.txt and assumed everything after that was okay.

Wow thanks for the great hints I do not see how it can get much faster.

MT

MrPaul · Nov 13, 2006

Use StringBuilder!

For primes to one million:

MTSkull's original algorithm: 11m 53s
Both loops using += 2: 6m 21s (53.5%)
Int32s instead of Int64s: 5m 7s (43%)
Testing up to Sqrt(num): 1m 5s (9%)
StringBuilder instead of string: < 1s (< 1.5%)
Testing against previous primes only
Removing all string work, using a list only

The last algorithm, where primes are stored only in the list, not in a string, was able to compute primes up to one hundred million in less than four minutes! 5,761,455 primes were found! Randomly selecting ten numbers from the results showed that they were all primes.

Notice the huge gains from using a StringBuilder instead of a string. That highlights a significant bottleneck in the code.

snarfblam · Nov 13, 2006

I don't think that time spent constructing strings should be included in the processing time period. That is a whole different thing and a whole different process of optimization. Data should probably be stored, whether in an array of ints or a List<int> or what, during the course of calculation. If you include string construction in the loop then the results become distorted, and as you optimize it can be difficult to tell how much the optimization helped which part (calculation or presentation) and where you could optimize more.

A big benefit to separating the two tasks is that the prime calculation code will be more centralized and cut down on things like page faults and branch mis-predictions. In other words, just separating calculation and presentation, without otherwise optimizing, could give you a small boost.

MTSkull · Nov 14, 2006

Holy Crap! I tried implmenting it with a String Collection and it knocked the time right down, 1 million in less then 900 milliseconds. Then i tried using an array list of ints and shaved anouther 100ms off the time.

Thats why I post here. I have learned more about the finer workings of dot net on this little project then I would have thought possible. If you look at the original algorithm I placed the start and stop time captures before and after the for loop, to capture that time only. So any gains should be measured direct on the algorithm and how it stores the data. This is a great way to measure diffrent techniques and see which is truly the best.

MT

mskeel · Nov 14, 2006

Adapted from here. It computes the first one million primes in under 200 milliseconds (100 million in less than 1 minute) on my machine. The reduction comes from removing the modulus operator. This algorithm uses a trick similar to the one marble_eater proposed earlier. This is pretty clever. I don't know that I would have come up with this one on my own.

C#:

        void findPrimes(int max)
        {
            int[] primes = new int[max + 1];

            //initialize the list of candidate primes
            for (int i = 0; i < max + 1; i++)
            {
                primes[i] = 1;
            }
            primes[0] = 0;
            primes[1] = 0;

            //mark the non-primes
            int currentFactor = 2;
            while (currentFactor * currentFactor <= max)
            {
                //mark all multiples of this factor as non-prime
                int mark = currentFactor + currentFactor;
                while (mark <= max)
                {
                    primes[mark] = 0;
                    mark += currentFactor;
                }

                // set the current factor to the next non-prime
                currentFactor++;
                while (primes[currentFactor] == 0)
                {
                    currentFactor++;
                }
            }

            //Print out the list.
            for (int i = 0; i < primes.Length; i++)
            {
                if (primes[i] == 1)
                {
                    this.listBox1.Items.Add(i.ToString());
                }
            }
        }

MrPaul · Nov 14, 2006

Size vs speed

That is interesting mskeel, although it has a major drawback in that it requires an int array as large as the input domain - to find primes up to 1 billion say, you'd need approximately 4GB of memory! Using bytes would reduce the memory cost four-fold but decrease the speed. Using bits would reduce it thirtytwo-fold (to 'just' 134 MB) but at a much greater loss of speed.

Yet another size vs speed tradeoff.

mskeel · Nov 14, 2006

Scalability is an issue. I'm pretty sure you could mitigate a lot of the reduction in speed of moving to a bit array with pointer arithmetic and bitwise operations...but that would be at the sacrifice of readability/maintainability. So many trade-offs to choose from.

If we're looking for big prime numbers I think this algorithm is out, but I just thought I'd toss it out there because it was different than anything that's come up so far and was interesting. You'll definitely run out of memory with this algorithm long before you hit the unsigned integer max.

MTSkull · Nov 14, 2006

Hmmm

Wow faster again. I just ran 1M in 157milliseconds. as opposed to 800 for the other method.

For Fun

MTSkull

Centurion

mskeel

Senior Contributor

snarfblam

Ultimate Contributor

MTSkull

Centurion

snarfblam

Ultimate Contributor

MTSkull

Centurion

MrPaul

Contributor

grip003

Regular

MTSkull

Centurion

MrPaul

Contributor

snarfblam

Ultimate Contributor

MTSkull

Centurion

mskeel

Senior Contributor

MrPaul

Contributor

mskeel

Senior Contributor

MTSkull

Centurion