If you do a test like that, be sure to run the test many times under different circumstances. The reason I say so is that there are many factors that cause a program's memory usage to fluctuate. Things such as the fact that the garbage collector in the CLR won't collect the same objects at the same times, and that memory allocation won't be exactly the same when the state of the rest of the computer's memory isn't exactly the same. You'll notice that the memory usage of a program with a single form will drop dramatically (by an order of magnitude) when the form is minimized, and suppose that you open the task manager to view memory usage of method A, then close method A and run method B, and this time unwittingly minimize the window to view the already-open task manager. It would certainly seem as though method B consumes one eighth of the memory as method A, which is obviously not the case.
If the difference is pronounced in all circumstances, I would say the statistics speak for themselves, but generally things such as the order that the programs are run in or even slightly varying contexts under which a program is running can make a difference. Minimize those unforeseen variables.
Besides all that, I can venture a guess about which of the two will consume more resources, but a guess is all I can offer. Using only button-buttons, each button will require more Windows resources: each needs their own window, their own device context, yadda yadda yadda, whereas each toolbar-button requires more managed resources. Based on the assumption that the windows resources would out-weight the managed resources, I would say that button-buttons would use more memory than toolbar-buttons. Again, that's just a guess.