It sounds like everyone's using Mongrel clusters in production for Rails apps these days, with one mongrel per CPU (see Alexey Koyvrin benchmarks for a bunch of typical configurations), but this seems a little limiting to me. If your controllers aren't optimized to suck 100% CPU all the time, your CPUs will be underused. It basically means you can't do any web service calls while servicing a request, or you'll block other clients.
IMHO Apache gets it right - "preforking" a bunch of processes, which are then used to serve requests. If a lot of requests come in, it forks off a few more processes to handle then, then reaps them as the load dies down. This means it can handle many simultaneous connections, without reserving much more memory than necessary. [ Of course, this usually means that your system will die when you get lots of connections unless you've set MaxClients conservatively, but at least you'll have some spare RAM for the disk cache when load is light ;-) ]
Hongli Lai has written a script to prefork a bunch of Rails processes, and has a good analysis of the other point of preforking: your server processes get to share memory, which can be fairly significant in the case of an interpreter running something with a decent sized library.
However, he found out a couple of months later that Ruby's garbage collector marks all pages as dirty, which breaks copy-on-write (as each child process will end up copying everything eventually, whether modified or not), then it wasn't particularly easy to fix that problem. He's since done a lot of hacking on Ruby's allocator/GC and it seems like things are moving along nicely. Looking forward to seeing the next post in the series.
Anyway, back to the original point -- where's the flexible preforking server for Rails? Is that what you end up with if you use FastCGI under Apache? Or does the (apparently quite large) memory footprint of a set of Mongrel/Rails processes make it prohibitively expensive to run more than four at a time?
Along those lines, TextDrive suggests that RAM usage of 30MB per FastCGI dispatcher is typical, and it can get up to 70MB+ if you use RMagick. MethodMissing.com (Lourens Naude) suggests more like 100-130MB.