Table 31.2 shows the relative performance of the different synchronization primitives under conditions of no lock contention.
Note that the size and performance differences between a mutex and a recursive mutex are negligible, so you should use a non-recursive mutex only if you have a very small critical region that is accessed repeatedly in a tight loop. Similarly, monitors perform much the same as mutexes (but do have a size penalty).
Read-write locks incur a substantial cost in both size and speed (especially under Windows), so you should use a read-write lock only if you can truly exploit the additional parallelism afforded by multiple readers and if a substantial amount of work is performed inside the critical region.