[Editor: a short answer was provided by UBC cosmologist Douglas Scott as follows.
The simplest explanation is this: if the source is exactly behind a circular lens, then you get an Einstein ring. But if the gravitational lens is elliptical (as is often a good approximation for a galaxy in projection) then there are 4 equivalent ways for the light rays to bend round the lens.]
Dr. Unruh answers: For cases with matter (not black holes) forming the lens, there are always an odd number of images. The lensing is a continuous function from the object sphere to the image sphere, which may be multivalued. I.e., you can consider the object sphere as a crumpled, folded version of the image. But this means that each fold can only introduce two extra images at a time. So there should in theory always be a total of an odd number (including the original). However this does not take into account either absorption or amplification. One of the paths of light for one of the images can go through a region of lots of matter which absorbs the light. Furthermore, the lensing can increase the intensity of the image, and thus some of the images can be much brighter than others.
If the images are very near the Einstein ring (for a spherical distribution this is where the source is directly behind the lensing matter and the light deflected by the lens all converges back to the observer) any image near there will be very bright. Thus I believe with Einstein's Cross the four images are very near the "Einstein ring" and thus very bright. That there are 4 rather than two is because of the lack of spherical symmetry of the lens, meaning you do not get a ring, but rather four bright images. The fifth (central) image is due to light going directly through the lensing galaxy, is not amplified, and is also probably absorbed.
The mass in a spiral galaxy is not in the arms but in the dark matter which forms a roughly spherical (or elliptical) distribution around the galaxy. That is, the visible matter (the stuff that shines) is a bad tracer of the mass, which causes the lensing. Now if the dark matter halo is roughly elliptical, you can easily get four images, instead of something like the "Einstein ring". Also the impact parameter of the light forming the images is quite a bit outside the light part of the galaxy.
[Editor: There's an animation on Wikipedia that graphically depicts Einstein Ring lensing effect.
As Jay Reynolds Freeman said on the amateur astronomy website Adventures in Deep Space: "Yet what a thing to see, even if only at the limit of visibility. The quasar whose image forms the cross is some eight billion light years away. The photons I saw left their source long before our solar system was formed, probably long before most of its heavy atoms were even synthesized. When they originated, everything we now see about us was for the most part primordial star-stuff, hydrogen atoms formed in the Big Bang, awaiting nucleosynthesis in suns now long dead, followed by redispersal into the void as planetary nebulae or supernova remnants. One day a few of them, now mere animate debris left over from the condensation of a younger and smaller star, would look back into the abyss, seeking a far-off, distorted glimpse of what the cosmos was like when they were young, and wondering how it would all turn out when they were old. Such is Einstein's Cross." From http://www.astronomy-mall.com/Adventures.In.Deep.Space/crossobsrpt.htm]