Stacking for Image Quality
What is Image Quality?
This can be quite hard a hard commodity to recognise, especially when presented with a photograph, particularly when the definition is unclear but one is familiar with detracting misfeatures.
However, I'm going to propose a working definition: image quality is a balance comprising: the degree to which pixels (individually and collectively) in an image are derived directly from a ray of light from subject to sensor; an output similar to what the eye might see; it should also be sharp (where the photographer wants it to be sharp) at its normal viewing distance and size.
Any camera has a capability only to respond to a finite dynamic range in the scene, be it slide film with about 5 stops' range or a digital SLR with a range of 8-10 stops or black and white negative film with a potential range of 14 stops. By way of contrast, the typical human eye takes in about the same 5 stops in any one "exposure" but varies the pupil diameter over a total range spanning about 22 stops. The consequences of trying to fit too large a contrast-ratio into a sensor that can only represent a smaller range are that either the shadows will submerge to black and/or the highlights will blow to solid white. These are failures insofar as the eye does not see that clipping.
One of the major detracting features of digital photography is the phenomenon of sensor noise. On the one hand, as ISO ratings increase, the number of photons required to effect the same change in a photosite's value reading decreases; on the other hand, the camera body and sensor itself emit photons of thermal noise, occurring at random locations but at a mostly constant rate. Hence, effectively we say that the error might be ±10/100000 at low ISO but ±10/1000 when 3 stops more sensitive; that increasing proportion arising from error is why higher-ISO images are more noisy.
What is Digital Imaging?
Digital Imaging refers to a number of post-processing techniques for optimising one's image, either to overcome limitations or to facilitate more preferable shooting settings in the first place, normally starting from multiple source images and merging them, one way or another, into an output photograph.
The techniques presented here are neither tricks nor "effects", but rather tools in a photographer's toolbox to be understood and deployed for their various merits as photographer's choice in various situations determines. Certainly they are not remedial tricks for post-hoc repair; the decision to use any one of them starts before pressing the shutter-button, for which reason I present each as a pair of sections, how to shoot frames at source and then how to process them in Photoshop.
High Dynamic Range (HDR)
HDR has been a hot topic for a few years, so let's set one thing straight: HDR is, by definition, the process of using multiple images of varying exposure, spanning a dynamic range greater than the sensor's own, to produce an image conveying detail across that expanded range. It is emphatically not "a look" - that resides solely in the photographer's choices for various parameters; nor is it possible to do "HDR" from a single image.
When out in the field in high-contrast light beyond one's control, one places the camera on a tripod, aimed at a scene with no movement, then, holding the ISO, aperture and focal distance fixed whilst varying the shutter-speed; although two images suffice, most cameras have an automatic bracketing mode that makes the process easier, allowing 3, 5 or 7 images at ±1EV.
In Adobe Photoshop (CS3 and above), you simply open the three RAW images together and it will merge them, presenting you with a tonemapping wizard at the end of the process.
Above, it was explained how sensor noise detracts from image-quality, particularly affecting the shadows.
Stacking has its origins in astrophotography; the fundamental principle of stacking is that, by layering photos of the same subject on top of each other, with every doubling of the number of images taken, you effectively halve the equivalent ISO (in terms of noise).
A side-effect is that the exposure-time is the sum of that of all the individual exposures. Hence, if you have a subject moving in the frame between source images, then its contribution to the final image in each location is much diluted.
A further corollary is that, if one wishes to synthesize the effect of a very long exposure, beyond the capabilities of one's filters to achieve (for example, make a daylight scene take 20 seconds), one can take many photos the same over a larger time interval, and average them together.
Stacking requires that the image not move, so a tripod is a prerequisite. It is easiest performed with a intervalometer, the same as used for timelapse photography: set it to make many photos a fixed interval apart, all the same exposure.
Simply open all the files into one per layer, then set the transparencies to 100/N% - for example, background=100%, layer 1 = 50%, layer 2 = 33%, layer 3 = 25%, etc, then flatten to taste. This gives an arithmetic average (mean), all images equally weighted.
Typically a lens might be at its optimum sharpness a stop either side of a particular sweetspot in the aperture range - for example, a semi-wide prime designed for 35mm or APS-C formats might be sharpest at f/8, or a zoom at f/11; at wider apertures, the depth of field is too narrow and it suffers residual design aberrations; at narrower apertures, diffraction starts to detract as well.
However, some subjects - most notably extreme closeup / macro work - require lots of depth of field, either beyond the lens's ability altogether or too far away from the optimum aperture.
Focus-stacking requires the camera to be mounted on a tripod; one makes multiple images at the same exposure settings, simply varying the focus distance from near to far. (The more frames, the smoother the results might be.)
Simply load all the frames into layers, leaving the transparencies alone. Using the auto-blend-layers menu option, select `stacking'; Photoshop will mask each layer to select only what it considers the "best" (sharpest) pixels.
Sometimes, the desired level of fine detail is not attainable in one frame. For example, distant detail (e.g trees on a far hillside) might be obscured by lens diffraction. Super-resolution is the process of extracting more, accurate, pixels, whether keeping the image the same size as the sensor's native size, or creating a larger image in order to demonstrate the detail.
Some methods endeavour to enhance fine detail by enlarging one frame. Here, however, we take multiple source photos instead and rely on sub-pixel realignment to make a larger, sharper, image than any one of the source.
The distinctive thing is that there should be small spatial offsets between the images whilst keeping the exposure the same. Think of making a panorama but with 99% overlap between frames. Perhaps ironically, this is easiest achieved without using a tripod - the theory is that a hand-holdable shutter speed (e.g. 1/100s on a normal focal-length lens) will deliver a sharp image, but in the time between frames shot in burst-mode (say, 1/5s depending on camera), your hands will have moved significantly by a random amount (probably equivalent to a non-integer number of pixels after conversion). Simply shooting an image twice in burst mode gives enough data to make a significantly larger image; four or eight images may be plenty.
First, open Adobe Camera Raw and check the `workflow' parameters below the large image preview. Typically we use ProPhotoRGB mode, 300ppi, no sharpening, and the image-size is the native size of the sensor. Now increase that image-size to the largest it will allow and press `done' to leave ACR.
By doing this, we have made a larger canvas from the outset, in order that when the various images provide accurate data for different pixels, there will be space for them to fit in between the pixels calculated by interpolation - ie, when multiplied up, sub-pixel alignment differences become pixel-sized alignment differences.
Select all the layers and select the menu option to auto-align them. Then, as with stacking, above, set the layer transparencies to 100%/N where N is the layer number - background 100%, first layer 50%, next layer 33%, etc - and flatten them.
Looking at the above techniques, we observe that some address issues in the "dimension" of dynamic range (HDR and stacking) while others concern themselves with spatial dimensions - within the width and height of an image (super-resolution), or depth into a scene (focus-stacking). So the question arises: is it possible to combine aspects of all the above into a workflow for general photography?
Enfuse is an open-source (free) application for merging photographs, with interesting options for specifying bias towards various criteria:
- with no options, just a list of input files, the images will be averaged together - ideal for stacking
- when an exposure weight is specified, enfuse biases itself toward pixels closest to a midtone - frequently called "poor man's HDR", although as HDR algorithms go, it has the advantages of being simple and honest, not introducing error-values into adjacent pixels to compensate for local contrast
- when a contrast weight is specified, enfuse biases toward pixels from an image with the highest local contrast - because sharpness is contrast at small scales, this is effectively selecting the sharpest pixels - ideal for focus-stacking
- There is an option of weighting toward pixels of higher entropy; this is similar to contrast-based weighting, except it seems less prone to exacerbating image noise.
Additionally, normally enfuse will choose pixels from all images but weight toward those fulfilling the specified criteria the highest; however, with the `hard mask' option enabled, it selects a pixel from only the one winning image. This is particularly useful with focus-stacking, which tends to be soft otherwise.
When out in the field, working with landscape and closeup subjects, a reasonable starting point is to simply take every photo twice, using burst mode. As long as the images are less than 50% noise each, when combined, either two pixels will overlap and reduce noise, or they'll not quite overlap and can be used to increase resolution.
A refinement might be to take as many frames as required to effectively reduce the ISO to desirable level- for example, at ISO 800, 4 frames would get the noise down to a more manageable ISO 200. There's no requirement for the number of source frames to be a power of two - it's just that the effective ISO reduction is easier to calculate if there are.
If you process the results in Photoshop, follow the recipe for super-resolution, above: use a pre-upsized workflow in ACR and auto-align before you average the images.
The open-source utility nona (shipping as part of Hugin) will automatically align images, producing both an HDR image and a set of corresponding realigned image files.
Take the latter and run them through enfuse:enfuse --exposure-weight=1 --entropy-weight=0.85 --contrast-weight=0 --saturation-weight=0
For most images, this will choose the sharpest pixels, giving lots of fine detail (super-resolution); additionally, the bias toward exposure means the same commandline will handle HDR sources and, even if the inputs have the same exposure, will choose lower highlights and brighter shadow values - effectively gently rolling-off the response curve in a manner similar to film.
For those with Linux or Unix-like environments, the following script allows easy automation, leaving you with both an HDR image direct from nona, and a set of realigned TIFF images:
#!/bin/zsh base="align_$1:r" echo "Aligning" align_image_stack -a "temp_$base_" -o "$base-hdr" $* echo "enfuse - biassed" enfuse -o "fused_$base.tiff" temp_$base_*.tif \ --saturation-weight=0 --entropy-weight=0.85 \ --contrast-weight=0 --exposure-weight=1 exiftool -overwrite_original_in_place -tagsfromfile $1 "fused_$base.tiff" echo "cleanups" rm temp*.tif
Techniques that rely on realigning multiple images have one flaw: pixels around the edges are not defined in all sources. If you use Photoshop to average a layer-stack, this becomes particularly obvious when certain wedge-shaped areas become partly transparent; however, if you use enfuse, it does the best it can with as many images as have defined pixels at each location; it might mean that there are areas of lower quality around the edges, but the chances are you won't notice where they fall.
In either case, losing a border of a few pixels is a small price to pay for having been able to upscale the vast bulk of the image.
I walked through the woodlands of Glenashdale on Arran, at the end of which I was confronted with this quite impressive waterfall (of which this is only the top third). Naturally, I made two exposures in quick succession using burst-mode. The camera's default settings were f/8, 1/60 and ISO 400.
In this example, both images were converted from RAW to TIFF using Adobe Camera RAW with no sharpening. These two TIFF files were then aligned with nona and stacked with the command
enfuse \ --saturation-weight=0.2 --entropy-weight=0.9 \ --contrast-weight=0.05 --exposure-weight=1
As expected, the enfused version shows much less shadow noise (equivalent to ISO 200) and the flowing water has bulked-out (equivalent to 1/30 of a second):
Additionally, the entropy bias has helped retain sharpness in areas of fine detail - compare the fronds on the fern in this pair: