I wanted to extract each pixel values so that i can use them for locating simple objects in an image. Every image is made up of pixels and when these values are extracted using python, four values are obtained for each pixel (R,G,B,A). This is called the RGBA color space having the Red, Green, Blue colors and Alpha value respectively.
In python we use a library called PIL (python imaging Library). The modules in this library is used for image processing and has support for many file formats like png, jpg, bmp, gif etc. It comes with large number of functions that can be used to open, extract data, change properties, create new images and much more…
PIL comes pre installed with python2.7 in Ubuntu but for windows it has to be installed manually. But for either operating systems having python2.7 or more can be downloaded from here. and for python3 it can be downloaded from here.
each pixel value can be extracted and stored in a list.Though IDLE shell can be used for it, it can take a long time to extract the values and hence its recommended that it is done using command line interface. The procedure for extraction is :
Recognise morphometric problems (those dealing with the number, size, or shape of the objects in an image).
As computer systems have become faster and more powerful, and cameras and other imaging systems have become commonplace in many other areas of life, the need has grown for researchers to be able to process and analyse image data. Considering the large volumes of data that can be involved - high-resolution images that take up a lot of disk space/virtual memory, and/or collections of many images that must be processed together - and the time-consuming and error-prone nature of manual processing, it can be advantageous or even necessary for this processing and analysis to be automated as a computer program.
This lesson introduces an open source toolkit for processing image data: the Python programming language and the scikit-image (
32) library. With careful experimental design, Python code can be a powerful instrument in answering many different kinds of questions.
Uses of Image Processing in Research
Automated processing can be used to analyse many different properties of an image, including the distribution and change in colours in the image, the number, size, position, orientation, and shape of objects in the image, and even - when combined with machine learning techniques for object recognition - the type of objects in the image.
Some examples of image processing methods applied in research include:
With this lesson, we aim to provide a thorough grounding in the fundamental concepts and skills of working with image data in Python. Most of the examples used in this lesson focus on one particular class of image processing technique, morphometrics, but what you will learn can be used to solve a much wider range of problems.
Morphometrics involves counting the number of objects in an image, analyzing the size of the objects, or analyzing the shape of the objects. For example, we might be interested in automatically counting the number of bacterial colonies growing in a Petri dish, as shown in this image:
We could use image processing to find the colonies, count them, and then highlight their locations on the original image, resulting in an image like this:
As we move through this workshop, we will learn image analysis methods useful for many different scientific problems. These will be linked together and applied to a real problem in the final end-of-workshop capstone challenge.
Let’s get started, by learning some basics about how images are represented and stored digitally.
The images we see on hard copy, view with our electronic devices, or process with our programs are represented and stored in the computer as numeric abstractions, approximations of what we see with our eyes in the real world. Before we begin to learn how to process images with Python programs, we need to spend some time understanding how these abstractions work.
It is important to realise that images are stored as rectangular arrays of hundreds, thousands, or millions of discrete “picture elements,” otherwise known as pixels. Each pixel can be thought of as a single square point of coloured light.
For example, consider this image of a maize seedling, with a square area designated by a red box:
Now, if we zoomed in close enough to see the pixels in the red box, we would see something like this:
Note that each square in the enlarged image area - each pixel - is all one colour, but that each pixel can have a different colour from its neighbors. Viewed from a distance, these pixels seem to blend together to form the image we see.
Working with Pixels
As noted, in practice, real world images will typically be made up of a vast number of pixels, and each of these pixels will be one of potentially millions of colours. While we will deal with pictures of such complexity shortly, let’s start our exploration with 15 pixels in a 5 X 3 matrix with 2 colours and work our way up to that complexity.
First, the necessary imports:
Now that we have our libraries loaded, we will run a Jupyter Magic Command that will ensure our images display in our Jupyter document with pixel information that will help us more efficiently run commands later in the session.
With that taken care of, let’s load our image data from disk using the
52 function from the
53 module and display it using the
54 function from the
56 is a Python library for reading and writing image data.
53 is specifying that we want to use version 3 of
56. This version has the benefit of supporting nD (multidimensional) image data natively (think of volumes, movies).
You might be thinking, “That does look vaguely like an eight, and I see two colours but how can that be only 15 pixels”. The display of the eight you see does use a lot more screen pixels to display our eight so large, but that does not mean there is information for all those screen pixels in the file. All those extra pixels are a consequence of our viewer creating additional pixels through interpolation. It could have just displayed it as a tiny image using only 15 screen pixels if the viewer was designed differently.
While many image file formats contain descriptive metadata that can be essential, the bulk of a picture file is just arrays of numeric information that, when interpreted according to a certain rule set, become recognizable as an image to us. Our image of an eight is no exception, and
53 stored that image data in an array of arrays making a 5 x 3 matrix of 15 pixels. We can demonstrate that by calling on the shape property of our image variable and see the matrix by printing our image variable to the screen.
Thus if we have tools that will allow us to manipulate these arrays of numbers, we can manipulate the image. The
33 library can be particularly useful here, so let’s try that out using
33 array slicing. Notice that the default behavior of the
54 function appended row and column numbers that will be helpful to us as we try to address individual or groups of pixels. First let’s load another copy of our eight, and then make it look like a zero.
To make it look like a zero, we need to change the number underlying the centremost pixel to be 1. With the help of those row and column headers, at this small scale we can determine the centre pixel is in row labeled 2 and column labeled 1. Using array slicing, we can then address and assign a new value to that position.
Up to now, we only had a 2 colour matrix, but we can have more if we use other numbers or fractions. One common way is to use the numbers between 0 and 255 to allow for 256 different colours or 256 different levels of grey. Let’s try that out.
We now have 3 colours, but are they the three colours you expected? They all appear to be on a continuum of dark purple on the low end and yellow on the high end. This is a consequence of the default colour map (cmap) in this library. You can think of a colour map as an association or mapping of numbers to a specific colour. However, the goal here is not to have one number for every possible colour, but rather to have a continuum of colours that demonstrate relative intensity. In our specific case here for example, 255 or the highest intensity is mapped to yellow, and 0 or the lowest intensity is mapped to a dark purple. The best colour map for your data will vary and there are many options built in, but this default selection was not arbitrary. A lot of science went into making this the default due to its robustness when it comes to how the human mind interprets relative colour values, grey-scale printability, and colour-blind friendliness (You can read more about this default colour map in a Matplotlib tutorial and an explanatory article by the authors). Thus it is a good place to start, and you should change it only with purpose and forethought. For now, let’s see how you can do that using an alternative map you have likely seen before where it will be even easier to see it as a mapped continuum of intensities: greyscale.
Above we have exactly the same underying data matrix, but in greyscale. Zero maps to black, 255 maps to white, and 128 maps to medium grey. Here we only have a single channel in the data and utilize a grayscale color map to represent the luminance, or intensity of the data and correspondingly this channel is referred to as the luminance channel.
Even More Colours
This is all well and good at this scale, but what happens when we instead have a picture of a natural landscape that contains millions of colours. Having a one to one mapping of number to colour like this would be inefficient and make adjustments and building tools to do so very difficult. Rather than larger numbers, the solution is to have more numbers in more dimensions. Storing the numbers in a multi-dimensional matrix where each colour or property like transparency is associated with its own dimension allows for individual contributions to a pixel to be adjusted independently. This ability to manipulate properties of groups of pixels separately will be key to certain techniques explored in later chapters of this lesson. To get started let’s see an example of how different dimensions of information combine to produce a set of pixels using a 4 X 4 matrix with 3 dimensions for the colours red, green, and blue. Rather than loading it from a file, we will generate this example using numpy.
Previously we had one number being mapped to one colour or intensity. Now we are combining the effect of 3 numbers to arrive at a single colour value. Let’s see an example of that using the blue square at the end of the second row, which has the index [1, 3].
This outputs: array([ 7, 1, 110]) The integers in order represent Red, Green, and Blue. Looking at the 3 values and knowing how they map, can help us understand why it is blue. If we divide each value by 255, which is the maximum, we can determine how much it is contributing relative to its maximum potential. Effectively, the red is at 7/255 or 2.8 percent of its potential, the green is at 1/255 or 0.4 percent, and blue is 110/255 or 43.1 percent of its potential. So when you mix those three intensities of colour, blue is winning by a wide margin, but the red and green still contribute to make it a slightly different shade of blue than 0,0,110 would be on its own.
These colours mapped to dimensions of the matrix may be referred to as channels. It may be helpful to display each of these channels independently, to help us understand what is happening. We can do that by multiplying our image array representation with a 1d matrix that has a one for the channel we want to keep and zeros for the rest.
If we look at the upper [1, 3] square in all three figures, we can see each of those colour contributions in action. Notice that there are several squares in the blue figure that look even more intensely blue than square [1, 3]. When all three channels are combined though, the blue light of those squares is being diluted by the relative strength of red and green being mixed in with them.
24-bit RGB Colour
This last colour model we used, known as the RGB (Red, Green, Blue) model, is the most common.
As we saw, the RGB model is an additive colour model, which means that the primary colours are mixed together to form other colours. Most frequently, the amount of the primary colour added is represented as an integer in the closed range [0, 255] as seen in the example. Therefore, there are 256 discrete amounts of each primary colour that can be added to produce another colour. The number of discrete amounts of each colour, 256, corresponds to the number of bits used to hold the colour channel value, which is eight (28=256). Since we have three channels with 8 bits for each (8+8+8=24), this is called 24-bit colour depth.
Any particular colour in the RGB model can be expressed by a triplet of integers in [0, 255], representing the red, green, and blue channels, respectively. A larger number in a channel means that more of that primary colour is present.
After completing the previous challenge, we can look at some further examples of 24-bit RGB colours, in a visual way. The image in the next challenge shows some colour names, their 24-bit RGB triplet values, and the colour itself.
Although 24-bit colour depth is common, there are other options. We might have 8-bit colour (3 bits for red and green, but only 2 for blue, providing 8 × 8 × 4 = 256 colours) or 16-bit colour (4 bits for red, green, and blue, plus 4 more for transparency, providing 16 × 16 × 16 = 4096 colours), for example. There are colour depths with more than eight bits per channel, but as the human eye can only discern approximately 10 million different colours, these are not often used.
If you are using an older or inexpensive laptop screen or LCD monitor to view images, it may only support 18-bit colour, capable of displaying 64 × 64 × 64 = 262,144 colours. 24-bit colour images will be converted in some manner to 18-bit, and thus the colour quality you see will not match what is actually in the image.
We can combine our coordinate system with the 24-bit RGB colour model to gain a conceptual understanding of the images we will be working with. An image is a rectangular array of pixels, each with its own coordinate. Each pixel in the image is a square point of coloured light, where the colour is specified by a 24-bit RGB triplet. Such an image is an example of raster graphics.
Although the images we will manipulate in our programs are conceptualised as rectangular arrays of RGB triplets, they are not necessarily created, stored, or transmitted in that format. There are several image formats we might encounter, and we should know the basics of at least of few of them. Some formats we might encounter, and their file extensions, are shown in this table:FormatExtensionDevice-Independent Bitmap (BMP).bmpJoint Photographic Experts Group (JPEG).jpg or .jpegTagged Image File Format (TIFF).tif or .tiff
The file format that comes closest to our preceding conceptualisation of images is the Device-Independent Bitmap, or BMP, file format. BMP files store raster graphics images as long sequences of binary-encoded numbers that specify the colour of each pixel in the image. Since computer files are one-dimensional structures, the pixel colours are stored one row at a time. That is, the first row of pixels (those with y-coordinate 0) are stored first, followed by the second row (those with y-coordinate 1), and so on. Depending on how it was created, a BMP image might have 8-bit, 16-bit, or 24-bit colour depth.
24-bit BMP images have a relatively simple file format, can be viewed and loaded across a wide variety of operating systems, and have high quality. However, BMP images are not compressed, resulting in very large file sizes for any useful image resolutions.
The idea of image compression is important to us for two reasons: first, compressed images have smaller file sizes, and are therefore easier to store and transmit; and second, compressed images may not have as much detail as their uncompressed counterparts, and so our programs may not be able to detect some important aspect if we are working with compressed images. Since compression is important to us, we should take a brief detour and discuss the concept.
Before discussing additional formats, familiarity with image compression will be helpful. Let’s delve into that subject with a challenge. For this challenge, you will need to know about bits / bytes and how those are used to express computer storage capacities. If you already know, you can skip to the challenge below.
Since image files can be very large, various compression schemes exist for saving (approximately) the same information while using less space. These compression techniques can be categorised as lossless or lossy.
In lossless image compression, we apply some algorithm (i.e., a computerised procedure) to the image, resulting in a file that is significantly smaller than the uncompressed BMP file equivalent would be. Then, when we wish to load and view or process the image, our program reads the compressed file, and reverses the compression process, resulting in an image that is identical to the original. Nothing is lost in the process – hence the term “lossless.”
The general idea of lossless compression is to somehow detect long patterns of bytes in a file that are repeated over and over, and then assign a smaller bit pattern to represent the longer sample. Then, the compressed file is made up of the smaller patterns, rather than the larger ones, thus reducing the number of bytes required to save the file. The compressed file also contains a table of the substituted patterns and the originals, so when the file is decompressed it can be made identical to the original before compression.
To provide you with a concrete example, consider the 71.5 MB white BMP image discussed above. When put through the zip compression utility on Microsoft Windows, the resulting .zip file is only 72 KB in size! That is, the .zip version of the image is three orders of magnitude smaller than the original, and it can be decompressed into a file that is byte-for-byte the same as the original. Since the original is so repetitious - simply the same colour triplet repeated 25,000,000 times - the compression algorithm can dramatically reduce the size of the file.
If you work with .zip or .gz archives, you are dealing with lossless compression.
Lossy compression takes the original image and discards some of the detail in it, resulting in a smaller file format. The goal is to only throw away detail that someone viewing the image would not notice. Many lossy compression schemes have adjustable levels of compression, so that the image creator can choose the amount of detail that is lost. The more detail that is sacrificed, the smaller the image files will be - but of course, the detail and richness of the image will be lower as well.
This is probably fine for images that are shown on Web pages or printed off on 4 × 6 photo paper, but may or may not be fine for scientific work. You will have to decide whether the loss of image quality and detail are important to your work, versus the space savings afforded by a lossy compression format.
It is important to understand that once an image is saved in a lossy compression format, the lost detail is just that - lost. I.e., unlike lossless formats, given an image saved in a lossy format, there is no way to reconstruct the original image in a byte-by-byte manner.
JPEG images are perhaps the most commonly encountered digital images today. JPEG uses lossy compression, and the degree of compression can be tuned to your liking. It supports 24-bit colour depth, and since the format is so widely used, JPEG images can be viewed and manipulated easily on all computing platforms.
Here is an example showing how JPEG compression might impact image quality. Consider this image of several maize seedlings (scaled down here from 11,339 × 11,336 pixels in order to fit the display).
Now, let us zoom in and look at a small section of the label in the original, first in the uncompressed format:
Here is the same area of the image, but in JPEG format. We used a fairly aggressive compression parameter to make the JPEG, in order to illustrate the problems you might encounter with the format.
The JPEG image is of clearly inferior quality. It has less colour variation and noticeable pixelation. Quality differences become even more marked when one examines the colour histograms for each image. A histogram shows how often each colour value appears in an image. The histograms for the uncompressed (left) and compressed (right) images are shown below:
We learn how to make histograms such as these later on in the workshop. The differences in the colour histograms are even more apparent than in the images themselves; clearly the colours in the JPEG image are different from the uncompressed version.
If the quality settings for your JPEG images are high (and the compression rate therefore relatively low), the images may be of sufficient quality for your work. It all depends on how much quality you need, and what restrictions you have on image storage space. Another consideration may be where the images are stored. For example,if your images are stored in the cloud and therefore must be downloaded to your system before you use them, you may wish to use a compressed image format to speed up file transfer time.
PNG images are well suited for storing diagrams. It uses a lossless compression and is hence often used in web applications for non-photographic images. The format is able to store RGB and plain luminance (single channel, without an associated color) data, among others. Image data is stored row-wise and then, per row, a simple filter, like taking the difference of adjacent pixels, can be applied to increase the compressability of the data. The filtered data is then compressed in the next step and written out to the disk.
TIFF images are popular with publishers, graphics designers, and photographers. TIFF images can be uncompressed, or compressed using either lossless or lossy compression schemes, depending on the settings used, and so TIFF images seem to have the benefits of both the BMP and JPEG formats. The main disadvantage of TIFF images (other than the size of images in the uncompressed version of the format) is that they are not universally readable by image viewing and manipulation software.
JPEG and TIFF images support the inclusion of metadata in images. Metadata is textual information that is contained within an image file. Metadata holds information about the image itself, such as when the image was captured, where it was captured, what type of camera was used and with what settings, etc. We normally don’t see this metadata when we view an image, but we can view it independently if we wish to (see , below). The important thing to be aware of at this stage is that you cannot rely on the metadata of an image being fully preserved when you use software to process that image. The image reader/writer library that we use throughout this lesson,
53, includes metadata when saving new images but may fail to keep certain metadata fields. In any case, remember: if metadata is important to you, take precautions to always preserve the original files.
Summary of image formats used in this lesson
The following table summarises the characteristics of the BMP, JPEG, and TIFF image formats:FormatCompressionMetadataAdvantagesDisadvantagesBMPNoneNoneUniversally viewable,Large file sizes high quality JPEGLossyYesUniversally viewable,Detail may be lost smaller file size PNGLosslessUniversally viewable, open standard, smaller file sizeMetadata less flexible than TIFF, RGB onlyTIFFNone, lossy,YesHigh quality orNot universally viewable or lossless smaller file size
Working with skimage
We have covered much of how images are represented in computer software. In this episode we will learn some more methods for accessing and changing digital images.
Reading, displaying, and saving images
Imageio provides intuitive functions for reading and writing (saving) images. All of the popular image formats, such as BMP, PNG, JPEG, and TIFF are supported, along with several more esoteric formats. Check the Supported Formats docs for a list of all formats. Matplotlib provides a large collection of plotting utilities.
Let us examine a simple Python program to load, display, and save an image to a different format. Here are the first few lines:
First, we import the
89 module of imageio (
91 so we can read and write images. Then, we use the
62 function to read a JPEG image entitled chair.jpg. Imageio reads the image, converts it from JPEG into a NumPy array, and returns the array; we save the array in a variable named
Next, we will do something with the image:
Once we have the image in the program, we first call
94 so that we will have a fresh figure with a set of axis independent from our previous calls. Next we call
95 in order to display the image.
Now, we will save the image in another format:
The final statement in the program,
96, writes the image to a file named
97 in the
74 directory. The
99 function automatically determines the type of the file, based on the file extension we provide. In this case, the
00 extension causes the image to be saved as a TIFF.
In the Image Basics episode, we individually manipulated the colours of pixels by changing the numbers stored in the image’s NumPy array. Let’s apply the principles learned there along with some new principles to a real world example.
Suppose we are interested in this maize root cluster image. We want to be able to focus our program’s attention on the roots themselves, while ignoring the black background.
Since the image is stored as an array of numbers, we can simply look through the array for pixel colour values that are less than some threshold value. This process is called thresholding, and we will see more powerful methods to perform the thresholding task in the Thresholding episode. Here, though, we will look at a simple and elegant NumPy method for thresholding. Let us develop a program that keeps only the pixel colour values in an image that have value greater than or equal to 128. This will keep the pixels that are brighter than half of “full brightness”, i.e., pixels that do not belong to the black background. We will start by reading the image and displaying it.
Now we can threshold the image and display the result.
The NumPy command to ignore all low-intensity pixels is
26. Every pixel colour value in the whole 3-dimensional array with a value less that 128 is set to zero. In this case, the result is an image in which the extraneous background detail has been removed.
Converting colour images to grayscale
It is often easier to work with grayscale images, which have a single channel, instead of colour images, which have three channels. Skimage offers the function
27 to achieve this. This function adds up the three colour channels in a way that matches human colour perception, see . It returns a grayscale image with floating point values in the range from 0 to 1. We can use the function
18 in order to convert it back to the original data type and the data range back 0 to 255. Note that it is often better to use image values represented by floating point values, because using floating point numbers is numerically more stable.
We can also load colour images as grayscale directly by passing the argument
Access via slicing
As noted in the previous lesson skimage images are stored as NumPy arrays, so we can use array slicing to select rectangular areas of an image. Then, we can save the selection as a new image, change the pixels in the image, and so on. It is important to remember that coordinates are specified in (ry, cx) order and that colour values are specified in (r, g, b) order when doing these manipulations.
Consider this image of a whiteboard, and suppose that we want to create a sub-image with just the portion that says “odd + even = odd,” along with the red box that is drawn around the words.
Using the same display technique we have used throughout this course, we can determine the coordinates of the corners of the area we wish to extract by hovering the mouse near the points of interest and noting the coordinates. If we do that, we might settle on a rectangular area with an upper-left coordinate of (135, 60) and a lower-right coordinate of (480, 150), as shown in this version of the whiteboard picture:
Note that the coordinates in the preceding image are specified in (cx, ry) order. Now if our entire whiteboard image is stored as an skimage image named
93, we can create a new image of the selected region with a statement like this:
Our array slicing specifies the range of y-coordinates or rows first,
46, and then the range of x-coordinates or columns,
47. Note we go one beyond the maximum value in each dimension, so that the entire desired area is selected. The third part of the slice,
48, indicates that we want all three colour channels in our new image.
A script to create the subimage would start by loading the image:
Then we use array slicing to create a new image with our selected area and then display the new image.
We can also change the values in an image, as shown next.
First, we sample a single pixel’s colour at a particular location of the image, saving it in a variable named
29, which creates a 1 × 1 × 3 NumPy array with the blue, green, and red colour values for the pixel located at (ry = 330, cx = 90). Then, with the
50 command, we modify the image in the specified area. From a NumPy perspective, this changes all the pixel values within that range to array saved in the
29 variable. In this case, the command “erases” that area of the whiteboard, replacing the words with a beige colour, as shown in the final image produced by the program:
Drawing and Bitwise Operations
The next series of episodes covers a basic toolkit of skimage operators. With these tools, we will be able to create programs to perform simple analyses of images based on changes in colour or shape.
Drawing on images
Often we wish to select only a portion of an image to analyze, and ignore the rest. Creating a rectangular sub-image with slicing, as we did in the Image Representation in skimage episode is one option for simple cases. Another option is to create another special image, of the same size as the original, with white pixels indicating the region to save and black pixels everywhere else. Such an image is called a mask. In preparing a mask, we sometimes need to be able to draw a shape - a circle or a rectangle, say - on a black image. skimage provides tools to do that.
Consider this image of maize seedlings:
Now, suppose we want to analyze only the area of the image containing the roots themselves; we do not care to look at the kernels, or anything else about the plants. Further, we wish to exclude the frame of the container holding the seedlings as well. Hovering over the image with our mouse, could tell us that the upper-left coordinate of the sub-area we are interested in is (44, 357), while the lower-right coordinate is (720, 740). These coordinates are shown in (x, y) order.
A Python program to create a mask to select only that area of the image would start with a now-familiar section of code to open and display the original image:
As before, we first import the
89 submodule of
53). We also import the NumPy library, which we need to create the initial mask image. Then, we import the
39 submodule of
32. We load and display the initial image in the same way we have done before.
NumPy allows indexing of images/arrays with “boolean” arrays of the same size. Indexing with a boolean array is also called mask indexing. The “pixels” in such a mask array can only take two values:
66. When indexing an image with such a mask, only pixel values at positions where the mask is
65 are accessed. But first, we need to generate a mask array of the same size as the image. Luckily, the NumPy library provides a function to create just such an array. The next section of code shows how:
The first argument to the
68 function is the shape of the original image, so that our mask will be exactly the same size as the original. Notice, that we have only used the first two indices of our shape. We omitted the channel dimension. Indexing with such a mask will change all channel values simultaneously. The second argument,
69, indicates that the elements in the array should be booleans - i.e., values are either
66. Thus, even though we use
72 to create the mask, its pixel values are in fact not
65. You could check this, e.g., by
Next, we draw a filled, rectangle on the mask:
Here is what our constructed mask looks like:
The parameters of the
78, are the coordinates of the upper-left (
79) and lower-right (
80) corners of a rectangle in (ry, cx) order. The function returns the rectangle as row (
81) and column (
82) coordinate arrays.
All that remains is the task of modifying the image using our mask in such a way that the areas with
65 pixels in the mask are not shown in the image any more.
Now we can write a Python program to use a mask to retain only the portions of our maize roots image that actually contains the seedling roots. We load the original image and create the mask in the same way as before:
Then, we use numpy indexing to remove the portions of the image, where the mask is
Then, we display the masked image.
The resulting masked image should look like this:
In this episode, we will learn how to use skimage functions to create and display histograms for images.
Introduction to Histograms
As it pertains to images, a histogram is a graphical representation showing how frequently various colour values occur in the image. We saw in the Image Basics episode that we could use a histogram to visualise the differences in uncompressed and compressed image formats. If your project involves detecting colour changes between images, histograms will prove to be very useful, and histograms are also quite handy as a preparatory step before performing thresholding.
We will start with grayscale images, and then move on to colour images. We will use this image of a plant seedling as an example:
Here we load the image in grayscale instead of full colour, and display it:
Again, we use the
62 function to load our image. The first argument to
62 is the filename of the image. The second argument
35 defines the type and depth of a pixel in the image (e.g., an 8-bit pixel has a range of 0-255). This argument is forwarded to the
21 backend, for which mode “L” means 8-bit pixels and single-channel (i.e., grayscale).
21 is a Python imaging library; which backend is used by
62 may be specified (to use
21, you would pass this argument:
25); if unspecified,
62 determines the backend to use based on the image type.
Then, we convert the grayscale image of integer dtype, with 0-255 range, into a floating-point one with 0-1 range, by calling the function
27. We will keep working with images in the value range 0 to 1 in this lesson.
We now use the function
28 to compute the histogram of our image which, after all, is a NumPy array:
29 determines the number of “bins” to use for the histogram. We pass in
30 because we want to see the pixel count for each of the 256 possible values in the grayscale image.
31 is the range of values each of the pixels in the image can have. Here, we pass 0 and 1, which is the value range of our input image after transforming it to grayscale.
The first output of the
28 function is a one-dimensional NumPy array, with 256 rows and one column, representing the number of pixels with the intensity value corresponding to the index. I.e., the first number in the array is the number of pixels found with intensity value 0, and the final number in the array is the number of pixels found with intensity value 255. The second output of
28 is an array with the bin edges and one column and 257 rows (one more than the histogram itself). There are no gaps between the bins, which means that the end of the first bin, is the start of the second and so on. For the last bin, the array also has to contain the stop, so it has one more element, than the histogram.
Next, we turn our attention to displaying the histogram, by taking advantage of the plotting facilities of the
We create the plot with
35, then label the figure and the coordinate axes with
38 functions. The last step in the preparation of the figure is to set the limits on the values on the x-axis with the
39 function call.
Finally, we create the histogram plot itself with
42. We use the left bin edges as x-positions for the histogram values by indexing the
43 array to ignore the last value (the right edge of the last bin). When we run the program on this image of a plant seedling, it produces this histogram:
We can also create histograms for full colour images, in addition to grayscale histograms. We have seen colour histograms before, in the Image Basics episode. A program to create colour histograms starts in a familiar way:
We read the original image, now in full colour, and display it.
Next, we create the histogram, by calling the
28 function three times, once for each of the channels. We obtain the individual channels, by slicing the image along the last axis. For example, we can obtain the red colour channel by calling
We will draw the histogram line for each channel in a different colour, and so we create a tuple of the colours to use for the three lines with the
line of code. Then, we limit the range of the x-axis with the
40 function call.
Next, we use the
55 control structure to iterate through the three channels, plotting an appropriately-coloured histogram line for each. This may be new Python syntax for you, so we will take a moment to discuss what is happening in the
The Python built-in
57 function takes a list and returns an iterator of tuples, where the first element of the tuple is the index and the second element is the element of the list.
In our colour histogram program, we are using a tuple,
61, as the
55 variable. The first time through the loop, the
63 variable takes the value
64, referring to the position of the red colour channel, and the
29 variable contains the string
66. The second time through the loop the values are the green channels index
68, and the third time they are the blue channel index
55 loop, our code looks much like it did for the grayscale example. We calculate the histogram for the current channel with the
function call, and then add a histogram line of the correct colour to the plot with the
function call. Note the use of our loop variables,
Finally we label our axes and display the histogram, shown here:
In this episode, we will learn how to use skimage functions to blur images.
When processing an image, we are often interested in identifying objects represented within it so that we can perform some further analysis of these objects e.g. by counting them, measuring their sizes, etc. An important concept associated with the identification of objects in an image is that of edges: the lines that represent a transition from one group of similar pixels in the image to another different group. One example of an edge is the pixels that represent the boundaries of an object in an image, where the background of the image ends and the object begins.
When we blur an image, we make the colour transition from one side of an edge in the image to another smooth rather than sudden. The effect is to average out rapid changes in pixel intensity. A blur is a very common operation we need to perform before other tasks such as thresholding. There are several different blurring functions in the
87 module, so we will focus on just one here, the Gaussian blur.
Consider this image of a cat, in particular the area of the image outlined by the white square.
Now, zoom in on the area of the cat’s eye, as shown in the left-hand image below. When we apply a filter, we consider each pixel in the image, one at a time. In this example, the pixel we are currently working on is highlighted in red, as shown in the right-hand image.
When we apply a filter, we consider rectangular groups of pixels surrounding each pixel in the image, in turn. The kernel is another group of pixels (a separate matrix / small image), of the same dimensions as the rectangular group of pixels in the image, that moves along with the pixel being worked on by the filter. The width and height of the kernel must be an odd number, so that the pixel being worked on is always in its centre. In the example shown above, the kernel is square, with a dimension of seven pixels.
To apply the kernel to the current pixel, an average of the the colour values of the pixels surrounding it is calculated, weighted by the values in the kernel. In a Gaussian blur, the pixels nearest the centre of the kernel are given more weight than those far away from the centre. The rate at which this weight diminishes is determined by a Gaussian function, hence the name Gaussian blur.
A Gaussian function maps random variables into a normal distribution or “Bell Curve”.https://en.wikipedia.org/wiki/Gaussian_function#/media/File:Normal_Distribution_PDF.svg
The shape of the function is described by a mean value μ, and a variance value σ². The mean determines the central point of the bell curve on the x axis, and the variance describes the spread of the curve.
In fact, when using Gaussian functions in Gaussian blurring, we use a 2D Gaussian function to account for X and Y dimensions, but the same rules apply. The mean μ is always 0, and represents the middle of the 2D kernel. Increasing values of σ² in either dimension increases the amount of blurring in that dimension.
The averaging is done on a channel-by-channel basis, and the average channel values become the new value for the pixel in the filtered image. Larger kernels have more values factored into the average, and this implies that a larger kernel will blur the image more than a smaller kernel.
To get an idea of how this works, consider this plot of the two-dimensional Gaussian function:
Imagine that plot laid over the kernel for the Gaussian blur filter. The height of the plot corresponds to the weight given to the underlying pixel in the kernel. I.e., the pixels close to the centre become more important to the filtered pixel colour than the pixels close to the outer limits of the kernel. The shape of the Gaussian function is controlled via its standard deviation, or sigma. A large sigma value results in a flatter shape, while a smaller sigma value results in a more pronounced peak. The mathematics involved in the Gaussian blur filter are not quite that simple, but this explanation gives you the basic idea.
To illustrate the blur process, consider the blue channel colour values from the seven-by-seven region of the cat image above:
The filter is going to determine the new blue channel value for the centre pixel – the one that currently has the value 86. The filter calculates a weighted average of all the blue channel values in the kernel giving higher weight to the pixels near the centre of the kernel.
This weighted average, the sum of the multiplications, becomes the new value for the centre pixel (3, 3). The same process would be used to determine the green and red channel values, and then the kernel would be moved over to apply the filter to the next pixel in the image.
This animation shows how the blur kernel moves along in the original image in order to calculate the colour channel values for the blurred image.
skimage has built-in functions to perform blurring for us, so we do not have to perform all of these mathematical operations ourselves. Let’s work through an example of blurring an image with the skimage Gaussian blur function.
First, we load the image, and display it:
Next, we apply the gaussian blur:
The first two parameters to
88 are the image to blur,
93, and a tuple defining the sigma to use in ry- and cx-direction,
90. The third parameter
91 gives the radius of the kernel in terms of sigmas. A Gaussian function is defined from -infinity to +infinity, but our kernel (which must have a finite, smaller size) can only approximate the real function. Therefore, we must choose a certain distance from the centre of the function where we stop this approximation, and set the final size of our kernel. In the above example, we set
91 to 3.5, which means the kernel size will be 2 * sigma * 3.5. For example, for a
93 of 1.0 the resulting kernel size would be 7, while for a
93 of 2.0 the kernel size would be 14. The default value for
91 in scikit-image is 4.0.
The last parameter to
88 tells skimage to interpret our image, that has three dimensions, as a multichannel colour image.
Finally, we display the blurred image:
Other methods of blurring
The Gaussian blur is a way to apply a low-pass filter in skimage. It is often used to remove Gaussian (i. e., random) noise from the image. For other kinds of noise, e.g. “salt and pepper”, a median filter is typically used. See for a list of available filters.
In this episode, we will learn how to use skimage functions to apply thresholding to an image. Thresholding is a type of image segmentation, where we change the pixels of an image to make the image easier to analyze. In thresholding, we convert an image from colour or grayscale into a binary image, i.e., one that is simply black and white. Most frequently, we use thresholding as a way to select areas of interest of an image, while ignoring the parts we are not concerned with. We have already done some simple thresholding, in the “Manipulating pixels” section of the Image Representation in skimage episode. In that case, we used a simple NumPy array manipulation to separate the pixels belonging to the root system of a plant from the black background. In this episode, we will learn how to use skimage functions to perform thresholding. Then, we will use the masks returned by these functions to select the parts of an image we are interested in.
Consider the image
04 with a series of crudely cut shapes set against a white background.
Now suppose we want to select only the shapes from the image. In other words, we want to leave the pixels belonging to the shapes “on,” while turning the rest of the pixels “off,” by setting their colour channel values to zeros. The skimage library has several different methods of thresholding. We will start with the simplest version, which involves an important step of human input. Specifically, in this simple, fixed-level thresholding, we have to provide a threshold value
The process works like this. First, we will load the original image, convert it to grayscale, and de-noise it as in the Blurring Images episode.
Next, we would like to apply the threshold
05 such that pixels with grayscale values on one side of
05 will be turned “on”, while pixels with grayscale values on the other side will be turned “off”. How might we do that? Remember that grayscale images contain pixel values in the range from 0 to 1, so we are looking for a threshold
05 in the closed range [0.0, 1.0]. We see in the image that the geometric shapes are “darker” than the white background but there is also some light gray noise on the background. One way to determine a “good” value for
05 is to look at the grayscale histogram of the image and try to identify what grayscale ranges correspond to the shapes in the image or the background.
The histogram for the shapes image shown above can be produced as in the Creating Histograms episode.
Since the image has a white background, most of the pixels in the image are white. This corresponds nicely to what we see in the histogram: there is a peak near the value of 1.0. If we want to select the shapes and not the background, we want to turn off the white background pixels, while leaving the pixels for the shapes turned on. So, we should choose a value of
05 somewhere before the large peak and turn pixels above that value “off”. Let us choose
To apply the threshold
05, we can use the numpy comparison operators to create a mask. Here, we want to turn “on” all pixels which have values smaller than the threshold, so we use the less operator
00 to compare the
14 to the threshold
05. The operator returns a mask, that we capture in the variable
16. It has only one channel, and each of its values is either 0 or 1. The binary mask created by the thresholding operation can be shown with
38, where the
66 entries are shown as black pixels (0-valued) and the
65 entries are shown as white pixels (1-valued).
You can see that the areas where the shapes were in the original area are now white, while the rest of the mask image is black.
We can now apply the
16 to the original coloured image as we have learned in the Drawing and Bitwise Operations episode. What we are left with is only the coloured shapes from the original.
The downside of the simple thresholding technique is that we have to make an educated guess about the threshold
05 by inspecting the histogram. There are also automatic thresholding methods that can determine the threshold automatically for us. One such method is Otsu’s method. It is particularly useful for situations where the grayscale histogram of an image has two peaks that correspond to background and objects of interest.
Consider the image
33 of a maize root system which we have seen before in the Image Representation in skimage episode.
We use Gaussian blur with a sigma of 1.0 to denoise the root image. Let us look at the grayscale histogram of the denoised image.
The histogram has a significant peak around 0.2, and a second, smaller peak very near 1.0. Thus, this image is a good candidate for thresholding with Otsu’s method. The mathematical details of how this works are complicated (see if you are interested), but the outcome is that Otsu’s method finds a threshold value between the two peaks of a grayscale histogram.
34 function can be used to determine the threshold automatically via Otsu’s method. Then numpy comparison operators can be used to apply it as before. Here are the Python commands to determine the threshold
05 with Otsu’s method.
For this root image and a Gaussian blur with the chosen sigma of 1.0, the computed threshold value is 0.42. No we can create a binary mask with the comparison operator
99. As we have seen before, pixels above the threshold value will be turned on, those below the threshold will be turned off.
Finally, we use the mask to select the foreground:
Application: measuring root mass
Let us now turn to an application where we can apply thresholding and other techniques we have learned to this point. Consider these four maize root system images, which you can find in the files
Suppose we are interested in the amount of plant material in each image, and in particular how that amount changes from image to image. Perhaps the images represent the growth of the plant over time, or perhaps the images show four different maize varieties at the same phase of their growth. The question we would like to answer is, “how much root mass is in each image?”
We will first construct a Python program to measure this value for a single image. Our strategy will be this:
Our intent is to perform these steps and produce the numeric result - a measure of the root mass in the image - without human intervention. Implementing the steps within a Python function will enable us to call this function for different images.
Here is a Python function that implements this root-mass-measuring strategy. Since the function is intended to produce numeric output without human interaction, it does not display any of the images. Almost all of the commands should be familiar, and in fact, it may seem simpler than the code we have worked on thus far, because we are not displaying any of the images.
The function begins with reading the original image from the file
41. We use
62 with the optional argument
35 to automatically convert it to grayscale. Next, the grayscale image is blurred with a Gaussian filter with the value of
93 that is passed to the function. Then we determine the threshold
05 with Otsu’s method and create a binary mask just as we did in the previous section. Up to this point, everything should be familiar.
The final part of the function determines the root mass ratio in the image. Recall that in the
16, every pixel has either a value of zero (black/background) or one (white/foreground). We want to count the number of white pixels, which can be accomplished with a call to the numpy function
47. Then we determine the width and height of the image by using the elements of
48 (that is, the dimensions of the numpy array that stores the image). Finally, the density ratio is calculated by dividing the number of white pixels by the total number of pixels
49 in the image. The function returns then root density of the image.
We can call this function with any filename and provide a sigma value for the blurring. If no sigma value is provided, the default value 1.0 will be used. For example, for the file
37 and a sigma value of 1.5, we would call the function like this:
Now we can use the function to process the series of four images shown above. In a real-world scientific situation, there might be dozens, hundreds, or even thousands of images to process. To save us the tedium of calling the function for each image by hand, we can write a loop that processes all files automatically. The following code block assumes that the files are located in the same directory and the filenames all start with the trial- prefix and end with the .jpg suffix.
Connected Component Analysis
In the Thresholding episode we have covered dividing an image into foreground and background pixels. In the shapes example image, we considered the coloured shapes as foreground objects on a white background.
In thresholding we went from the original image to this version:
Here, we created a mask that only highlights the parts of the image that we find interesting, the objects. All objects have pixel value of
65 while the background pixels are
By looking at the mask image, one can count the objects that are present in the image (7). But how did we actually do that, how did we decide which lump of pixels constitutes a single object?
In order to decide which pixels belong to the same object, one can exploit their neighborhood: pixels that are directly next to each other and belong to the foreground class can be considered to belong to the same object.
Let’s discuss the concept of pixel neighborhoods in more detail. Consider the following mask “image” with 8 rows, and 8 columns. For the purpose of illustration, the digit
64 is used to represent background pixels, and the letter
63 is used to represent object pixels foreground).
The pixels are organised in a rectangular grid. In order to understand pixel neighborhoods we will introduce the concept of “jumps” between pixels. The jumps follow two rules: First rule is that one jump is only allowed along the column, or the row. Diagonal jumps are not allowed. So, from a centre pixel, denoted with
64, only the pixels indicated with a
73 are reachable:
The pixels on the diagonal (from
64) are not reachable with a single jump, which is denoted by the
67. The pixels reachable with a single jump form the 1-jump neighborhood.
The second rule states that in a sequence of jumps, one may only jump in row and column direction once -> they have to be orthogonal. An example of a sequence of orthogonal jumps is shown below. Starting from
64 the first jump goes along the row to the right. The second jump then goes along the column direction up. After this, the sequence cannot be continued as a jump has already been made in both row and column direction.
All pixels reachable with one, or two jumps form the 2-jump neighborhood. The grid below illustrates the pixels reachable from the centre pixel
64 with a single jump, highlighted with a
73, and the pixels reachable with 2 jumps with a
We want to revisit our example image mask from above and apply the two different neighborhood rules. With a single jump connectivity for each pixel, we get two resulting objects, highlighted in the image with
In the 1-jump version, only pixels that have direct neighbors along rows or columns are considered connected. Diagonal connections are not included in the 1-jump neighborhood. With two jumps, however, we only get a single object
72 because pixels are also considered connected along the diagonals.
Connected Component Analysis
In order to find the objects in an image, we want to employ an operation that is called Connected Component Analysis (CCA). This operation takes a binary image as an input. Usually, the
66 value in this image is associated with background pixels, and the
65 value indicates foreground, or object pixels. Such an image can be produced, e.g., with thresholding. Given a thresholded image, the connected component analysis produces a new labeled image with integer pixel values. Pixels with the same value, belong to the same object. Skimage provides connected component analysis in the function
77. Let us add this function to the already familiar steps of thresholding an image. Here we define a reusable Python function
Note the new import of
79 in order to use the
80 function that performs the CCA. The first four lines of code are familiar from the Thresholding episode.
Then we call the
80 function. This function has one positional argument where we pass the
16, i.e., the binary image to work on. With the optional argument
83, we specify the neighborhood in units of orthogonal jumps. For example, by setting
84 we will consider the 2-jump neighborhood introduced above. The function returns a
85 where each pixel has a unique value corresponding to the object it belongs to. In addition, we pass the optional parameter
86 to return the maximum label index as
We can call the above function
78 and display the labeled image like so:
We can use the function
11 to convert the colours in the image (recall that we already used the
27 function to convert to grayscale). With
11, all objects are coloured according to a list of colours that can be customised. We can use the following commands to convert and show the image:
You might wonder why the connected component analysis with
23 finds 11 objects, whereas we would expect only 7 objects. Where are the four additional objects? With a bit of detective work, we can spot some small objects in the image, for example, near the left border.
For us it is clear that these small spots are artifacts and not objects we are interested in. But how can we tell the computer? One way to calibrate the algorithm is to adjust the parameters for blurring (
93) and thresholding (
05), but you may have noticed during the above exercise that it is quite hard to find a combination that produces the right output number. In some cases, background noise gets picked up as an object. And with other parameters, some of the foreground objects get broken up or disappear completely. Therefore, we need other criteria to describe desired properties of the objects that are found.
Morphometrics - Describe object features with numbers
Morphometrics is concerned with the quantitative analysis of objects and considers properties such as size and shape. For the example of the images with the shapes, our intuition tells us that the objects should be of a certain size or area. So we could use a minimum area as a criterion for when an object should be detected. To apply such a criterion, we need a way to calculate the area of objects found by connected components. Recall how we determined the root mass in the Thresholding episode by counting the pixels in the binary mask. But here we want to calculate the area of several objects in the labeled image. The skimage library provides the function
28 to measure the properties of labeled regions. It returns a list of
29 that describe each connected region in the images. The properties can be accessed using the attributes of the
29 data type. Here we will use the properties
32. You can explore the skimage documentation to learn about other properties available.
We can get a list of areas of the labeled objects as follows:
This will produce the output
In this episode, we will provide a final challenge for you to attempt, based on all the skills you have acquired so far. This challenge will be related to the shape of objects in images (morphometrics).
Morphometrics: Bacteria Colony Counting
As mentioned in the workshop introduction, your morphometric challenge is to determine how many bacteria colonies are in each of these images:
The image files can be found at