Tuesday, May 5, 2009

Refining the process

So far, when I try to discriminate cloudy areas from usable data in a Landsat scene, I tell it to ignore everything that has a pixel value above 100 in band 3. This process discriminates a lot of unusable pixels that are clouds while avoiding the elimination of good pixels. This said, inspection of the band 5 histogram has inspired me to try to discriminate shadows in a similar way I use band 3 to get rid of clouds.


Histogram of band 5 of a close up of a Landsat scene

Close up of Landsat scene (pixels in red have a value inferior to 30)

As can be seen from the close up above and its histogram, discriminating pixels that have a value under 30 are almost exclusively cloud shadows. The exception being shadows that result of abrupt topography and water bodies.

Comparing an image where I discriminate only pixels that have a value above 100 in band 3 to an image where I discriminate both pixels that have a value above 100 in band 3 and pixels that have a value under 30 in band 5 gives the following results:


(If you ignore the stripes, green are pixels that should
be good data, yellow are pixels that have a value above
100 in band 3 [clouds] and red are pixels that have
a value under 30 in band 5 [cloud shadows, abrupt
topography shadows and water bodies])

Close-up of above scene

Further testing needs to be done to determine if I should use this thresholding method on band 5 and/or if the threshold value should be changed.

On a different note, I’ve observed that certain bands seem to be far better suited to the method I’m developing to create cloud free composites:

Close-up of Landsat cloud-free image
composite made from 12 images - Band 1

Close-up of Landsat cloud-free image
composite made from 12 images – Band 2

Close-up of Landsat cloud-free image
composite made from 12 images – Band 3

Close-up of Landsat cloud-free image
composite made from 12 images – Band 4

Close-up of Landsat cloud-free image
composite made from 12 images – Band 5

Close-up of Landsat cloud-free image
composite made from 12 images – Band 7

Close-up of Landsat cloud-free image
composite made from 12 images – Band 8

Unwanted artifacts far less impair band 4, 5 and 8 than the other bands.

Monday, February 2, 2009

Basic statistics for image composition

In an attempt to get rid of the pesky clouds that plague equatorial regions, namely Borneo, I’ve tested different approaches. For any given Landsat scene, I have access to images taken every couple of weeks. Now some of them are only slightly obscured by clouds.


Image least affected by cloud cover out of seven taken
over a period of four months (RGB composite)


False color composite of same image

Unfortunately, we’re usually not so lucky and it’s not unusual for a landsat scene taken over tropical regions to look more like this one.


Same scene as above…

Also, every Landsat scene taken since May 2003 have data gaps because of the SLC-off mode the satellite has been operating in.

The gaps and thick clouds are easily identified and removed, but filling those removed areas with data form other dates can prove challenging, as I shall illustrate.

For my first attempt, I used a cloud detection algorithm implemented as a GRASS add-on module : (i.landsat.acca.) I then combined the seven images I had at hand, minus their clouds and stripes, to create a new composite image.


Composite image made from seven images
with clouds removed with the i.landsat.acca
GRASS add-on module

As can be seen, there is a lot of data missing in the resulting image. As a matter of a fact, this image is more fragmented than the least cloudy image of the seven used to create the composite (see first and second image of this post.)

This led me to investigate how I could reassemble my seven images without using this module. I gave all the missing values (thick cloud obstruction and stripes) the value 255. None of the unsaturated pixel even comes close to nearing this value. I then crated a new image, pixel by pixel, using the minimum value for a given pixel out of the seven images. My premise was that the darkest pixel is the one least affected by haze or partial cloud obstruction (which makes the pixel pale; pale pixels have a higher value.) I obtained this resulting image:


Composite image obtained by using the minimal
value for each pixel out of seven images

The problem with this approach is that cloud shadows tend to be very dark. This explains the numerous dark patches, which can be seen in the image above. Unfortunately, when using the exact opposite approach, the resulting image includes lots of haze and outer edges of clouds:


Close-up of a composite image obtained
by using the maximal value for each
pixel out of seven images

What seems most promising is to use the median values of all the pixels that are neither clouds nor stripes. Because GRASS has trouble dealing with null values for pixel by pixel analysis, the images had to be exported into the statistical software R to extract the median values, than reimported into GRASS for further analysis.


Image composed of the median value
for each pixel (RGB composite)


Same image (false colors composite)


Close-up the same image (RGB composite)


Same close-up (false colors composite)

Now I need to see how useful this image can be to produce classifications or band ratios such as NDVIs.

Tuesday, October 7, 2008

Classifications

It’s been a while since I’ve talked about image classification, but ultimately, it’s the main issue that must be tackled in the scope of this project. The entire Landsat archive is soon to be available online for free. In the mean time, I’ve done a couple of test on some data I already have at hand.

I’ve chosen a subset of a circa 2000 Landsat 7 tile. I chose this tile because it has little cloud cover and it displays a few land use classes (water, forest, converted land [fields, small cultivated parcels, buildings,…].)

Landsat subset in natural colors
with a mask over clouded areas

Pseudo color composite of the same subset

At first glance, we can discern water, forest, vast fields and heterogeneous areas with small cultivated parcels, buildings and patches of trees.

I did two unsupervised classifications where I asked the classifier to classify my image in 15 classes and to only consider areas of more than 25 contiguous pixels.

Maximum likelihood classification
performed by GRASS

Unsupervised classification by Feature Analyst

My intuition is that the unsupervised classification of Feature Analyst is far less cluttered that the one performed by GASS’ maximum likelihood classifier. Also, it seems that Feature Analyst managed to put more pixels in the prominent classes where as GRASS created numerous classes for any single prominent class as identified by Feature Analyst.

Now for the supervised classifications. I used the same training map for all my supervised classifications. I created three classes for my classification: one is water, an other forest and the last one is converted land (fields, small parcels, buildings…)

Training map for supervised classification

I did three types of classification with GRASS. I did a maximum likelihood classification based on the signatures of my training sites. I did a second maximum likelihood classification, but with signatures generated with a clustering algorithm that uses my training sites as input. Finally, I did a contextual image classification that uses sequential maximum a posteriori (SMAP) estimation.

Maximum likelihood classification

Maximum likelihood classification
with clustering algorithm

SMAP classification

All three classifications gave similar results, but ultimately the SMAP classification seems to give me the best results.

I also used Feature Analyst, with the same training map, to do a supervised classification.

Feature Analyst supervised classification

The results are uncluttered and impressive. I can only assume that by tweaking the classification options of Feature Analyst and by adding a few classes, I can further improve the results.

This said, I wondered if I could slightly refine the smap classification by doing a little aggregation of astray pixels. Here's what I get:

Refined SMAP classification

In the end, both the SMAP classification and the Feature Analyst classification are somewhat comparable. This leaves me guessing... what is the approach to adopt? I guess further testing is in order.

Thursday, September 4, 2008

Elevation

Here are a couple of elevation images for Borneo. No specialized software is required in order to view them. (Files are only accessible when the light is green)

The light is RED

Low resolution DEM
Full resolution DEM

50m isoline map
50m isoline map (brown & green)
10m isoline map

I've created them with data from the Shuttle Radar Topography Mission.
The raw data can be downloaded here.

Tuesday, September 2, 2008

Clouds some more & principal components analysis

It’s not that I particularly like to ramble on and on about the same things, but this is going to be, in part at least, yet another post about clouds. I don’t want to repeat my self too much, but some of the nice datasets we have come from the Landsat missions. Unfortunately, these images are plagued with an abundance of clouds. Nevertheless, their hi resolution (both spatial and spectral) still make those Landsat dataset valuable assets. I’ve started to experiment with how these clouds, and their shadows, can be masked. I found a methodology online that I’ve tried on a Landsat tile from 2000 and it seems very conclusive. I basically used the same technique as Martinuzzi, Sebastián; Gould, William A.; Ramos González, Olga M. used in order to mask clouds in their publication Creating cloud-free Landsat ETM+ data sets in tropical landscapes: cloud and cloud-shadow removal.

The main steps are as follows:
  • Create two masks over potentially cloudy pixels. Pixels with values above 120 in the first band and pixels with values between 102 and 128 in the band 6.1 are considered as such.
  • Create a 3 pixel buffer around those mask to take into account pixels that are only partly cloudy
  • The intersection of those two should cover all the clouds.
  • Now to deal with the shadows, we need to relocate that mask over the shadows. This is done by finding a cloud with peculiar features over low land (where clouds are usually the further away from the ground) and measuring the offset of the shadows in relation to their respective clouds.
  • To take into account the influence of the topography, add a 10 pixel buffer to this relocated mask.
  • Now a mask with potential cloud shadows is created by considering all the pixels in band 4 with values between 17 and 66.
  • A 3 pixel buffer must be added. The shadows identified this way also include shadows of buildings and topography, so we’ll need to create a mask that is the intersection of this mask and the offseted cloud mask we previously created.
  • I added a 6 pixel buffer to that mask.
  • So now we have a mask for clouds and one for their shadows. By joining them both, we get our final cloud and cloud shadow mask.

Here is what it looks like:

Landsat 2000 color composite without the mask

And now with the mask applied

So the clouds and their shadows are fully covered, but my methodology still needs tweaking. Indeed, this final mask covers way more ground than needed. The threshold for obscured pixels must be adjusted. The size of my buffers must be fine-tuned. Maybe the translation offset as well. Also, my results would be more precise if I resampled the band 6.1 so that the pixel size matches that of the other bands. Furthermore, prior to all the previously described steps, it would be useful to do an atmospheric correction of the datasets.

Ideally, if we’re able to get our hands on additional Landsat snapshots, we could create composite images of our own that are a lot less affected by clouds.

With my new mask at hand, I proceeded to do a principal component analysis of a Landsat 2000 tile. Without the mask, the statistical abundance of clouds and shadows in my image would undermine the principal component analysis. Here’s what it looks like:

Color composite of the three principal
components of a Landsat 2000 image

As you can see, the types of fields and vegetation is very differentiated…

Wednesday, July 30, 2008

Ah those clouds...

At first glance, it seems the LANDSAT data is the most usable set of information to generate land use maps. Both the 1990 set and the 2000 set cover the whole island. There are a lot of spectral bands to generate signatures from, the 28.5m resolution seems more than adequate for our purpose and there is more than enough literature to prove that LANDSAT data can be used to produce land use maps. Unfortunately, as was mentioned in the previous post, Borneo is plagued with clouds. So we set out on a quest to diversify our datasets. This ongoing quest has taken us all over the web and I made a table to summarize our findings. There is quite a bit of usable stuff out there, but only trials and testing’s will reveal the true value of it. One promising purveyor of such data is the MODIS project. They make a whole bunch of datasets available for free online. Their data is acquired by two satellites, Aqua and Terra. Together, they manage to image the entire earth every one to two days. They publish data numerous times every year for a period spanning from the year 2000 to the present. The datasets produced are not very affected by cloud cover because they are composites of a stack of images taken over several days. For each pixel on the final image, only the “best” pixels are taken into account. The downside with the MODIS data is that the resolution is much coarser than for the LANDSAT data. Some of the higher resolution datasets they produce have a 250m resolution. One interesting dataset is the MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V005. Some of the promising layers are the EVI, the NDVI, and the visible and infrareds bands. At 250m, the resolution might be coarse. On the other hand, I generated an NDVI from the 2000 LANDSAT images and they really seem to show differentiation between the cultivated areas, the forested land and the clearcut areas. My hope is that decent results can be achieved with the MODIS data.

NDVI generated from the 2000 LANDSAT data

NDVI from MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V005
for the same area

Another promising dataset is MODIS/Terra Vegetation Cover Conversion 96-Day L3 Global 250m SIN Grid. Each dataset highlights the areas where change has occurred over the past year in the tree cover. I’m tempted to use several images (there is one dataset produced every three months) to create an animation showing change through time that reflects the transformation of the forest.


Having talked with Jeff and Rodolphe today, we’re starting to envision the big picture of what is going to be produced from all this. It’s still early to have anything close to a perspective on the final product, but having a coarse estimation of what’s going on with the Borneo forests is definitely a priority. After that, if we can refine some areas of interest, or areas where we have better data, it’s even better. On an other note, using the DEM to figure out (or at least confirm the general assumption) that people are willing to plant palms in more remote areas over time is something that would be worth doing.

Wednesday, July 2, 2008

Pelleter des nuages

An important issue to be tackled concerning the mapping of Borneo is what to do with the clouds. Borneo is largely covered by rainforest so there are quite a lot of them obstructing the view. So what can be done? I’ll just free style a couple of options here (not all of them are feasible or practical.) Ideally, it would be great to erase all of them. This said, because some of these clouds are quite thick, I’m doubtful even a sophisticated ATCOR algorithm would see through them. Also, I don’t believe it would be relevant to use such an elaborate algorithm here. For one thing, we don’t have information available to use ATCOR reliably. This said, we still need to take into account the atmospheric interference. A more basic approach, using a SMAC like algorithm for instance, would be more suited (ex.: Production of CORINE2000 Land cover data using calibrated LANDSAT 7 ETM satellite image mosaics and digital maps in Finland .) Another way of getting rid of clouds is by comparing multiple images pixel by pixel and keeping the “best” ones. The challenge then becomes finding numerous images of the same areas, ideally taken over a relatively short time frame. In the end, we might have to completely give up on certain areas and concentrate on what we can work with.

On another somewhat related note, I’ve found a new source of data where we have all the bands of the LANDSAT images. Also, the metadata provided enables us to know when each image was taken so we can better take into account both the angle of the satellite and of the sun, something which will turn out quite useful to enhance our images, especially in conjunction with the DEM.