Studying Convolutional Neural Networks and their Potential Uses in Refugee Camp Settings
What's a convolutional neural network?
The short answer is this:
A "CNN" is basically just a specific type of algorithm, only it's an incredibly complex algorithm. And it improves itself by learning from its mistakes. CNNs are designed to detect and label all manners of shapes and features that you can find in images or even videos. Sounds pretty simple for us humans, but from a computer-vision standpoint things are a lot more complicated. The important part is that it's structure actually mimics the human brain, which is why it's called a neural network. After all, the human brain is the greatest super-computer we know of.
Why do we use them?
Because they are the best widely used method there is (at the moment) to analyse large amounts of imaged data! Deep learning in general is renowned for processing large amounts of "big data" and de-coding the complexity of the problems we are trying to analyse.
Nowadays, the words "artificial intelligence" or "deep learning" are well-known and might even be included in our household vocabulary. How could they not, with all the headlines about exciting new technological advances?
All of this is hardly surprising, considering the amount of research and investment that have flown into this field in the past decade. Likewise, the sheer amount of new applications that have sprung up is a testimony to AI's innovative potential.
At the University of Salzburg I have been able to Learn about Current Research on CNNs:
Dwelling Detection and Extraction in Refugee Camps
Coming from a humanitarian setting in which I have spent many (many, many) hours manually piecing together a GIS working-tool for the WASH and Shelter group, this research naturally captured my curiosity. Perhaps the thought of finding a way of not having to manually delineate an entire camp also played a role.
Diving into the literature and case-studies on the ever-increasing degree of automation and accuracy with which a CNN can extract individual tents in a camp has been astonishing. And it doesn't even stop there. Today, CNNs are also being combined with other well-established methods in remote sensing that help refine the outcomes even more.
The usefulness of such a technology, to various humanitarian actors on the ground, seems obvious. Any organisations managing a camp with many thousands of residents could make use of this technology to extract tents and format them in shapefiles to build a GIS working tool. Information could be stored on such a GIS platform, it could be utilised to share information and encourage better coordination. And that's just one possible use case. Especially for large camps this is a tool that could be extremely cost-effective.
I thought I would like to use my e-portfolio to display some of the published research from the "GeoHum" research lab here. If it motivates me, perhaps it also motivates someone else. See below:
Dwelling Extraction - Before & After
(hover your mouse on the image)
How does it work?
That's exactly what I wanted to know too. The image below shows the CNN structure that was used. It has all the elements of a typical CNN: 3 convolution layers, a max pooling layer and a fully connected layer. For more information on CNN architecture here are some good websites and videos that helped me understand it:
Below is a (very simple and incomplete) description of the process:
Step 1: Training through Backpropagation
So the pre-requisite is training samples - and LOTS of them (thousands and thousands). They need to represent the features you're interested in (= different types of tents). Ideally, they show the tents in all shapes, forms, and variations that you can expect them! There's actually some methods that help multiply the number of samples you have (e.g. by rotating samples). And they need to be labelled, otherwise a CNN cannot be trained.
So training batches of 5000 samples were fed into the network. And the CNN does its convolving, pooling, connecting and predicting... and gives you a final prediction of what it believes it sees (e.g. Tent Type1). This is then evaluated against the "ground truth" with which a sample comes - remember they're all labelled so you know what the prediction should have been.
And with the error value, the CNN can adjust its parameters to improve its predictions with the next training batch. It's an iterative and supervised training process. This continues until the error values are deemed acceptable.
Step 2: Training
This is when the CNN is tested on a completely new dataset - usually a part of the same image it was trained on but hasn't seen yet. With the results of that an accuracy assessment can be done to evaluate the overall performance of the CNN. Usually, the values that are checked are:
precision = how much of the identified features were correct
recall = how much did it miss
"F1" = an overall accuracy value that combines precision and recall
It doesn't stop there - more research is under way
Current research is combining CNNs with other methods to harness the strengths of different approaches to dwelling extraction. One recent example has been to combine CNNs with an Object-Based-Image-Analysis (OBIA) method. The CNN was first used to create predictions and these were then used as an input for a typical knowledge-based image analysis workflow in which the image is segmented and classified according to expert knowledge in an iterative process. The results are promising.
This is what that workflow looked like: