Krakatoa, or, a Systeme for thee Discovery of Volcanoes that do Erupte

Using a quirk in NASA's FIRMS system to build a volcano detection machine

Before I get into today’s post, I want to shout out a really cool tutorial that Oliver Burdekin, the force behind BurdGIS, did to build off my last piece about oil spills in Nigeria. He (correctly) noted that my method did not go as far as it could have with spatial analysis, so he showed how you could import the code into Google Earth Engine and QGIS to perform additional analysis. If you’re into GIS or remote sensing, give it a watch!

Last week, I was reading through the VIIRS satellite active fire product user guide (just a normal thing to do) and I found something interesting. The data is broken into two products: near-real time data and standard data. Near-real time (or NRT) data is processed and uploaded to an online fire map and data repository within a few hours of an image being taken. It includes several fewer fields than the standard data, which is processed to scientific specifications and uploaded several months after an image was taken.

According to the user guide, one of the fields included in the standard data, but not in the NRT data, is “thermal anomaly type” - basically a single digit code that indicates what type of fires NASA’s algorithms detected in the image. The type fields are as follows:

  • 0 = presumed vegetation fire (wildfires, grassfires, crop burning, etc)

  • 1 = active volcano (pretty self explanatory)

  • 2 = other static land source (refineries, power plants, etc)

  • 3 = offshore detection (ship exhaust, offshore refinery flaring, etc)

The field that piqued my interest is field 1 - active volcano. It got me thinking…since that field is only available in the standard data months after an image was taken (and not available in the NRT data at all), is there a way to map historical volcano detections from the standard data to the NRT data? In other words, could I build a quick and dirty technique to detect volcanoes in close to real time?

As it turns out, with a little Python and the data provided by FIRMS, it’s not too difficult.

To be clear, there are already some tools to do this, such as Hawaii’s MODVOLC platform, which uses the lower-resolution MODIS instrument suite to detect volcanoes in near-real time. However, the site’s search function is tricky and there’s no “recent” section, meaning it’s difficult to find a complete list of up to date volcano eruptions. Plus, MODIS’s lower resolution means it may miss some smaller flaring or venting.

To build a near-real time volcano detection machine, the first thing I had to do was find and download the relevant data. Since I didn’t want to use all of my computer’s memory, I decided to just focus on a single country. I chose Indonesia because it has the world’s most active volcanoes, as well as a track record of recent, destructive volcanoes. (If you want to replicate this yourself, you could really choose any country or location - Alaska or Iceland were my second and third choices.)

The data is available from three different instrument suites (two VIIRS and one MODIS) on the FIRMS website under the “Text Files” tab at the bottom of the page. I inspected each of the South East Asia 24 hour files and, with blessings to whichever government functionary set this page up, found that the file names are unchanged day-over-day and located in the same spot on the website:

That makes it incredibly easy to scrape and download files from the site, which I did here, after importing the requisite Python packages:

(Note: I am still a huge Python noob so if there’s an easier way to do this step (a loop?) or any step below, reply or comment!)

I confirmed that the scrape worked and that the daily files were downloaded to my working directory:

We truly love to see it.

However, these files were useless, for my purposes at least, without knowing where the volcanoes within them were located. The next step, then, was to figure out where volcanoes in Indonesia had erupted in the past.

To build a historical baseline of Indonesia’s volcanoes, I first downloaded the FIRMS archived, standard processed data from the VIIRS instrument suite on the SNPP satellite for Indonesia between 2012 and 2019. Note that this baseline will only include volcanoes detected by remote sensing between those years - but I don’t think this is a huge issue because, as I said, I’m just looking for a quick and dirty method, nothing too scientific. Plus, a quick Wikipedia search shows there have been plenty of volcano eruptions in Indonesia since 2012.

I read in and stitched together the eight yearly files. A basic statistics pull shows the composite file is absolutely enormous - after all, this file contains a line for every single fire detected in Indonesia by NASA since 2012.

Next up is some basic data cleaning. I needed to get rid of anything that wasn’t a volcano eruption, i.e. anything that didn’t match fire type 1. I also wanted to round the latitude and longitude columns to two decimal points. Doing so will allow me to broaden the aperture around each volcano. Based on coordinate decimal distances, I’ll essentially be able to filter the incoming files for fires within 1.11 kilometers of each previously-detected fire (but more on that later…).

Last, I needed to combine the rounded latitude and longitude columns and add them into a single “coords” column, which will be the index off which I’ll filter incoming fire detections.

With all the data cleaning done, I saved the new volcanoes master file to my project directory.

Next comes the fun part: seeing if we can actually use the volcanoes master file to detect active volcanoes in each near-real time set of fire data.

I began by importing the four relevant files: the volcanoes master fire described above, as well as the three daily fires files scraped from the FIRMS website.

I concatenated the three daily fires files into one file to make it easier to work with, because one file is, uh, three times easier to manipulate than three files. I also cleaned the data in the same way as the volcano master data by rounding and combining the latitude and longitude columns into a third column, again called “coords”.

Next, I changed the “coords” column in the volcanoes master file to a list and filtered the “coords” column in the daily fires file by the coordinates in that master list. If all goes according to plan, this step will be where the magic happens. It should return a short list of every coordinate pair at which a fire was detected in the previous 24 hours within 1.11 kilometers of a previously-detected volcano in Indonesia.

But, before getting there, I wanted to make the final product look prettier. To do so, I filtered the output by acquisition date and time so the volcano detection list would show the most recent volcanoes at the bottom of the list. I also dropped duplicate detection rows and a bunch of irrelevant columns.

After all that….would my code work….? Yes!!

There it is - a brief, succinct list of all the volcanoes detected in Indonesia in the past 24 hours. To validate the output, I took a few of the coordinate pairs and searched them on Google Earth to make sure that whatever was there was at, or at least near, a volcano.

Sure looks like those coordinates correspond to Lewotolo (top/oldest line in the output) and Mount Merapi (bottom/most recent line in the output) to me.

To double validate it, I checked Twitter to see if anyone was talking about eruptions at either volcano. Sure enough, users had posted about both Lewotolo and Mount Merapi erupting between February 25th and 27th:

Even better, this system downloads and re-saves the files under the same naming convention and file path every time the code is run, removing the need for me to delete and re-save files for the code to work. For example, I ran the same code 15 minutes after running it the first time and, without me doing anything different, it detected a new volcano:

For any aspiring volcanologists out there, the new coordinates correspond to an eruption at Mount Sinabung. According to local news, the volcano erupted on the 26th, but apparently it’s still erupting enough on the 27th to be detected by satellite.

Three brief notes about the output table. First, the “confidence” column refers to how certain NASA’s algorithms are that a fire was actually detected at that location. The VIIRS systems rank confidence on a low, nominal, and high scale, while the MODIS system ranks confidence from 0, meaning no confidence, to 100, meaning virtual certainty.

Second, the “frp” column refers to the fire’s radiative power, measured in megawatts. The higher the frp, the hotter the fire detected by the satellite.

Third, FIRMS defaults the “acq_time” column to UTC, which is why it may look a little weird - after all, 18:06 hadn’t happened in my time zone by the time I ran the code. But don’t worry - the code will still detect near-real time volcanoes regardless of what time zone you’re in.

Although this machine is cool and all, there are a bunch of fun things I still want to do with the data - perhaps map the volcanoes by frp? Or turn the near-real time data into a Twitter bot? Add in other countries besides Indonesia? Either way, more to come for the mighty volcano detection machine.

Last, although I don’t have a GitHub account or anything, if you reply to this post, I’d be happy to get you the code I used for this. As I said above, if you have any thoughts, suggestions, or edits to the code I’ve shown here, I’m certainly all ears. Happy volcano hunting!