Resurrecting a project that almost made me insane
Greetings, old subscribers and new! Line of Actual Control is fresh off being featured on Substack’s Discover page last week, so I offer a sincere welcome to everyone who found the blog through that. If you like today’s post, feel free to share or subscribe at the bottom. Now onto the post.
Today, I am proud to introduce @volcano_bot!
It scrapes data from NASA’s FIRMS site to find new volcano eruptions in Indonesia and tweet them out in close to real time. Follow it, mock it, retweet it, or DM me with suggestions and technical support for it (Lord knows I need it...).
Or simply say “thank you, volcano_bot, for your tweets and for not crashing”
“Longtime” readers of this blog may remember my post from early March about using NASA’s FIRMS fire data and Python to build a system that detects volcanoes in Indonesia in as close to real time as possible.
I don’t want to rehash the whole piece, but the gist is that I used NASA’s archived “thermal anomaly” (i.e. fires) satellite data to figure out where in the world volcanoes are located and then use NASA’s near-real time fires data to flag fires in those locations (which, 99.99% of the time, are actively erupting volcanoes).
While that system was all well and good, it wasn’t exactly user friendly. It spat out tables of data that were chunky, non-chronological, and just not very pretty to look at.
To paraphrase the Simpsons: “Smithers, are they booing me [because this table looks so bad]?”
You also need the code itself to make it work. Since I wrote it in a Jupyter notebook, there was no way (as far as I knew) to keep it running and, therefore, detecting volcanoes when I shut my computer down or closed the notebook. To get the full value of the method, I needed a way to provide a permanent, running tally of the volcanoes detected in Indonesia.
Enter the Twitter Bot
I love a good Twitter bot: AggiBot, Dictator Alert, Editing the Gray Lady - I love them all. Twitter’s open API and public data sets let even coding neophytes build bots like these that tweet rich, unique information on a regular schedule. And, as it turns out, a cloud-native Twitter bot is a perfect way to track and publish the location of volcanoes in near-real time.
There were only two problems: I had never built a Twitter bot before, nor had I ever programmed on the cloud.
But with a few how-to guides and a lot of trial-and-error, neither obstacle was insurmountable.
Before embarking on my journey to the cloud, I realized that a lot of my code from February was far more complicated than it needed to be. To take just one example, I initially spent hours trying to scrape the FIRMS website and save the results to a csv file. In reality, all I needed to do was use one of the most basic Pandas/Python commands out there: pd.read_csv()
In doing so, I went from this…
After a few other similar edits, I had my code condensed and simplified enough to bring to the cloud.
Enter the Cloud
I evaluated a few different cloud options but eventually settled on Google Cloud because it offered tons of how-to guides, allowed for the easy import of packages required by the code, and, perhaps most importantly, was priced at the attractive price point of “free”.
I’m not sure how computer science professors describe the cloud, but to me, it’s basically like if your home machine was a million times more powerful, always turned on, included a mind-boggling amount of storage, and was located in a server bank in the kind of wide open spaces that also host the US National Security Agency’s nerve center. Cloud computing can be accessed on your home machine through online services provided by Google, Amazon, or other tech companies.
For my purposes, I would need three Google Cloud tools: storage, a function, and a scheduler.
First (and most straightforward) is storage. To figure out where historic volcano eruptions have occurred, we need to reference a 45,000-plus line file containing the location of every volcano eruption detected in Indonesia since 2012. I uploaded that file to a Google Cloud Storage bucket without any issues:
Second (and least straightforward) is the function - this is where the magic happens. After setting some basic parameters for my Google Cloud Function (naming conventions, environment variables, figuring out what the heck a requirements.txt file was, etc.), I copied my Python 3.7 code from my Jupyter notebook and pasted it into the function.
First, it imports all the required packages, sets environment variables, and authorizes Tweepy, which is the Python library that will actually let volcano_bot tweet.
(for the record, these screenshots don’t show the whole code - if you want the whole thing, just let me know and I’ll send it to you)
Next, it scrapes the FIRMS site for the most recent fire detections, imports the volcanoes database from the storage bucket I mentioned above (which is the bit in the red box below), and cleans the data so all of the files are in the same format.
Third, it filters the near-real time fires data for locations in which volcanoes had historically been observed, saves that filtered data in a new table, cleans and organizes it, plugs the locations into a common tweet format (“Volcano detected at [time in UTC] on [date] at lat/long [location in latitude, longitude format]”) and finally tweets it out for the whole world to see.
After all that, would my code work?
Nope! Here is but a sampling of the errors my code threw in the many, many times I tried to get it to work:
I kid you not, the code once threw nearly 2,000 errors in a single run. Truly impressive stuff. But perseverance paid off! After not one, not two, but, uh, thirty tries, my code successfully deployed in the cloud and ran without errors.
I wasn’t joking - it literally took 30 tries.
The logs finally showed a 200 status code, indicating that the code ran fine.
I manically checked the bot’s Twitter account to confirm that no, my eyes were not deceiving me, and yes, the bot had worked as intended.
Following that absolute trial, the final step was a breeze. The last thing I needed to do was use Google Cloud Scheduler to schedule when the bot would run.
I quickly wrote a job that would trigger the bot to check for new Indonesian volcano eruptions every 15 minutes.
The satellites NASA uses only pass Indonesia two or three times a day, meaning that even though the scheduler will check for new volcanoes every 15 minutes, the bot will only tweet when the satellites detect new volcanoes after each overpass.
Since Twitter doesn’t allow bots to tweet the same tweet twice in a row, the code will technically crash most times it runs. This is because every 15 minutes the code will only detect the volcanoes that the bot has already tweeted about (which would result in a duplicate tweet) until the next satellite overpass. But that’s fine by me, as long as the scheduler re-tries the code every 15 minutes until it detects new eruptions.
To close things up, let’s take a look at a volcano the bot detected here at -8.11, 112.92.
A quick social media search confirms that Mt. Sinabung, the volcano located there, had indeed erupted on April 20:
So there you have it - a happy little Twitter bot hanging out in the cloud, tweeting about volcanoes in Indonesia. As always if you liked, hated, or have feedback for this piece, don’t hesitate to comment below, share it with your friends, or reach out (to me, not to volcano_bot) on Twitter!