The Image Recognition Files: Part 2
Highlights from today’s post:
Revisiting my old friend: the warship image recognition model
Some basic mistakes I made in machine learning
The first version of a Twitter bot I made to tweet when it “sees” a naval vessel on a port webcam
But first! Need to convert geo coordinates into useful location information?
Also, a brief programming note: I know I missed last week’s edition and I may miss another edition or two over the next few weeks. Work is pretty busy for me now, I’m working on a few side projects (that hopefully I’ll be able to share soon!), and I’m also doing some traveling. So thanks as always for your patience.
Before we start, let me say that it gives me great pleasure to introduce…Shipspotter 1.0:
In case you haven’t read the first part of this piece, I recommend checking it out here. To recap, I trained a machine learning model to detect when naval vessels sail past port webcams. I also asked for some feedback from you all to determine where you wanted to see these predictions (a website, Twitter bot, etc) and which areas of the world they should focus on.
However, taking the mini-model running on my local machine to a more robust model running permanently, cheaply, and in the cloud was not the easiest task. Let’s dive in.
Enter the Cloud
After my previous post, we were left with a warship prediction model that was trained on @WarshipCam’s tweets, could generalize predictions pretty well, and was not location specific. I thought it would be a relatively simple matter of taking that model file, putting it into Google Cloud Storage, [insert witchcraft here], and voila, I would have a happy little Twitter bot making predictions in the cloud. To illustrate why that was clearly not the case, let’s examine my requirements. My solution had to:
Focus on webcams in Western Europe that were not videos. Still images only!
[obviously] Give correct predictions of warship/not warship
Tweet those predictions without running afoul of any of Twitter’s bot limits
Run without any intervention whatsoever by me
Be cheap (I was not going to spend more than a few bucks per year on it)
So let’s take those in order. First: the webcams.
Part of the reason this post was delayed is because I made an absolutely tragic coding error that I noticed way too late. My first Twitter bot used Selenium - essentially a Python package that allows you to (among many other things) take screenshots of webpages. While that worked fine and dandy on my local machine, I soon found out it wouldn’t work at all in Google Cloud1.
This bug matters because it meant I would be unable to make predictions on screenshots of moving images2. Instead, I would now have to make predictions on port webcams consisting of jpeg images that refreshed every few seconds3. But I’ll get to that in a moment.
I settled on three locations in Western Europe: Rostock, Germany; Kiel, Germany; and Nuuk, Greenland. Each hosts plenty of these jpeg-based webcams and are located near shipping or naval facilities.
Next — the predictions. Getting correct predictions from the model proved to be (and remains) the most difficult part of this project. The more I thought about it, the more I realized this project would be much more like anomaly detection and less like traditional machine learning. After all, 99.9% of objects that pass in front of these webcams won’t be naval vessels. How does one train a model to correctly classify that 0.1% of naval vessels, not incorrectly classify civilian ships as naval vessels, and miss as few of the 0.1% of naval vessels as possible?
A machine learning rule of thumb is that if a human can’t decide in one or two seconds what category an image belongs in, an ML model won’t be able to either. And, much to my chagrin, it turns out there are lots of things that look just enough like warships to fool the model. Even for a human these tasks can be difficult: can you pick which of the ships in the below images is a naval vessel, which one is a ferry, which one is a sailboat, and which one is a fishing boat?4
With that challenge in mind, I used three strategies to boost the model’s accuracy. First, I actually made three models - one for each location - trained on naval and civilian vessels spotted in that area’s webcams. This tactic helps each model pick out which civilian vessels it is most likely to “see” in each area and therefore exclude those vessels from naval predictions. The tactic also helps the model to not get civilian vessels or extraneous objects in one location confused for a warship in another location. This misclassification happened far more often than you may think, particularly with cruise ships and ferries.
The second strategy involved introducing copies of images of naval vessels spotted in each location5 to help the model “learn” what naval vessels actually look like. I am, once again, deeply indebted to @WarshipCam for collecting, tweeting, and allowing me to use photos of naval vessels he spotted in each of the three webcam locations to train these models.
Finally, as mentioned in the first post, I also significantly augmented each warship image by tilting it slightly, flipping it horizontally, altering the colors a little, and so on and so forth. These allowed the model to generalize better, which means, for instance, a warship seen in two photos taken at two different angles will still be a warship.
Next up is the Twitter bot itself. While it was kind of a pain to create (I had to have a pretty lengthy email exchange with the good people who approve Twitter bot accounts), I finally ended up getting it running. There was also the whole authorization issue, which, while it briefly drove me mad, ended up getting happily resolved:
After a bit of tinkering, I got the bot set up on my local machine and tweeting correct-ish predictions. But the fact that it was tweeting at all allowed me to move onto the next step: hosting the model in the cloud.
To do so, I explored a few Google products. I chose Google because it has a solid free tier, great documentation, lots of tutorials for newbies like me, and plenty of options for hosting ML models.
I first looked at Vertex AI, Google’s purpose-built machine learning platform. While it has a lot of neat and out-of-the-box ML features, I soon realized it would be way too expensive to host a model on Vertex AI and get predictions from it. Hosting just one of my models - sans predictions - on Vertex AI for only two days cost me $12! That would work out to thousands of dollars per year. Highway robbery for someone like me!
So I started casting around for cheaper options and eventually landed on what’s known as serverless architecture. This would make my process slightly more complicated and error-prone but would be cheap as dirt. Here’s how it looks in practice:
I uploaded my three locally trained models to a Google Cloud Storage bucket and wrote one Google Cloud Function (which is basically a snippet of code) for each location. The functions would scrape the most recent jpeg images from each location’s webcams, ask the relevant model hosted in Cloud Storage if the image contained a warship, and, if so, tweet the image out. I also wrote three separate Cloud Scheduler tasks to run each workflow every few minutes during daylight hours in each location.
In contrast to Vertex AI, the serverless solution has cost me exactly $1.35 over the past month of beta testing. This includes hosting the models, running the functions hundreds of times per day, and scheduling each task.
While the model isn’t perfect - in fact, it still makes a decent amount of incorrect predictions - it does regularly and correctly identify warships in webcams.
So let’s look at some ships.
The model’s very first correct prediction was, for me, even more encouraging because it actually shows a civilian vessel and a naval vessel in the same frame near Kiel, Germany. The model was nonetheless able to pick out the naval vessel while not getting confused by the civilian one (you have to click on the tweet to see the naval vessel to the far left).
A few days later, it also picked out a much clearer image of the German Navy minesweeper Siegburg near Kiel:
And between April 25 and April 26, it caught a few good shots of the Danish Navy patrol vessel HDMS Lauge Koch docked in Nuuk, Greenland.
If you go digging through the account, you’ll find a few other interesting spots (more German naval vessels, a police or coast guard boat, etc).
But as I’m sure you can tell, this is only my first version of the bot. I’d love some feedback from you all on a couple different things. First of all, is the confidence percentage helpful or distracting? I feel like it gives a good indication of what’s going right (or wrong!) with the model’s predictions but not everyone may feel the same.
Is the bot tweeting too much? Too little? If the former, what would be a good cadence? If the latter, are there other jpeg-based webcams I should take a look at - either in Europe or elsewhere?
Finally, if you’re a Python guru, is there a cloud-based way to use Selenium (or a similar package) to grab images of video webcams? Doing so would allow me to dramatically expand the number and range of port webcams I’m currently scraping.
Naturally, I’ll also keep tweaking and refining the model itself so that the false positives gradually diminish. If you have any feedback at all, hit my Twitter DMs, email me at lineofactualcontrol @ protonmail dot com, or just drop a comment on this post.
In the meantime, feel free to follow, like, retweet, or cyberbully Shipspotter 1.0!
See here for all the nerds (myself now included) who have been mad about this for the past three years.
Left to right: ferry, sailboat, fishing boat, naval vessel, assorted other sailboats
Google tells me this tool is known as “oversampling”