Good Hits and Splashes
A data-driven look at the failures of “the most precise air campaign in history”
Highlights from today’s post:
A closer look at the New York Times’ data on civilian casualties from US airstrikes in Iraq and Syria
How open-source methods can enhance good old fashioned shoe leather reporting
Data scraping, visualization, and analysis showing the errant airstrikes that targeted Mosul, Iraq and Raqqa, Syria
On December 18, the New York Times published a landmark investigation into civilian casualties caused by US airstrikes during the war against ISIS in Syria and Iraq1. The story outlines how the military’s processes for approving, launching, and investigating airstrikes were nowhere near robust enough to prevent innocent people from getting hurt or killed.
“The promise was a war waged by all-seeing drones and precision bombs. The documents show flawed intelligence, faulty targeting, years of civilian deaths — and scant accountability”
-The New York Times’ Civilian Casualty Files
Among the most compelling details in the story are hundreds of US Department of Defense internal reports and press releases published in the wake of airstrikes that allegedly harmed civilians. In a move of genuine transparency, the team of reporters who worked on the story made all of these documents and highlights from them — like the strike’s date, location, and casualties — public.
As strong as the Times’ story is, it’s a bit difficult to make sense of any trends in the data. There aren’t any charts to summarize significant details or graphs to visually represent important themes. To remedy that, I scraped the airstrike data presented in the Times’ tables and chopped it up to see what I could find.
Scraping The Site
I wrote a tweet thread on this the other day, but the bottom line is the New York Times’ site is not particularly easy to scrape.
The basic components of modern websites consist of text, images, and other pieces of html content divided into various “tags”. Images are represented by the <img> tag, page or section headers are represented by the <title> tag, text can be represented by the <p> (paragraph) tag, and so on.
Tables found on websites are often structured using <table> tags. However, the Times’ airstrike tables are not organized by <table> tags2. Instead, its content is organized by <div> tags, which, on the Times’ page, simply separate pieces of <p> tagged text from other pieces of <p> tagged text.
In other words, each individual data point has no defined, tabular relationship with any other row, column, or data point in the rest of the table. By contrast, here’s a (simplified) diagram showing what that same table organized under a <table> tag would look like.
With a <table> tag, scraping is a relatively simple matter. A programmer can basically scrape all content that falls under the <table> tag as one unit with the Python command pd.read_html() and call it a day3. With the <div> and <p> tags on the Times’ site, however, it’s a different matter. The programmer instead has to explicitly tell the scraper where to find the content (<div> tags) they’re interested in, what it looks like, and where to put it.4 For data like the Times’ table, finding the correct <div> tags to scrape can be a tall order since there can be dozens - if not hundreds - of tags per web page.
I wrote the code below to square that particular circle. It essentially searches the text in the Times’ URL, organizes it by html tag, selects the text under the appropriate <div> tags, and exports that text to a DataFrame.
Running the code produces a clear, organized table that I opened in Google Sheets for further analysis.
Strike Data
The US Department of Defense divides reports of civilian casualties into those it deems credible and those it deems non-credible. Both types are presented in the Times’ report and I scraped both data sets into the Google Sheet.
Here is every single one of the DOD’s credible reports of civilian casualties during the four-and-a-half years covered by the Times’ investigation.
2017 was clearly the deadliest year of the air war for civilians, which corresponds with the coalition’s dual assaults on the ISIS-held cities of Mosul, Iraq, and Raqqa, Syria. The DOD assessed about five civilians were killed and three civilians were injured on average per report, although there were several outlying instances in which 40, 70, or even up to 105 civilians were killed in strikes.
Put differently, late 2016 to late 2017 was so hazardous for civilians in Iraq and Syria that there were, on average, more than three separate reports of civilians hurt or killed in airstrikes per day between mid-October 2016 and mid-October 20175.
But where were these errant airstrikes actually occurring? In short, in Mosul, Iraq and Raqqa, Syria. These two cities experienced more airstrikes and civilian casualties than all remaining locations with civilian casualty allegations combined. The DOD assessed that 988 civilians were killed or injured in Mosul and Raqqa, while 762 civilians were killed or injured in the remaining 78 locations targeted by airstrikes that, according to the DOD, credibly caused civilian casualties.
Below are two maps showing the intense concentration of airstrikes in the two cities. The red circles in the top map are sized by the overall number of civilians hurt and killed (assessed as credible by the DOD) in each location. The bottom map shows the raw number of airstrikes (both credible and non-credible) that allegedly harmed civilians in each location.
Note the differences in the two maps. Far more allegations of civilian casualties were made in rural areas (i.e. the second map, showing plenty of red dots outside major urban centers) than the DOD found credible (i.e. the first map, showing the DOD found civilian casualties from airstrikes were mainly concentrated in Mosul, Raqqa, and a string of villages in southwest Deir Ez Zor).
Does this mean the DOD was miscounting or ignoring civilian casualties from airstrikes that weren’t witnessed by hundreds of people in cities? No, but the discrepancy between rural and urban civilian casualties is a trend that merits further study.
By either metric, innocent people in Mosul and Raqqa were targeted by an astonishing amount of indiscriminate aerial death and destruction.
Of course, there are confounding factors at play. For one, airstrikes in cities are inherently more visible than those in rural areas. If more people witness an airstrike - especially one that kills civilians - it is more likely to be reported and therefore investigated by the DOD.
Additionally, these airstrikes are only those that the DOD themselves has assessed as having harmed civilians. Other observers, like the Times and the British NGO Airwars, have compiled datasets showing considerably more civilians were harmed than the DOD claims. Since the DOD is, assumedly, interested in protecting both their reputation and the narrative that the counter-ISIS campaign was, according to Barack Obama, “the most precise air campaign in history”, the DOD’s figures should be seen as the absolute lowest estimate of civilian casualties in Iraq and Syria.
Finally, these figures only include US airstrikes and leave out those conducted by coalition partners like the UK, Netherlands, and Belgium. While the US is responsible for the bulk of the airstrikes against ISIS (and the bulk of the resulting civilian casualties), other coalition members occasionally launched incredibly deadly airstrikes as well.
While the coalition’s military campaign must be commended for destroying ISIS, killing its leaders, and freeing millions from the group’s reign of terror, the same forces must also be held accountable for the astonishing number of innocent people who lost their lives during the course of the operation.
The Times’ investigation into the failures of the US-led air war in Syria and Iraq bolsters the causes of transparency and accountability in military operations. Hopefully this blog post served to enhance their methods and illustrate wider trends represented by the Times’ compelling accounts of people hurt and killed in the effort to topple ISIS.
For further reading, there’s a great Stack Exchange thread on why <table> tags seem to be disappearing. There are a few interesting possibilities - one of which is that non-table-tagged content is easier to display on smartphones, tablets, and smaller screens!
Which is, in fact, exactly what I did for this piece about Shell’s oil spill data.
I am deeply indebted to @RichterRatio on Twitter for the double checks and keen editorial eye in this section. Follow them for a good (and educational!) time.
These dates correspond to the beginning of the Battle of Mosul in October 2016 and the end of the Battle of Raqqa in October 2017.
Hello! My name is Cate and I am a student at Washburn University in Dr. Boncella's Data Mining course. I have been reading through many of your articles and have found them very interesting! This one specifically caught my eye because you utilized a variety of code that we have used in previous data courses. I was curious about the Times' tag set up -- it would make sense for it to be purposeful in display on different screens. I appreciate the visuals you created from this data, they provide good information and are easy to interpret!