In this post we’ll delve into the data science portion of “Ethics in Data Science”. We’ll look at our current system of self-regulation, possible models of external regulation, and the implications for ethical data scientists. This is the second in a series. If you haven’t already, catch up and read part one.
Uncle Bob’s talk
Robert Cecil Martin aka Uncle Bob gave a great talk in May 2016 titled “The Future of Programming.”
One of the main stories he shares in his talk is that of the Volkswagen scandal. For those that don’t remember, in 2015 the public found out that Volkswagen had added illegal software. The ‘cheat device’ was used to changed the behavior of the car when it underwent emissions tests. This way the car was able to pass as a ‘clean air’ emissions diesel vehicle, when in fact it was not.
Subsequently Michael Horn, CEO of VW’s American division, testified at a congressional hearing in October 2015 and deflected the issue. As this was first coming to light Horn said “This was not a corporate decision, from my point of view, and to my best knowledge today…This was a couple of software engineers who put this in for whatever reasons.”
They Probably Knew
Robert Cecil Martin, aka Uncle Bob, brought this statement up in his talk titled the Future of Programming. Uncle Bob said:
“The weird thing is, it was software developers who wrote that code. It was us. Some programmers wrote cheating code. Do you think they knew? I think they probably knew.”
Martin was right that they “probably knew.” Uncle Bob’s talk happened around May 2016. Four months later, Ars Technica published this article about an engineer at Volkswagen that pleaded guilty in this emissions scandal.
As part of his plea bargain and agreement to cooperate with authorities, the truth was disclosed. The use of a defeat device to beat emissions tests was in the final calibration and refinement stages in 2008. Subsequently it began to show up in 2009 model VW diesels. Therefore it was likely part of the conversation when the project began in 2006. In short, this scandal was probably created and hidden for ten years.
Data Scientists as Ethical Gatekeepers
In his talk, Martin said “software developers wrote cheating code.” It wasn’t just data scientists and software developers that knew about this cheat. There were a lot more people involved in the scandal than just a couple of software developers as Horn insinuated.
Someone had to write the code. People had to take the vehicles out on the road and test drive them to make sure the test mode vs the normal mode worked properly. There were even bugs in the software when the normal use would get “stuck” in the maintenance mode that required a patch to be pushed. A lot of people were involved in this.
Despite the many players involved, the tech that made a cheat possible was squarely on data scientists. This cheat code was written by a data scientist, or made possible by one.
Martin would have been more accurate to say “data scientists wrote cheating code” or “software developers wrote cheating code using data science.” That is actually more precise, and even more concerning than a mere software issue.
Data Scientists Scope and View
This Venn Diagram is derived from a similar one Drew Conway posted online in 2013. It describes what makes a data scientist (or a big data engineer). Only “Software Engineering” is highlighted because that was the message of Uncle Bob’s talk.
This is a data science issue. The reason can be found in an article the Washington post published to explain the tech behind the scandal.
By measuring how long the engine was running, the vehicle’s speeds, and even seemingly esoteric factors, such as the position of the steering wheel and barometric pressure, Volkswagen vehicles could understand they were being tested and so adjusted their engine performance to pass the tests, according to the EPA.
Volkswagen was a Data Science Scandal
That’s why I think of the Volkswagen incident as a data science scandal. The individuals involved had domain expertise in the auto industry. They had the skills to write the code. Additionally, they would need to use math and statistics to develop the model to detect the difference between a pattern that resembled normal use and that of a vehicle being tested.
According to a Business Insider article, Uncle Bob ended his speech on with a message:
Martin finished with a fire-and-brimstone call to action in which he warned that one day, some software developer will do something that will cause a disaster that kills tens of thousands of people.
…what developers really need is an organization that governs and regulates their profession like other industries have…
The Myth of Self Regulation
In other words, Uncle Bob argued that programmers need to self-regulate, before someone else does it for us. Perhaps self-regulation will be more effective than outside regulation. Regulation in general is necessary, but not sufficient. Three reasons for this:
- The industries used in these examples were already heavily regulated. In fact the scandal was Volkswagen’s attempt to circumvent current regulations! The addition of more regulation, even in an adjacent sector like technology, wouldn’t curtail this.
- Self-regulation was already attempted at the company level, through the code of conduct, but I’ll point out why that is not useful.
- The software technology industry already has self-regulating organizations for our profession with ethical standards and codes of conduct.
Enron vs Volkswagen
Two companies most famous for ethical violations, in industries that are heavily regulated, also had printed and published codes of conduct prior to their scandals.
“Civilization depends on us,” said Martin. “Civilization doesn’t know that yet.”
That’s why the call-to-action at the end of the speech is that we in the software community self-regulate ourselves. Like other industries, such as medicine or law, there should be rules and guidelines for what we can and cannot do.
The problem is this scandal occurred in an industry already heavily regulated. In fact Volkswagen’s entire scandal may not have taken place if they hadn’t felt the need to cheat against current regulations! True, the regulation was how Volkswagen was caught, but it also created the reason for the cheat in the first place.
I read an article about the scandal by Michael Miller who published on bizjournals.com. The article is titled “Enron’s ethics code read like fiction”:
Imagine finding a copy of the Titanic’s “Safety at Sea” manual. It would be a sadly fascinating, ironic, morbid read.
When the invaluable gang at TheSmokingGun.com posted a copy of Enron’s official Code of Ethics, I assumed it would be a three-page document, including the cover and a blank page for notes. But the July 2000 booklet is nearly 65 pages of take-the-high-road legalese that must have made employees feel they were working for the Vatican or some other equally pure and clean organization.
That’s kind of what it felt like to me when I read the 2010–2015 version of the Volkswagen Code of Ethics. Filings with the SEC confirm Volkswagen had a code of ethics since at least February 2010. A detail I found funny is how immediately after the scandal Volkswagen published a new code of ethics, and the front cover showed the date 2016 and the words “first edition.”
Ignore the Rules
Now that we’ve ruled out “code of conduct” publications, I’d like to share a story in this Business Insider article about a young programmer who landed a job to build a website for a pharmaceutical company:
… [The programer] was duped into helping the company skirt drug advertising laws in order to persuade young women to take a particular drug.
He later found out the drug was known to worsen depression and at least one young woman committed suicide while on it. He found out his sister was taking the drug and warned her off it.
Decades later, he still feels guilty about it, he told Business Insider. And he was inspired to write the post after he viewed a talk by Robert Martin, called “The Future of Programming.”
Here is yet another example of an industry that is heavily regulated, the pharmaceutical industry. From an individual’s perspective a code of ethics published by your employer is not enough. A company that has a code of ethics published does not protect you as a worker. To work in an industry that is heavily regulated is not enough either.
I think of these examples as a call-to-arms for self-regulation or for outside regulatory agencies to come down on the software industry is problematic. Like automotive and finance, the pharmaceutical industry is pretty heavily regulated. Adding more regulation on top of that may be necessary, but not sufficient.
In the next post I’ll share a better explanation, and go into more detail of what we data scientists can try to do as individuals.
“Ethics: A Data Scientist’s Perspective” is based on a talk given at the Ethics in Data Science meetup on June 2017 in San Francisco. This series is a version of that talk.
You can view the original slide deck on SlideShare here.
- How to Use Random Seeds Effectively
- Data Analysis in Action: LA Traffic
- Why You Should Switch to Time Series Analysis
Mike Zawitkowski is a full stack data scientist. He has worked with big data and machine learning problems since 1999—before “big data” and “machine learning” were trendy.
If you want to work with the Acorn Team fill out the Contact Us form.