The Toxic Practice of Pragmatic Data Science

4 mn read

Disclaimer: The following article is an original work by the author whose opinions expressed are his own and are not endorsed by his employer or any of the organizations that he’s a part of.

The Opening Quote:

“To name a thing is to acknowledge its existence as separate from everything else that has a name; to confer upon it the dignity of autonomy while at the same time affirming its belonging with the rest of the namable world; to transform its strangeness into familiarity, which is the root of empathy.” – Maria Popova

The Scenario

You’re a data scientist, data analyst, business intelligence analyst, or another name for someone skilled in the data crunching department. You’re given a project where the answer ascertained by the end would point to favoring certain business/organizational decisions over others. For example, it could be reviewing previous years’ performance for notable growth or decline to even forecasting the trends.

Now you, intrepid data person, have done the best you can in considering the best methodology for the problem. You clearly stated your hypotheses up front, explained why certain variables were operationalized, and once the results were in, carefully stated the ways one could interpret the results. Which included why you interpreted the results based on your hypotheses and previous research and stating the biases that could lead to you having interpreted the results incorrectly.

You give that gold star report to the folks who commissioned the research hoping that it could be of use in their decision-making but knowing that there can be other variables that could lead them to make a decision counter to the suggestions in the report (which is absolutely fine). The ideal for the data scientist is not in being the one with the “right answer” and considering any decision counter to the suggestions wrong; the ideal is to always present something that is helpful to the decision makers’ decisions by trying to capture the realities of the research question with the best methods one can and let them know what could be biased or missing from the picture.

But alas, that is the ideal scenario but not necessarily the frequent situation I’m writing up on today. The “toxic” situation I wanted to bring up today is where one of the following scenarios happens after sharing the report:

Scenario 1:

The folks who commissioned the report are delighted that the results pointed to the decision that they really wanted to go with and they make sure to include excerpts of the report on further meetings with their bosses/stakeholders and parts that may even make it to marketing/advertising materials.

Scenario 2:

The folks who commissioned the report are disappointed that the results don’t really support the decision they are going to go with. It would’ve been nice to have some “data science” to add to the supporting proposal materials that they are going to move forward on (since many people trust what data scientists say on things). They may ask the data scientist if there is another analysis or dataset that could be checked to see if results come out that they could “use” (which if so, they can get to the Scenario 1 that they want) but if not, they thank you for your time and move forward without any data science to punch up the proposal.

What to Call this Phenomena?

So after coming up empty trying to search if there’s a name for this phenomenon (if anybody finds something that I missed, I’ll gladly use the term over what I propose) and vacillating over other candidate names (e.g. defensive data science, post-hoc data science, egotistical data science), I’m going with “pragmatic data science” because it’s using data science as a tool for the goals that one wants (usually what would help the person professionally or what would most please certain stakeholders) with secondary or no regard on getting after the “truth”.

With pragmatic data science, the results are only as helpful as they are useful for getting what one wants. There is no decision that one needs help in making; the data science research is purely to get tidbits that can be used when pitching the decision. In this scenario, the data scientist is at best a member of the marketing team by providing some good stats to make the already-decided decision look more like a great idea; at worst, the data scientist will cast out “trying to capture the realities of the research question with the best methods” for doing whatever they need to do with the data to make their bosses happy and keep their job. This toxicity is unfortunately all too common but I think because there hasn’t been a common name for it, it’s been hard to bring it up and call it out.

Now that We Have a Name for it…

I experienced earlier this year what the power of giving something a name can do for getting something done about it. I wrote on “the cyber binary” which is the implicit assumption to set up cyber scenarios into an attack/defend dichotomy (which I argue in the paper absolutely breaks down in cyber influence operations). When I brought this concept up with cyber professionals, they all knew exactly what I was talking about and when I asked them if they knew a common name for the assumption, nobody knew of one. Since having a working name to go with the concept, there have been healthy conversations amongst cyber and cognitive security professionals on when and where the cyber binary assumption is appropriate.

One of my main goals for my contributions to the Average Geniuses site will be trying to encapsulate some of these “nameless invisible” phenomena that are (unfortunately) very familiar once they’re brought up. The hope is that with a name, phenomena like pragmatic data science can be better identified and called out for the toxic practice that it is and with its existence acknowledged, get to the root of empathy that the Opening Quote mentions.