I was reading a recent post on machine learning from one of my favourite technical writers (Julia Evans) and was inspired enough to comment that I signed up for Disqus under my own name for a change 🙂
However, it turns out that the post was closed for comments, so I thought that I’d write up my thoughts here. I’m pretty busy today so I’m just going to paste it as a quote rather than edit it into a more ‘first-person’ writing style.
This is a pretty interesting topic overall – I have largely avoided machine learning b/c of a lot of concerns about how it’s used (and its results assumed to be ‘normal’ or ‘correct’) but recently decided that that’s the wrong approach. Instead, I’m going to learn more about it so I’m in a better position to critique how it’s used and point out implicit assumptions and biases. To that end, I’ve signed up for the Stanford course that just started.
Even in the first lesson in that course, I saw some interesting examples that made my skin crawl. I think the biggest issue I have with it is (and I am JUST starting to learn) is that there’s an implicit assumption that all relevant information is externally observable and that conclusions drawn from objectively measurable data/behaviour will be correct. I’m fine with that when it involves some kinds of events, but I get very uncomfortable when we’re applying it to humans. So much of human motivation is invisible/intuitive that leaning so heavily on machine learning (which necessarily relies on events that are can be observed by others & fed into algorithms) leads to things like, as you say, the Target pregnancy issue. There are many other ‘positive feedback’ effects of assuming that reinforcing/strengthening conclusions based on visibly available data that are detrimental – gender-segregated toys is a primary one. “65% of people buying for girls bought X, so when someone is shopping for a girl, we’ll suggest X. Look! Now 75% of people shopping for girls bought X – we were right! MOAR X for the girls!” [eventually, the ‘for girls’ section is nothing but X in various sizes, shapes, and colours (all colours a variant shade of pink)]
Another ML issue that came up for discussion when I was working at Amazon was: some people consider LGBTQ topics to be inappropriate for some readers, so even if someone had actively searched for LGBTQ fiction, the recommendation engine was instructed to NOT suggest those titles. That has the effect of erasing representation for marginalized people and increasing isolation among those who are already isolated. In fact, one could argue that one of the things that ML does best is erase the margins (obviously, depending on how it’s implemented, but in the rush to solve all problems with it, these types of questions seem to be ignored a lot).
I mentioned positive feedback loops before. The analogy I have in my head is: ML type algorithms (unless you build in randomness & jitter) amplify differences until you end up with a square wave – flat and of two polar opposite values. Negative feedback loops lead to dynamic yet stable equilibriums.
I mean, it’s obviously here to stay, and it clearly has some very significant beneficial use cases. But there are a lot of ethical questions that aren’t getting a lot of attention and I’d love to see more focus on that over the technical ones. Thank you for mentioning them 🙂
The more time I spend in this industry, the more I believe that the one Computational Ethics course I took back in my CS degree wasn’t nearly enough, and that we could really use a much broader conversation in that area. [To that end, I’ve also signed up for some philosophy courses to go with the ML one ;)]