On Machine Learning and the Ethics of Computation

I was reading a recent post on machine learning from one of my favourite technical writers (Julia Evans) and was inspired enough to comment that I signed up for Disqus under my own name for a change 🙂
However, it turns out that the post was closed for comments, so I thought that I’d write up my thoughts here.  I’m pretty busy today so I’m just going to paste it as a quote rather than edit it into a more ‘first-person’ writing style.

This is a pretty interesting topic overall – I have largely avoided machine learning b/c of a lot of concerns about how it’s used (and its results assumed to be ‘normal’ or ‘correct’) but recently decided that that’s the wrong approach. Instead, I’m going to learn more about it so I’m in a better position to critique how it’s used and point out implicit assumptions and biases. To that end, I’ve signed up for the Stanford course that just started.

Even in the first lesson in that course, I saw some interesting examples that made my skin crawl. I think the biggest issue I have with it is (and I am JUST starting to learn) is that there’s an implicit assumption that all relevant information is externally observable and that conclusions drawn from objectively measurable data/behaviour will be correct. I’m fine with that when it involves some kinds of events, but I get very uncomfortable when we’re applying it to humans. So much of human motivation is invisible/intuitive that leaning so heavily on machine learning (which necessarily relies on events that are can be observed by others & fed into algorithms) leads to things like, as you say, the Target pregnancy issue. There are many other ‘positive feedback’ effects of assuming that reinforcing/strengthening conclusions based on visibly available data that are detrimental – gender-segregated toys is a primary one. “65% of people buying for girls bought X, so when someone is shopping for a girl, we’ll suggest X. Look! Now 75% of people shopping for girls bought X – we were right! MOAR X for the girls!” [eventually, the ‘for girls’ section is nothing but X in various sizes, shapes, and colours (all colours a variant shade of pink)]

Another ML issue that came up for discussion when I was working at Amazon was: some people consider LGBTQ topics to be inappropriate for some readers, so even if someone had actively searched for LGBTQ fiction, the recommendation engine was instructed to NOT suggest those titles. That has the effect of erasing representation for marginalized people and increasing isolation among those who are already isolated. In fact, one could argue that one of the things that ML does best is erase the margins (obviously, depending on how it’s implemented, but in the rush to solve all problems with it, these types of questions seem to be ignored a lot).

I mentioned positive feedback loops before. The analogy I have in my head is: ML type algorithms (unless you build in randomness & jitter) amplify differences until you end up with a square wave – flat and of two polar opposite values. Negative feedback loops lead to dynamic yet stable equilibriums.

I mean, it’s obviously here to stay, and it clearly has some very significant beneficial use cases. But there are a lot of ethical questions that aren’t getting a lot of attention and I’d love to see more focus on that over the technical ones. Thank you for mentioning them 🙂

The more time I spend in this industry, the more I believe that the one Computational Ethics course I took back in my CS degree wasn’t nearly enough, and that we could really use a much broader conversation in that area. [To that end, I’ve also signed up for some philosophy courses to go with the ML one ;)]

Building SelfControl from [REAL] scratch

As someone who is easily distracted by brightly coloured shiny things, which is great when scuba diving in the tropics, but not so great when working on a pretty Retina Display Mac with all sorts of bouncing icons, infinite browser tabs, etc, I use the tools available to remove obstacles in my way. Or indeed, the ones NOT in my way but off to the side of where I want to be going but that are so tantalizing. One such tool is the open source Mac app SelfControl.

The basic functionality of SelfControl is that you set up either a blacklist of sites you don’t want to allow yourself to access for some amount of time or you go hardcore and set up a whitelist of sites that you WILl allow and ban the rest of the world as unacceptable distractions. [Aside: an anti-spam/anti-virus company I used to work for preferred the terms ‘blocklist’ and ‘allowlist’ instead of ‘blacklist’ and ‘whitelist’ for a variety of reasons, some cultural, and where I need to use them, I’ll be using those terms as well.]

With SelfControl, you  decide how long you want to focus for, set the slider for that amount of time, and press ‘Start’.

The way it blocks sites is by modifying the Mac’s hosts file (and firewall) so it needs to use admin privileges, which is why you have to enter your password. For many, that’s a decent way of it asking “Are you sure?” because the average user isn’t going to know how to undo the changes manually – that’s part of why it’s effective.

And that’s a perfectly helpful use case: person says “I need to focus for 2 hours straight, and I’m my own worst enemy, so block distractions RIGHT NOW”. But it’s not the one I’m most interested in, personally.

You see, when there’s something that I’m particularly avoiding starting (usually writing), I won’t necessarily even get to the point of starting the app. There are different tiers of self control and what I’d like to do is set myself a regular schedule with blocks at certain times of day. There’s an argument to be made that if I can’t even boot the app & click the button, I have bigger problems to sort out, but if there’s a way to make the process more structured and automatic, I’d prefer that. And I’ve heard from others that they feel the same way – they think scheduling would be useful.

There are a number of things that I’ll need to work through to get that going (not least of which is figuring out how to automate the privilege escalation on a scheduled basis – maybe cron?) but the first hurdle I had was getting the app to build at all.

I had tried to do this about 8 months ago, with NO success. I haven’t been a Mac developer at all and until fairly recently, I hadn’t been a developer for over 10 years. A lot has changed, y’all! And one of the biggest changes has been the burgeoning mass of package managers to simplify installation of apps, libraries, etc. Although I use Homebrew to install apps on my Mac, I hadn’t heard of CocoaPods and didn’t know there was a required step to run ‘pod install’ to get the required prereq libraries installed in the build directory. [It’s useful to run your build instructions past true newbies to find the steps that are SO familiar/basic that it doesn’t occur to you to write them down].

At the Recurse Center, I learned about a lot more package managers, and that there’s at least one for every platform. I already knew about npm & gem, but not pip, and definitely not pod. So I realized that there was a missing step in the SelfControl instructions – one that would be so automatic for Mac app developers that they wouldn’t consider it missing, but for someone who wanted to start their Mac OSS development with SelfControl, it was pretty crucial. Now, I don’t want to suggest that I don’t know how to search for solutions to issues, nor that you don’t. But when you run up against something where you don’t have a reference point for what’s missing, the amount of the unknown is completely unbounded. You have no idea how far away the finish line is, and if your drive to do this is hobbyist-level, you may bail, like I did last summer.

So, armed with this new knowledge, I tried again to build SelfControl from scratch. I got a bunch of failures (including the promised code-signing ones) but some of them are due to a recent Ruby change that apparently breaks CocoaPods. This post is getting long, so here’s a link to how I got through ’em. It’s ALSO long but that’s largely due to a bucketload of screenshots.

Steps to build SelfControl from scratch with no prior OSX dev experience

Preamble

This post documents the steps required to be able to clone and build SelfControl from absolute scratch – as in, you haven’t done any Mac development to speak of before at all *as of Feb 12, 2016*. I assume you’re running El Capitan.

The only prerequisite I’m going to assume you have is Xcode because:

  1. I really don’t want to try to fully uninstall it from my machine for the purposes of this walkthrough and
  2. if you haven’t got it fully installed, the steps to getting that going are trivial (if you try to run something and it says you need to install more components, install those components. If it says you need to accept the command-line license, page through the full license text in the Terminal and type ‘agree’ (or whatever it asks you to type – I don’t recall the exact text)

(Edited: the above is not quite true. While writing the below I realized that I’m also assuming you are using Homebrew. Look, I’ve tried Macports, and I’ve tried Homebrew, and although there are a lot of pluses about Macports, the developer zeitgeist seems to be around Homebrew. It’s just easier, ok?)

Other than that, I’m assuming you don’t have any extra libraries or tools installed. The reason I’ve listed an exact date above is that I think that part of why this didn’t work for me but did for others is because of a recent change to Ruby that broke CocoaPods. So others couldn’t help me because either they weren’t using that version of Ruby or they’d already gotten their pods successfully installed; the SelfControl build wasn’t failing for them.

Steps

We’ll start with the official build instructions & go from there. I may submit a pull request to update some of the steps – step 2 is definitely wrong once you install the CocoaPods.

  1. Clone the repository
  2. Open SelfControl.xcodeproj in Xcode
  3. Switch the Scheme selector (upper-left-hand corner) to SelfControl — not Distribution
  4. Build!
1. Clone the repository

This is completely correct. However, if your goal is to contribute to SelfControl, you probably want to fork the repo and then clone your fork instead (after all, why are you building from scratch if you aren’t planning to contribute?). But for this walkthrough, you can just clone the official repo if you want. Go to the directory where you want to do your build and run:

git clone https://github.com/SelfControlApp/selfcontrol.git

If you forked the repo, you’d just replace it with your copy, e.g.

git clone https://github.com/karamcnair/selfcontrol.git

That’s what I did. Here’s what you’ll have in the directory after the clone.

 

git_clone

2. Open SelfControl.xcodeproj in Xcode

Nope. I’m not sure if the instructions are just out of date or whether there are multiple paths to building this project with CocoaPods but if you follow the instructions directly and try to build the .xcodeproj file you’ll get missing dependencies because the CocoaPod libraries aren’t there.

See?

So we’re going to take a detour from the official instructions at this point because this is what threw me off last summer and where we’re going off-road in our quest to get this working.

Let’s install CocoaPods

The way we do that is to use ‘gem’, the Ruby package manager.  The process should be:

  1. gem install cocoapods 
  2. pod install ***

WARNING: CocoaPods is currently (remember, this post is as of 2016-02-12) in beta for a new version with a totally different Podfile format. At first I accepted their suggestion to upgrade to the beta version, ended up rewriting the file, and it STILL didn’t work. Do not accept the beta version at this point. That’s not what the problem is.

Note the error we’re getting here: Undefined method ‘to_ary’ ? ¯_(ツ)_/¯

And at the end of the output we see this:

Something that took me longer than I’m happy with to notice:

/usr/local/lib/ruby/gems/2.3.0/gems

[But why would I even care? It knows what it’s doing, right?]

So what is the problem? How come everyone else building SelfControl doesn’t have this error. Well, it turns out that we’re too up to date, my friend! Our systems are too pristine, too fresh! We have the newest Ruby, the version that ships with OSX, and it’s version 2.3.0 (in my case) and not version 2.2. And there’s a problem with 2.3.x, as far as CocoaPods is concerned.

github_writeup

(Look familiar?)

Let’s see if that’s the problem. What Ruby we have now?

Yup. That’s probably it. So how do we fix that? This helpful person has an answer for us:

 

pod_solution

To do that, we use these instructions and to install rvm (Ruby Version Manager) to tell OSX which version of Ruby we want to use. But before we do that, let’s get rid of our current (bad) install of CocoaPods.

I know I could have just had you start by using rvm to pick Ruby 2.2 so that you wouldn’t hit the failing pod install but there are two reasons I didn’t:

  1. there are probably a non-zero number of people out there who have the same problem with the bad CocoaPods & by following these steps, this includes how to recover from that setup and
  2. it’s useful to document thought processes of how to debug & fix borked systems

gem uninstall cocoapods

Then install RVM as per the instructions:

Huzzah! We have the right Ruby! Let’s see if we can fix the Pod Problem!

Let’s REALLY install CocoaPods

So

gem install cocoapods

followed by

pod install

Hey! look at that! It worked.

So here’s where something weird is, that I don’t want to take the time to fully replicate on a freshly installed machine: The first time I got this working, THIS was the output from the ‘pod install’ command:

Note that it says to use the .xcworkspace file, not the .xcproject file. And it’s telling the truth. My directory has a SelfControl.xcworkspace file in it after ‘pod install’ but it didn’t tell me to use it this time. But if I don’t, and use the SelfControl.xcproject file, here’s what Xcode complains about:

See? The pod dependencies aren’t built. So we use the Workspace instead:

Cool, cool. Now we’re actually getting somewhere. The pods are building and so is SelfControl! But. Here come the promised code signing issues.

 

codesign1

OK, this post is too long now. And the code signing issues are ones that are likely common to multiple projects that have nothing to do with CocoaPods, so I’ll be adding a follow-on post just to walk through the OSX Code Signing traps!