Algorithmic Bias

Good Q&A with Cynthia Dwork on algorithmic bias.

Posted on August 14, 2015 at 8:20 AM • 26 Comments

Comments

Clive RobinsonAugust 14, 2015 8:54 AM

@ Bruce,

It's a NY Times "pay walled" article.

Some of us have ethical and moral objections to handing over anything that will bring profit to it and it's parent organization which shows every sign of being a criminal enterprise.

PaulAugust 14, 2015 8:59 AM

@Clive
Google the headline and click the link to the site - you can usually then read the article for free. This works with most newspaper paywalls

BuckAugust 14, 2015 9:55 AM

You can clear your cookies first, but you must accept the cookie sent from NYT or you'll be redirected to a paywall screen.

MeAugust 14, 2015 10:30 AM

@ianf

This is where CookieMonster comes in handy, either preventing the setting of said cookie, or removing it when the browser closes are great options.

I almost never allow persistent cookies any more.

Carl 'SAI' MitchellAugust 14, 2015 10:46 AM

@ianf, @me

Or Self-Destructing Cookies. It clears all cookies set by a site when you close the browser tab by default, though this can be overridden.

edgeAugust 14, 2015 10:59 AM

It seems that the simple 2-second answer is to just remove the variables that you want the algorithm to be blind to. The algorithms can't make judgments on inputs that it doesn't have access to.

(I suppose that there's some complication in the fact that the remaining variables may not be completely independent from the ones you want to hide (e.g. zip code may be correlated to wealth or race).

MrCAugust 14, 2015 11:12 AM

@Clive:

In addition to providing some security and privacy protection against the background radiation of the web, NoScript + Self Destructing Cookies incidentally bypass NYT's paywall.

IondreamAugust 14, 2015 11:18 AM

Just block javascript for the site, and you can read it for free. You can do it in chrome by clicking the site icon and going to "site settings"

JDAugust 14, 2015 11:24 AM

The number of times that the word "Fair[ness]" was used, makes me want to hurl. Not everything can or should be fair. As pointed out in the jobs example, there are plenty of reasons for why an ad may have been shown to men more frequently, some of them are bad, most aren't. We shouldn't assume that just because something /could/ have been caused by a bad choice that it WAS a bad choice and attack it. And frankly, if Google wants to discriminate in their ads for jobs or anything else, I really just don't care - they are a private business, and discriminating ads aren't something we should regulate. If you are going to regulate it, then you better start regulating radio ads too, that overtly discriminate in some way (a great example is on a classic hip-hop station here in Houston there is an attorney that advertises and admits that he only runs ads on stations that play the kind of music he likes, and wants to serve those "men that still get their fade trimmed in a barbershop" and other overt hints at targeting only black customers. - which I think is insulting to those he is advertising to, but I think he should be free to do so.)

If monster or dice was only showing high paying job postings to women, then that would be an issue - which is really my point - not everything needs to be "fair", not everyone gets a freaking trophy just for showing up.

OF COURSE data are influenced by the inputs, and if you are not careful, your biases will affect the inputs, but this is literally not new news. This has been known within the scientific method for hundreds of years - generally that's why we cite "controlled variables", and why sometimes experiments conclude completely wrong answers, because of bias that wasn't seen going in. Guess what, this is EXACTLY why you can get 2 different polls on the same issue to conclude conflicting answers - because of how you phrase the question (bias) and whom / how you target those you poll (bias).


IMHO, there is /some/ interesting things about bias in algorithms, but it's not some panic level as the interviewer seems to portray.

Peter PearsonAugust 14, 2015 11:31 AM

The only concrete example Ms. Dwork gives of an unbiased algorithm failing to be unbiased is the case of the unbiased algorithm being trained with biased data. I'd suggest that to most people, "algorithm" refers to the trained algorithm; but in any event, you can avoid this confusion by insisting that both the algorithm and any training data be unbiased. Ms. Dwork's rush toward a solution based on government experts is unseemly.

rgaffAugust 14, 2015 11:55 AM

You know, guys, even just visiting the site is a kind of "support" in that you're supporting them with your eyeballs. So if you have an ethical and moral objection, you might consider staying away, regardless of whether the paywall can be defeated or not.

A MoraAugust 14, 2015 2:00 PM

Edge, while what you suggest (removing variables) may work in some cases, in more complicated systems that can backfire. The author gave one example of this in her article, about admissions.

The trick is that fairness isn't just about blind or indiscriminate application of arbitrary rules. First you have to figure out what your objectives are, and what in your data set reflects that. In some cases accurately assessing someones performance requires context. For example, if you chart work place productivity, both drop like a rock around religious and national holidays.

As an example, I am granting a monthly bonus to the salesman for my software company. The US Divisions lead sales man was up 2% over last month and the Chinese is down 2%. So a blind system might just make a decision based on that, and miss the fact that the Chinese team was fighting a major headwind because it was Chinese New Year and the whole country shut down. By making the decision culturally aware you see that there was a huge flurry of activity in the previous month, followed by a drop as the whole country shut down. In this case, a salesman that was only 2% month on month would be off that chart if you graphed year on year. The same would have happened in the US, just a few months earlier.

So like so many things, you need to carefully review results to make sure they are accurate, and watch out for the law of unintended consequences.

Systems that are totally blind tend to hammer on some of the least deserving people. As an example college financial aid in the US is based on an "Expected Family Contribution" and applying for aid requires submitting not only your own but your "Parents" financial information. In years past this would deny or delay access to aid in the case of students who were estranged and supporting themselves. So "need" based aid was not delivered those most in need. Their system was designed to catch cheaters, and preformed well on the average, but failed to be "Fair" as a result.

Oh the joys of math in complex systems. In some ways I'd rather be assigned to crack AES256 messages by pen and paper then to be the one to design a "Fair" admissions algorithm. One problem is known and very hard. One is unknown and possibly impossible.

tyrAugust 14, 2015 4:23 PM


You won't find much blind faith here in the
infallibility of computer outputs. People are
burned far too often to be gullible about it.

That does not extend past the boundaries of
the comp inner circle into the outside world.
Pareto tried to put sociology on a scientific
basis with limited success because of his way
of describing what he saw in human behaviors.
One thing he did notice was that a society
able to allow circulation (upward mmobility
is the current buzzword) of the talented had
a much better outcome in the long run. Using
bias to reinforce the status quo is a really
bad idea no matter where it comes from. If a
programmer carries this bias in then you wind
up wasting the most valuable of resources a
society possesses without knowing why society
is failing around you. Notice that this is not
some sappy idea of levelling the playing field
so that retards can feel good about their lack
of accomplishments. Its about making sure there
is a path for the excellent to rise to useful
positions when they are capable of it. The
shameful example of women in the sciences is
clearly a result of bias being applied against
people who were head and shoulders above their
esteemed colleagues who used them to reap the
fame of the womens efforts. The results have
been to set human civilization back and hold
it back to pander to inferior male egos.

I'm glad somebody has had the gall to point
out that assumptions of what we are doing with
comp algorythmns need to be checked now and
then to make sure they are doing what is intended
instead of just reinforcing retardation.

@Clive
That men who stare comment was about the BBC4
documentary not the hollywood fluff film.

Gerard van VoorenAugust 14, 2015 4:46 PM

@ tyr,

Your line width is getting smaller and smaller. There is no need to break the lines.

(offtopic note)

LessThanObviousAugust 14, 2015 5:11 PM

I continue see algorithmic decision making as a very scary trend. There is so much reliance on what the computer decides. If we have to tweak the algorithm to protect minorities and women, then aren't there likely a large number of other corner cases where the algorithm fails the fairness test on an individual basis? When it comes to things that affect people's lives like lending, hiring and school admission I really recoil at the idea of such arbitrary selectors being used. Machine learning algorithms it seems often use criteria that are "good enough" correlation.

Ray DillingerAugust 14, 2015 5:55 PM

Way back in the wayback, in school, I created an expert system that was supposed to simulate rational executive decision-making. One that would decide everything based on merit and legitimate business objectives. Completely fair, right? And so it was in the beginning.

But when its available information included the fact that women in general faced pay discrimination, it immediately lowered the salaries of all its female "employees" by the same percentage - on the grounds that they would accept less pay because they weren't getting a better offer elsewhere. Perfectly rational. Perfectly unbiased. Perfectly logical response to human bigotry. And a big honking signal that mere rationality won't solve the problem.

This sort of thing continues the trend.... Obviously ethnic names elicit different clickbait ads based on the stereotypes about their ethnicity, not because the algorithm is bigoted, but because people are. Those are the clickbait ads that generate the biggest revenue streams.

As long as people are bigoted, rational responses to (or exploitation of) people will remain bigoted.

Mike BarnoAugust 14, 2015 6:31 PM

@ edge


(I suppose that there's some complication in the fact that the remaining variables may not be completely independent from the ones you want to hide (e.g. zip code may be correlated to wealth or race).

In the USA, ZIP Code is strongly correlated with both wealth and race. The whole PRIZM segmentation system (from NPDC, and then from Claritas, before Nielsen wiped the Claritas name) was based on this originally -- not race as a selector, but income as one of a few selectors for all the demographics (including wealth and race) that went with Census household data, and state and local data. It got more granular when they went to the ZIP+4 level. Later, PRIZM became based on direct household-level data from data brokers, with ZIP-based coding used only as a backup. But selecting ZIPs with highest and lowest median household wealth shows that highs are far far above lows. Selecting ZIPs by proportion of race shows the whitest and blackest and most-Asian ZIPs have vastly different proportions.

So if you run a pilot program in Arlington Heights, Virginia, and expect its results to be the same in Ferguson, Missouri, you might let incorrect assumptions cause a bad decision.

More broadly:
If you overly rely on "hiding" variables either by refusing to consider them where attention might be needed, or by using a simple adjustment that assumes "all else being equal" relying on not-always-valid assumptions,.. then your model will draw conclusions that draw from your presumptions. If you decide married people score 50 favorable points and unmarried people living together score 10, your algorithm will skew toward more-religious people, toward older people, toward more-traditional communities, differently than if all people sharing households got 30 points regardless of marital status.

I see concerns of biased algorithms all the time in matters that have become political footballs. Climate-change study, of course, before it ever gets to the stage of assigning human causes. Anything to do with Obamacare, especially Medicaid expansion. Economic impact studies when the developer might be bribing the mayor and council or begging for tax breaks. Government agency studies purporting to show their program's effectiveness, with a funding vote coming up soon.

As others have noted, it gets a bunch more complex when you have a "learning" system. Depending on what methodology and data are used for training, a system can learn that previously existing biases are the norm, the baseline. If that gets treated as "these are the profitable customers we want to keep supporting", the system becomes an excuse justifying continued discrimination. And once a system is built and running, even its managers might not realize what assumptions got trained into it. The ordinary citizen, whose loan wasn't approved or whose taxes were audited or whose communications were surveilled, would never know.

deLaBoetieAugust 15, 2015 7:30 AM

This focus on bias is - I think - a small part of something that worries me far more, which is false positives. The data mining approach, even with careful "selectors", is demonstrably disastrous from the point of view of both the probability and the scale of those false positives. This has been confirmed by the agencies themselves.

And the issue with that, leaving aside the wasted time of the agencies, is that I believe those false positives from indiscriminate selectors (whether biased or not), will effectively result in automated targeting. It will have the effect of being targeted, because before you know it, as an innocent false positive, you will be attacked with algorithmic scripts, either by auto-infecting your systems, or putting you on no-fly lists etc. with no human involvement at all, and no effective remedy or compensation. Because it's technically possible, and easier to do the scatter-gun approach, and because they do not bear the costs, experience has shown that this is exactly what will happen.

From the agencies point of view, as well as the haystack problem of false positives, the bias issue will inevitably result in false negatives too - my guess is that many "baddies" would slip through in exactly the same way as they would with biased border control inspections.

winterAugust 15, 2015 8:46 AM

Isn't this just another example of machines amplifying human behavior?

In this case, they will amplify all our biases.

Bruce SchneierAugust 15, 2015 2:48 PM

"It's a NY Times "pay walled" article."

Not on my computer or network. Odd.

For those without accessAugust 15, 2015 4:01 PM

Algorithms and Bias: Q. and A. With Cynthia Dwork

Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.
Continue reading the main story

Q: Some people have argued that algorithms eliminate discrimination because they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination.

Q: What’s an example?

A: Suppose we have a minority group in which bright students are steered toward studying math, and suppose that in the majority group bright students are steered instead toward finance. An easy way to find good students is to look for students studying finance, and if the minority is small, this simple classification scheme could find most of the bright students.

But not only is it unfair to the bright students in the minority group, it is also low utility. Now, for the purposes of finding bright students, cultural awareness tells us that “minority+math” is similar to “majority+finance.” A classification algorithm that has this sort of cultural awareness is both more fair and more useful.

Fairness means that similar people are treated similarly. A true understanding of who should be considered similar for a particular classification task requires knowledge of sensitive attributes, and removing those attributes from consideration can introduce unfairness and harm utility.

Q: How could the university create a fairer algorithm? Would it mean more human involvement in the work that software does, collecting more personal data from students or taking a different approach when the algorithm is being created?

A: It would require serious thought about who should be treated similarly to whom. I don’t know of any magic bullets, and it is a fascinating question whether it is possible to use techniques from machine learning to help figure this out. There is some preliminary work on this problem, but this direction of research is still in its infancy.

Q: Another recent example of the problem came from Carnegie Mellon University, where researchers found that Google’s advertising system showed an ad for a career coaching service for “$200k+” executive jobs to men much more often than to women. What did that study tell us about these issues?

A: The paper is very thought-provoking. The examples described in the paper raise questions about how things are done in practice. I am currently collaborating with the authors and others to consider the differing legal implications of several ways in which an advertising system could give rise to these behaviors.

Q: What are some of the ways it could have happened? It seems that the advertiser could have targeted men, or the algorithm determined that men were more likely to click on the ad.

A: Here is a different plausible explanation: It may be that there is more competition to advertise to women, and the ad was being outbid when the web surfer was female.

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research.

Q: Whose responsibility is it to ensure that algorithms or software are not discriminatory?

A: This is better answered by an ethicist. I’m interested in how theoretical computer science and other disciplines can contribute to an understanding of what might be viable options.

The goal of my work is to put fairness on a firm mathematical foundation, but even I have just begun to scratch the surface. This entails finding a mathematically rigorous definition of fairness and developing computational methods — algorithms — that guarantee fairness.

Q: In your paper on fairness, you wrote that ideally a regulatory body or civil rights organization would impose rules governing these issues. The tech world is notoriously resistant to regulation, but do you believe it might be necessary to ensure fairness in algorithms?

A: Yes, just as regulation currently plays a role in certain contexts, such as advertising jobs and extending credit.

Q: Should computer science education include lessons on how to be aware of these issues and the various approaches to addressing them?

A: Absolutely! First, students should learn that design choices in algorithms embody value judgments and therefore bias the way systems operate. They should also learn that these things are subtle: For example, designing an algorithm for targeted advertising that is gender-neutral is more complicated than simply ensuring that gender is ignored. They need to understand that classification rules obtained by machine learning are not immune from bias, especially when historical data incorporates bias. Techniques for addressing these kinds of issues should be quickly incorporated into curricula as they are developed.

futureskynetdeveloperAugust 16, 2015 9:00 PM

I do not think biases are a particularly bad thing. The developer is only creating the algorithm to the clients specification, in the end the developer usually is just doing it for the monetary compensation, not to please the inner workings of society.

The client determines to hire the developer to create the biased algorithm. Generally to meet a specific business need such as @JD pointed out. In that case the Attorney is probably sitting on a gold mine by focusing on just black males who like rap music from that area, data probably shows there is a lot of demand in that area for that service from that group of people.

I like to judge algorithms for their effectiveness, how well they met the scope of the project. Even if the scope for the project was deciding "which babies should be thrown out the garbage shoot at a hospital and which babies should be delivered to the nursery" there still would be a final grade on how well that algorithm met its objectives.

spectacularAugust 16, 2015 9:50 PM

@ futureskynetdev, those without access

>>Algorithms have become one of the most powerful arbiters in our lives.

When algorithm is used in decisions at scale, it is shaping, imho. As all things are relative, algorithms can predict, as well as shape. Tweaking is imperative as it require a feedback process. Thus everything has a cyclical tendency.

NateAugust 17, 2015 8:25 AM

Edge wrote:

> It seems that the simple 2-second answer is to just remove the variables that you want the
> algorithm to be blind to. The algorithms can't make judgments on inputs that it doesn't have
> access to.

As you (and others) point out, it's very difficult to control for undesired factors in real world data.

Something that hasn't been mentioned by others is that, with modern information technology, it can be quite easy to construct a 'stealth selector' that deliberately creates a bias in a selection, but uses santized criteria. So, for example, if a credit bureau wanted to build a 'jim crow' credit rating while retaining deniability about racial bias, they could probably do so with little trouble using the data they collect.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.