Balkinization  

Sunday, May 06, 2007

The Sexual Politics of Google

JB

(via digg.com)

When you type "she invented" into Google, it returns: Did you mean: "he invented"

The same, by the way, also applies for Google searches for "she created," "she succeeded," and "she led." Lest you think that this happens for all active verbs, Google does not make the same suggestion for "she followed" or "she failed."

Comments:

That's not true of Yahoo!:

"she invented"

"she created"

"she succeeded"

Except that when you type in "she led"

It returns: Did you mean: "shelled" above the requested search results.
 

Also:

She chose
She studied
She earned
She granted
She directed
She suggested
She built
She engineered
She calculated
She envisioned
She imagined
She snarled
She guarded

It looks like google suggests the "he" alternative whenver it has about twice as many responses.
 

Google probably has some algorithm that determines this (the use of a suggestion) based on the actual number of search results for particular terms. In other words, so what? There are more web pages out there with the words "he led" (150,000,000 for 'he led') than "she led" (69,400,000 for 'she led.') Is that Google's fault? I don't think this has anything to do with "the sexual politics of google," although you might argue that this reflects deeper trends in the society at large.
 

Google is not to blame. The suggestion of "he invented" as an alternative to "she invented" is not a suggestion by the people of Google. It is more so a reflection of the data that Google has collected through various methods.

Google has crawled many websites and in doing so has saved billions of different word combinations. It also receives millions of search queries a day saving each one to a database. When I type "she created" into the Google search box Google then accesses this database. If a similar word or word combination appears many times more than the one that I have typed (say for instance "she created" appears 1 time for every 100,000 times that "he created" appears) then Google will kindly offer it as a suggestion.

As Peter J. Spano (an intern for the Department of Homeland Security “cyber-research” division) puts it "Google would be doing its users a disservice if it didn't suggest a similar possibility that would return a result set about six times greater, especially with an edit distance of only one"
in fact he says "'Google' itself is completely blind to how its system will react to every input. Because the site is dynamically growing constantly, Google's search engine returns what is considered the most accurate result set based on a mathematical analysis of data previously indexed.” So for instance if if Google's database has 100,000 sets of the word “thier” and only 10 sets of the word “their”, and you then query the word “their” it will offer “thier” as a suggestion.



When we search the Internet using Google, blog, or create an internet photo album we are creating Google's data. It then uses this data to offer us suggestions like “he invented” as an alternative for the original query “she invented”. Since it would take a tremendous amount of resources for Google to deviously crawl only websites in the "woman haters club", we must conclude that it is not Google that is sexist.


Instead, it could be a reflection of the average internet user and our present views. This would make Google's suggestions an interesting social experiment tool. I said “could” because Google's data set most likely includes the mass of historical data retrieved from University libraries. Google has been scanning books from these well known libraries for quite some time and offers a search service for these books known as Google Books, or Google Scholar. Since it is a well known fact that women have had a late start at being documented in history, this could be why Google offers these specific suggestions.

Rich's comment actually could lend support for this explanation if Google and Yahoo use a similar method for offering their suggestions. If Google's searches include the data which they received from the scanning of the University research books, and Yahoo does not, then Yahoo would be giving us suggestions that are more in tune with current sentiment. Which would mean today's internet users are not that sexist.
 

Not only that, if you do an image search for "she created", it shows a lot of pictures of women with dresses, whereas "he created" shows sports scenes and engineering diagrams. Even worse, "he invented" displays an electric chair and a model car, while "she invented" shows Hepburn inventing a new look and Suzy Parker inventing modern fashion photography.

Furthermore, a local search for pastries has "Mara's", "Claire's" and "Tartine" (a sexist French slur) in the top 10, while only showing one male name, "Bob's Donuts." (Talk about type-casting!)
 

There is a sample spelling corrector here, written in 21 lines of Python and well explained: http://www.norvig.com/spell-correct.html

It is important to keep in mind that "s/he invented" is a very weird search, and Google is not querying some database on a hard disk every time you do a search but finding something relevant stored in memory, so you get results related to recent searches (including diggers comparing hit counts.) If you google "he invented" you get millions of hits about Gore, if you do "she invented -gore -internet -digg -google) the "he" suggestion goes away. The memory use question is relevant too because brivtennay spoors immediately yields a suggestion while Jack Balkun yields none, even though a program much simpler than Google's, like the one in the sample would resolve it with a standard dictionary, at least to Balkan, which points here. (I bet your readers have above average spelling anyway.)
 

This comment has been removed by the author.
 

This is a Quote from Google explaining their suggestions.

"Google uses spell checking software to check queries against the most common spelling of each word. When we calculate a greater number of relevant search results with an alternative spelling, you'll see"Did you mean: (more common spelling)" at the top of your search results page.

For example, a search for [ foot ball ] returns approximately 17million search results. At the top of the search results page, you'll see "Did you mean: football." Clicking on "football," a more common spelling of the game's name, yields a search with over a hundred million results."


Hits are accumulated through crawling of websites. As I mentioned Google has sort of "crawled" research books so they may be part of the group of words that gives us “suggestions”.

A cool tool that's on topic is Google Trends You can input a given search such as "he invented" and see how many searches where conducted with those words over time and region. You can also challenge two of more different searches say "he invented" vs "she invented" and see which one wins. Pretty neat feature, check it out!
 

Post a Comment

Older Posts
Newer Posts
Home