November 15, 2012 1 Comment
The paper uses clustering analysis, social networks, and classification in search of a deeper understanding of the ways language varies with gender. We find a range of gendered styles and interests among Twitter users; some of these styles mirror the aggregate language-gender statistics, while others contradict them. Next, we investigate individuals whose language defies gender expectations. We find that such individuals have social networks that include significantly more individuals from the other gender, and that in general, ego-network gender homophily is correlated with the use of same-gender language markers.
Also, David has wrapped up a nice release of the data, if you want to play.