Weak links and some new ideas about regression model visualization and net leverage

| 1 Comment

Eric Gilbert and Karrie Karahalios have a paper on tie strength, distinguishing between strong and weak ties in social networks, published at the Computer and Human Interaction conference. Eric is one of the recipients of 2009 Google fellowships. There are some neat ideas there:

Presenting the distributions of predictors
predictors.png

Pretty, informative and compact.

Distribution of outcomes
outcomes.png

Not sure the median is particularly interesting.

Graphical model summary
model summary.png

They describe it as:

The predictive power of the seven tie strength dimensions. [...] A dimension's weight is computed by summing the absolute values of the coefficients belonging to it. The diagram also lists the top three predictive variables for each dimension. [...]

While the aggregation of coefficients in the same category is nice, there are some problems summing betas together. Rarely occurring values with huge betas are often an artifact of overfitting and not of informativity, and betas for continuous predictors are strongly affected by scale. Consider these betas:

Days since last communication -0.76
Days since first communication 0.755
Intimacy × Structural 0.4
Wall words exchanged 0.299

So, the top two predictors are probably correlated, and opposite to one another - resulting in runaway absolute betas.

I've suggested the concept of net leverage a few years ago in a natural language binary outcome setting as an attempt to improve the presentation of feature importance in regression models, but this topic is worth revisiting.

1 Comment

I really like this paper, but I'm a bit surprised that they considered "How helpful if looking for a job?" as a dependent variable intended to proxy for tie strength, since the literature is pretty clear that looking for a job is best aided by people not of high tie strength. This is evidenced some by how low that correlated with the other proxies. I haven't finished the paper yet, so there might be some more discussion. I'd be interested in seeing if their results improve by leaving the job question out.

Leave a comment

Subscribe to Entry

Email:

Back to archived post list | Wayback snapshot | Live page