« Campaign contributions | Main | The estimate, not the prior »

May 17, 2007

Two-dimensional machine learning conference space

Janez Demsar has shown me a chart of machine learning conferences and their similarity. To compute the similarity between the conferences, he used the Jaccard distance, based on the proportion of authors that publish in both venues (set intersection of authors of both venues) versus those that publish in either (set union of authors of both venues). Afterwards, he employed multidimensional scaling to embed the points into 2D space. Lines' thickness indicates proximity. As for color, red are journals, blue are conferences. He acquired the data from the DBLP.

mds-graph.png

We can see the data mining (KDD/PKDD) more towards the bottom, machine learning in the middle (ICML/ECML/ML/JMLR) largely separating the two, and AI on the top. To the left there are special areas, such as neural networks (ICANN/NN) or medical applications (ARTMED/AIME). Do not, however, interpret these areas as marginal: it's just that the lens was centered on the highly connected conferences to the right of the diagram.

There are a few challenges with analyzing such proximity data statistically. First: the authorship data should be controlled by year: long-running conferences will appear detached from the base. Second: when there is not much data, there is uncertainty in the similarity. For this we first need a probabilistic stress function (an uncertain distance can be stretched or shrunk more than a certain one). Finally, the nonconvexity of MDS can be remedied with good priors. One might also debate the pros and cons of using similarity functions on the original features, or whether to generate the original features directly from the latent variables.

Also see Map of Science and Scientometrics.

Posted by Aleks at May 17, 2007 12:19 PM

RSS feed for this entry.

Trackback Pings

TrackBack URL for this entry:
http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi/956

Comments

Finally, the nonconvexity of MDS can be remedied with good priors

Expand, please!

Posted by: BrendanH at May 17, 2007 4:03 PM.

where are NIPS and STOC?

Posted by: bob at May 17, 2007 10:44 PM.

Emmm... I can't find NIPS...

Posted by: Yee Whye Teh at May 18, 2007 6:28 AM.

BrendanH: I'll post something in the coming days that shows the basic idea.

Bob, Yee: Janez says NIPS will be included in version 2.0. The fact that NIPS is always on the west coast makes it rather unfriendly to Europeans.

Posted by: Aleks [TypeKey Profile Page] at May 18, 2007 10:29 AM.

Is there an easy way to send Janez other conferences to add in 2.0? I'm missing the three big ones I attend almost every year: GECCO, CEC and AAMAS.

Posted by: Bill Tozier at May 23, 2007 3:56 PM.

You don't need to, I can just include these conferences in the list and run the whole thing again. I'm gonna do it, I promise, but I just won't have any time for the next two weeks or so.


I can also put the script into a readable form and publish it for anyone to patch it at will. :)

Posted by: Janez [TypeKey Profile Page] at May 25, 2007 6:27 PM.

Post a comment




Remember Me?

(you may use HTML tags for style)

Back to archived post list | Wayback snapshot | Live page