Thursday, May 8, 2008

Random Surfer Model revisited

I have been reading a lot of algorithm based on the random surfer model. I rechecked the original simrank paper from Glen Jeh and Jeniffer Widom @stanford. It's a very simple and basic idea of using the graph structure to identify similarity's between graph nodes. It can be define the probability by which 2 nodes would meet if they keep choosing an edge to walk from current node at random.The "random surfer" model was popularized by google pagerank, this paper here uses it for a two node similliarity problem.

Surfing is kool by me, I only fear flying you Fooool !!

The problem with this approach I think are
1) Graph Structure: The graph need to be structured very strongly for simrank to give any meaningful result. The unpopular or nodes with low fanout would be categorized similar to many non-worthy nodes due to some common neighbor values.
2) I think the simrank would suffer in a diverse graph scenario where both dense and sparse sub-graphs are part of a big graph.

The good part about simrank are:
1) The approach is context-free and can be applied to different domains
2) simrank is iterative , can be parallelized easily and is guaranteed to converge (with cutoff thresholds).

Similar interesting problems:
1) Given a Graph G, and two nodes (s,d) find the neighbors of 's' who have maximum influence on 'd'.
2) Finding communities in graph.
3) sub-graph matching ?? This problem would be hard to model using random surfer. would be interesting to do that though.

Alright !! Need to go and convert the van for assault !!

No comments: