I have been thinking about starting a technical blog for a while. so without further Ado. Here is my First Post.
I have been "Trying" to read lots of paper lately, I feel I am not doing justice to all the papers I have collected and just thinking about the printing cost of them to the company makes me feel really really guilty.
I went back and read few articles about how to Read Research Papers actually. One good article by Philip Fong
Link.
I will use his method of reading paper to read few of the papers suggested in really remarkable post at
http://glinden.blogspot.com/2008/05/random-walks-of-click-graph.htmlI have chosen paper by Hector Garcia-Molina et al. "Simrank ++: Query Rewriting through link analysis"
Problem Space
---------------
The paper talks about Query Rewriting problem i.e Given a New Query 'q' finding similar queries set 'Q' with known clicks from historical Query-Click data.
The problem space make sense for search marketing folks to find correlation between different query/keyword terms to maximize their CTR yield.
Can the problem be extended to search queries in general, I mean on finding relation like if query:"football" and extending search to "soccer" etc.
Importance of Problem space
----------------------------
A good solution here can increase CTR for search ads and can mean direct money :)
Paper Type
----------
This paper improves an existing algorithm Simrank.
BackGround
------------
Simrank :: Simrank is an approach of finding similliarity between different objects. It is a domain independent, graph structure based approach which uses the simmiliar/common friends principal to measure similarity's between objects. Excerpt from
original simrank paper."“two objects are similar if they are related to similar objects" eg. On the web two pages with common hyperlinks / Papers with shared references etc. [I will be analyzing this paper next]
Contributions
----------------
1) Application of simrank algorithm in sponsored search space
2) Improves on existing algorithm
3) Experiment evaluation of real query-click data from yahoo
Simrank improvements
--------------------------------
1) Evidence based: The author argues citing cases of complete bipartite graphs that for some cases simrank doesn't give optimal pair and use of a new "evidence factor" can improve score.
2) Weighted edge traversal: The author then devises a simrank improvement by considering the weight calculated in step 1 for the random surfer in simrank.
Results
--------
1) Coverage improvement : same as simrank
2) Precision Improvement : 5 % improvement ??
3) Rewriting depth Improvement : 5 % over simrank
4) Correctness : 30-40 % improvement over simrank. +2
Conclusions
-------------
Query rewrite effectiveness: The query rewrites are measured using two major
Methodlogies
1) Comparison with manual scores from experts : Not based on real click-data/ person bias.
2) Checking rewrite quality by removing some links.
Checking query rewrite looks like a difficult problem, I would like someone doing an a-b testing to really make a quality test-suite.