We intend to promote new collaborations among potential participants of the Workshop on Open Web Search. Therefore, we maintain a public, non-binding pre-registration list below where we share ideas for components together with potential participants of the workshop who expressed their interest in working on component(s) so that potential participants can coordinate and avoid overlapping their work. If you have additional ideas for components you want to see in the list or want to express interest in working on a component, please write a short message or post a message in the forum below. We will incorporate all suggestions and clean responses to this forum as soon as they are included to keep the forum concise. We hope the pre-registration also provides collaboration ideas for the ECIR’24 Collab-a-thon.
Pre-Registration for Document Processors
Document Processor | Description | Expressed Interest to Work on This |
---|---|---|
Corpus Graph | Calculate similarity scores between documents in the corpus to | |
DeepCT Document Reduction | Calculate the importance of terms in their context to remove unimportant terms. | |
Document Expansion with DocT5Query | ||
Genre Classification | Detect the aim of a web page, e.g., is the goal of a page to educate visitors or to sell something? | Sebastian Schmidt et al. |
Keyprhase Extraction | Reduce the length of a document drastically by extracting only the top keyphrases | Already Implemented as Baseline |
Medical Document Classification | Binary classification to detect medical content. | Ferdinand Schlatt et al. |
SEO detection from ad blockers | Use XPath and Regex rules of ad blockers to extract some SEO features for ranking/re-ranking or removal of ads from a web page | |
Spam Classification | Is a given web page a spam page that aims to deliver unwanted payload? | Mohammed Al-Maamari et al. |
TextStat Document Analysis | Calculate readability scores etc with the textstat library. | |
… |
Pre-Registration for Query Processors
Query Processor | Description | Expressed Interest to Work on This |
---|---|---|
Archive Query Log Expansion | Given a query, get k highly similar queries with SERPs/linked documents from the Archive Query Log | Heinrich Reimer et al. |
Entity Linking (precision-oriented) | Given a query like source of the nile , link to the most prominent entities, e.g., the Nile as river. |
|
Entity Linking (recall-oriented) | Given a query like source of the nile , link to all possible interpretations, e.g., the board game and the river. |
Marcel Gohsen et al. |
Long Query Reduction | Given a long/verbose query, remove unimportant terms. | |
Spelling Correction | Correct the spelling of queries (maybe especially interesting for dense retrievers, e.g., T5 tokenized obama to 4 tokens, but Obama to a single token) |
|
RM3 Expansion on the AQL | Use the top-ranked documents form the Archive Query Log as relevance feedback to RM3/Bo1/etc. | Justin Löscher et al. |
Query Intent Prediction | Is a query informational, navigational, or transactional? | |
Query Performance Prediction | Given a list of queries, order them by their predicted effectiveness. | |
Query Segmentation | Detect query terms that belong together, e.g., segment the query hubble telescope achievements into ['hubble telescope', 'achievements'] |
Already Implemented as Baseline |
Query Variants with ChatGPT | ||
… |
Pre-Registration for Full-Rank/Re-Rank approaches
Approach | Description | Expressed Interest to Work on This |
---|---|---|
Axiomatic re-ranking | Heinrich Reimer et al. | |
Splade | ||
… |