We intend to promote new collaborations among potential participants of the Workshop on Open Web Search. Therefore, we maintain a public, non-binding pre-registration list below where we share ideas for components together with potential participants of the workshop who expressed their interest in working on component(s) so that potential participants can coordinate and avoid overlapping their work. If you have additional ideas for components you want to see in the list or want to express interest in working on a component, please write a short message or post a message in the forum below. We will incorporate all suggestions and clean responses to this forum as soon as they are included to keep the forum concise. We hope the pre-registration also provides collaboration ideas for the ECIR’24 Collab-a-thon.

1 Like

Pre-Registration for Document Processors

Document Processor Description Expressed Interest to Work on This
Corpus Graph Calculate similarity scores between documents in the corpus to improve the recall. Sean MacAvaney et al.
DeepCT Document Reduction Calculate the importance of terms in their context to remove unimportant terms. Sean MacAvaney et al.
Document Expansion with DocT5Query{–} Sean MacAvaney et al.
Genre Classification Detect the aim of a web page, e.g., is the goal of a page to educate visitors or to sell something? Sebastian Schmidt et al.
Keyprhase Extraction Reduce the length of a document drastically by extracting only the top keyphrases Already Implemented as Baseline
Medical Document Classification Binary classification to detect medical content. Ferdinand Schlatt et al.
SEO detection from ad blockers Use XPath and Regex rules of ad blockers like ABP to extract some SEO features for ranking/re-ranking or removal of ads from a web page
Spam Classification Is a given web page a spam page that aims to deliver unwanted payload? Mohammed Al-Maamari et al.
TextStat Document Analysis Calculate readability scores etc with the textstat library.
Comparative Stance Detection Given a comparative query with two comparison objects and a sentence, identify whether the sentence takes a stance for either of the two objects or takes no stance. Alexander Bondarenko et al

Pre-Registration for Query Processors

Query Processor Description Expressed Interest to Work on This
Archive Query Log Expansion Given a query, get k highly similar queries with SERPs/linked documents from the Archive Query Log Heinrich Reimer et al.
Detection of Comparative Queries Given a query, classify if it is comparative (e.g., should I buy a PlayStation or an XBox?) or not Alexander Bondarenko et al.
Entity Linking (precision-oriented) Given a query like source of the nile, link to the most prominent entities, e.g., the Nile as river.
Entity Linking (recall-oriented) Given a query like source of the nile, link to all possible interpretations, e.g., the board game and the river. Marcel Gohsen et al.
Long Query Reduction Given a long/verbose query, remove unimportant terms.
Spelling Correction Correct the spelling of queries (maybe especially interesting for dense retrievers, e.g., T5 tokenized obama to 4 tokens, but Obama to a single token) Gustav Lahmann et al.
RM3 Expansion on the AQL Use the top-ranked documents form the Archive Query Log as relevance feedback to RM3/Bo1/etc. Justin Löscher et al.
Query Intent Prediction Is a query informational, navigational, or transactional? Daria Alexander et al.
Query Performance Prediction Given a list of queries, order them by their predicted effectiveness.
Query Segmentation Detect query terms that belong together, e.g., segment the query hubble telescope achievements into ['hubble telescope', 'achievements'] Already Implemented as Baseline
Query Variants with ChatGPT

Pre-Registration for Full-Rank/Re-Rank approaches respectively Query-Document processors

Approach Description Expressed Interest to Work on This
Axiomatic re-ranking Heinrich Reimer et al.
Query-dependent document summarization
Splade Thibault Formal et al.