π Today's pick in Interpretability & Analysis of LMs: SyntaxShap: Syntax-aware Explainability Method for Text Generation by @kamara000, R. Sevastjanova and M. El-Assady
Most model-agnostic post-hoc interpretability methods used nowadays in NLP were originally ported from tabular/CV domains with next to no adjustments to the intrinsic properties of textual inputs.
In this work, authors propose SyntaxSHAP, an adaptation of the Shapely value approach in which coalitions used to compute marginal contributions to importance scores are constrained by the syntax of the explained sentence. The resulting tree-based coalitions do not satisfy the efficiency assumption of Shapley values but preserves the symmetry, nullity and additivity axioms.
SyntaxSHAP is compared to other model-agnostic approaches on small (GPT-2 117M) and large (Mistral 7B) LMs, showing it produces explanations that are more faithful to model predictions and more semantically meaningful than other common methods, while also being more efficient than the base SHAP method.
π Today's pick in Interpretability & Analysis of LMs: LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools by @qiaw99@tanikina@nfel et al.
Authors introduce LLMCheckup, a conversational interface connecting an LLM to several interpretability tools (feature attribution methods, similarity, counterfactual/rationale generation) allowing users to inquire about LLM predictions using natural language. The interface consolidates several interpretability methods in a unified chat interface, simplifying future investigations into natural language explanations.