Accepted Papers
The proceedings of NAACL 2025 are now available on the ACL Anthology.
Main Conference - Long Papers
- 
    Effective Skill Unlearning through Intervention and Abstention 
 Yongce Li, Chung-En Sun, Tsui-Wei Weng
- 
    Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts 
 Weisi Liu, Guangzeng Han, Xiaolei Huang
- 
    CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts 
 Zhenpeng Su, Xing W, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan Ma, Hui Chen, Songlin Hu, Guiguang Ding
- 
    Can LLMs Convert Graphs to Text-Attributed Graphs? 
 Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye
- 
    Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages 
 Jannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller
- 
    ParaICL: Towards Parallel In-Context Learning 
 Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing
- 
    Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation 
 Dongryeol Lee, Yerin Hwang, Yongil Kim, Joonsuk Park, Kyomin Jung
- 
    The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding 
 Mo Yu, Lemao Liu, Junjie Wu, Tsz Ting Chung, Shunchi Zhang, Jiangnan Li, Dit-Yan Yeung, Jie Zhou
- 
    Have LLMs Reopened the Pandora’s Box of AI-Generated Fake News? 
 Xinyu Wang, Wenbo Zhang, Sai Koneru, Hangzhi Guo, Bonam Mingole, S. Shyam Sundar, Sarah Rajtmajer, Amulya Yadav
- 
    Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech 
 Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, Soroosh Mariooryad, Matt Shannon, Julian Salazar, David Teh-Hwa Kao
- 
    Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models 
 Varun Gumma, Pranjal A Chitale, Kalika Bali
- 
    Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs 
 Shuyang Yu, Runxue Bao, Parminder Bhatia, Taha Kass-Hout, Jiayu Zhou, Cao Xiao
- 
    Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling 
 Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li
- 
    Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use 
 Mohit Chandra, Siddharth Sriraman, Gaurav Verma, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, Munmun De Choudhury
- 
    Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages 
 Hoang H Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary
- 
    How to Align Multiple Signed Language Corpora for Better Sign-to-Sign Translations? 
 Mert Inan, Yang Zhong, Vidya Ganesh, Malihe Alikhani
- 
    Sneaking Syntax into Transformer Language Models with Tree Regularization 
 Ananjan Nandi, Christopher D Manning, Shikhar Murty
- 
    ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding 
 Kimihiro Hasegawa, Wiradee Imrattanatrai, Zhi-Qi Cheng, Masaki Asada, Susan Holm, Yuran Wang, Ken Fukuda, Teruko Mitamura
- 
    Communication Makes Perfect: Persuasion Dataset Construction via Multi-LLM Communication 
 Weicheng Ma, Hefan Zhang, Ivory Yang, Shiyu Ji, Joice Chen, Farnoosh Hashemi, Shubham Mohole, Ethan Gearey, Michael Macy, Saeed Hassanpour, Soroush Vosoughi
- 
    Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing 
 Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang
- 
    ReGLA: Refining Gated Linear Attention 
 Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, Philippe Langlais
- 
    A Distributional Perspective on Word Learning in Neural Language Models 
 Filippo Ficarra, Ryan Cotterell, Alex Warstadt
- 
    Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions 
 Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze
- 
    Superlatives in Context: Modeling the Implicit Semantics of Superlatives 
 Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty
- 
    Can Unconfident LLM Annotations Be Used for Confident Conclusions? 
 Kristina Gligoric, Tijana Zrnic, Cinoo Lee, Emmanuel Candes, Dan Jurafsky
- 
    Simulating Classroom Education with LLM-Empowered Agents 
 Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin LIU, Zhiyuan Liu, Lei Hou, Juanzi Li
- 
    Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models 
 Soham Poddar, Paramita Koley, Janardan Misra, Niloy Ganguly, Saptarshi Ghosh
- 
    CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds 
 Lei Wang, Jianxun Lian, Yi Huang, Yanqi Dai, Haoxuan Li, Xu Chen, Xing Xie, Ji-Rong Wen
- 
    ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations 
 Yichuan Li, Xinyang Zhang, Chenwei Zhang, Mao Li, Tianyi Liu, Pei Chen, Yifan Gao, Kyumin Lee, Kaize Ding, Zhengyang Wang, Zhihan Zhang, Jingbo Shang, Xian Li, Trishul Chilimbi
- 
    In-Context Learning with Long-Context Models: An In-Depth Exploration 
 Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, Graham Neubig
- 
    Decoding Hate: Exploring Language Models’ Reactions to Hate Speech 
 Paloma Piot, Javier Parapar
- 
    ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning 
 Vy Vo, Lizhen Qu, Tao Feng, YUNCHENG HUA, Xiaoxi Kang, Songhai Fan, Tim Dwyer, Lay-Ki Soon, Gholamreza Haffari
- 
    Arabic Dataset for LLM Safeguard Evaluation 
 Yasser Ashraf, Yuxia Wang, Bin Gu, Preslav Nakov, Timothy Baldwin
- 
    Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction 
 William P Hogan, Jingbo Shang
- 
    Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning 
 Sungjin Park, Xiao Liu, Yeyun Gong, Edward Choi
- 
    Is a Peeled Apple Still Red? Evaluating LLMs’ Ability for Conceptual Combination with Property Type 
 Seokwon Song, Taehyun Lee, Jaewoo Ahn, Jae Hyuk Sung, Gunhee Kim
- 
    ImgTrojan: Jailbreaking Vision-Language Models with ONE Image 
 Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong
- 
    Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors 
 Georgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan
- 
    SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection 
 Sindhu Padakandla, Sadbhavana Babar, Rathod Darshan D, Manohar Kaul
- 
    Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies 
 Zirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, fu yujie, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen
- 
    VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation 
 Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha
- 
    MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration 
 David Wan, Justin Chen, Elias Stengel-Eskin, Mohit Bansal
- 
    Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations 
 Yinghan Zhou, Juan Wen, Wanli Peng, Xue yiming, ZiWei Zhang, Wu Zhengxian
- 
    AID: Adaptive Integration of Detectors for Safe AI with Language Models 
 Xinran Wang, Enmao Diao, Qi Le, Jie Ding, Ali Anwar
- 
    FactTrack: Time-Aware World State Tracking in Story Outlines 
 Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Dan Klein
- 
    DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models 
 Suyoung Bae, YunSeok Choi, Jee-Hyong Lee
- 
    DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning 
 Xinyu Tang, Xiaolei Wang, Xin Zhao, Ji-Rong Wen
- 
    DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images 
 Sami Baral, Li Lucy, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo
- 
    High-Dimension Human Value Representation in Large Language Models 
 Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung
- 
    Beemo: Benchmark of Expert-edited Machine-generated Outputs 
 Ekaterina Artemova, Jason S Lucas, Saranya Venkatraman, Jooyoung Lee, Sergei Tilga, Adaku Uchendu, Vladislav Mikhailov
- 
    Self-Generated Critiques Boost Reward Modeling for Language Models 
 Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou
- 
    Multi$^3$Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models 
 Minh Duc Bui, Katharina von der Wense, Anne Lauscher
- 
    MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation 
 Langlin Huang, Mengyu Bu, Yang Feng
- 
    FiNE: Filtering and Improving Noisy Data Elaborately with Large Language Models 
 Junliang He, Ziyue Fan, Shaohui Kuang, Li Xiaoqing, Kai Song, Yaqian Zhou, Xipeng Qiu
- 
    Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors 
 Kaushal Kumar Maurya, KV Aditya Srivatsa, Kseniia Petukhova, Ekaterina Kochmar
- 
    An LLM-Based Approach for Insight Generation in Data Analysis 
 Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood
- 
    A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding 
 Abdulfattah Safa, Gözde Gül Şahin
- 
    The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism 
 Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin
- 
    Language Models Predict Empathy Gaps Between Social In-groups and Out-groups 
 Yu Hou, Hal Daumé III, Rachel Rudinger
- 
    What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length 
 Lindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao
- 
    AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging 
 Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing
- 
    Language Models Can Infer Action Semantics for Symbolic Planners from Environment Feedback 
 Wang Bill Zhu, Ishika Singh, Robin Jia, Jesse Thomason
- 
    WHoW: A Cross-domain Approach for Analysing Conversation Moderation 
 Ming-Bin Chen, Lea Frermann, Jey Han Lau
- 
    MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet Extraction 
 Shaopeng Tang, Lin Li, Xiaohui Tao, Leqi Zhong, Qing Xie
- 
    What the #?*!: Disentangling Hate Across Target Identities 
 Yiping Jin, Leo Wanner, Aneesh Moideen Koya
- 
    CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs 
 Amey Hengle, Aswini Kumar Padhi, Anil Bandhakavi, Tanmoy Chakraborty
- 
    CoME: An Unlearning-based Approach to Conflict-free Model Editing 
 Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
- 
    Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment 
 Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe
- 
    Graph Neural Network Enhanced Retrieval for Question Answering of Large Language Models 
 Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang
- 
    Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning 
 Faeze Ghorbanpour, Viktor Hangya, Alexander Fraser
- 
    Towards Quantifying Commonsense Reasoning with Mechanistic Insights 
 Abhinav Joshi, Areeb Ahmad, Divyaksh Shukla, Ashutosh Modi
- 
    Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs? 
 So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal
- 
    K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor 
 Jeonghun Cho, Gary Lee
- 
    LLMs as Meta-Reviewers’ Assistants: A Case Study 
 Eftekhar Hossain, Sanjeev Kumar Sinha, Naman Bansal, R. Alexander Knipper, Souvika Sarkar, John Salvador, Yash mahajan, Sri Ram Pavan Kumar Guttikonda, Mousumi Akter, Md. Mahadi Hassan, Matthew Freestone, Matthew C. Williams Jr., Dongji Feng, Santu Karmaker
- 
    Understanding Figurative Meaning through Explainable Visual Entailment 
 Arkadiy Saakyan, Shreyas Kulkarni, Tuhin Chakrabarty, Smaranda Muresan
- 
    FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data 
 Deren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu Wang, Emily Ching, Alex Deng
- 
    IHEval: Evaluating Language Models on Following the Instruction Hierarchy 
 Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, Yichuan Li, Qingyu Yin, Bing Yin, Meng Jiang
- 
    Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization 
 Jin Zhao, Jingxuan Tu, Bingyang Ye, Xinrui Hu, Nianwen Xue, James Pustejovsky
- 
    Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping 
 Ryan Li, Yanzhe Zhang, Diyi Yang
- 
    StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples 
 Ajay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch
- 
    KS-Lottery: Finding Certified Lottery Tickets for Multilingual Transfer in Large Language Models 
 Fei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei Li
- 
    Open-World Evaluation for Retrieving Diverse Perspectives 
 Hung-Ting Chen, Eunsol Choi
- 
    Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness 
 Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, Jordan Lee Boyd-Graber
- 
    Benchmarking Language Model Creativity: A Case Study on Code Generation 
 Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Sanjeev Khudanpur, Meng Jiang, Daniel Khashabi
- 
    Exploring Safety-Utility Trade-Offs in Personalized Language Models 
 Anvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi
- 
    EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in Dialogues 
 Qiming Feng, Qiujie Xie, Xiaolong Wang, Qingqiu Li, Yuejie Zhang, Rui Feng, Tao Zhang, Shang Gao
- 
    ToW: Thoughts of Words Improve Reasoning in Large Language Models 
 Zhikun Xu, Ming Shen, Jacob Dineen, Zhaonan Li, Xiao Ye, Shijie Lu, Aswin RRV, Chitta Baral, Ben Zhou
- 
    Improving Retrospective Language Agents via Joint Policy Gradient Optimization 
 Xueyang Feng, bo lan, Quanyu Dai, Lei Wang, Jiakai Tang, Xu Chen, Zhenhua Dong, Ji-Rong Wen
- 
    Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements 
 Antonia Karamolegkou, Sandrine Schiller Hansen, Ariadni Christopoulou, Filippos Stamatiou, Anne Lauscher, Anders Søgaard
- 
    Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions 
 Hongru WANG, Boyang XUE, Baohang Zhou, Tianhua Zhang, Cunxiang Wang, Huimin WANG, Guanhua Chen, Kam-Fai Wong
- 
    MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning 
 Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen
- 
    Improving Data Annotation for Low-Resource Relation Extraction with Logical Rule-Augmented Collaborative Language Models 
 Xiyang Liu, Chunming Hu, Richong Zhang, Junfan Chen, Baowen Xu
- 
    Matina: A Large-Scale 73B Token Persian Text Corpus 
 Sara Bourbour Hosseinbeigi, Heshaam Faili, Fatemeh Taherinezhad, Hamed Baghbani, Fatemeh Nadi, Mostafa Amiri
- 
    CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking 
 Delvin Ce Zhang, Dongwon Lee
- 
    Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench 
 Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, Meng Jiang
- 
    Rationale-Guided Retrieval Augmented Generation for Medical Question Answering 
 Jiwoong Sohn, Yein Park, Chanwoong Yoon, Sihyeon Park, Hyeon Hwang, Mujeen Sung, Hyunjae Kim, Jaewoo Kang
- 
    Stronger Models are Not Always Stronger Teachers for Instruction Tuning 
 Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran
- 
    MAD Speech: Measures of Acoustic Diversity of Speech 
 Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov
- 
    EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models 
 Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang
- 
    DIRAS: Efficient LLM Annotation of Document Relevance for Retrieval Augmented Generation 
 Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold
- 
    SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture 
 Jiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Yuanfang Zhang, Weibo Zheng, Donghong Han
- 
    SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA 
 Venktesh V, Mandeep Rathee, Avishek Anand
- 
    Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training 
 Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini, Zoran Tiganj
- 
    A Survey of QUD Models for Discourse Processing 
 Yingxue Fu
- 
    Not All Models Are Created Equal: Differences in which Surprisal Predicts Reading Time by Speaker First Language 
 Shannon Clark, Daniela Teodorescu, Lin Chen, Gaisha Oralova, Charles Perfetti, Alona Fyshe, Carrie Demmans Epp
- 
    Adapting Sentence-level Automatic Metrics for Document-level Simplification Evaluation 
 Mounica Maddela, Fernando Alva-Manchego
- 
    Towards Automatic Evaluation for Image Transcreation 
 Simran Khanuja, Vivek Iyer, Xiaoyu He, Graham Neubig
- 
    Substance Beats Style: Why Beginning Students Fail to Code with LLMs 
 Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson
- 
    Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities 
 Kourosh T Baghaei, Dieter Pfoser, Antonios Anastasopoulos
- 
    Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering 
 Yeonjun In, Sungchul Kim, Ryan A. Rossi, Mehrab Tanjim, Tong Yu, Ritwik Sinha, Chanyoung Park
- 
    Efficient Prompting for Continual Adaptation to Missing Modalities 
 Zirun Guo, Shulei Wang, Wang Lin, Weicai Yan, Yangyang Wu, Tao Jin
- 
    How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments 
 Yusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
- 
    TurkingBench: A Challenge Benchmark for Web Agents 
 Kevin Xu, Yeganeh Kordi, Tanay Nayak, Adi Asija, Yizhong Wang, Kate Sanders, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi
- 
    The State and Fate of Summarization Datasets: A Survey 
 Noam Dahan, Gabriel Stanovsky
- 
    Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models 
 Chaoqun Liu, Wenxuan Zhang, Yiran Zhao, Anh Tuan Luu, Lidong Bing
- 
    Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization 
 Min Xiao, Junnan Zhu, Feifei Zhai, Chengqing Zong, Yu Zhou
- 
    S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency 
 Yuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, XiTai Jin, Chen Tianying Tiana, Jing Li, Xiaohua Xu
- 
    No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks 
 Ryan A. Cook, John P. Lalor, Ahmed Abbasi
- 
    Anticipating Future with Large Language Model for Simultaneous Machine Translation 
 Siqi Ouyang, Oleksii Hrinchuk, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Lei Li, Boris Ginsburg
- 
    LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? 
 Jan Cegin, Jakub Simko, Peter Brusilovsky
- 
    SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation 
 Yixian Shen, Qi Bi, JIA-HONG HUANG, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania
- 
    Language Models Largely Exhibit Human-like Constituent Ordering Preferences 
 Ada Tur, Gaurav Kamath, Siva Reddy
- 
    Fine-Tuned LLMs are “Time Capsules” for Tracking Societal Bias Through Books 
 Sangmitra Madhusudan, Robert Morabito, Skye Reid, Nikta Gohari Sadr, Ali Emami
- 
    Enhancing Language Model Hypernetworks with Restart: A Study on Optimization 
 Yihan Zhang, Jie Fu, Rongrong Ji, Jie Chen
- 
    Private Synthetic Text Generation with Diffusion Models 
 Sebastian Ochs, Ivan Habernal
- 
    HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem 
 Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh
- 
    PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona 
 Jihyun Lee, Yejin Jeon, Seungyeon Seo, Gary Lee
- 
    When2Call: When (not) to Call Tools 
 Hayley Ross, Ameya Sunil Mahabaleshwarkar, Yoshi Suhara
- 
    LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices 
 Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee
- 
    MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections 
 Nishant Balepur, Alexa Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun, Jordan Lee Boyd-Graber, Puneet Mathur
- 
    Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework 
 Reza Averly, Xia Ning
- 
    Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction 
 Dongming Sheng, Kexin Han, Hao Li, Yan Zhang, Yucheng Huang, Jun Lang, Wenqiang Liu
- 
    SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent 
 Keyeun Lee, Seo Hyeong Kim, Seolhee Lee, Jinsu Eun, Yena Ko, Hayeon Jeon, Esther Hehsun Kim, Seonghye Cho, Soeun Yang, Eun-mee Kim, Hajin Lim
- 
    Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities 
 Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao
- 
    CORG: Generating Answers from Complex, Interrelated Contexts 
 Hyunji Lee, Franck Dernoncourt, Trung Bui, Seunghyun Yoon
- 
    CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following Analysis 
 Ziyue Fan, Junliang He, Li Xiaoqing, Shaohui Kuang, Kai Song, Yaqian Zhou, Xipeng Qiu
- 
    RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement 
 Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Xin Zhao, Yang Song, Tao Zhang
- 
    An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMs 
 Eui Jun Hwang, Sukmin Cho, Junmyeong Lee, Jong C. Park
- 
    Pula: Training Large Language Models for Setswana 
 Nathan Brown, Vukosi Marivate
- 
    QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models 
 Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang
- 
    FLEX: Expert-level False-Less EXecution Metric for Text-to-SQL Benchmark 
 Heegyu Kim, Jeon taeyang, SeungHwan Choi, Seungtaek Choi, Hyunsouk Cho
- 
    Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models 
 Joan Nwatu, Oana Ignat, Rada Mihalcea
- 
    Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization 
 Yang Zhong, Diane J. Litman
- 
    Representing Rule-based Chatbots with Transformers 
 Dan Friedman, Abhishek Panigrahi, Danqi Chen
- 
    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models 
 Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang, Seonghyeon Ye, Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo
- 
    Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora 
 JoonHo Lee, JuYoun Son, Juree Seok, Wooseok Jang, Yeong-Dae Kwon
- 
    Are explicit belief representations necessary? A comparison between Large Language Models and Bayesian probabilistic models 
 Dingyi Pan, Ben Bergen
- 
    Assessing the State of the Art in Scene Segmentation 
 Albin Zehe, Elisabeth Fischer, Andreas Hotho
- 
    Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets 
 Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi
- 
    Efficient One-shot Compression via Low-Rank Local Feature Distillation 
 Yaya SY, Christophe Cerisara, Irina Illina
- 
    The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals 
 Xiaofeng Wu, Karl Stratos, Wei Xu
- 
    Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate 
 Xiaomeng Jin, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong
- 
    SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains 
 Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei DAI, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He
- 
    Markov Chain of Thought for Efficient Mathematical Reasoning 
 Wen Yang, Minpeng Liao, Kai Fan
- 
    JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking 
 Samar Mohamed Magdy, Sang Yun Kwon, Fakhraddin Alwajih, Safaa Taher Abdelfadil, Shady Shehata, Muhammad Abdul-Mageed
- 
    Functional Lexicon in Subword Tokenization 
 Zachary William Hopton, Yves Scherrer, Tanja Samardzic
- 
    Exploring the Cost-Effectiveness of Perspective Taking in Crowdsourcing Subjective Assessment: A Case Study of Toxicity Detection 
 Xiaoni Duan, Zhuoyan Li, Chien-Ju Ho, Ming Yin
- 
    SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model 
 Jiayang Yu, Yihang Zhang, Bin Wang, Peiqin Lin, YongKang Liu, Shi Feng
- 
    Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models 
 Hyegang Son, Yonglak Son, Changhoon Kim, Young Geun Kim
- 
    Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation 
 Cong-Duy T Nguyen, Xiaobao Wu, Thong Thanh Nguyen, Shuai Zhao, Khoi M. Le, Nguyen Viet Anh, Feng Yichao, Anh Tuan Luu
- 
    Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision 
 Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder
- 
    MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media 
 Muhammad Arslan Manzoor, Ruihong Zeng, Dilshod Azizov, Preslav Nakov, Shangsong Liang
- 
    Extracting and Understanding the Superficial Knowledge in Alignment 
 Runjin Chen, Gabriel Jacob Perin, Xuxi Chen, Xilun Chen, Yan Han, Nina S. T. Hirata, Junyuan Hong, Bhavya Kailkhura
- 
    ReasVQA: Advancing VideoQA with Imperfect Reasoning Process 
 Jianxin Liang, Xiaojun Meng, Huishuai Zhang, Yueqian Wang, Jiansheng Wei, Dongyan Zhao
- 
    FedSpaLLM: Federated Pruning of Large Language Models 
 Guangji Bai, Yijiang Li, Zilinghan Li, Liang Zhao, Kibaek Kim
- 
    A Survey of NLP Progress in Sino-Tibetan Low-Resource Languages 
 Shuheng Liu, Michael Best
- 
    Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling 
 Yiwen Ding, Zhiheng Xi, Wei He, Lizhuoyuan, Yitao Zhai, Shi Xiaowei, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang
- 
    DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models 
 Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Pei Wang, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu
- 
    PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines 
 Reya Vir, Shreya Shankar, Harrison Chase, William Hinthorn, Aditya Parameswaran
- 
    CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells 
 Atharva Naik, Marcus Alenius, Daniel Fried, Carolyn Rose
- 
    Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL 
 Bingfeng chen, Shaobin Shi, yongqi luo, Boyan Xu, Ruichu Cai, Zhifeng Hao
- 
    Stronger Universal and Transferable Attacks by Suppressing Refusals 
 David Huang, Avidan Shah, Alexandre Araujo, David Wagner, Chawin Sitawarin
- 
    You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL 
 Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng
- 
    Rethinking the Role of LLMs for Document-level Relation Extraction: a Refiner with Task Distribution and Probability Fusion 
 Fu Zhang, Xinlong Jin, Jingwei Cheng, Hongsen Yu, Huangming Xu
- 
    LBC: Language-Based-Classifier for Out-Of-Variable Generalization 
 Kangjun Noh, Baekryun Seong, Hoyoon Byun, Youngjun Choi, Sungjin Song, Kyungwoo Song
- 
    ComPO: Community Preferences for Language Model Personalization 
 Sachin Kumar, Chan Young Park, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi
- 
    Grammar Control in Dialogue Response Generation for Language Learning Chatbots 
 Dominik Glandorf, Peng Cui, Detmar Meurers, Mrinmaya Sachan
- 
    A Template Is All You Meme 
 Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych
- 
    SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals 
 Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang
- 
    ITALIC: An Italian Culture-Aware Natural Language Benchmark 
 Andrea Seveso, Daniele Potertì, Edoardo Federici, Mario Mezzanzanica, Fabio Mercorio
- 
    Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language Models 
 Danyang Liu, Fanjie Kong, Xiaohang Sun, Dhruva Patil, Avijit Vajpayee, Zhu Liu, Vimal Bhat, Najmeh Sadoughi
- 
    Investigating Human Values in Online Communities 
 Nadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein
- 
    AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction 
 Jie Feng, Yuwei Du, Jie Zhao, Yong Li
- 
    On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs 
 Nitay Calderon, Roi Reichart
- 
    LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems 
 Nan Xu, Xuezhe Ma
- 
    Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective 
 Shenglai Zeng, Jiankun Zhang, Bingheng Li, Yuping Lin, Tianqi Zheng, Dante Everaert, Hanqing Lu, Hui Liu, Hui Liu, Yue Xing, Monica Xiao Cheng, Jiliang Tang
- 
    Generative Prompt Internalization 
 Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo
- 
    ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App Screenshots 
 Yu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Srinivas Sunkara, Victor Carbune, Jason Lin, Maria Wang, Yun Zhu, Jindong Chen
- 
    xLAM: A Family of Large Action Models to Empower AI Agent Systems 
 Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Quoc Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Manoj Awalgaonkar, Rithesh R N, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong
- 
    Advancing MoE Efficiency: A Collaboration-Constrained Routing ($\texttt{C2R}$) Strategy for Better Expert Parallelism Design 
 Mohan Zhang, Pingzhi Li, Jie Peng, Mufan Qiu, Tianlong Chen
- 
    ALiiCE: Evaluating Positional Fine-grained Citation Generation 
 Yilong Xu, Jinhua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng
- 
    LLM-Supported Natural Language to Bash Translation 
 Finnian Westenfelder, Erik Hemberg, Stephen Moskal, Una-May O’Reilly, Silviu Chiricescu
- 
    PICLe: Pseudo-annotations for In-Context Learning in Low-Resource Named Entity Detection 
 Sepideh Mamooler, Syrielle Montariol, Alexander Mathis, Antoine Bosselut
- 
    Fine-Tuning Large Language Models with Sequential Instructions 
 Hanxu Hu, Simon Yu, Pinzhen Chen, Edoardo Ponti
- 
    Token-based Decision Criteria Are Suboptimal in In-context Learning 
 Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
- 
    From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks 
 Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye
- 
    One Unified Model for Diverse Tasks: Emotion Cause Analysis via Self-Promote Cognitive Structure Modeling 
 Zhaoxin Yu, Xinglin Xiao, Wenji Mao
- 
    Through the Lens of History: Methods for Analyzing Temporal Variation in Content and Framing of State-run Chinese Newspapers 
 Shijia Liu, David A. Smith
- 
    A Systematic Examination of Preference Learning through the Lens of Instruction-Following 
 Joongwon Kim, Anirudh Goyal, Aston Zhang, Bo Xiong, Rui Hou, Melanie Kambadur, Dhruv Mahajan, Hannaneh Hajishirzi, Liang Tan
- 
    Mutual-pairing Data Augmentation for Fewshot Continual Relation Extraction 
 Nguyen Hoang Anh, Quyen Tran, Thanh Xuan Nguyen, Nguyen Thi Ngoc Diep, Linh Ngo Van, Thien Huu Nguyen, Trung Le
- 
    Coverage-based Fairness in Multi-document Summarization 
 Haoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi
- 
    Guiding Medical Vision-Language Models with Diverse Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations 
 Kangyu Zhu, Ziyuan Qin, Huahui Yi, Zekun Jiang, Qicheng Lao, Shaoting Zhang, Kang Li
- 
    CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts 
 Malvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis Konstas, Alessandro Suglia
- 
    A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation 
 Bairu Hou, Yang Zhang, Jacob Andreas, Shiyu Chang
- 
    The Impact of Domain-Specific Terminology on Machine Translation for Finance in European Languages 
 Arturo Oncevay, Charese Smiley, Xiaomo Liu
- 
    Exploring the Potential of Large Language Models for Heterophilic Graphs 
 Yuxia Wu, Shujie Li, Yuan Fang, Chuan Shi
- 
    A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models 
 Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen
- 
    Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion 
 Ziyao Xu, Houfeng Wang
- 
    Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning 
 Gaurav Arora, Srujana Merugu, shreya jain, Vaibhav Saxena
- 
    ACCORD: Closing the Commonsense Measurability Gap 
 François Roewer-Després, Jinyue Feng, Zining Zhu, Frank Rudzicz
- 
    DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language Models 
 Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang
- 
    Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset 
 Khaoula Chehbouni, Jonathan Colaço Carr, Yash More, Jackie CK Cheung, Golnoosh Farnadi
- 
    A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model 
 Shengxiang Gao, Fang Nan, Yongbing Zhang, Yuxin Huang, Kaiwen Tan, Zhengtao Yu
- 
    Ihquin tlahtouah in Tetelahtzincocah: An annotated, multi-purpose audio and text corpus of Western Sierra Puebla Nahuatl 
 Robert Pugh, Cheyenne Wing, María Ximena Juárez Huerta, Angeles Márquez Hernandez, Francis M. Tyers
- 
    Knowledge Graph Guided Evaluation of Abstention Techniques 
 Kinshuk Vasisht, Navreet Kaur, Danish Pruthi
- 
    A Novel Computational Modeling Foundation for Automatic Coherence Assessment 
 Aviya Maimon
- 
    Query-focused Referentiability Learning for Zero-shot Retrieval 
 Jaeyoung Kim, Dohyeon Lee, seung-won hwang
- 
    MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps 
 Xiongtao Zhou, Jie He, Lanyu Chen, jingyu li, Haojing Chen, Victor Gutierrez Basulto, Jeff Z. Pan, Hanjie Chen
- 
    LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs 
 Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan
- 
    PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries 
 Mingwen Dong, Nischal Ashok Kumar, Yiqun Hu, Anuj Chauhan, Chung-Wei Hang, Shuaichen Chang, Lin Pan, Wuwei Lan, Henghui Zhu, Jiarong Jiang, Patrick Ng, Zhiguo Wang
- 
    Option Symbol Matters: Investigating and Mitigating Multiple-Choice Option Symbol Bias of Large Language Models 
 Zhen Yang, Ping Jian, Chengzhi Li
- 
    IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models 
 David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba Oluwadara Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Ijeoma Chukwuneke, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson Koffi KALIPE, Jonathan Mukiibi, Salomon KABONGO KABENAMUALU, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Salomey Osei, Shamsuddeen Hassan Muhammad, Sokhar Samb, Tadesse Kebede Guge, Tombekai Vangoni Sherman, Pontus Stenetorp
- 
    PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles 
 Siyan Li, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, Zhou Yu
- 
    Tonguescape: Exploring Language Models Understanding of Vowel Articulation 
 Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
- 
    Exploring Large Language Models for Effective Rumor Detection on Social Media 
 Yirong Zeng, Xiao Ding, Bibo Cai, Ting Liu, Bing Qin
- 
    Instantly Learning Preference Alignment via In-context DPO 
 Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang
- 
    MASTER: A Multi-Agent System with LLM Specialized MCTS 
 BINGZHENG GAN, Yufan Zhao, Tianyi Zhang, Jing Huang, LI YUSU, Shu Xian Teo, Changwang Zhang, Wei Shi
- 
    Unfamiliar Finetuning Examples Control How Language Models Hallucinate 
 Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine
- 
    Multimodal Cognitive Reframing Therapy via Multi-hop Psychotherapeutic Reasoning 
 Subin Kim, Hoonrae Kim, Heejin Do, Gary Lee
- 
    PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models 
 Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
- 
    Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection 
 Maximilian Spliethöver, Tim Knebler, Fabian Fumagalli, Maximilian Muschalik, Barbara Hammer, Eyke Hüllermeier, Henning Wachsmuth
- 
    Evaluating and Improving Graph to Text Generation with Large Language Models 
 Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Victor Gutierrez Basulto, Jeff Z. Pan
- 
    Analyzing (In)Abilities of SAEs via Formal Languages 
 Abhinav Menon, Manish Shrivastava, David Krueger, Ekdeep Singh Lubana
- 
    PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection 
 Jooyoung Lee, Toshini Agrawal, Adaku Uchendu, Thai Le, Jinghui Chen, Dongwon Lee
- 
    What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language Models 
 Shengqi Zhu, Jeffrey Rzeszotarski
- 
    Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion 
 Jacob K Christopher, Brian R. Bartoldson, Tal Ben-Nun, Michael Cardei, Bhavya Kailkhura, Ferdinando Fioretto
- 
    The Impact of Inference Acceleration on Bias of LLMs 
 Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar
- 
    Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval 
 Yu Xia, Junda Wu, Sungchul Kim, Tong Yu, Ryan A. Rossi, Haoliang Wang, Julian McAuley
- 
    Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework 
 Jingyi Sun, Pepa Atanasova, Isabelle Augenstein
- 
    THREAD: Thinking Deeper with Recursive Spawning 
 Philip Schroeder, Nathaniel W. Morgan, Hongyin Luo, James R. Glass
- 
    Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward 
 Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander G Hauptmann, Yonatan Bisk, Yiming Yang
- 
    Characterizing the Role of Similarity in the Property Inferences of Language Models 
 Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra
- 
    TurtleBench: A Visual Programming Benchmark in Turtle Geometry 
 Sina Rismanchian, Yasaman Razeghi, Sameer Singh, Shayan Doroudi
- 
    SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning 
 Jinu Lee, Wonseok Hwang
- 
    BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment 
 Sizhe Wang, Yongqi Tong, Hengyuan Zhang, Dawei Li, Xin Zhang, Tianlong Chen
- 
    PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics 
 Daniil Larionov, Steffen Eger
- 
    Not all Hallucinations are Good to Throw Away When it Comes to Legal Abstractive Summarization 
 Nihed Bendahman, Karen Pinel-Sauvagnat, Gilles Hubert, Mokhtar Boumedyen BILLAMI
- 
    From Evidence to Belief: A Bayesian Epistemology Approach to Language Models 
 Minsu Kim, Sangryul Kim, James Thorne
- 
    MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation 
 Satya Krishna Gorti, Ilan Gofman, Zhaoyan Liu, Jiapeng Wu, Noël Vouitsis, Guangwei Yu, Jesse C. Cresswell, Rasa Hosseinzadeh
- 
    AlgoPuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Algorithmic Multimodal Puzzles 
 Deepanway Ghosal, Vernon Toh, Yew Ken Chia, Soujanya Poria
- 
    Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond 
 Mardhiyah Sanni, Tassallah Abdullahi, Devendra Deepak Kayande, Emmanuel Ayodele, Naome A Etori, Michael Samwel Mollel, Moshood O. Yekini, Chibuzor Okocha, Lukman Enegi Ismaila, Folafunmi Omofoye, Boluwatife A. Adewale, Tobi Olatunji
- 
    UFO: A UI-Focused Agent for Windows OS Interaction 
 Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
- 
    Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations 
 Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich
- 
    Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs 
 Keqi Deng, Guangzhi Sun, Phil Woodland
- 
    Evaluating the Prompt Steerability of Large Language Models 
 Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Kush R. Varshney, Eitan Farchi, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu, Prasanna Sattigeri
- 
    Style Transfer with Multi-iteration Preference Optimization 
 Shuai Liu, Jonathan May
- 
    Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning 
 Junzhi Chen, Juhao Liang, Benyou Wang
- 
    SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning 
 Magdalena Wysocka, Danilo Carvalho, Oskar Wysocki, Marco Valentino, Andre Freitas
- 
    REFFLY: Melody-Constrained Lyrics Editing Model 
 Songyan Zhao, Bingxuan Li, Yufei Tian, Nanyun Peng
- 
    Analyzing and Improving Coherence of Large Language Models in Question Answering 
 Ivano Lauriola, Stefano Campese, Alessandro Moschitti
- 
    Forest for the Trees: Overarching Prompting Evokes High-Level Reasoning in Large Language Models 
 Haoran Liao, Shaohua Hu, Zhihao Zhu, Hao HE, Yaohui Jin
- 
    Soft Prompting for Unlearning in Large Language Models 
 Karuna Bhaila, Minh-Hao Van, Xintao Wu
- 
    AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents 
 Zhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, Maarten Sap
- 
    Self-Harmonized Chain of Thought 
 Ziqi Jin, Wei Lu
- 
    AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge 
 Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
- 
    Disentangling language change: sparse autoencoders quantify the semantic evolution of indigeneity in French 
 Jacob A. Matthews, Laurent Dubreuil, Imane Terhmina, Yunci Sun, Matthew Wilkens, Marten Van Schijndel
- 
    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners 
 Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang
- 
    Familarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data 
 Jonas Golde, Patrick Haller, Max Ploner, Fabio Barth, Nicolaas Paul Jedema, Alan Akbik
- 
    E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic Expressions 
 Hongbo Zheng, Suyuan Wang, Neeraj Gangwar, Nickvash Kani
- 
    PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification 
 Ashish Seth, Ramaneswaran Selvakumar, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
- 
    Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences 
 Heejin Kook, Junyoung Kim, Seongmin Park, Jongwuk Lee
- 
    CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories 
 Yijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang
- 
    CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases 
 Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, Wenmeng Zhou
- 
    Embedding derived animacy rankings offer insights into the sources of grammatical animacy 
 Vivian G. Li
- 
    Handling Missing Entities in Zero-Shot Named Entity Recognition: Integrated Recall and Retrieval Augmentation 
 Ruichu Cai, Junhao Lu, Zhongjie Chen, Boyan Xu, Zhifeng Hao
- 
    Prototype Conditioned Generative Replay for Continual Learning in NLP 
 Xi Chen, Min Zeng
- 
    TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection 
 Shengmin Piao, Sanghyun Park
- 
    Self-Training Meets Consistency: Improving LLMs’ Reasoning with Consistency-Driven Rationale Evaluation 
 Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak
- 
    SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters 
 Yan Yang, Zeguan Xiao, Xin Lu, Hongru WANG, Xuetao Wei, Hailiang Huang, Guanhua Chen, Yun Chen
- 
    Are We Done with MMLU? 
 Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini
- 
    UOREX: Towards Uncertainty-Aware Open Relation Extraction 
 Rebii Jamal, Mounir OUREKOUCH, Mohammed Erradi
- 
    VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models 
 Zejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Xuanjing Huang, zhongyu wei
- 
    On Positional Bias of Faithfulness for Long-form Summarization 
 David Wan, Jesse Vig, Mohit Bansal, Shafiq Joty
- 
    CAST: Corpus-Aware Self-similarity Enhanced Topic modelling 
 Yanan Ma, Chenghao Xiao, Chenhan Yuan, Sabine N van der Veer, Lamiece Hassan, Chenghua Lin, Goran Nenadic
- 
    VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction 
 Xiaoman Wang, DAN YUAN, Xin Liu, Yike Zhao, Xiaoxiao Zhang, Xizhi Chen, Yunshi Lan
- 
    FactEval: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models 
 Mamta Mamta, Oana Cocarascu
- 
    Patent-CR: A Dataset for Patent Claim Revision 
 Lekang Jiang, Pascal A. Scherz, Stefan Goetz
- 
    Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages 
 Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman, Stephen Bach
- 
    Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation 
 Liwen Sun, James Jialun Zhao, Wenjing Han, Chenyan Xiong
- 
    Mixture of Multimodal Adapters for Sentiment Analysis 
 Kezhou Chen, Shuo Wang, Huixia Ben, Shengeng Tang, Yanbin Hao
- 
    Benchmarking Failures in Tool-Augmented Language Models 
 Eduardo Treviño, Hugo Contant, James Ngai, Graham Neubig, Zora Zhiruo Wang
- 
    Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering 
 Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru WANG, Xuanli He, Kam-Fai Wong, Pasquale Minervini
- 
    It Is Not Only the Negative that Deserves Attention! Understanding, Generation & Evaluation of (Positive) Moderation 
 Iman Jundi, Eva Maria Vecchi, Carlotta Quensel, Neele Falk, Gabriella Lapesa
- 
    DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback 
 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
- 
    Measuring memorization in language models via probabilistic extraction 
 Jamie Hayes, Marika Swanberg, Ilia Shumailov, Itay Yona, Harsh Chaudhari, A. Feder Cooper, Christopher A. Choquette-Choo, Katherine Lee, Milad Nasr
- 
    AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails 
 Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien
- 
    FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing 
 James Seale Smith, Chi-Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, Yen-Chang Hsu
- 
    Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions 
 Moran Yanuka, Assaf Ben-Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes
- 
    Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory 
 Haoran Li, Wei Fan, Yulin Chen, Cheng Jiayang, Tianshu Chu, Xuebing Zhou, Peizhao Hu, Yangqiu Song
- 
    NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals 
 Neha Srikanth, Rachel Rudinger
- 
    Evaluating and Mitigating Object Hallucination in Large Vision-Language Models: Can They Still See Removed Objects? 
 Yixiao He, Haifeng Sun, Pengfei Ren, Jingyu Wang, Huazheng Wang, Qi Qi, Zirui Zhuang, Jing Wang
- 
    PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example Banks 
 Soumya Suvra Ghosal, Soumyabrata Pal, Koyel Mukherjee, Dinesh Manocha
- 
    From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection 
 Rupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang
- 
    The Russian-focused embedders’ exploration: ruMTEB benchmark and Russian embedding model design 
 Artem Snegirev, Maria Tikhonova, Maksimova Anna, Alena Fenogenova, Aleksandr Abramov
- 
    SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc 
 Daniel Guzman Olivares, Lara Quijano, Federico Liberatore
- 
    Self-calibration for Language Model Quantization and Pruning 
 Miles Williams, George Chrysostomou, Nikolaos Aletras
- 
    SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs 
 Zhichao Shi, Shaoling Jing, Yi Cheng, Hao Zhang, Yuanzhuo Wang, Jie Zhang, Huawei Shen, Xueqi Cheng
- 
    Temporal-Aware Soft Prompt Tuning for Automatic Text Dating 
 Hai Wang, Yuzhi Liang, Han Ren
- 
    LiPO: Listwise Preference Optimization through Learning-to-Rank 
 Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J Liu, Xuanhui Wang
- 
    Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models 
 Abhilasha Ravichander, Jillian Fisher, Taylor Sorensen, Ximing Lu, Maria Antoniak, Bill Yuchen Lin, Niloofar Mireshghallah, Chandra Bhagavatula, Yejin Choi
- 
    Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement 
 Qianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang, daiyuan li, Yu Hu, Mingkui Tan
- 
    ReIFE: Re-evaluating Instruction-Following Evaluation 
 Yixin Liu, Kejian Shi, Alexander Fabbri, Yilun Zhao, PeiFeng Wang, Chien-Sheng Wu, Shafiq Joty, Arman Cohan
- 
    Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation 
 Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li
- 
    HISTOIRESMORALES: A French Dataset for Assessing Moral Alignment 
 Thibaud Leteno, Irina Proskurina, Antoine Gourru, Julien Velcin, Charlotte Laclau, Guillaume Metzler, Christophe Gravier
- 
    See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias 
 Junehyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin Kim
- 
    The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning 
 Longju Bai, Angana Borah, Oana Ignat, Rada Mihalcea
- 
    CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments 
 Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu
- 
    DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization 
 Yasir Khan, Xinlei Wu, Sangpil Youm, Justin Ho, Aryaan Mehboob Shaikh, Jairo Garciga, Rohan Sharma, Bonnie J Dorr
- 
    Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis 
 Pamela D Riviere, Anne L. Beatty-Martínez, Sean Trott
- 
    Conformalized Answer Set Prediction for Knowledge Graph Embedding 
 Yuqicheng Zhu, Nico Potyka, Jiarong Pan, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab
- 
    A Multi-modal Large Language Model with Graph-of-Thought for Effective Recommendation 
 Zixuan Yi, Iadh Ounis
- 
    Meta-Cultural Competence: Climbing the Right Hill of Cultural Awareness 
 Sougata Saha, Saurabh Kumar Pandey, Monojit Choudhury
- 
    ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation 
 Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang
- 
    On the Role of Speech Data in Reducing Toxicity Detection Bias 
 Samuel Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà
- 
    CodeSCM: Causal Analysis for Multi-Modal Code Generation 
 Mukur Gupta, Noopur Bhatt, Suman Jana
- 
    On the Impact of Fine-Tuning on Chain-of-Thought Reasoning 
 Elita Lobo, Chirag Agarwal, Himabindu Lakkaraju
- 
    ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors 
 Qinchan Li, Sophie Hao
- 
    Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data 
 Haonan Wang, Minbin Huang, Runhui Huang, Lanqing HONG, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi
- 
    Fighting Spurious Correlations in Text Classification via a Causal Learning Perspective 
 Yuqing Zhou, Ziwei Zhu
- 
    Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation 
 Jaechang Kim, Jinmin Goh, Inseok Hwang, Jaewoong Cho, Jungseul Ok
- 
    CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models 
 Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao
- 
    Reliability of Topic Modeling 
 Kayla Schroeder, Zach Wood-Doughty
- 
    TRANSIENTTABLES: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables 
 Abhilash Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta
- 
    On the Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models 
 Tianyang Zhao, Kunwar Yashraj Singh, srikar appalaraju, Peng Tang, Ying Nian Wu, Li Erran Li
- 
    Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization 
 Zilu Tang, Rajen Chatterjee, Sarthak Garg
- 
    ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models 
 Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang
- 
    Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction 
 lu Yang, Jiajia Li, En Ci, Lefei Zhang, Zuchao Li, Ping Wang
- 
    Fine-grained Fallacy Detection with Human Label Variation 
 Alan Ramponi, Agnese Daffara, Sara Tonelli
- 
    Soft Language Prompts for Language Transfer 
 Ivan Vykopal, Simon Ostermann, Marian Simko
- 
    Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation 
 Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui
- 
    Soft Syntactic Reinforcement for Neural Event Extraction 
 Anran Hao, Jian Su, Shuo Sun, Teo Yong Sen
- 
    UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models 
 Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić
- 
    Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack 
 Cheng Wang, Yiwei Wang, Yujun Cai, Bryan Hooi
- 
    Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation 
 Mingqi Gao, Xinyu Hu, Li Lin, Xiaojun Wan
- 
    World Models with Hints of Large Language Models for Goal Achieving 
 Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu
- 
    Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations 
 Ziqiao Ma, Zekun Wang, Joyce Chai
- 
    Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment 
 Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David R Mortensen
- 
    Diversity Helps Jailbreak Large Language Models 
 Weiliang Zhao, Daniel Ben-Levi, Wei Hao, Junfeng Yang, Chengzhi Mao
- 
    NAT: Enhancing Agent Tuning with Negative Samples 
 Renxi Wang, Xudong Han, Yixuan Zhang, Timothy Baldwin, Haonan Li
- 
    AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems 
 Arihant Jain, Purav Aggarwal, Rishav Sahay, Chaosheng Dong, Anoop Saladi
- 
    Learning to Summarize from LLM-generated Feedback 
 Hwanjun Song, Taewon Yun, Yuho Lee, Jihwan Oh, Gihun Lee, Jason Cai, Hang Su
- 
    FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation 
 Garrett Tanzer
- 
    Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models 
 Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty
- 
    Prompt Compression for Large Language Models: A Survey 
 Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier
- 
    Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models 
 Juan Pablo Munoz, Jinjie Yuan, Nilesh Jain
- 
    Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation 
 Tianyu Liu, Jirui Qi, Paul He, Arianna Bisazza, Mrinmaya Sachan, Ryan Cotterell
- 
    Has this Fact been Edited? Detecting Knowledge Edits in Language Models 
 Paul Youssef, Zhixue Zhao, Christin Seifert, Jörg Schlötterer
- 
    DPL: Diverse Preference Learning Without A Reference Model 
 Abhijnan Nath, Andrey Volozin, Saumajit Saha, Albert Aristotle Nanda, Galina Grunin, Rahul Bhotika, Nikhil Krishnaswamy
- 
    LegalViz: Legal Text Visualization by Text To Diagram Generation 
 Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita
- 
    Dynamic Fisher-weighted Model Merging via Bayesian Optimization 
 Sanwoo Lee, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Yunfang Wu
- 
    From Distributional to Overton Pluralism: Investigating Large Language Model Alignment 
 Thom Lake, Eunsol Choi, Greg Durrett
- 
    InfoPO: On Mutual Information Maximization for Large Language Model Alignment 
 Teng Xiao, Zhen Ge, sujay sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, qingjun cui, Trishul Chilimbi
- 
    WebQuality: A Large-scale Multi-modal Web Page Quality Assessment Dataset with Multiple Scoring Dimensions 
 Tao Zhang, Yige Wang, ZhuHangyu, Li Xin, CHEN XIANG, Tian Hua Zhou, Jin Ma
- 
    A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization 
 Haoxin Liu, Chenghao Liu, B. Aditya Prakash
- 
    Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition 
 Guanlin Li, Yuki Arase, Noel Crespi
- 
    Revisiting Early Detection of Sexual Predators via Turn-level Optimization 
 JinMyeong AN, Sangwon Ryu, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Lee
- 
    Knowledge Graph-Guided Retrieval Augmented Generation 
 Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu
- 
    Towards Rationality in Language and Multimodal Agents: A Survey 
 Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J Su, Camillo Jose Taylor, Tanwi Mallick
- 
    A Bayesian Optimization Approach to Machine Translation Reranking 
 Julius Cheng, Maike Züfle, Vilém Zouhar, Andreas Vlachos
- 
    ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages 
 Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee
- 
    Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings 
 Linsen Li, Aron Culotta, Nicholas Mattei
- 
    EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms 
 Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang
- 
    Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? 
 Xuan He, Da Yin, Nanyun Peng
- 
    MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation 
 Junqing He, Liang Zhu, Rui Wang, Xi Wang, Gholamreza Haffari, Jiaxing Zhang
- 
    EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics 
 Chenwei Wan, Matthieu Labeau, Chloé Clavel
- 
    HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing 
 Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong
- 
    ConMeC: A Dataset for Metonymy Resolution with Common Nouns 
 Saptarshi Ghosh, Tianyu Jiang
- 
    VividMed: Vision Language Model with Versatile Visual Grounding for Medicine 
 Lingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han, Ting Chen
- 
    Decoding Speculative Decoding 
 Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman
- 
    Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models 
 Nikhil Sharma, Kenton Murray, Ziang Xiao
- 
    LongLeader: A Comprehensive Leaderboard for Large Language Models in Long-context Scenarios 
 Pei Chen, Hongye Jin, Cheng-Che Lee, Rulin Shao, Jingfeng Yang, Mingyu Zhao, Zhaoyu Zhang, Qin Lu, Kaiwen Men, Ning Xie, Huasheng Li, Bing Yin, Han Li, Lingyun Wang
- 
    CausalEval: Towards Better Causal Reasoning in Language Models 
 Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan
- 
    CAVE: Controllable Authorship Verification Explanations 
 Sahana Ramnath, Kartik Pandey, Elizabeth Boschee, Xiang Ren
- 
    StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion 
 Yinghao Aaron Li, Xilin Jiang, Cong Han, Nima Mesgarani
- 
    FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models 
 Xin Guo, Haotian Xia, Zhaowei Liu, Hanyang Cao, Zhi Yang, Zhiqiang Liu, Sizhe Wang, Jinyi Niu, Chuqi Wang, Yanhui Wang, Xiaolong Liang, Xiaoming Huang, Bing Zhu, zhongyu wei, Yun Chen, Weining Shen, Liwen Zhang
- 
    JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation 
 Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa
- 
    Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning 
 Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, Xiaozhong Liu
- 
    Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation 
 Damien De Mijolla, Hannan Saddiq, Kim Moore
- 
    Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study 
 Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang
- 
    Sparser Mixture-of-Adapters with Cross-Layer Generalization 
 Ziyue Li, Tianyi Zhou
- 
    How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs 
 ran zhang, Wei Zhao, Steffen Eger
- 
    Script-Agnosticism and its Impact on Language Identification for Dravidian Languages 
 Milind Agarwal, Joshua Otten, Antonios Anastasopoulos
- 
    What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation 
 Michal Golovanevsky, William Rudman, Vedant Palit, Carsten Eickhoff, Ritambhara Singh
- 
    BEMEAE: Moving Beyond Exact Span Match for Event Argument Extraction 
 Enfa Fane, Md Nayem Uddin, Oghenevovwe Ikumariegbe, Daniyal Kashif, Eduardo Blanco, Steven Corman
- 
    uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes 
 Abdul Waheed, Karima Kadaoui, Bhiksha Raj, Muhammad Abdul-Mageed
- 
    LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue 
 Sangyeop Kim, Sohhyung Park, Jaewon Jung, Jinseok Kim, Sungzoon Cho
- 
    M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models 
 Rishabh Maheshwary, Vikas Yadav, Hoang H Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan
- 
    Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque 
 Ander Corral, Ixak Sarasua Antero, Xabier Saralegi
- 
    Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks 
 Justin Zhao, Flor Miriam Plaza-del-Arco, Amanda Cercas Curry
- 
    On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena 
 Tarek Naous, Wei Xu
- 
    Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models 
 Sehun Lee, Kang-wook Kim, Gunhee Kim
- 
    Language Models are Crossword Solvers 
 Soumadeep Saha, Sutanoya Chakraborty, Saptarshi Saha, Utpal Garain
- 
    A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Case Study of Supplementary Adverbs 
 Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun
- 
    Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage 
 Kaige Xie, Philippe Laban, Prafulla Kumar Choubey, Caiming Xiong, Chien-Sheng Wu
- 
    Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps? 
 Sougata Saha, Saurabh Kumar Pandey, Harshit Gupta, Monojit Choudhury
- 
    Enhancing Discriminative Representation in Similar Relation Clusters for Few-Shot Continual Relation Extraction 
 Anh Duc Le, Nam Le Hai, Thanh Xuan Nguyen, Linh Ngo Van, Nguyen Thi Ngoc Diep, Sang Dinh, Thien Huu Nguyen
- 
    My LLM might Mimic AAE - But When Should It? 
 Sandra Camille Sandoval, Christabel Acquaye, Kwesi Adu Cobbina, Mohammad Nayeem Teli, Hal Daumé III
- 
    Single Ground Truth Is Not Enough: Adding Flexibility to Aspect-Based Sentiment Analysis Evaluation 
 Soyoung Yang, Hojun Cho, Jiyoung Lee, Sohee Yoon, Edward Choi, Jaegul Choo, Won Ik Cho
- 
    Differentially Private Learning Needs Better Model Initialization and Self-Distillation 
 Ivoline C. Ngong, Joseph Near, Niloofar Mireshghallah
- 
    Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice 
 Sunny Rai, Khushang Zaveri, Shreya Havaldar, Soumna Nema, Lyle Ungar, Sharath Chandra Guntuku
- 
    Reverse Thinking Makes LLMs Stronger Reasoners 
 Justin Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
- 
    One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity 
 Sonia Krishna Murthy, Tomer Ullman, Jennifer Hu
- 
    Bayelemabaga: Creating Resources for Bambara NLP 
 Allahsera Auguste Tapo, Kevin Assogba, Christopher M Homan, M. Mustafa Rafique, Marcos Zampieri
- 
    COVE: COntext and VEracity prediction for out-of-context images 
 Jonathan Tonglet, Gabriel Thiem, Iryna Gurevych
- 
    Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion 
 Muzhi Li, Cehao Yang, Chengjin Xu, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-fung Leung, Irwin King
- 
    mHumanEval - A Multilingual Benchmark to Evaluate Large Language Models for Code Generation 
 Md Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri
- 
    KMMLU: Measuring Massive Multitask Language Understanding in Korean 
 Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman
- 
    Making Language Models Robust Against Negation 
 MohammadHossein Rezaei, Eduardo Blanco
- 
    Harnessing and Evaluating the Intrinsic Extrapolation Ability of Large Language Models for Vehicle Trajectory Prediction 
 Jiawei Liu, yanjiao liu, Xun Gong, Tingting Wang, Hong Chen, Yunfeng hu
- 
    Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction 
 Junlang Qian, Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Zepeng Zhai, Kezhi Mao
- 
    Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools 
 Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan
- 
    Analyzing Memorization in Large Language Models through the Lens of Model Attribution 
 Tarun Ram Menta, Susmit Agrawal, Chirag Agarwal
- 
    AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs 
 Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan, Le Chen, Mihai Capotă, Theodore L. Willke, Nesreen K. Ahmed, Ali Jannesari
- 
    How to Make LLMs Forget: On Reversing In-Context Knowledge Edits 
 Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert
- 
    $\textsf{TCProF}$: $\underline{\text{T}}$ime-$\underline{\text{C}}$omplexity $\underline{\text{Pr}}$edicti$\underline{\text{o}}$n SSL $\underline{\text{F}}$ramework 
 Joonghyuk Hahn, Hyeseon Ahn, Jungin Kim, Soohan Lim, Yo-Sub Han
- 
    Language Models can Categorize System Inputs for Performance Analysis 
 Dominic Sobhani, Ruiqi Zhong, Edison Marrese-Taylor, Keisuke Sakaguchi, Yutaka Matsuo
- 
    Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product 
 Pengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, Xingwei Wang
- 
    Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering 
 Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang
- 
    KODIS: A Multicultural Dispute Resolution Dialogue Corpus 
 James Anthony Hale, Sushrita Rakshit, Kushal Chawla, Jeanne M Brett, Jonathan Gratch
- 
    Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor Recognition 
 Haohao Zhu, Xiaokun Zhang, Zeyuan Zeng, Junyu Lu, Zewen Bai, Liang Yang, Hongfei Lin
- 
    AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages 
 Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu, Paul Röttger, Abigail Oppong, Andiswa Bukula, Chiamaka Ijeoma Chukwuneke, Ebrahim Chekol Jibril, Elyas Abdi ISMAIL, Esubalew Alemneh, Hagos Tesfahun Gebremichael, Lukman Jibril Aliyu, Meriem Beloucif, Oumaima Hourrane, Rooweither Mabuya, Salomey Osei, Samuel Rutunda, Tadesse Destaw Belay, Tadesse Kebede Guge, Tesfa Tegegne Asfaw, Lilian Diana Awuor Wanzare, Nelson Odhiambo Onyango, Seid Muhie Yimam, Nedjma Ousidhoum
- 
    The Plagiarism Singularity Conjecture 
 Sriram Ranga, Rui Mao, Erik Cambria, Anupam Chattopadhyay
- 
    CoRAC: Integrating Selective API Document Retrieval with Question Semantic Intent for Code Question Answering 
 YunSeok Choi, CheolWon Na, Jee-Hyong Lee
- 
    GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing 
 Jinhao Duan, Xinyu Zhao, Zhuoxuan Zhang, Eunhye Grace Ko, Lily Boddy, Chenan Wang, Tianhao Li, Alexander Rasgon, Junyuan Hong, Min Kyung Lee, Chenxi Yuan, Qi Long, Ying Ding, Tianlong Chen, Kaidi Xu
- 
    $B^4$: A Black-Box Scrubbing Attack on LLM Watermarks 
 Baizhou Huang, Xiao Pu, Xiaojun Wan
- 
    WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching 
 Tianze Luo, Xingchen Miao, Wenbo Duan
- 
    Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring 
 Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che
- 
    Vision-Language Models Can Self-Improve Reasoning via Reflection 
 Kanzhi Cheng, Li YanTao, Fangzhi Xu, Jianbing Zhang, Hao Zhou, Yang Liu
- 
    Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference 
 Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
- 
    Sharpness-Aware Minimization for Topic Models with High-Quality Document Representations 
 Tung Nguyen, Tue Le, Hoang Tran Vuong, Quang Duc Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen
- 
    DCE-LLM: Dead Code Elimination with Large Language Models 
 Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu
- 
    Active Few-Shot Learning for Text Classification 
 Saeed Ahmadnia, Arash Yousefi Jordehi, Mahsa Hosseini Khasheh Heyran, Seyed Abolghasem Mirroshandel, Owen Rambow, Cornelia Caragea
- 
    ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis 
 Zezhong WANG, Xingshan Zeng, Weiwen Liu, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
- 
    FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions 
 Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini
- 
    Mitigating Hallucinations in Multi-modal Large Language Models via Image Token Attention-Guided Decoding 
 Xinhao Xu, Hui Chen, Mengyao Lyu, Sicheng Zhao, Yizhe Xiong, Zijia Lin, Jungong Han, Guiguang Ding
- 
    Cascading Large Language Models for Salient Event Graph Generation 
 Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He
- 
    Fingerspelling within Sign Language Translation 
 Garrett Tanzer
- 
    MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools 
 Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, Sam Thomson
- 
    Few-shot Personalization of LLMs with Mis-aligned Responses 
 Jaehyung Kim, Yiming Yang
- 
    Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement 
 Suchae Jeong, Inseong Choi, Youngsik Yun, Jihie Kim
- 
    GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models 
 Harsh Kohli, Sachin Kumar, Huan Sun
- 
    Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration 
 Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu
- 
    Little Giants: Synthesizing High-Quality Embedding Data at Scale 
 Haonan Chen, Liang Wang, Nan Yang, Yutao Zhu, Ziliang Zhao, Furu Wei, Zhicheng Dou
- 
    CluSanT: Differentially Private and Semantically Coherent Text Sanitization 
 Ahmed Musa Awon, Yun Lu, Shera Potka, Alex Thomo
- 
    Towards Operationalizing Right to Data Protection 
 Abhinav Java, Simra Shahid, Chirag Agarwal
- 
    In-Context Learning (and Unlearning) of Length Biases 
 Stephanie Schoch, Yangfeng Ji
- 
    SLM-Mod: Small Language Models Surpass LLMs at Content Moderation 
 Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha
- 
    NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models 
 Abhinav Sukumar Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap
- 
    Hybrid Graphs for Table-and-Text based Question Answering using LLMs 
 Ankush Agarwal, Chaitanya Devaguptapu, Ganesh S
- 
    Entropy-Based Decoding for Retrieval-Augmented Large Language Models 
 Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King
- 
    Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison 
 Tsz Kin Lam, Marco Gaido, Sara Papi, Luisa Bentivogli, Barry Haddow
- 
    A Data-Driven Method for Analyzing and Quantifying Lyrics-Dance Motion Relationships 
 Kento Watanabe, Masataka Goto
- 
    Is In-Context Learning a Type of Error-Driven Learning? Evidence from the Inverse Frequency Effect in Structural Priming 
 Zhenghao Zhou, Robert Frank, R. Thomas McCoy
- 
    Exploiting Edited Large Language Models as General Scientific Optimizers 
 Qitan Lv, Tianyu Liu, Hong Wang
- 
    REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance 
 Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap
- 
    Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5 
 Arkadeep Acharya, Rudra Murthy, vishwajeet kumar, Jaydeep Sen
- 
    MoDification: Mixture of Depths Made Easy 
 Chen Zhang, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang, Dawei Song
- 
    Revealing the Barriers of Language Agents in Planning 
 Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao
- 
    PeerQA: A Scientific Question Answering Dataset from Peer Reviews 
 Tim Baumgärtner, Ted Briscoe, Iryna Gurevych
- 
    Reversed Attention: On The Gradient Descent Of Attention Layers In GPT 
 Shahar Katz, Lior Wolf
- 
    MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems 
 Zifeng Zhu, Mengzhao Jia, Zhihan Zhang, Lang Li, Meng Jiang
- 
    Self-Pluralising Culture Alignment for Large Language Models 
 Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong
- 
    Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator 
 Chengyuan Liu, Shihang Wang, Lizhi Qing, Jun Lin, Ji Zhang, Fei Wu, Kun Kuang
- 
    Reward-Guided Tree Search for Inference Time Alignment of Large Language Models 
 Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
- 
    An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues 
 Rena Wei Gao, Xuetong Wu, Carsten Roever, Jing Wu, Long Lv, Jingxuan Wu, Jey Han Lau
- 
    Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training 
 Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, Zheng Li, Zhengyang Wang, Pei Chen, Ruijie Wang, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang
- 
    Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense 
 Yang Ouyang, Hengrui Gu, Shuhang Lin, Wenyue Hua, Jie Peng, Bhavya Kailkhura, Meijun Gao, Tianlong Chen, Kaixiong Zhou
- 
    Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss 
 Fu-An Chao, Berlin Chen
- 
    Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models 
 Tongxuan Liu, Wenjiang Xu, Weizhe Huang, Yuting Zeng, Jiaxing Wang, Xingyu Wang, Hailong Yang, Jing Li
- 
    Automatically Discovering How Misogyny is Framed on Social Media 
 Rakshitha Rao Ailneni, Sanda M. Harabagiu
- 
    Leveraging LLM For Synchronizing Information Across Multilingual Tables 
 Siddharth Khincha, Tushar Kataria, Ankita Anand, Dan Roth, Vivek Gupta
- 
    Rethinking Word Similarity: Semantic Similarity through Classification Confusion 
 Kaitlyn Zhou, Haishan Gao, Sarah Li Chen, Dan Edelstein, Dan Jurafsky, Chen Shani
- 
    Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models 
 Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang
- 
    Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMs 
 Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin
- 
    UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers 
 Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You
- 
    CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy 
 Mian Zhang, Xianjun Yang, Xinlu Zhang, Travis Labrum, Jamie C. Chiu, Shaun M. Eack, Fei Fang, William Yang Wang, Zhiyu Chen
- 
    Hello Again! LLM-powered Personalized Agent for Long-term Dialogue 
 Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua
- 
    MAPWise: Evaluating Vision-Language Models for Advanced Map Queries 
 Srija Mukhopadhyay, Abhishek Rajgaria, Prerana Khatiwada, Manish Shrivastava, Dan Roth, Vivek Gupta
- 
    Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment 
 Sangwon Yu, Jongyoon Song, Bongkyu Hwang, Hoyoung Kang, Sooah Cho, Junhwa Choi, Seongho Joe, Taehee Lee, Youngjune Gwon, Sungroh Yoon
- 
    Analyzing the Inner Workings of Transformers in Compositional Generalization 
 Ryoma Kumon, Hitomi Yanaka
- 
    IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval 
 Tingyu Song, Guo Gan, Mingsheng Shang, Yilun Zhao
- 
    Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance 
 Borui Xu, Yao Chen, Zeyi Wen, Weiguo Liu, Bingsheng He
- 
    JRE-L: Journalist, Reader, and Editor LLMs in the Loop for Science Journalism for the General Audience 
 Gongyao Jiang, Xinran Shi, Qiong Luo
- 
    Large Language Models for Persian $ \xleftrightarrow{} $ English Idiom Translation 
 Sara Rezaeimanesh, Faezeh Hosseini, Yadollah Yaghoobzadeh
- 
    K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning 
 Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei
- 
    LLM-Human Pipeline for Cultural Grounding of Conversations 
 Rajkumar Pujari, Dan Goldwasser
- 
    SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data 
 Suyoung Bae, YunSeok Choi, Hyojun Kim, Jee-Hyong Lee
- 
    SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models 
 Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle, Manan Dey, Sil Hamilton, Timm Dill, Jad Doughman, Ritam Dutt, Avijit Ghosh, Jessica Zosa Forde, Carolin Holtermann, Lucie-Aimée Kaffee, Tanmay Laud, Anne Lauscher, Roberto L Lopez-Davila, Maraim Masoud, Nikita Nangia, Anaelia Ovalle, Giada Pistilli, Dragomir Radev, Beatrice Savoldi, Vipul Raheja, Jeremy Qin, Esther Ploeger, Arjun Subramonian, Kaustubh Dhole, Kaiser Sun, Amirbek Djanibekov, Jonibek Mansurov, Kayo Yin, Emilio Villa Cueva, Sagnik Mukherjee, Jerry Huang, Xudong Shen, Jay Gala, Hamdan Al-Ali, Tair Djanibekov, Nurdaulet Mukhituly, Shangrui Nie, Shanya Sharma, Karolina Stanczak, Eliza Szczechla, Tiago Timponi Torrent, Deepak Tunuguntla, Marcelo Viridiano, Oskar van der Wal, Adina Yakefu, Aurélie Névéol, Mike Zhang, Sydney Zink, Zeerak Talat
- 
    Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models 
 Aliakbar Nafar, K. Brent Venable, Parisa Kordjamshidi
- 
    Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing 
 Mayank Kothyari, Sunita Sarawagi, Soumen Chakrabarti, Gaurav Arora, Srujana Merugu
- 
    KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy 
 Hyunjong Kim, Suyeon Lee, Yeongjae Cho, Eunseo Ryu, Yohan Jo, Suran Seong, Sungzoon Cho
- 
    MILU: A Multi-task Indic Language Understanding Benchmark 
 Sshubam Verma, Mohammed Safi Ur Rahman Khan, vishwajeet kumar, Rudra Murthy, Jaydeep Sen
- 
    SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression 
 Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang
- 
    SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search 
 Hanwen Du, Bo Peng, Xia Ning
- 
    Navigating the Cultural Kaleidoscope: A Hitchhiker’s Guide to Sensitivity in Large Language Models 
 Somnath Banerjee, Sayan Layek, Hari Shrawgi, Rajarshi Mandal, Avik Halder, Shanu Kumar, Sagnik Basu, Parag Agrawal, Rima Hazra, Animesh Mukherjee
- 
    Benchmarking Distributional Alignment of Large Language Models 
 Nicole Meister, Carlos Guestrin, Tatsunori Hashimoto
- 
    Eliciting Critical Reasoning in Retrieval-Augmented Generation via Contrastive Explanations 
 Leonardo Ranaldi, Marco Valentino, Andre Freitas
- 
    Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs 
 Jiancheng Dong, Lei Jiang, Wei Jin, Lu Cheng
- 
    GLiREL - Generalist Model for Zero-Shot Relation Extraction 
 Jack Boylan, Chris Hokamp, Demian Gholipour Ghalandari
- 
    $C^2$: Scalable Auto-Feedback for LLM-based Chart Generation 
 Woosung Koh, Jang Han Yoon, MinHyung Lee, Youngjin Song, Jaegwan Cho, Jaehyun Kang, Taehyeon Kim, Se-Young Yun, Youngjae Yu, Bongshin Lee
- 
    Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding 
 Junyi Ye, Ankan Dash, Wenpeng Yin, Guiling Wang
- 
    Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction 
 Liping Liu, Chunhong Zhang, Likang Wu, Chuang Zhao, Zheng Hu, Ming He, Jianping Fan
- 
    MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback 
 Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Zhichao Yang, hong yu
- 
    AI-Assisted Human Evaluation of Machine Translation 
 Vilém Zouhar, Tom Kocmi, Mrinmaya Sachan
- 
    Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis 
 Donglei Yu, Xiaomian Kang, Yuchen Liu, Feifei Zhai, Nanchang Cheng, Yu Zhou, Chengqing Zong
- 
    Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss 
 Kunal Dahiya, Diego Ortego, David Jimenez-Cabello
- 
    MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs 
 Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu
- 
    RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models 
 Bang An, Shiyue Zhang, Mark Dredze
- 
    RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models 
 Donghao Huang, Thanh-Son Nguyen, Fiona Liausvia, Zhaoxia WANG
- 
    Learning to Substitute Words with Model-based Score Ranking 
 Hongye Liu, Ricardo Henao
- 
    IMRRF: Integrating Multi-Source Retrieval and Redundancy Filtering for LLM-based Fake News Detection 
 Dayang Li, Fanxiao Li, Bingbing Song, Li Tang, Wei Zhou
- 
    SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation 
 Saurabh Kumar Pandey, Sachin Vashistha, DEBRUP DAS, Somak Aditya, Monojit Choudhury
- 
    Goal-Conditioned DPO: Prioritizing Safety in Misaligned Instructions 
 Joo Bon Maeng, Seongmin Lee, Seokin Seo, Kee-Eung Kim
- 
    Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders 
 Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan
- 
    EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction 
 Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, Deqing Yang
- 
    A Logical Fallacy-Informed Framework for Argument Generation 
 Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings
- 
    tRAG: Term-level Retrieval-Augmented Generation for Domain-Adaptive Retrieval 
 Dohyeon Lee, Jongyoon Kim, Jihyuk Kim, seung-won hwang, Joonsuk Park
- 
    Evaluating Evidence Attribution in Generated Fact Checking Explanations 
 Rui Xing, Timothy Baldwin, Jey Han Lau
- 
    Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals 
 Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko
- 
    Evaluating Morphological Compositional Generalization in Large Language Models 
 Mete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Duygu Ataman, Lonneke van der Plas
- 
    Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models 
 Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith
- 
    Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training 
 Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun
- 
    CogLM: Tracking Cognitive Development of Large Language Models 
 Xinglin Wang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Boyuan Pan, Heda Wang, Yao Hu, Kan Li
- 
    Automatic Input Rewriting Improves Translation with Large Language Models 
 Dayeon Ki, Marine Carpuat
- 
    Typographic Attacks in a Multi-Image Setting 
 Xiaomeng Wang, Zhengyu Zhao, Martha Larson
- 
    AnaScore: Understanding Semantic Parallelism in Proportional Analogies 
 Liyan Wang, Haotong Wang, Yves Lepage
- 
    ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs 
 Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana
- 
    Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts 
 Tingchen Fu, Yupeng Hou, Julian McAuley, Rui Yan
- 
    Cross-Lingual and Cross-Cultural Variation in Image Descriptions 
 Uri Berger, Edoardo Ponti
- 
    AudioBench: A Universal Benchmark for Audio Large Language Models 
 Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen
- 
    Evaluating Defeasible Reasoning in LLMs with DEFREASING 
 Emily Allaway, Kathleen McKeown
- 
    Generating Complex Question Decompositions in the Face of Distribution Shifts 
 Kelvin Han, Claire Gardent
- 
    Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation 
 Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu
- 
    ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation 
 Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, Zhaozhuo Xu
- 
    Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection 
 Koji Inoue, Divesh Lala, Gabriel Skantze, Tatsuya Kawahara
- 
    Grounding Fallacies Misrepresenting Scientific Publications in Evidence 
 Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych
- 
    Towards a Perspectivist Turn in Argument Quality Assessment 
 Julia Romberg, Maximilian Maurer, Henning Wachsmuth, Gabriella Lapesa
- 
    EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs 
 Sam Lin, Wenyue Hua, Zhenting Wang, Mingyu Jin, Lizhou Fan, Yongfeng Zhang
- 
    Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models 
 Michael Toker, Ido Galil, Hadas Orgad, Rinon Gal, Yoad Tewel, Gal Chechik, Yonatan Belinkov
- 
    Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge 
 Li Zhou, Taelin Karidi, Wanlong Liu, Nicolas Garneau, Yong Cao, Wenyu Chen, Haizhou Li, Daniel Hershcovich
- 
    From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning 
 Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen
- 
    Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction 
 Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang
- 
    Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts 
 Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng
- 
    Parameter-free and Accessible Prompt Learning to Enhance Adversarial Robustness for Pre-trained Vision-Language Models 
 Xingran Zhou, Kun Yang, Changtao Miao, Bingyu Hu, Zhuoer Xu, shiwen cui, Changhua Meng, Dan Hong
- 
    MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria 
 Wentao Ge, Shunian Chen, Hardy Chen, Nuo Chen, Junying Chen, Zhihong Chen, Wenya Xie, Shuo Yan, ChenghaoZhu, Ziyue Lin, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhang Zhiyi, Jianquan Li, Xiang Wan, Benyou Wang
- 
    WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines 
 Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, WANG YUTONG, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, CHENG Ching Lam, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Christabelle Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh, Chong-Wah Ngo
- 
    Is Your LLM Outdated? A Deep Look at Temporal Generalization 
 ChenghaoZhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang
- 
    ProSE: Diffusion Priors for Speech Enhancement 
 Sonal Kumar, Sreyan Ghosh, Utkarsh Tyagi, Anton Jeran Ratnarajah, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha
- 
    Where is the answer? An empirical study of positional bias for parametric knowledge extraction in language model 
 Kuniaki Saito, Chen-Yu Lee, Kihyuk Sohn, Yoshitaka Ushiku
- 
    SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators 
 Daniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Elena Tutubalina, Alexander Panchenko
- 
    AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising 
 Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, Taro Watanabe
- 
    LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search 
 Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong LI, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou
- 
    Can Large Language Models Invent Algorithms to Improve Themselves? 
 Yoichi Ishibashi, Taro Yano, Masafumi Oyamada
- 
    Generating Diverse Hypotheses for Inductive Reasoning 
 Kang-il Lee, Hyukhun Koh, Dongryeol Lee, Seunghyun Yoon, Minsung Kim, Kyomin Jung
- 
    LLaSA: Large Language and Structured Data Assistant 
 Yao Xu, Shizhu He, Jiabei Chen, ZengXiangrong, Bingning Wang, Guang Liu, Jun Zhao, Kang Liu
- 
    How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? 
 Kenza Benkirane, Jackie Kay, Maria Perez-Ortiz
- 
    Legal Judgment Prediction based on Knowledge-enhanced Multi-Task and Multi-Label Text Classification 
 Ang Li, Yiquan Wu, Ming Cai, Adam Jatowt, Xiang Zhou, Weiming Lu, Changlong Sun, Fei Wu, Kun Kuang
- 
    Mastering the Craft of Data Synthesis for CodeLLMs 
 Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li
- 
    Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities 
 Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman
- 
    Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios 
 Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee
- 
    Continual Learning in Multilingual Sign Language Translation 
 Shakib Yazdani, Josef van Genabith, Cristina España-Bonet
- 
    Racing Thoughts: Explaining Contextualization Errors in Large Language Models 
 Michael A. Lepori, Michael Curtis Mozer, Asma Ghandeharioun
- 
    CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities 
 Peiran Wang, Xiaogeng Liu, Chaowei Xiao
- 
    Constrained Decoding with Speculative Lookaheads 
 Nishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah
- 
    Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning 
 Yilun Zhao, Guo Gan, Chen Zhao, Arman Cohan
- 
    Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language 
 Amalie Brogaard Pauli, Isabelle Augenstein, Ira Assent
- 
    Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models 
 Artem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov
- 
    What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering 
 Federico Errica, Davide Sanvito, Giuseppe Siracusano, Roberto Bifulco
- 
    Towards Lifelong Dialogue Agents via Timeline-based Memory Management 
 Kai Tzu-iunn Ong, Namyoung Kim, Minju Gwak, Hyungjoo Chae, Taeyoon Kwon, Yohan Jo, seung-won hwang, Dongha Lee, Jinyoung Yeo
- 
    Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation 
 Satyapriya Krishna, Kalpesh Krishna, Anhad Mohananey, Steven Schwarcz, Adam Stambler, Shyam Upadhyay, Manaal Faruqui
- 
    Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding 
 Jiang Li, Xiangdong Su, Guanglai Gao
- 
    VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning 
 Yifan Peng, Krishna C Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg
- 
    Transferable Post-training via Inverse Value Learning 
 Xinyu Lu, Xueru Wen, Yaojie Lu, Bowen Yu, Hongyu Lin, Haiyang Yu, Le Sun, Xianpei Han, Yongbin Li
- 
    MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems 
 Nandan Thakur, Suleman Kazi, Ge Luo, Jimmy Lin, Amin Ahmad
- 
    Identifying Emerging Concepts in Large Corpora 
 Sibo Ma, Julian Nyarko
- 
    CompAct: Compressed Activations for Memory-Efficient LLM Training 
 Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster
- 
    Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models 
 Ziche Liu, Rui Ke, Yajiao LIU, Feng Jiang, Haizhou Li
- 
    ConQRet: A New Benchmark for Fine-Grained Automatic Evaluation of Retrieval Augmented Computational Argumentation 
 Kaustubh Dhole, Kai Shu, Eugene Agichtein
- 
    HARP: Hesitation-Aware Reframing in Transformer Inference Pass 
 Romain Storaï, seung-won hwang
- 
    CultureInstruct: Curating Multi-Cultural Instructions at Scale 
 Viet Thanh Pham, Zhuang Li, Lizhen Qu, Gholamreza Haffari
- 
    \textsc{MatViX}: Multimodal Information Extraction from Visually Rich Articles 
 Ghazal Khalighinejad, Sharon Scott, Ollie Liu, Kelly L. Anderson, Rickard Stureborg, Aman Tyagi, Bhuwan Dhingra
- 
    MixLLM: Dynamic Routing in Mixed Large Language Models 
 Xinyuan Wang, Yanchi Liu, Wei Cheng, Xujiang Zhao, Zhengzhang Chen, Wenchao Yu, Yanjie Fu, Haifeng Chen
- 
    PORT: Preference Optimization on Reasoning Traces 
 Salem Lahlou, Abdalgader Abubaker, Hakim Hacid
- 
    WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness 
 Baizhou Huang, Xiaojun Wan
- 
    Probe-Free Low-Rank Activation Intervention 
 Chonghe Jiang, Bao Nguyen, Anthony Man-Cho So, Viet Anh Nguyen
- 
    Multi-Conditional Ranking with Large Language Models 
 Pouya Pezeshkpour, Estevam Hruschka
- 
    ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage 
 Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang
- 
    LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs 
 Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour
- 
    LLM-Based Explicit Models of Opponents for Multi-Agent Games 
 XiaoPeng Yu, Wanpeng Zhang, Zongqing Lu
- 
    Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance? 
 Qisheng Hu, Quanyu Long, Wenya Wang
- 
    Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs 
 Anirudh Phukan, Divyansh, Harshit Kumar Morj, Vaishnavi, Apoorv Saxena, Koustava Goswami
- 
    Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation 
 Mahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu J. Han, Hang Su
- 
    On the Vulnerability of Text Sanitization 
 Meng Tong, Kejiang Chen, Xiaojian Yuan, Jiayang Liu, Weiming Zhang, Nenghai Yu, Jie Zhang
- 
    Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning 
 Zixiao Zhu, Zijian Feng, Hanzhang Zhou, Junlang Qian, Kezhi Mao
- 
    GloCOM: A Short Text Neural Topic Model via Global Clustering Context 
 Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen
- 
    MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference 
 Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang
- 
    The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units 
 Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf
- 
    DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models 
 Yimu Wang, Shuai Yuan, Bo Xue, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu
- 
    A Unified Supervised and Unsupervised Dialogue Topic Segmentation Framework Based on Utterance Pair Modeling 
 Shihao YANG, Ziyi Zhang, Yue Jiang, Chunsheng Qin, Shuhua Liu
- 
    Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data 
 Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi
- 
    VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models 
 Ming Cheng, Jiaying Gong, Chenhan Yuan, William A Ingram, Edward Fox, Hoda Eldardiry
- 
    ALTER: Augmentation for Large-Table-Based Reasoning 
 Han Zhang, Yuheng Ma, Hanfang Yang
- 
    CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models 
 Jierui Li, Hung Le, Yingbo Zhou, Caiming Xiong, silvio savarese, Doyen Sahoo
- 
    Causally Modeling the Linguistic and Social Factors that Predict Email Response 
 Yinuo Xu, Hong Chen, Sushrita Rakshit, Aparna Ananthasubramaniam, Omkar Yadav, Mingqian Zheng, Michael Jiang, Lechen Zhang, Bowen Yi, Kenan Alkiek, Abraham Israeli, Bangzhao Shu, Hua Shen, Jiaxin Pei, Haotian Zhang, Miriam Schirmer, David Jurgens
- 
    DTELS: Towards Dynamic Granularity of Timeline Summarization 
 Chenlong Zhang, Tong Zhou, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao
- 
    MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling 
 Yakun Zhu, Shaohang Wei, Xu Wang, KUI XUE, Shaoting Zhang, Xiaofan Zhang
- 
    Multilingual Reasoning via Self-training 
 Leonardo Ranaldi, Giulia Pucci
- 
    AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios 
 Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, zhongyu wei
- 
    Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models 
 Michael Hanna, Aaron Mueller
- 
    DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition 
 Wonjun Lee, Solee Im, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Lee
- 
    LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation 
 Sachit Kuhar, Wasi Uddin Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras
- 
    Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models 
 Lovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, Dieuwke Hupkes
- 
    LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network Reconfiguration 
 Panayiotis Christou, Md. Zahidul Islam, Yuzhang Lin, Jingwei Xiong
- 
    AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence 
 Minbeom Kim, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung
- 
    Understanding LLMs’ Fluid Intelligence Deficiency: An Analysis of the ARC Task 
 Junjie Wu, Mo Yu, Lemao Liu, Dit-Yan Yeung, Jie Zhou
- 
    Few-Shot Natural Language to First-Order Logic Translation via Code Generation 
 Junnan Liu
- 
    Teaching Models to Balance Resisting and Accepting Persuasion 
 Elias Stengel-Eskin, Peter Hase, Mohit Bansal
- 
    PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization 
 Jiayi Wu, Hengyi Cai, Lingyong Yan, Hao Sun, Xiang Li, Shuaiqiang Wang, Dawei Yin, Ming Gao
- 
    SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models 
 Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner
- 
    H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables 
 Nikhil Abhyankar, Vivek Gupta, Dan Roth, Chandan K. Reddy
- 
    Improving Model Evaluation using SMART Filtering of Benchmark Datasets 
 Vipul Gupta, Candace Ross, David Pantoja, Rebecca J. Passonneau, Megan Ung, Adina Williams
- 
    LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs 
 Sumin An, Junyoung Sung, Wonpyo Park, Chanjun Park, Paul Hongsuck Seo
- 
    Palette of Language Models: A Solver for Controlled Text Generation 
 ZHE YANG, Yi Huang, Yaqin Chen, XiaotingWu, Junlan Feng, Chao Deng
- 
    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation 
 Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui GUO, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang
- 
    PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian 
 Erfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh, Azadeh Shakery, Mohammad Taher Pilehvar
- 
    Audio Is the Achilles’ Heel: Red Teaming Audio Large Multimodal Models 
 Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari
- 
    Main Predicate and Their Arguments as Explanation Signals For Intent Classification 
 Sameer Pimparkhede, Pushpak Bhattacharyya
- 
    A Grounded Typology of Word Classes 
 Coleman Haley, Sharon Goldwater, Edoardo Ponti
Main Conference - Short Papers
- 
    A Layered Debating Multi-Agent System for Similar Disease Diagnosis 
 Yutian Zhao, Huimin WANG, Yefeng Zheng, Xian Wu
- 
    How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations 
 Hyunji Lee, Danni Liu, Supriti Sinhamahapatra, Jan Niehues
- 
    Do Audio-Language Models Understand Linguistic Variations? 
 Ramaneswaran Selvakumar, Sonal Kumar, Hemant Kumar Giri, Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha
- 
    Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction 
 Kaiqiao Han, Tianqing Fang, Zhaowei Wang, Yangqiu Song, Mark Steedman
- 
    Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm 
 Vasudha Varadarajan, Syeda Mahwish, Xiaoran Liu, Julia Buffolino, Christian Luhmann, Ryan L. Boyd, H. Schwartz
- 
    Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data 
 Laura Biester
- 
    Defense against Prompt Injection Attacks via Mixture of Encodings 
 Ruiyi Zhang, David Sullivan, Kyle Jackson, Pengtao Xie, Mei Chen
- 
    AMPS: ASR with Multimodal Paraphrase Supervision 
 Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi
- 
    Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject 
 Zenghao Duan, Wenbin Duan, Zhiyi yin, Yinghan Shen, Shaoling Jing, Jie Zhang, Huawei Shen, Xueqi Cheng
- 
    Evaluating Multimodal Generative AI with Korean Educational Standards 
 Sanghee Park, Geewook Kim
- 
    Context-Efficient Retrieval with Factual Decomposition 
 Yanhong Li, David Yunis, David McAllester, Jiawei Zhou
- 
    Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish 
 Aylin Ece Gunal, Bowen Yi, John D. Piette, Rada Mihalcea, Veronica Perez-Rosas
- 
    ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges 
 Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma
- 
    Towards Federated Low-Rank Adaptation of Language Models with Rank Heterogeneity 
 Yuji Byun, Jaeho Lee
- 
    Preserving Multilingual Quality While Tuning Query Encoder on English Only 
 Oleg Vasilyev, Randy Sawaya, John Bohannon
- 
    Reverse Modeling in Large Language Models 
 Sicheng Yu, Xu Yuanchen, Cunxiao Du, Yanying Zhou, Minghui Qiu, Qianru Sun, Hao Zhang, Jiawei Wu
- 
    FLIQA-AD: a Fusion Model with Large Language Model for Better Diagnose and MMSE Prediction of Alzheimer’s Disease 
 Junhao Chen, Zhiyuan Ding, Yan Liu, Xiangzhu Zeng, Ling Wang
- 
    Automatic Evaluation of Healthcare LLMs Beyond Question-Answering 
 Anna Arias-Duart, Pablo Agustin Martin-Torres, Daniel Hinjos, Pablo Bernabeu-Perez, Lucia Urcelay Ganzabal, Marta Gonzalez Mallo, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Sergio Alvarez-Napagao, Dario Garcia-Gasulla
- 
    GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities 
 Usman Naseem, Shuvam Shiwakoti, Siddhant Bikram Shah, Surendrabikram Thapa, Qi Zhang
- 
    Repetition Neurons: How Do Language Models Produce Repetitions? 
 Tatsuya Hiraoka, Kentaro Inui
- 
    Complete Chess Games Enable LLM Become A Chess Master 
 Yinqi Zhang, Xintian Han, Haolong Li, Kedi Chen, Shaohui Lin
- 
    The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces 
 Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui
- 
    Task-driven Layerwise Additive Activation Intervention 
 Hieu Trung Nguyen, Bao Nguyen, Binh Nguyen, Viet Anh Nguyen
- 
    Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning 
 Juraj Vladika, Ivana Hacajova, Florian Matthes
- 
    Developing multilingual speech synthesis system for Ojibwe, Mi’kmaq, and Maliseet 
 Shenran Wang, Changbing Yang, Michael l parkhill, Chad Quinn, Christopher Hammerly, Jian Zhu
- 
    CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation 
 Youngwon Lee, seung-won hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, Yuxiong He
- 
    Interpret and Control Dense Retrieval with Sparse Latent Features 
 Hao Kang, Tevin Wang, Chenyan Xiong
- 
    Identifying Power Relations in Conversations using Multi-Agent Social Reasoning 
 Zhaoqing Wu, Dan Goldwasser, Maria Leonor Pacheco, Leora Morgenstern
- 
    Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement 
 Nicolas Floquet, Joseph Le Roux, Nadi Tomeh, Thierry Charnois
- 
    MixRevDetect: Towards Detecting AI-Generated Content in Hybrid Peer Reviews. 
 Sandeep Kumar, Samarth Garg, Sagnik Sengupta, Tirthankar Ghosal, Asif Ekbal
- 
    Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs 
 Huaman Sun, Jiaxin Pei, Minje Choi, David Jurgens
- 
    Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes 
 Isabel O. Gallegos, Ryan Aponte, Ryan A. Rossi, Joe Barrow, Mehrab Tanjim, Tong Yu, Hanieh Deilamsalehy, Ruiyi Zhang, Sungchul Kim, Franck Dernoncourt, Nedim Lipka, Deonna Owens, Jiuxiang Gu
- 
    Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework 
 Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi, Sathwik Tejaswi Madhusudhan
- 
    IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs 
 Kawshik Manikantan, Makarand Tapaswi, Vineet Gandhi, Shubham Toshniwal
- 
    Local Prompt Optimization 
 Yash Jain, Vishal Chowdhary
- 
    Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3 
 Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara
- 
    Great Memory, Shallow Reasoning: Limits of $k$NN-LMs 
 Shangyi Geng, Wenting Zhao, Alexander M Rush
- 
    Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches 
 Adithya Pratapa, Teruko Mitamura
- 
    Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models 
 Gleb Kuzmin, Neemesh Yadav, Ivan Smirnov, Timothy Baldwin, Artem Shelmanov
- 
    CoRAG: Collaborative Retrieval-Augmented Generation 
 Aashiq Muhamed, Mona T. Diab, Virginia Smith
- 
    Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models 
 Sangmin Woo, Kang Zhou, Yun Zhou, Shuai Wang, Sheng Guan, Haibo Ding, Lin Lee Cheong
- 
    Explore the Reasoning Capability of LLMs in the Chess Testbed 
 Shu Wang, Lei Ji, Renxi Wang, Wenxiao Zhao, Haokun Liu, Yifan Hou, Ying Nian Wu
- 
    A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference 
 You Wu, Haoyi Wu, Kewei Tu
- 
    Cross-lingual Transfer of Reward Models in Multilingual Alignment 
 Jiwoo Hong, Noah Lee, Rodrigo Martínez-Castaño, César Rodríguez, James Thorne
- 
    Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers 
 Akshit Achara, Anshuman Chhabra
- 
    AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages 
 Steve Bakos, David Guzmán, Riddhi More, Kelly Chutong Li, Félix Gaschi, En-Shiun Annie Lee
- 
    RuleR: Improving LLM Controllability by Rule-based Data Recycling 
 Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou
- 
    EqualizeIR: Mitigating Linguistic Biases in Retrieval Models 
 Jiali Cheng, Hadi Amiri
- 
    Improving Vietnamese-English Cross-Lingual Retrieval for Legal and General Domains 
 Toan Ngoc Nguyen, Nam Le Hai, Nguyen Doan Hieu, Dai An Nguyen, Linh Ngo Van, Thien Huu Nguyen, Sang Dinh
- 
    ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots 
 Shani Goren, Oren Kalinsky, Tomer Stav, Yuri Rapoport, Yaron Fairstein, Ram Yazdi, Nachshon Cohen, Alexander Libov, Guy Kushilevitz
- 
    STAR: Spectral Truncation and Rescale for Model Merging 
 Yu-Ang Lee, Ching-Yun Ko, Tejaswini Pedapati, I-Hsin Chung, Mi-Yen Yeh, Pin-Yu Chen
- 
    DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph 
 Maitreya Prafulla Chitale, Uday Bindal, Rajakrishnan P Rajkumar, Rahul Mishra
- 
    LLM2: Let Large Language Models Harness System 2 Reasoning 
 Cheng Yang, Chufan Shi, Siheng Li, Bo Shui, Yujiu Yang, Wai Lam
- 
    Transform Retrieval for Textual Entailment in RAG 
 Quan Guo, Xin Liang
- 
    Don’t Touch My Diacritics 
 Kyle Gorman, Yuval Pinter
- 
    Beyond Literal Token Overlap: Token Alignability for Multilinguality 
 Katharina Hämmerl, Tomasz Limisiewicz, Jindřich Libovický, Alexander Fraser
- 
    FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs 
 Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad
- 
    STRUX: An LLM for Decision-Making with Structured Explanations 
 Yiming Lu, Yebowen Hu, Hassan Foroosh, Wei Jin, Fei Liu
- 
    Language Models ``Grok’’ to Copy 
 Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan
- 
    Computational Discovery of Chiasmus in Ancient Religious Text 
 Hope McGovern, Hale Sirin, Tom Lippincott
- 
    Cross-Lingual Transfer Learning for Speech Translation 
 Rao Ma, Mengjie Qian, Yassir Fathullah, Siyuan Tang, Mark Gales, Kate Knill
- 
    GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization 
 Margarita Bugueño, Hazem Abou Hamdan, Gerard de Melo
- 
    PROM: Pivoted and Regulated Optimization for Multilingual Instruction Learning 
 Jaeseong Lee, seung-won hwang, Hojin Lee, Yunju Bak, Changmin Lee
- 
    A Fair Comparison without Translationese: English vs. Target-language Instructions for Multilingual LLMs 
 Taisei Enomoto, Hwichan Kim, Zhousi Chen, Mamoru Komachi
- 
    Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing 
 Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya
- 
    Pretrained Image-Text Models are Secretly Video Captioners 
 Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi
- 
    Personalized Help for Optimizing Low-Skilled Users’ Strategy 
 Feng Gu, Wichayaporn Wongkamjan, Jordan Lee Boyd-Graber, Jonathan K. Kummerfeld, Denis Peskoff, Jonathan May
- 
    STEP: Staged Parameter-Efficient Pre-training for Large Language Models 
 Kazuki Yano, Takumi Ito, Jun Suzuki
- 
    Leveraging Moment Injection for Enhanced Semi-supervised Natural Language Inference with Large Language Models 
 Seo Yeon Park
- 
    Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models 
 Dipankar Srirag, Aditya Joshi, Jacob Eisenstein
- 
    Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts 
 Kun Qian, Maximillian Chen, Siyan Li, Arpit Sharma, Zhou Yu
- 
    DART: An AIGT Detector using AMR of Rephrased Text 
 Hyeonchu Park, Byungjun Kim, Bugeun Kim
- 
    Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties 
 Zixin Tang, Chieh-Yang Huang, TSUNG-CHI LI, Ho Yin Sam Ng, Hen-Hsen Huang, Ting-Hao Kenneth Huang
- 
    Is It Navajo? Accurate Language Detection for Endangered Athabaskan Languages 
 Ivory Yang, Weicheng Ma, Chunhui Zhang, Soroush Vosoughi
- 
    Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces 
 Hope McGovern, Hale Sirin, Tom Lippincott
- 
    Language Models Encode Numbers Using Digit Representations in Base 10 
 Amit Arnold Levy, Mor Geva
- 
    Alligators All Around: Mitigating Lexical Confusion in Low-resource Machine Translation 
 Elizabeth Nielsen, Isaac Rayburn Caswell, Jiaming Luo, Colin Cherry
- 
    Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages 
 Chunlan Ma, Ayyoob Imani, Haotian Ye, Renhao Pei, Ehsaneddin Asgari, Hinrich Schuetze
- 
    Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? 
 Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Lee Boyd-Graber, Rachel Rudinger
- 
    Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction 
 Xi Chen, Mao Mao, Shuo Li, Haotian Shangguan
- 
    kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech 
 Karl El Hajal, Ajinkya Kulkarni, Enno Hermann, Mathew Magimai Doss
Findings Papers
- 
    Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks 
 Samuele Poppi, Zheng Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, Jianfeng Chi
- 
    HALLUCANA: Fixing LLM Hallucination with A Canary Lookahead 
 Tianyi Li, Erenay Dayanik, Shubhi Tyagi, Andrea Pierleoni
- 
    ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval 
 Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Adam Jatowt
- 
    Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation 
 Jiwon Jeong, Hyeju Jang, Hogun Park
- 
    Time-aware ReAct Agent for Temporal Knowledge Graph Question Answering 
 QianyiHu, Xinhui Tu, guo cong, Shunping Zhang
- 
    CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models 
 Shangda Wu, Yashan Wang, Ruibin Yuan, Guo Zhancheng, Xu Tan, Ge Zhang, Monan Zhou, Jing Chen, Xuefeng Mu, Yuejie Gao, Yuanliang Dong, Jiafeng Liu, Xiaobing Li, Feng Yu, Maosong Sun
- 
    PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction 
 Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang, Shih-Fu Chang
- 
    LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content 
 Mohamed Bayan Kmainasi, Ali Ezzat Shahroor, Maram Hasanain, Sahinur Rahman Laskar, Naeemul Hassan, Firoj Alam
- 
    Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems 
 Đorđe Klisura, Anthony Rios
- 
    Position Really Matters: Towards a Holistic Approach for Prompt Tuning 
 Xianjun Yang, Wei Cheng, Xujiang Zhao, Wenchao Yu, Linda Ruth Petzold, Haifeng Chen
- 
    Improving Reward Models with Synthetic Critiques 
 Zihuiwen Ye, Fraser David Greenlee, Max Bartolo, Phil Blunsom, Jon Ander Campos, Matthias Gallé
- 
    Aligning Black-box Language Models with Human Judgments 
 Gerrit J.J. Van den Burg, Gen Suzuki, Wei Liu, Murat Sensoy
- 
    GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models 
 Aditya Sharma, Aman Dalmia, Mehran Kazemi, Amal Zouaq, Christopher Pal
- 
    Alleviating Hallucinations of Large Language Models through Induced Hallucinations 
 Yue Zhang, Leyang Cui, V. W., Shuming Shi
- 
    EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild 
 Junhyeok Kim, Min Soo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu
- 
    Attention Tracker: Detecting Prompt Injection Attacks in LLMs 
 Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen
- 
    Language Modeling with Editable External Knowledge 
 Belinda Z. Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig, Jacob Andreas
- 
    Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy 
 Athiya Deviyani, Fernando Diaz
- 
    Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models 
 Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh
- 
    MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning 
 Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu
- 
    QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models 
 Changhai Zhou, Yuhua Zhou, Yibin Wang, Shijie Han, Qian Qiao, Hongguang Li
- 
    Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations 
 Wenjie Jacky Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Hadi Askari, Chaowei Xiao, Muhao Chen
- 
    WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents 
 Leyi Pan, Aiwei Liu, Yijian LU, Zitian Gao, Yichen Di, Lijie Wen, Irwin King, Philip S. Yu
- 
    Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation 
 Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, Yanghua Xiao
- 
    FIRE: Fact-checking with Iterative Retrieval and Verification 
 Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov
- 
    Advancing Persian LLM Evaluation 
 Sara Bourbour Hosseinbeigi, Behnam Rohani, Mostafa Masoudi, Mehrnoush Shamsfard, Zahra Saaberi, Mostafa Karimi Manesh, Mohammad Amin Abbasi
- 
    Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs 
 Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty
- 
    Huatuo-26M, a Large-scale Chinese Medical QA Dataset 
 Xidong Wang, Jianquan Li, Shunian Chen, Yuxuan Zhu, Xiangbo Wu, Zhiyi Zhang, Xiaolong Xu, Junying Chen, Jie Fu, Xiang Wan, Anningzhe Gao, Benyou Wang
- 
    Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales 
 Zhen Qian, Xiuzhen Zhang, Xiaofei Xu, Feng Xia
- 
    Causal Inference with Large Language Model: A Survey 
 Jing Ma
- 
    CoPERLex: Content Planning with Event-based Representations for Legal Case Summarization 
 Santosh T.Y.S.S, Youssef Farag, Matthias Grabmair
- 
    MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty 
 Yongjin Yang, Haneul Yoo, Hwaran Lee
- 
    Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding 
 Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui
- 
    Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering 
 Wei Zhou, Mohsen Mesgar, Annemarie Friedrich, Heike Adel
- 
    2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision 
 Shilong Li, Yancheng He, Hui Huang, Xingyuan Bu, Jiaheng Liu, Hangyu Guo, Weixun Wang, Jihao Gu, Wenbo Su, Bo Zheng
- 
    Broadening Applications: Grounding LLM Development in Potential User Needs 
 Kaitlyn Zhou, Kristina Gligoric, Myra Cheng, Vyoma Raman, Boluwatife Aminu, Caeley Woo, Michael Brockman, Dan Jurafsky
- 
    PairScale: Analyzing Attitude Change with Pairwise Comparisons 
 Rupak Sarkar, Patrick Y. Wu, Kristina Miler, Alexander Miserlis Hoyle, Philip Resnik
- 
    DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery 
 Xuming Hu, Xiao Qin, Chuan Lei, Asterios Katsifodimos, Zhengyuan Shen, Balasubramaniam Srinivasan, Huzefa Rangwala
- 
    Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents 
 Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang
- 
    LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages 
 Xuhan Huang, Qingning Shen, Yan Hu, Anningzhe Gao, Benyou Wang
- 
    Guideline Compliance in Task-Oriented Dialogue: The Chained Prior Approach 
 Xiangyu Wen, Jianyuan Zhong, Zhijian Xu, Qiang Xu
- 
    SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials 
 Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park
- 
    Robust Bias Detection in MLMs and its Application to Human Trait Ratings 
 Ingroj Shrestha, Louis Tay, Padmini Srinivasan
- 
    Enhancing Temporal Understanding in LLMs for Semi-structured Tables 
 Irwin Deng, Kushagra Dixit, Dan Roth, Vivek Gupta
- 
    LSDC: An Efficient and Effective Large-Scale Data Compression Method for Supervised Fine-tuning of Large Language Models 
 Zhaoguang Long, Yuhao Zhou, Shangqing Zhao, Yupei Ren, Li Cai, Chenghao Jia, Zhe Chen, Zhe Fang, Yuxiang Song, Man Lan
- 
    SOLID: Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking Dialogs 
 Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin Abolghasemi, Evangelos Kanoulas, Suzan Verberne
- 
    Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-sample Aggregation on Large Language Models 
 Jishnu Ray Chowdhury, Jayanth Mohan, Tomas Malik, Cornelia Caragea
- 
    A Practical Method for Generating String Counterfactuals 
 Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel
- 
    A Guide To Effectively Leveraging LLMs for Low-Resource Text Summarization: Data Augmentation and Semi-supervised Approaches 
 Gaurav Sahu, Olga Vechtomova, Issam H. Laradji
- 
    Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios 
 Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, YimingZhao, LinHai, Hai-Tao Zheng
- 
    CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions 
 Mourad Heddaya, Kyle MacMillan, Hongyuan Mei, Chenhao Tan, Anup Malani
- 
    Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning 
 Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vincent Howe, David Wingate
- 
    The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models 
 Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho, SueYeon Chung
- 
    VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning 
 Cuong Le Chi, Chau Truong Vinh Hoang, Phan Nhật Huy, Dung D. Le, Tien N Nguyen, Nghi D. Q. Bui
- 
    Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In 
 Itay Nakash, George Kour, Guy Uziel, Ateret Anaby Tavor
- 
    DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility 
 Yifan Liu, Yu Fang, Zhouhan Lin
- 
    Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding 
 Kyungmin Min, Minbeom Kim, Kang-il Lee, Dongryeol Lee, Kyomin Jung
- 
    TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning 
 Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan
- 
    Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG 
 Kushagra Bhushan, Yatin Nandwani, Dinesh Khandelwal, Sonam Gupta, Gaurav Pandey, Dinesh Raghu, Sachindra Joshi
- 
    Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions 
 Yujuan Fu, Ozlem Uzuner, Meliha Yetisgen, Fei Xia
- 
    What Is Missing in Multilingual Visual Reasoning and How to Fix It 
 Yueqi Song, Simran Khanuja, Graham Neubig
- 
    GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings 
 Raghuveer Thirukovalluru, Bhuwan Dhingra
- 
    MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU 
 Yan Li, So-Eon Kim, Seong-Bae Park, Caren Han
- 
    Using Linguistic Entrainment to Evaluate Large Language Models for Use in Cognitive Behavioral Therapy 
 Mina Kian, Kaleen Shrestha, Katrin Fischer, Xiaoyuan Zhu, Jonathan Ong, Aryan Trehan, Jessica Wang, Gloria Chang, Séb Arnold, Maja Mataric
- 
    When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models 
 Shufan Chen, He Zheng, Lei Cui
- 
    WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response 
 Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen
- 
    FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models 
 Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
- 
    How Do Large Language Models Perform in Dynamical System Modeling 
 Xiao Luo, Binqi Chen, Haixin Wang, Zhiping Xiao, Ming Zhang, Yizhou Sun
- 
    Prototype Tuning: A Meta-Learning Approach for Few-Shot Document-Level Relation Extraction with Large Language Models 
 Dinghao Pan, Yuanyuan Sun, Bo Xu, Jiru Li, Zhihao Yang, Ling Luo, Hongfei Lin, Jian Wang
- 
    $SusGen-GPT$: A Data-Centric LLM for Financial NLP and Sustainability Report Generation 
 Qilong Wu, Xiaoneng Xiang, Huang Hejia, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, Bharadwaj Veeravalli
- 
    Improving Pre-trained Language Models with Knowledge Enhancement and Filtering Framework 
 Qi Zhao, Qi Song, Tian Xie, Haiyue Zhang, Hongyu Yang, Xiangyang Li
- 
    FIDELITY: Fine-grained Interpretable Distillation for Effective Language Insights and Topic Yielding 
 Divyansh Singh, Brodie Mather, Demi Zhang, Patrick Lehman, Justin Ho, Bonnie J Dorr
- 
    MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts 
 Lin Ning, Harsh Lara, Meiqi Guo, Abhinav Rastogi
- 
    Chasing Random: Instruction Selection Strategies Fail to Generalize 
 Harshita Diddee, Daphne Ippolito
- 
    On the Role of Key Phrases in Argument Mining 
 Nilmadhab Das, Vijaya V Saradhi, Ashish Anand
- 
    Concise and Organized Perception Facilitates Reasoning in Large Language Models 
 Junjie Liu, Shaotian Yan, Chen Shen, Zhengdong Xiao, Liang Xie, Wenxiao Wang, Jieping Ye
- 
    Marrying LLMs with Dynamic Forecasting: A Graph Mixture-of-expert Perspective 
 Dapeng Jiang, Xiao Luo
- 
    Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media 
 Owen Cook, Charlie Grimshaw, Ben Peng Wu, Sophie Dillon, Jack Hicks, Luke Jones, Thomas Smith, Matyas Szert, Xingyi Song
- 
    Jailbreaking with Universal Multi-Prompts 
 Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen
- 
    What can Large Language Models Capture about Code Functional Equivalence? 
 Nickil Maveli, Antonio Vergari, Shay B Cohen
- 
    RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model 
 Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, Hongguang Li
- 
    Task-wrapped Continual Learning in Task-Oriented Dialogue Systems 
 Min Zeng, Haiqin Yang, Xi Chen, Yike Guo
- 
    As easy as PIE: understanding when pruning causes language models to disagree 
 Pietro Tropeano, Maria Maistro, Tuukka Ruotsalo, Christina Lioma
- 
    Inference Scaling for Bridging Retrieval and Augmented Generation 
 Youngwon Lee, seung-won hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, Yuxiong He
- 
    MultiCAT: Multimodal Communication Annotations for Teams 
 Adarsh Pyarelal, John M Culnan, Ayesha Qamar, Meghavarshini Krishnaswamy, Yuwei Wang, Cheonkam Jeong, Chen Chen, Md Messal Monem Miah, Shahriar Hormozi, Jonathan Tong, Ruihong Huang
- 
    Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models 
 Jiatao Li, Xinyu Hu, Xunjian Yin, Xiaojun Wan
- 
    ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities 
 Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Haoping Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang
- 
    Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety 
 Yiwei Wang, Muhao Chen, Nanyun Peng, Kai-Wei Chang
- 
    PEMV: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors 
 Chen Lin, Fei Li, Donghong Ji, Chong Teng
- 
    The Role of Prosody in Spoken Question Answering 
 Jie Chi, Maureen de Seyssel, Natalie Schluter
- 
    Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs) 
 Abhijit Mishra, Shreya Shukla, Jose Torres, Jacek Gwizdka, Shounak Roychowdhury
- 
    GPT-NER: Named Entity Recognition via Large Language Models 
 Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, Guoyin Wang, Chen Guo
- 
    Media of Langue: Exploring Word Translation Network 
 Goki Muramoto, Atsuki Sato, Takayoshi Koyama
- 
    BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla 
 Fabiha Haider, Fariha Tanjim Shifat, Md Farhan Ishmam, Md Sakib Ul Rahman Sourove, Deeparghya Dutta Barua, Md Fahim, Md Farhad Alam Bhuiyan
- 
    Discrete Diffusion Language Model for Efficient Text Summarization 
 Do Huu Dat, Duc Anh Do, Anh Tuan Luu, Wray Buntine
- 
    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics 
 Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu
- 
    Enhancing Text-to-SQL with Question Classification and Multi-Agent Collaboration 
 Zhihui Shao, Shubin Cai, Rongsheng Lin, Zhong Ming
- 
    Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting 
 Mohamed Salim AISSI, Clément ROMAC, Thomas Carta, sylvain lamprier, Pierre-Yves Oudeyer, Olivier Sigaud, Laure Soulier, Nicolas THOME
- 
    How much do contextualized representations encode long-range context? 
 Simeng Sun, Cheng-Ping Hsieh
- 
    Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks 
 Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe
- 
    Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts? 
 Crystina Zhang, Jing Lu, Vinh Q. Tran, Tal Schuster, Donald Metzler, Jimmy Lin
- 
    Taxonomy and Analysis of Sensitive User Queries in Generative AI Search System 
 Hwiyeol Jo, Taiwoo Park, Hyunwoo Lee, Nayoung Choi, Changbong Kim, Ohjoon kwon, Donghyeon Jeon, Eui Hyeon Lee, Kyoungho Shin, Lim Sun Suk, Kyungmi KIM, LEE JIHYE, Sun Kim
- 
    Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception 
 Zehan Wang, Haifeng Huang, Yang Zhao, Ziang Zhang, Tao Jin, Zhou Zhao
- 
    MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time 
 Mozhi Zhang, Pengyu Wang, Chenkun Tan, Mianqiu Huang, Dong Zhang, Yaqian Zhou, Xipeng Qiu
- 
    AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts 
 Soichiro Murakami, Peinan Zhang, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
- 
    Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval 
 Ingeol Baek, Hwan Chang, ByeongJeong Kim, Jimin Lee, Hwanhee Lee
- 
    Avoiding Copyright Infringement via Large Language Model Unlearning 
 Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong
- 
    Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs 
 David Ifeoluwa Adelani, A. Seza Doğruöz, Iyanuoluwa Shode, Anuoluwapo Aremu
- 
    DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition 
 Qi Zhang, Huitong Pan, Zhijia Chen, Longin Jan Latecki, Cornelia Caragea, Eduard Dragut
- 
    Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models 
 Zaifu Zhan, Rui Zhang
- 
    Biases in Opinion Dynamics in Multi-Agent Systems of Large Language Models: A Case Study on Funding Allocation 
 Pedro Cisneros-Velarde
- 
    Atoxia: Red-teaming Large Language Models with Target Toxic Answers 
 Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao
- 
    From Curiosity to Clarity : Exploring the Impact of Consecutive Why-Questions 
 Geonyeong Son, Jaeyoung Lee, Misuk Kim
- 
    An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them 
 Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, Frederick Riemenschneider, Karthik R Narasimhan, Barbara Graziosi
- 
    LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models 
 Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro
- 
    BanNERD: A Benchmark Dataset and Context-Driven Approach for Bangla Named Entity Recognition 
 Md. Motahar Mahtab, Faisal Ahamed Khan, Md. Ekramul Islam, Md. Shahad Mahmud Chowdhury, Labib Imam Chowdhury, Sadia Afrin, Hazrat Ali, Mohammad Mamun Or Rashid, Nabeel Mohammed, Mohammad Ruhul Amin
- 
    FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking 
 Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Charese Smiley
- 
    Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs 
 Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Chenyang Tao, Anil Ramakrishna, Dimitrios Dimitriadis, Jieyu Zhao, Salman Avestimehr
- 
    Pairwise Prompt-Based Tuning with Parameter Efficient Fast Adaptation for Generalized Zero-Shot Intent Detection 
 Xiaotong Zhang, Qianru Zhou, Han Liu, Hong Yu
- 
    Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation 
 Palaash Goel, Dushyant Singh Chauhan, Md Shad Akhtar
- 
    GAIfE: Using GenAI to Improve Literacy in Low-resourced Settings 
 Allahsera Auguste Tapo, Nouhoum COULIBALY, Seydou DIALLO, Sebastien Diarra, Christopher M Homan, Mamadou K. KEITA, Michael Leventhal
- 
    Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks 
 Gagan Bhatia, El Moatez Billah Nagoudi, Abdellah EL MEKKI, Fakhraddin Alwajih, Muhammad Abdul-Mageed
- 
    “Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French 
 Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol
- 
    FeRG-LLM : Feature Engineering by Reason Generation Large Language Models 
 Jeonghyun Ko, Gyeongyun Park, Donghoon Lee, Kyunam Lee
- 
    Improving Consistency in LLM Inference using Probabilistic Tokenization 
 Ashutosh Sathe, Divyanshu Aggarwal, Sunayana Sitaram
- 
    Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs 
 Maxim Ifergan, Omri Abend, Idan Szpektor, Leshem Choshen, Roee Aharoni
- 
    NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models 
 Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, Vivek Gupta
- 
    Infogent: An Agent-Based Framework for Web Information Aggregation 
 Revanth Gangi Reddy, Sagnik Mukherjee, Jeonghwan Kim, Zhenhailong Wang, Dilek Hakkani-Tür, Heng Ji
- 
    Adaptive Retrieval-Augmented Generation for Conversational Systems 
 Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz
- 
    Evaluating Cultural and Social Awareness of LLM Web Agents 
 Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu
- 
    Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue 
 Jieming Cao, Chen Huang, Yanan Zhang, Ruibo Deng, Jincheng Zhang, Wenqiang Lei
- 
    CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmented Generation 
 Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou
- 
    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains 
 Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe Zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Simon Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang
- 
    RATSD: Retrieval Augmented Truthfulness Stance Detection from Social Media Posts Toward Factual Claims 
 Zhengyuan Zhu, Zeyu Zhang, Haiqi Zhang, Chengkai Li
- 
    LITERA: An LLM Based Approach to Latin-to-English Translation 
 Paul Rosu
- 
    Exploring Backward Reasoning in Large Language Models 
 Leonardo Ranaldi, Giulia Pucci
- 
    Investigating the Zone of Proximal Development of Language Models for In-Context Learning 
 Peng Cui, Mrinmaya Sachan
- 
    Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning 
 Hyundong Justin Cho, Karishma Sharma, Nicolaas Paul Jedema, Leonardo F. R. Ribeiro, Jonathan May, Alessandro Moschitti
- 
    How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models 
 Jiyue Jiang, Pengan CHEN, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu
- 
    Hierarchical Speculative Decoding with Dynamic Window 
 Shensian Syu, Hung-yi Lee
- 
    ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models 
 David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee
- 
    On the Impacts of Contexts on Repository-Level Code Generation 
 Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui
- 
    AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization via Multi-LLMs 
 Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, ZHAOXIA YIN, Hang Su
- 
    Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning 
 Lin Zhang, Lijie Hu, Di Wang
- 
    Accounting for Sycophancy in Language Model Uncertainty Estimation 
 Anthony Sicilia, Mert Inan, Malihe Alikhani
- 
    Data-centric NLP Backdoor Defense from the Lens of Memorization 
 Zhenting Wang, Zhizhi Wang, Mingyu Jin, Mengnan Du, Juan Zhai, Shiqing Ma
- 
    Unsupervised Sentence Representation Learning with Syntactically Aligned Negative Samples 
 Zhilan Wang, Zekai Zhi, Rize Jin, Kehui Song, He Wang, Da-Jung Cho
- 
    UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models 
 Yuzhe YANG, Yifei Zhang, Yan Hu, Yilin GUO, Ruoli Gan, Yueru He, Mingcong Lei, Xiao Zhang, Haining Wang, Qianqian Xie, Jimin Huang, Honghai Yu, Benyou Wang
- 
    Can Large Language Models Generate High-quality Patent Claims? 
 Lekang Jiang, Caiqi Zhang, Pascal A. Scherz, Stefan Goetz
- 
    Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models 
 Srishti Yadav, Zhi Zhang, Daniel Hershcovich, Ekaterina Shutova
- 
    MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG 
 Pingyu Wu, Daiheng Gao, Jing Tang, Huimin Chen, Wenbo Zhou, Weiming Zhang, Nenghai Yu
- 
    How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? 
 Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhail Salnikov
- 
    Investigating the Transferability of Code Repair for Low-Resource Programming Languages 
 Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang
- 
    Dis2Dis: Explaining Ambiguity in Fact-Checking 
 Ieva Staliunaite, Andreas Vlachos
- 
    Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models 
 Aashiq Muhamed, Mona T. Diab, Virginia Smith
- 
    Dynamic Feature Fusion for Sign Language Translation Using HyperNetworks 
 Ruiquan Zhang, Rui Zhao, Zhicong Wu, Liang Zhang, Haoqi Zhang, Yidong Chen
- 
    COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis 
 Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu
- 
    LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression 
 Souvik Kundu, Anahita Bhiwandiwalla, Sungduk Yu, Phillip Howard, Tiep Le, Sharath Nittur Sridhar, David Cobbley, Hao Kang, Vasudev Lal
- 
    DSQG-Syn: Synthesizing High-quality Data for Text-to-SQL Parsing by Domain Specific Question Generation 
 Shaoming Duan, Youxuan Wu, Chuanyi Liu, Yuhao Zhang, Zirui Wang, Peiyi Han, Shengyuan Yu, Liang Yan, yingwei liang
- 
    Extracting Military Event Temporal Relations via Relative Event Time Prediction and Virtual Adversarial Training 
 Jie Gong, qiwang hu
- 
    CDB: A Unified Framework for Hope Speech Detection Through Counterfactual, Desire and Belief 
 Tulio Ferreira Leite da Silva, Gonzalo Freijedo Aduna, Farah Benamara, Alda Mari, Zongmin Li, Li Yue, Jian Su
- 
    SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine 
 Xiaochen Wang, Junqing He, Liang Chen, Gholamreza Haffari, Yiru Wang, Zhe Yang, Xiangdi Meng, Kunhao Pan, Zhifang Sui
- 
    Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs 
 Michael JQ Zhang, Eunsol Choi
- 
    Evaluating Vision-Language Models for Emotion Recognition 
 Sree Bhattacharyya, James Z. Wang
- 
    Multi-Condition Guided Diffusion Network for Multimodal Emotion Recognition in Conversation 
 Wenjin Tian, Xianying Huang, Shihao Zou
- 
    Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis 
 Yiyi Chen, Qiongxiu Li, Russa Biswas, Johannes Bjerva
- 
    PolyJoin: Semantic Multi-key Joinable Table Search in Data Lakes 
 Xuming Hu, Chuan Lei, Xiao Qin, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala
- 
    ImaRA: An Imaginative Frame Augmented Method for Low-Resource Multimodal Metaphor Detection and Explanation 
 Yuan Tian, Minzheng Wang, Nan Xu, Wenji Mao
- 
    Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs 
 Sen Yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam
- 
    Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains 
 Katerina Korre, Arianna Muti, Federico Ruggeri, Alberto Barrón-Cedeño
- 
    Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors 
 Mayank Nagda, Phil Ostheimer, Sophie Fellenz
- 
    When natural language is not enough: The limits of in-context learning demonstrations in multilingual reasoning 
 Leonardo Ranaldi, Barry Haddow, Alexandra Birch
- 
    ThoughtSculpt: Reasoning with Intermediate Revision and Search 
 Yizhou Chi, Kevin Yang, Dan Klein
- 
    Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias 
 Yuen Chen, Vethavikashini Chithrra Raghuram, Justus Mattern, Rada Mihalcea, Zhijing Jin
- 
    Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection 
 Yassine El Kheir, Younes Samih, Suraj Maharjan, Tim Polzehl, Sebastian Möller
- 
    SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse 
 Giang Do, Hung Le, Truyen Tran
- 
    ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding 
 Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Yonas Chanie, Bontu Fufa Balcha, Negasi Haile Abadi, Henok Biadglign Ademtew, Mulubrhan Abebe Nerea, Debela Desalegn Yadeta, Derartu Dagne Geremew, Assefa Atsbiha Tesfu, Philipp Slusallek, Thamar Solorio, Dietrich Klakow
- 
    Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference 
 Mingqi Gao, Yixin Liu, Xinyu Hu, Xiaojun Wan, Jonathan Bragg, Arman Cohan
- 
    A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models 
 Peiqin Lin, Andre Martins, Hinrich Schuetze
- 
    Adaptive Parameter Compression for Language Models 
 Jeremias Bohn, Frederic Mrozinski, Georg Groh
- 
    Unlocking the Planning Capabilities of Large Language Models with Maximum Diversity Fine-tuning 
 Wenjun Li, Changyu Chen, Pradeep Varakantham
- 
    GRAG: Graph Retrieval-Augmented Generation 
 Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao
- 
    Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning 
 Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych, Alham Fikri Aji
- 
    Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey 
 Ruiyao Xu, Kaize Ding
- 
    Claim-Guided Textual Backdoor Attack for Practical Applications 
 Minkyoo Song, Hanna Kim, Jaehan Kim, Youngjin Jin, Seungwon Shin
- 
    Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference 
 Go Kamoda, Benjamin Heinzerling, Tatsuro Inaba, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui
- 
    Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish 
 Juan Manuel Pérez, Paula Miguel, Viviana Cotik
- 
    Preserving Zero-shot Capability in Supervised Fine-tuning for Multi-label Text Classification 
 Si-An Chen, Hsuan-Tien Lin, Chih-Jen Lin
- 
    CodeRAG-Bench: Can Retrieval Augment Code Generation? 
 Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried
- 
    MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding 
 Zayd Muhammad Kawakibi Zuhri, Muhammad Farid Adilazuarda, Ayu Purwarianti, Alham Fikri Aji
- 
    Neuro-symbolic Training for Reasoning over Spatial Language 
 Tanawan Premsri, Parisa Kordjamshidi
- 
    Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation 
 Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang
- 
    Identifying and Mitigating Social Bias Knowledge in Language Models 
 Ruizhe Chen, Yichen Li, Jianfei Yang, YANG FENG, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu
- 
    Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding 
 Sukmin Cho, Sangjin Choi, Taeho Hwang, Jeongyeon Seo, Soyeong Jeong, Huije Lee, Hoyun Song, Jong C. Park, Youngjin Kwon
- 
    RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation 
 Viacheslav Vasilev, Julia Agafonova, Nikolai Gerasimenko, Alexander Kapitanov, Polina Mikhailova, Evelina Mironova, Denis Dimitrov
- 
    Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence 
 Qianren Mao, Weifeng Jiang, Junnan Liu, Chenghua Lin, Qian Li, Xianqing Wen, Jianxin Li, Jinhu Lu
- 
    MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents 
 Wanqi Yang, Yanda Li, Meng Fang, Ling Chen
- 
    GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design 
 Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, Qiaoyu Tan
- 
    Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation 
 CheolWon Na, YunSeok Choi, Jee-Hyong Lee
- 
    Unsupervised Speech-text word-level alignment with Dynamic Programming 
 Tianshu Yu, Zihan Gong, Minghuan Tan, Guhong Chen, Min Yang
- 
    Semi-supervised Fine-tuning for Large Language Models 
 Junyu Luo, Xiao Luo, Xiusi Chen, Zhiping Xiao, Wei Ju, Ming Zhang
- 
    An Optimizable Suffix Is Worth A Thousand Templates: Efficient Black-box Jailbreaking without Affirmative Phrases via LLM as Optimizer 
 Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen
- 
    From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models 
 Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee
- 
    KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents 
 Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Shiwei Lyu, YUE SHEN, Lei Liang, Jinjie GU, Huajun Chen, Ningyu Zhang
- 
    LOFT: Scalable and More Realistic Long-Context Evaluation 
 Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Séb Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu
- 
    Prompt-Guided Selective Masking Loss for Context-Aware Emotive Text-to-Speech 
 Yejin Jeon, Youngjae Kim, Jihyun Lee, Gary Lee
- 
    On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems 
 Juraj Vladika, Florian Matthes
- 
    Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack 
 Xin Liu, Aoyang Zhou, Kun He
- 
    MojoBench: Language Modeling and Benchmarks for Mojo 
 Md Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri
- 
    Decoding Fatphobia: Examining Anti-Fat and Pro-Thin Bias in AI-Generated Images 
 Jane Warren, Gary M. Weiss, Fernando Martinez, Annika Guo, Yijun Zhao
- 
    DialogGen: Multi-modal Interactive Dialogue System with Multi-turn Text-Image Generation 
 Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu
- 
    A Practical Examination of AI-Generated Text Detectors for Large Language Models 
 Brian Tufts, Xuandong Zhao, Lei Li
- 
    Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs 
 Yiyang Luo, Ke Lin, Chao Gu, Jiahui Hou, Lijie Wen, Luo ping
- 
    GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge 
 Daniil Gurgurov, Rishu Kumar, Simon Ostermann
- 
    Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context 
 Sangwon Yu, Ik-hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, Sungroh Yoon
- 
    Aligning to Constraints for Data-Efficient Language Model Customization 
 Fei Wang, Chao Shang, Shuai Wang, Sarthak Jain, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, Dan Roth
- 
    DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization 
 Pucheng Dang, Xing Hu, Dong Li, Rui Zhang, Qi Guo, Kaidi Xu
- 
    VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs 
 Hanan Gani, Rohit Bharadwaj, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan
- 
    Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models 
 Shintaro Ozaki, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
- 
    Personalize Your LLM: Fake it then Align it 
 Yijing Zhang, Dyah Adila, Changho Shin, Frederic Sala
- 
    LLMs for Extremely Low-Resource Finno-Ugric Languages 
 Taido Purason, Hele-Andra Kuulmets, Mark Fishel
- 
    TESTEVAL: Benchmarking Large Language Models for Test Case Generation 
 Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, LINGMING ZHANG, An Ran Chen, Lei Ma
- 
    Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Senses 
 Samuel Cahyawijaya, Ruochen Zhang, Jan Christian Blaise Cruz, Holy Lovenia, Elisa Gilbert, Hiroki Nomoto, Alham Fikri Aji
- 
    Exploring Hybrid Sampling Inference for Aspect-based Sentiment Analysis 
 Xiaoyi Bao, Minjie Qiang, Jinghang Gu, Zhongqing Wang, Chu-Ren Huang
- 
    Augmented Adversarial Trigger Learning 
 Zhe Wang, Yanjun Qi
- 
    Open Domain Question Answering with Conflicting Contexts 
 Siyi Liu, Qiang Ning, Kishaloy Halder, Zheng Qi, Wei Xiao, Phu Mon Htut, Yi Zhang, Neha Anna John, Bonan Min, Yassine Benajiba, Dan Roth
- 
    OLMES: A Standard for Language Model Evaluations 
 Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi
- 
    Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model 
 Jiali Chen, Xusen Hei, Yuqi Xue, Zihan Wu, Jiayuan Xie, Yi Cai
- 
    Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax 
 Iuliia Zaitova, Vitalii Hirak, Badr M. Abdullah, Dietrich Klakow, Bernd Möbius, Tania Avgustinova
- 
    Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey 
 Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei Ai, Furong Huang
- 
    On Reference (In-)Determinacy in Natural Language Inference 
 Sihao Chen, Chaitanya Malaviya, Alex Fabrikant, Hagai Taitelbaum, Tal Schuster, Senaka Buthpitiya, Dan Roth
- 
    FaithfulPersona: Balancing Faithfulness and Personalization in Code Explanations through Self-Critique 
 Zhuang Luo, Yichuan Li, Zexing Xu, Kyumin Lee, S. Rasoul Etesami
- 
    Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs 
 Abdellah EL MEKKI, Muhammad Abdul-Mageed
- 
    Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation 
 Chenyu Wang, Weichao Zhou, Shantanu Ghosh, kayhan Batmanghelich, Wenchao Li
- 
    Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language 
 Zhiqiang Zhong, Simon Sataa-Yu Larsen, Haoyu Guo, Tao Tang, Kuangyu Zhou, Davide Mottin
- 
    LogRules: Enhancing Log Analysis Capability of Large Language Models through Rules 
 Xin Huang, Ting Zhang, Wen Zhao
- 
    KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus 
 Xiaoming Shi, Zeming Liu, Yiming Lei, Chenkai Zhang, Haitao Leng, Chuan Wang, Qingjie Liu, Wanxiang Che, Yunhong Wang
- 
    ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification 
 Yaswanth M, Vaibhav Singh, Ayush Maheshwari, Amrith Krishna, Ganesh Ramakrishnan
- 
    BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks 
 Hanyong Lee, Chaelyn Lee, Yongjae Lee, Jaesung Lee
- 
    Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization 
 Yen-Ju Lu, Ting-Yao Hu, Hema Swetha Koppula, Hadi Pouransari, Jen-Hao Rick Chang, Yin Xia, Xiang Kong, Qi Zhu, Xiaoming Simon Wang, Oncel Tuzel, Raviteja Vemulapalli
- 
    Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling 
 Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang
- 
    GPT-4V Cannot Generate Radiology Reports Yet 
 Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan
- 
    Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations 
 Hao Yang, Hongyuan Lu, Xinhua Zeng, Yang Liu, Xiang Zhang, HAORAN YANG, Yumeng Zhang, SHAN HUANG, YIRAN WEI, Wai Lam
- 
    CAPE: A Chinese Dataset for Appraisal-based Emotional Generation in Large Language Models 
 June M. Liu, He CAO, Renliang Sun, Rui Wang, Yu Li, Jiaxing Zhang
- 
    Towards Zero-Shot Multimodal Machine Translation 
 Matthieu Futeral, Cordelia Schmid, Benoît Sagot, Rachel Bawden
- 
    Aligning to What? Limits to RLHF Based Alignment 
 Logan Barnhart, Reza Akbarian Bafghi, Stephen Becker, Maziar Raissi
- 
    CausalGraph2LLM: Evaluating LLMs for Causal Queries 
 Ivaxi Sheth, Bahare Fatemi, Mario Fritz
- 
    RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering 
 Yang Bai, Christan Grant, Daisy Zhe Wang
- 
    On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation 
 Xiaonan Jing, Srinivas Billa, Danny Godbout
- 
    SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification 
 Han Liu, Shuqin Li, Xiaotong Zhang, Yuanyuan Wang, Feng Zhang, Hongyang Chen, Hong Yu
- 
    A Closer Look into Mixture-of-Experts in Large Language Models 
 Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu
- 
    Do Large Language Models Align with Core Mental Health Counseling Competencies? 
 Viet Cuong Nguyen, Mohammad Taher, Dongwan Hong, Vinicius Konkolics Possobom, Vibha Thirunellayi Gopalakrishnan, Ekta Raj, Zihang Li, Heather J. Soled, Michael L. Birnbaum, Srijan Kumar, Munmun De Choudhury
- 
    Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification 
 Vishnu Kabir Chhabra, Ding Zhu, Mohammad Mahdi Khalili
- 
    Long-Tail Crisis in Nearest Neighbor Language Models 
 Yuto Nishida, Makoto Morishita, Hiroyuki Deguchi, Hidetaka Kamigaito, Taro Watanabe
- 
    Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias 
 Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, Vincent Ginis
- 
    BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression 
 Yuankai Li, Jia-Chen Gu, Di Wu, Kai-Wei Chang, Nanyun Peng
- 
    TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System 
 Zeyu Zhang, Jianxun Lian, Chen Ma, Yaning Qu, Ye Luo, Lei Wang, Rui Li, Xu Chen, Yankai Lin, Le Wu, Xing Xie, Ji-Rong Wen
- 
    VLind-Bench: Measuring Language Priors in Large Vision-Language Models 
 Kang-il Lee, Minbeom Kim, Seunghyun Yoon, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung
- 
    Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge 
 Lian Remme, Kevin Tang
- 
    From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis 
 Moritz Plenz, Philipp Heinisch, Janosch Gehring, Philipp Cimiano, Anette Frank
- 
    Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies 
 Yingqiang Gao, Lukas Fischer, Alexa Lintner, Sarah Ebling
- 
    UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models 
 Ruoli Gan, Duanyu Feng, Chen Zhang, Zhihang Lin, Haochen Jia, Hao Wang, Zhenyang Cai, Lei Cui, Qianqian Xie, Jimin Huang, Benyou Wang
- 
    GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets 
 Qiming Wu, Zichen Chen, Will Corcoran, Misha Sra, Ambuj Singh
- 
    UniRAG: Universal Retrieval Augmentation for Large Vision Language Models 
 Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin
- 
    DDGIP: Radiology Report Generation Through Disease Description Graph and Informed Prompting 
 Chentao Huang, Guangli Li, Xinjiong Zhou, Yafeng Ren, Hongbin Zhang
- 
    Linguistically Grounded Analysis of Language Models using Shapley Head Values 
 Marcell Fekete, Johannes Bjerva
- 
    Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances 
 zehui wu, Ziwei Gong, Lin Ai, Pengyuan Shi, Kaan Donbekci, Julia Hirschberg
- 
    A Context-Aware Contrastive Learning Framework for Hateful Meme Detection and Segmentation 
 Xuanyu Su, Yansong Li, Diana Inkpen, Nathalie Japkowicz
- 
    Syntriever: How to Train Your Retriever with Synthetic Data from LLMs 
 Minsang Kim, Seung Jun Baek
- 
    Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning 
 Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li
- 
    Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation 
 Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Alessio Miaschi, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli, Roberto Navigli
- 
    LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models 
 Zhenyue Qin, Yu Yin, Dylan Campbell, Xuansheng Wu, Ke Zou, Ninghao Liu, Yih Chung Tham, Xiuzhen Zhang, Qingyu Chen
- 
    ConShift: Sense-based Language Variation Analysis using Flexible Alignment 
 Clare Arrington, Mauricio Gruppi, Sibel Adali
- 
    OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description 
 Alessio Cocchieri, Giacomo Frisoni, Marcos Martínez Galindo, Gianluca Moro, Giuseppe Tagliavini, Francesco Candoli
- 
    Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step 
 Zezhong WANG, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
- 
    MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing 
 Vlad Andrei Negru, Robert Vacareanu, Camelia Lemnaru, Mihai Surdeanu, RODICA POTOLEA
- 
    InstructAny2Pix: Image Editing with Multi-Modal Prompts 
 Shufan Li, Harkanwar Singh, Aditya Grover
- 
    LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain 
 Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher D Manning
- 
    Where is this coming from? Making groundedness count in the evaluation of Document VQA models 
 Armineh Nourbakhsh, Siddharth Parekh, Pranav Shetty, Zhao Jin, Sameena Shah, Carolyn Rose
- 
    Data Poisoning for In-context Learning 
 Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang
- 
    Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models 
 Jialiang Wu, Yi Shen, Sijia Liu, Yi Tang, Sen Song, Xiaoyi Wang, Longjun Cai
- 
    Transformer-based Causal Language Models Perform Clustering 
 Xinbo Wu, Lav R. Varshney
- 
    Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description 
 Mahshid Dehghani, Amirahmad Shafiee, Ali Shafiei, Neda Fallah, Farahmand Alizadeh, Mohammad Mehdi Gholinejad, Hamid Behroozi, Jafar Habibi, Ehsaneddin Asgari
- 
    Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models 
 Yuyi Huang, Runzhe Zhan, Derek F. Wong, Lidia S. Chao, Ailin Tao
- 
    MedEureka: A Medical Domain Benchmark for Multi-Granularity and Multi-Data-Type Embedding-Based Retrieval 
 Yongqi Fan, Nan Wang, KUI XUE, Jingping Liu, Tong Ruan
- 
    CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis 
 Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee
- 
    SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia 
 Chaoqun Liu, Wenxuan Zhang, Jiahao Ying, Mahani Aljunied, Anh Tuan Luu, Lidong Bing
- 
    MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts 
 Shizhou Huang, Bo Xu, Changqun Li, Yang Yu, Xin Alex Lin
- 
    GuideQ: Framework for Guided Questioning for progressive informational collection and classification 
 PRIYA MISHRA, Suraj Racha, Kaustubh Ponkshe, Adit Akarsh, Ganesh Ramakrishnan
- 
    DHP Benchmark: Are LLMs Good NLG Evaluators? 
 Yicheng Wang, Jiayi Yuan, Yu-Neng Chuang, Zhuoer Wang, Yingchi Liu, Mark Cusick, Param Kulkarni, Zhengping Ji, Yasser Ibrahim, Xia Hu
- 
    LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy 
 Zhiwen Ruan, Yixia Li, He Zhu, Longyue Wang, Weihua Luo, Kaifu Zhang, Yun Chen, Guanhua Chen
- 
    Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion 
 Guoliang Zhu, Tao Ren, Dandan Wang, JUN HU
- 
    Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm 
 Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin
- 
    Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents 
 Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Iquebal, Chitta Baral
- 
    On the Impact of Noise in Differentially Private Text Rewriting 
 Stephen Meisenbacher, Maulik Chevli, Florian Matthes
- 
    Multi-Stage LLM Fine-Tuning with a Continual Learning Setting 
 Changhao Guan, Chao Huang, Hongliang Li, You Li, Ning Cheng, Zihe Liu, Yufeng Chen, Jinan Xu, Jian Liu
- 
    Learning to Search Effective Example Sequences for In-Context Learning 
 Xiang Gao, Ankita sinha, Kamalika Das
- 
    Representation-to-Creativity (R2C): Automated Holistic Scoring Model for Essay Creativity 
 Deokgi Kim, Joonyoung Jo, Byung-Won On, Ingyu Lee
- 
    BioEL: A Comprehensive Python Package for Biomedical Entity Linking 
 Prasanth Bathala, Christophe Ye, Batuhan Nursal, Shubham Lohiya, David Kartchner, Cassie S. Mitchell
- 
    Understanding Reference Policies in Direct Preference Optimization 
 Yixin Liu, Pengfei Liu, Arman Cohan
- 
    Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models 
 Weidi Luo, He CAO, Zijing Liu, Yu Wang, Aidan Wong, Bin Feng, Yuan Yao, Yu Li
- 
    MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens 
 Yongqi Fan, Hongli Sun, KUI XUE, Xiaofan Zhang, Shaoting Zhang, Tong Ruan
- 
    Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning 
 Venkatesh Mishra, Bimsara Pathiraja, Mihir Parmar, Sat Chidananda, Jayanth Srinivasa, Gaowen Liu, Ali Payani, Chitta Baral
- 
    Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning 
 Joy Crosbie, Ekaterina Shutova
- 
    NOTA: Multimodal Music Notation Understanding for Visual Large Language Model 
 Mingni Tang, Jiajia Li, lu Yang, Zhiqiang Zhang, Jinhao Tian, Zuchao Li, Lefei Zhang, Ping Wang
- 
    MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows 
 Xingjian Zhang, Yutong Xie, Jin Huang, Jinge Ma, Zhaoying Pan, Qijia Liu, Ziyang Xiong, Tolga Ergen, Dongsub Shim, Honglak Lee, Qiaozhu Mei
- 
    COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning 
 yuelin bai, Xeron Du, Yiming Liang, Leo Jin, Junting Zhou, Ziqiang Liu, Feiteng Fang, Mingshan Chang, Tianyu Zheng, Xincheng Zhang, Nuo ma, Zekun Moore Wang, Ruibin Yuan, Haihong Wu, Hongquan Lin, Wenhao Huang, Jiajun Zhang, Chenghua Lin, Jie Fu, Min Yang, Shiwen Ni, Ge Zhang
- 
    LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education 
 Iain Weissburg, Sathvika Anand, Sharon Levy, Haewon Jeong
- 
    ResoFilter: Fine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis 
 Zeao Tu, Xiangdi Meng, Yu He, Zihan Yao, Tianyu Qi, Jun Liu, Ming Li
- 
    Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis 
 Angelina Parfenova, Andreas Marfurt, Jürgen Pfeffer, Alexander Denzler
- 
    Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection 
 Myrthe Reuver, Indira Sen, Matteo Melis, Gabriella Lapesa
- 
    FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG 
 Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, zhenyu liu, Dongfang Li, Baotian Hu, Min zhang
- 
    GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation 
 Runchuan Zhu, Xinke Jiang, Jiang Wu, Zhipeng ma, Jiahe Song, Fengshuo Bai, Dahua Lin, Lijun Wu, Conghui He
- 
    LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models 
 Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu
- 
    Evaluating Numeracy of Language Models as a Natural Language Inference Task 
 Rahmad Mahendra, Damiano Spina, Lawrence Cavedon, Karin Verspoor
- 
    PRDetect: Perturbation-Robust LLM-generated Text Detection Based on Syntax Tree 
 Xiang Li, Zhiyi yin, Hexiang Tan, Shaoling Jing, Du Su, Yi Cheng, Huawei Shen, Fei Sun
- 
    Self-Training Large Language Models for Tool-Use Without Demonstrations 
 Ne Luo, Aryo Pradipta Gema, Xuanli He, Emile van Krieken, Pietro Lesci, Pasquale Minervini
- 
    Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages 
 Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya
- 
    Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction 
 Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, Pankaj Wasnik
- 
    INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages 
 Abhishek Kumar Singh, vishwajeet kumar, Rudra Murthy, Jaydeep Sen, Ashish Mittal, Ganesh Ramakrishnan
- 
    CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V 
 Siyu Xu, Yunke Wang, Daochang Liu, Bo Du, Chang Xu
- 
    CodeSim: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging 
 Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
- 
    Multimodal Generation with Consistency Transferring 
 Junxiang Qiu, Jinda Lu, Shuo Wang
- 
    An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning 
 Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly
- 
    People will agree what I think: Investigating LLM’s False Consensus Effect 
 Junhyuk Choi, Yeseon Hong, Bugeun Kim
- 
    Entity Pair-guided Relation Summarization and Retrieval in LLMs for Document-level Relation Extraction 
 Fu Zhang, Hongsen Yu, Jingwei Cheng, Huangming Xu
- 
    How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise in Machine Translation 
 Yan Meng, Di Wu, Christof Monz
- 
    Token Weighting for Long-Range Language Modeling 
 Falko Helm, Nico Daheim, Iryna Gurevych
- 
    Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction 
 ShengbinYue, Ting Huang, Zheng Jia, Siyuan Wang, Shujun Liu, Yun Song, Xuanjing Huang, zhongyu wei
- 
    Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models 
 Atharva Mehta, Shivam Chauhan, Amirbek Djanibekov, Atharva Kulkarni, Gus Xia, Monojit Choudhury
- 
    Hard Emotion Test Evaluation Sets for Language Models 
 Tiberiu Sosea, Cornelia Caragea
- 
    RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process 
 Peiran Wang, Xiaogeng Liu, Chaowei Xiao
- 
    Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum 
 Ryan Soh-Eun Shim, Barbara Plank
- 
    Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation 
 Yu Wang, Jiaxin Zhang, Xiang Gao, Wendi Cui, Peng Li, Kamalika Das
- 
    RewardBench: Evaluating Reward Models for Language Modeling 
 Nathan Lambert, Valentina Pyatkin, Jacob Morrison, Lester James Validad Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
- 
    LLM-Generated Passphrases That Are Secure and Easy to Remember 
 Jie S. Li, Jonas Geiping, Micah Goldblum, Aniruddha Saha, Tom Goldstein
- 
    Analysis of LLM as a grammatical feature tagger for African American English 
 Rahul Porwal, Alice Rozet, Jotsna Gowda, Pryce Houck, Kevin Tang, Sarah Moeller
- 
    Constraining Sequential Model Editing with Editing Anchor Compression 
 Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu
- 
    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation 
 Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee
- 
    Rationale Behind Essay Scores: Enhancing S-LLM’s Multi-Trait Essay Scoring with Rationale Generated by LLMs 
 SeongYeub Chu, Jong woo kim, Bryan Wong, Mun Yong Yi
- 
    Synonym-unaware Fast Adversarial Training against Textual Adversarial Attacks 
 Yichen Yang, Xin Liu, Kun He
- 
    A Comprehensive Survey of Contemporary Arabic Sentiment Analysis: Methods, Challenges, and Future Directions 
 Zhiqiang Shi, Ruchit Agrawal
- 
    MAiDE-up: Multilingual Deception Detection of AI-generated Hotel Reviews 
 Oana Ignat, Xiaomeng Xu, Rada Mihalcea
- 
    From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning 
 Ranran Haoran Zhang, Bensu Uçar, Soumik Dey, Hansi Wu, Binbin Li, Rui Zhang
- 
    LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification 
 Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya
- 
    Using Review Combination and Pseudo-Tokens for Aspect Sentiment Quad Prediction 
 Jiazhou Chen, Xu Jia, RuiQiang Guo
- 
    Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy 
 Tunazzina Islam, Dan Goldwasser
- 
    Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization 
 Weiqi Wu, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, hai zhao
- 
    SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer 
 Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Zhuosheng Zhang, Gongshen Liu
- 
    HEISIR: Hierarchical Expansion of Inverted Semantic Indexing for Training-free Retrieval of Conversational Data using LLMs 
 Sangyeop Kim, Hangyeul Lee, Yohan Lee
- 
    Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement 
 Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Taha Kass-Hout, Furong Huang, Cao Xiao
- 
    TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data 
 Jipeng Zhang, Yaxuan Qin, Renjie Pi, WEIZHONG ZHANG, Rui Pan, Tong Zhang
- 
    From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization 
 Catarina G Belém, Pouya Pezeshkpour, Hayate Iso, Seiji Maekawa, Nikita Bhutani, Estevam Hruschka
- 
    DisComp: A Two-Stage Prompt Optimization Framework Combining Task-Agnostic and Task-Aware Compression 
 Liu quancai, Haihui Fan, Jinchao Zhang, lixiangfang, Lichuanrong, Bo Li
- 
    MoLA: MoE LoRA with Layer-wise Expert Allocation 
 Chongyang Gao, Kezhen Chen, Jinmeng Rao, Ruibo Liu, Baochen Sun, Yawen Zhang, Daiyi Peng, Xiaoyuan Guo, VS Subrahmanian
- 
    SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models 
 Jahyun Koo, Yerin Hwang, Yongil Kim, Taegwan Kang, Hyunkyung Bae, Kyomin Jung
- 
    UNLEARN Efficient Removal of Knowledge in Large Language Models 
 Tyler Lizzo, Larry Heck
- 
    RetrieverGuard: Empowering Information Retrieval to Combat LLM-Generated Misinformation 
 Chuwen Chen, Shuai Zhang
- 
    ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning 
 Millennium Bismay, Xiangjue Dong, James Caverlee
- 
    TagGen: Enforcing Syntactic Structures with Tag-Based Control 
 Vicky Xefteri, Afra Amini, Tim Vieira, Ryan Cotterell
- 
    Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models 
 Zizhang Chen, Peizhao Li, Xiaomeng Dong, Pengyu Hong
- 
    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis 
 Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Wang Changxin, Zhifeng Gao, Hongshuai Wang, Li Yongge, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Xi Fang, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, changhong chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke
- 
    PLD+: Accelerating LLM Inference by Leveraging Language Model Artifacts 
 Shwetha Somasundaram, Anirudh Phukan, Apoorv Saxena
- 
    A Federated Framework for LLM-based Recommendation 
 Jujia Zhao, Wenjie Wang, Chen Xu, See-Kiong Ng, Tat-Seng Chua
- 
    “All that Glitters”: Techniques for Evaluations with Unreliable Model and Human Annotations 
 Michael Hardy
- 
    Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models 
 Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu
- 
    Language-based Valence and Arousal Expressions between the United States and China: a Cross-Cultural Examination 
 Young Min Cho, Dandan Pang, Stuti Thapa, Garrick Sherman, Lyle Ungar, Louis Tay, Sharath Chandra Guntuku
- 
    Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models 
 Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang
- 
    DiPT: Enhancing LLM Reasoning through Diversified Perspective-Taking 
 Hoang Anh Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia
- 
    DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications 
 Sathya Krishnan Suresh, Wu Mengjun, Tushar Pranav, EngSiong Chng
- 
    Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models 
 Jiachen Ma, Yijiang Li, Zhiqing Xiao, Anda Cao, Jie Zhang, Chao Ye, Junbo Zhao
- 
    LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models 
 Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang
- 
    SFMSS: Service Flow aware Medical Scenario Simulation for Conversational Data Generation 
 Zhijie Bao, Qingyun Liu, Xuanjing Huang, zhongyu wei
- 
    DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization 
 Haohan Yuan, Haopeng Zhang
- 
    On the Feasibility of In-Context Probing for Data Attribution 
 Cathy Jiao, Weizhen Gao, Aditi Raghunathan, Chenyan Xiong
- 
    Modeling the Differential Prevalence of Online Supportive Interactions in Private Instant Messages of Adolescents 
 Ondrej Sotolar, Michał Tkaczyk, Jaromír Plhák, David Smahel
- 
    CLERC: A Dataset for U. S. Legal Case Retrieval and Retrieval-Augmented Analysis Generation 
 Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme
- 
    Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification 
 Hamidreza Saffari, Mohammadamin Shafiei, Donya Rooein, Francesco Pierri, Debora Nozza
- 
    DOLFIN - Document-Level Financial Test-Set for Machine Translation 
 Mariam Nakhle, Marco Dinarelli, Raheel Qader, Emmanuelle Esperança-Rodier, Hervé Blanchon
- 
    Dynamic Strategy Planning for Efficient Question Answering with Large Language Models 
 Tanmay Parekh, Pradyot Prakash, Alexander Radovic, Akshay Shekher, Denis Savenkov
- 
    Enhancing the Prototype Network with Local-to-Global Optimization for Few-Shot Relation Extraction 
 Hui Sun, Rongxin Chen
- 
    UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model 
 Zhaowei Li, Wei Wang, YiQing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang
- 
    From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs 
 Navya Jain, Zekun Wu, CRISTIAN ENRIQUE MUNOZ VILLALOBOS, Airlie Hilliard, Xin Guan, Adriano Koshiyama, Emre Kazim, Philip Colin Treleaven
- 
    SimulBench: Evaluating Language Models with Creative Simulation Tasks 
 Qi Jia, Xiang Yue, Tuney Zheng, Jie Huang, Bill Yuchen Lin
- 
    Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs 
 Aarón Galiano-Jiménez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena
- 
    PREMISE: Matching-based Prediction for Accurate Review Recommendation 
 Wei Han, Hui Chen, Soujanya Poria
- 
    LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers 
 Anton Razzhigaev, Matvey Mikhalchuk, Temurbek Rahmatullaev, Elizaveta Goncharova, Polina Druzhinina, Ivan Oseledets, Andrey Kuznetsov
- 
    Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots 
 Chengyue Wu, Zhixuan Liang, Yixiao Ge, Qiushan Guo, Zeyu Lu, Jiahao Wang, Ying Shan, Ping Luo
- 
    How to Talk to Language Models: Serialization Strategies for Structured Entity Matching 
 Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, Vidit Bansal
- 
    Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency 
 Leonidas Gee, Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci
- 
    Overcoming both Domain Shift and Label Shift for Referring Video Segmentation 
 Hai Huang, Sashuai zhou, Yan Xia
- 
    Verifiable Format Control for Large Language Model Generations 
 Zhaoyang Wang, Jinqi Jiang, Huichi Zhou, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Huaxiu Yao
- 
    The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection 
 Tomáš Horych, Christoph Mandl, Terry Ruas, Andre Greiner-Petter, Bela Gipp, Akiko Aizawa, Timo Spinde
- 
    Can LLMs Learn Macroeconomic Narratives from Social Media? 
 Almog Gueta, Amir Feder, Zorik Gekhman, Ariel Goldstein, Roi Reichart
- 
    MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation 
 Chanhee Park, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
- 
    A Large-Scale Benchmark for Vietnamese Sentence Paraphrases 
 Sang Quang Nguyen, Kiet Van Nguyen
- 
    Omni-Chart-600K: A Comprehensive Dataset of Chart Types for Chart Understanding 
 Shulei Wang, Shuai Yang, Wang Lin, Zirun Guo, Sihang Cai, Hai Huang, Ye Wang, Jingyuan Chen, Tao Jin
- 
    Rejected Dialects: Biases Against African American Language in Reward Models 
 Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, Maarten Sap
- 
    ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems 
 Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li
- 
    Echoes of Discord: Forecasting Hater Reactions to Counterspeech 
 Xiaoying Song, Sharon Lisseth Perez, Xinchen Yu, Eduardo Blanco, Lingzi Hong
- 
    The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge 
 Lee Kezar, Nidhi Munikote, Zian Zeng, Zed Sehyr, Naomi Caselli, Jesse Thomason
- 
    Evaluation of Multilingual Image Captioning: How far can we get with CLIP models? 
 Goncalo Emanuel Cavaco Gomes, Chrysoula Zerva, Bruno Martins
- 
    Multilingual Blending: Large Language Model Safety Alignment Evaluation with Language Mixture 
 Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma
- 
    GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration 
 Ziwen Li, Xiang Chen, Youngseung Jeon
- 
    Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data 
 Juanhui Li, Sreyashi Nag, Hui Liu, Xianfeng Tang, Sheikh Muhammad Sarwar, Limeng Cui, Hansu Gu, Suhang Wang, Qi He, Jiliang Tang
- 
    Large-Scale Corpus Construction and Retrieval-Augmented Generation for Ancient Chinese Poetry: New Method and Data Insights 
 Yang Liu, Lan Lan, Jiahuan Cao, Hiuyi Cheng, Kai Ding, Lianwen Jin
- 
    Rethinking Smoothness for Fast and Adaptable Entity Alignment Decoding 
 Yuanyi Wang, Han Li, Haifeng Sun, Lei Zhang, Bo He, Wei Tang, Tianhao Yan, Qi Qi, Jingyu Wang
- 
    In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation 
 Armel Randy Zebaze, Benoît Sagot, Rachel Bawden
- 
    Joint Learning Event-Specific Probe and Argument Library with Differential Optimization for Document-Level Multi-Event Extraction 
 Jianpeng Hu, Chao Xue, Chunqing Yu, JiaCheng Xu, Chengxiang Tan
- 
    Zero-Shot Strategies for Length-Controllable Summarization 
 Fabian Retkowski, Alexander Waibel
- 
    Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models 
 Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, Heuiseok Lim
- 
    MedThink: A Rationale-Guided Framework for Explaining Medical Visual Question Answering 
 Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, YANG FENG, Jian Wu, Zuozhu Liu
- 
    Can’t Hide Behind the API: Stealing Black-Box Commercial Embedding Models 
 Manveer Singh Tamber, Jasper Xian, Jimmy Lin
- 
    Tackling Social Bias against the Poor: a Dataset and a Taxonomy on Aporophobia 
 Georgina Curto, Svetlana Kiritchenko, Muhammad Hammad Fahim Siddiqui, Isar Nejadgholi, Kathleen C. Fraser
- 
    Human and LLM-Based Resume Matching: An Observational Study 
 Swanand Vaishampayan, Hunter Leary, Yoseph Berhanu Alebachew, Louis Hickman, Brent A. Stevenor, Weston Beck, Chris Brown
- 
    A Survey to Recent Progress Towards Understanding In-Context Learning 
 Haitao Mao, Guangliang Liu, Yao Ma, Rongrong Wang, Kristen Johnson, Jiliang Tang
- 
    M-IFEval: Multilingual Instruction-Following Evaluation 
 Antoine Dussolle, A. Cardeña, Shota Sato, Peter Devine
- 
    Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing 
 Hadi Askari, Anshuman Chhabra, Muhao Chen, Prasant Mohapatra
- 
    XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples 
 Peiqin Lin, Andre Martins, Hinrich Schuetze
- 
    Continuous Speech Tokenizer in Text To Speech 
 Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, Zhanhui Kang
- 
    Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models 
 Sonam Gupta, Yatin Nandwani, Asaf Yehudai, Dinesh Khandelwal, Dinesh Raghu, Sachindra Joshi
- 
    On Using Arabic Language Dialects in Recommendation Systems 
 Abdulla Alshabanah, Murali Annavaram
- 
    $\mathcal{S}^2$IT: Stepwise Syntax Integration Tuning for Large Language Models in Aspect Sentiment Quad Prediction 
 Bingfeng chen, Chenjie Qiu, Yifeng Xie, Boyan Xu, Ruichu Cai, Zhifeng Hao
- 
    SEEval: Advancing LLM Text Evaluation Efficiency and Accuracy through Self-Explanation Prompting 
 Meng-Chen Wu, Md Mosharaf Hossain, Tess Wood, Shayan Ali Akbar, Si-Chi Chin, Erwin Cornejo
- 
    Unified Automated Essay Scoring and Grammatical Error Correction 
 SeungWoo Song, Junghun Yuk, ChangSu Choi, HanGyeol Yoo, HyeonSeok Lim, KyungTae Lim, Jungyeul Park
- 
    Towards Long Context Hallucination Detection 
 Siyi Liu, Kishaloy Halder, Zheng Qi, Wei Xiao, Nikolaos Pappas, Phu Mon Htut, Neha Anna John, Yassine Benajiba, Dan Roth
- 
    TabComp: A Dataset for Visual Table Reading Comprehension 
 Somraj Gautam, Abhishek Bhandari, Gaurav Harit
- 
    Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving 
 Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun
- 
    TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement 
 Zhaopeng Feng, Yan Zhang, Hao Li, Bei Wu, Jiayu Liao, Wenqiang Liu, Jun Lang, YANG FENG, Jian Wu, Zuozhu Liu
- 
    Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning 
 Jean Vassoyan, Nathanaël Beau, Roman Plaud
- 
    The Power of Bullet Lists: A Simple Yet Effective Prompting Approach to Enhancing Spatial Reasoning in Large Language Models 
 Ikhyun Cho, Changyeon Park, Julia Hockenmaier
- 
    CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering 
 Yumeng Wang, Zhiyuan Fan, Qingyun Wang, Yi R. Fung, Heng Ji
- 
    Challenges in Trustworthy Human Evaluation of Chatbots 
 Wenting Zhao, Alexander M Rush, Tanya Goyal
- 
    How Inclusively do LMs Perceive Social and Moral Norms? 
 Michael Galarnyk, Agam Shah, Dipanwita Guhathakurta, Poojitha Nandigam, Sudheer Chava
- 
    AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation 
 Vaishnavi Pulavarthi, Deeksha Nandal, Soham Dan, Debjit Pal
- 
    Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device 
 Juntae Lee, Jihwan Bang, Kyuhong Shim, Seunghan Yang, Simyung Chang
- 
    ToVo: Toxicity Taxonomy via Voting 
 Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Nguyen Thi Ngoc Diep
- 
    Understanding the Role of Mental Models in User Interaction with an Adaptive Dialog Agent 
 Lindsey Morgan Vanderlyn, Dirk Väth, Thang Vu
- 
    Beyond Excess and Deficiency: Adaptive Length Bias Mitigation in Reward Models for RLHF 
 Yuyan Bu, Liangyu Huo, Yi Jing, Qing Yang
- 
    Flaming-hot Initiation with Regular Execution Sampling for Large Language Models 
 Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan
- 
    BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting 
 Mohammad Jahid Ibna Basher, Md Kowsher, Md Saiful Islam, Rabindra Nath Nandi, Nusrat Jahan Prottasha, Mehadi Hasan Menon, Tareq Al Muntasir, Shammur Absar Chowdhury, Firoj Alam, Niloofar Yousefi, Ozlem Garibay
- 
    Keep Guessing? When Considering Inference Scaling, Mind the Baselines 
 Gal Yona, Or Honovich, Omer Levy, Roee Aharoni
- 
    Synthetic Audio Helps for Cognitive State Tasks 
 Adil Soubki, John Murzaku, Peter Zeng, Owen Rambow
- 
    Sequence-level Large Language Model Training with Contrastive Preference Optimization 
 Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao, Sheng Zha
- 
    FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval 
 Jinlin Wang, Suyuchen Wang, Ziwen Xia, Sirui Hong, Yun Zhu, Bang Liu, Chenglin Wu
- 
    SAFR: Neuron Redistribution for Interpretability 
 Ruidi Chang, Chunyuan Deng, Hanjie Chen
- 
    Is Semantic Chunking Worth the Computational Cost? 
 Renyi Qu, Ruixuan Tu, Forrest Sheng Bao
- 
    TeCoFeS: Text Column Featurization using Semantic Analysis 
 Ananya Singha, Mukul Singh, Ashish Tiwari, Sumit Gulwani, Vu Le, Chris Parnin
- 
    kNN For Whisper And Its Effect On Bias And Speaker Adaptation 
 Maya K. Nachesa, Vlad Niculae
- 
    Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks 
 Nikita Soni, Pranav Chitale, Khushboo Singh, Niranjan Balasubramanian, H. Schwartz
- 
    LeCoPCR: Legal Concept-guided Prior Case Retrieval for European Court of Human Rights cases 
 Santosh T.Y.S.S, Isaac Misael Olguín Nolasco, Matthias Grabmair
- 
    Optimizing Hidden Markov Language Models: An Empirical Study of Reparameterization and Initialization Techniques 
 Ivan Lee, Taylor Berg-Kirkpatrick
- 
    CAMEL-Bench: A Comprehensive Arabic LMM Benchmark 
 Sara Ghaboura, Ahmed Heakl, Omkar Chakradhar Thawakar, Ali Husain Salem Abdulla Alharthi, Ines Riahi, Abduljalil Radman, Jorma Laaksonen, Fahad Shahbaz Khan, Salman Khan, Rao Muhammad Anwer
- 
    Seeds of Discourse: A Multilingual Corpus of Direct Quotations from African Media on Agricultural Biotechnologies 
 Patricia Chiril, Trevor Spreadbury, Joeva Sean Rock, Brian Dowd-Uribe, David Uminsky
- 
    Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring 
 Heejin Do, Taehee park, Sangwon Ryu, Gary Lee
- 
    An empirical study of validating synthetic data for formula generation 
 Usneek Singh, José Cambronero, Sumit Gulwani, Aditya Kanade, Anirudh Khatry, Vu Le, Mukul Singh, Gust Verbruggen
- 
    CA*: Addressing Evaluation Pitfalls in Computation-Aware Latency for Simultaneous Speech Translation 
 Xi Xu, Wenda Xu, Siqi Ouyang, Lei Li
- 
    Are Large Language Models Effective in Clinical Trial Design? A Study on Baseline Feature Generation 
 Nafis Neehal, Bowen Wang, Shayom Debopadhaya, Corey Curran, Keerthiram Murugesan, Soham Dan, Vibha Anand, Kristin Bennett
- 
    AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora 
 Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, Joseph P. Dexter
- 
    Gender Bias in Instruction-Guided Speech Synthesis Models 
 Chun-Yi Kuan, Hung-yi Lee
- 
    Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations 
 Kirti Bhagat, Kinshuk Vasisht, Danish Pruthi
- 
    MMLF: Multi-query Multi-passage Late Fusion Retrieval 
 Yuan-Ching Kuo, Yi Yu, Chih-Ming Chen, Chuan-Ju Wang
- 
    Advocating Character Error Rate for Multilingual ASR Evaluation 
 Thennal D K, Jesin James, DEEPA PADMINI GOPINATH, MUHAMMED ASHRAF K
- 
    Evaluating the Performance of Large Language Models via Debates 
 Behrad Moniri, Hamed Hassani, Edgar Dobriban
- 
    RELexED: Retrieval-Enhanced Legal Summarization with Exemplar Diversity 
 Santosh T.Y.S.S, Chen Jia, Patrick Goroncy, Matthias Grabmair
- 
    Demystifying the Power of Large Language Models in Graph Generation 
 Yu Wang, Ryan A. Rossi, Namyong Park, Nesreen K. Ahmed, Danai Koutra, Franck Dernoncourt, Tyler Derr
- 
    Meta-Reasoning Improves Tool Use in Large Language Models 
 Lisa Alazraki, Marek Rei
- 
    WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation 
 João Matos, Shan Chen, Siena Kathleen V. Placino, Yingya Li, Juan Carlos Climent Pardo, Daphna Idan, Takeshi Tohyama, David Restrepo, Luis Filipe Nakayama, José María Millet Pascual-Leone, Guergana K Savova, Hugo Aerts, Leo Anthony Celi, An-Kwok Ian Wong, Danielle Bitterman, Jack Gallifant
- 
    QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums 
 Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dana Calacci, Andrés Monroy-Hernández
- 
    A Practical Analysis of Human Alignment with *PO 
 Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song
- 
    Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting 
 Jiarui Wu, Zhuo Liu, Hangfeng He
- 
    Considering Length Diversity in Retrieval-Augmented Summarization 
 Juseon-Do, Jaesung Hwang, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura
- 
    Lessons from a User Experience Evaluation of NLP Interfaces 
 Eduardo Calò, Lydia Penkert, Saad Mahamood
- 
    Can GPT-4 Sway Experts’ Investment Decisions? 
 Takehiro Takayanagi, Hiroya Takamura, Kiyoshi Izumi, Chung-Chi Chen
- 
    Adapting LLM Agents with Universal Communication Feedback 
 Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, yelong shen
- 
    On Localizing and Deleting Toxic Memories in Large Language Models 
 Anubrata Das, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, Rahul Gupta
- 
    Lost in the Distance: Large Language Models Struggle to Capture Long-Distance Relational Knowledge 
 Meiyun Wang, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo
Industry Track Papers
- 
    Challenges and Remedies of Domain-Specific Classifiers as LLM Guardrails: Self-Harm as a Case Study Bing Zhang, Guang-Jie Ren 
- 
    Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance Somnath Banerjee, Avik Halder, Rajarshi Mandal, Sayan Layek, Ian Soboroff, Rima Hazra, Animesh Mukherjee 
- 
    Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang 
- 
    RAD-Bench: Evaluating Large Language Models’ Capabilities in Retrieval Augmented Dialogues Tzu-Lin Kuo, FengTing Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-Chun Hsu, Da-shan Shiu 
- 
    eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables Luis Antonio Gutierrez Guanilo, Mir Tafseer Nayeem, CRISTIAN JOSE LOPEZ DEL ALAMO, Davood Rafiei 
- 
    Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Zijian Chen, John-Michael Gamble, Jimmy Lin 
- 
    MonoTODia: Translating Monologue Requests to Task-Oriented Dialogues Sebastian Steindl, Ulrich Schäfer, Bernd Ludwig 
- 
    Towards Reliable and Practical Phishing Detection Hyowon Cho, Minjoon Seo 
- 
    Pisets: A Robust Speech Recognition System for Lectures and Interviews Ivan Bondarenko, Daniil Grebenkin, Oleg Sedukhin, Mikhail Klementev, Derunets Roman, Lyudmila Budneva 
- 
    Chinese Morph Resolution in E-commerce Live Streaming Scenarios jiahao zhu, Jipeng Qiang, Ran Bai, Chenyu Liu, Xiaoye Ouyang 
- 
    Protein2Text: Resampling Mechanism to Translate Protein Sequences into Human-Interpretable Text Ala Jararweh, Oladimeji Macaulay, David Arredondo, Yue Hu, Luis E Tafoya, Kushal Virupakshappa, Avinash Sahu 
- 
    VIT-Pro: Visual Instruction Tuning for Product Images Vishnu Prabhakaran, Purav Aggarwal, Vishruit Kulshreshtha, Arunita Das, Sahini Venkata Sitaram Sruti, Anoop Saladi 
- 
    MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Vinay Kumar Verma, Shreyas Sunil Kulkarni, Happy Mittal, Deepak Gupta 
- 
    Finding-Centric Structuring of Japanese Radiology Reports and Analysis of Performance Gaps for Multiple Facilities Yuki Tagawa, Yohei Momoki, Norihisa Nakano, Ryota Ozaki, Motoki Taniguchi, Masatoshi Hori, Noriyuki Tomiyama 
- 
    SuperRAG: Beyond RAG with Layout-Aware Graph Modeling Chening Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le 
- 
    Goal-Driven Data Story, Narrations and Explanations Aniya Aggarwal, Ankush Gupta, Shivangi Bithel, Arvind Agarwal 
- 
    Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings Xuanqing Liu, Luyang Kong, Wei Niu, Afshin Khashei, Belinda Zeng, Steve Johnson, Jon Jay, Davor Golac, Matt Pope 
- 
    Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs Cong Duy Vu Hoang, Gioacchino Tangari, Clemence Lanfranchi, Dalu Guo, Paul Cayet, Steve Siu, Don Dharmasiri, Yuan-Fang Li, Long Duong, Damien Hilloulin, Rhicheek Patra, Sungpack Hong, Hassan Chafi 
- 
    SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae 
- 
    TaeBench: Improving Quality of Toxic Adversarial Examples Jennifer Zhu, Dmitriy Bespalov, Liwen You, Ninad Kulkarni, Yanjun Qi 
- 
    LLM Safety for Children Prasanjit Rath, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat 
- 
    How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with a Novel Traffic Incident Benchmark Dataset Qiang Li, Mingkun Tan, Xun Zhao, Dan Zhang, Daoan Zhang, Shengzhao Lei, Anderson S. Chu, Lujun Li, Porawit Kamnoedboon 
- 
    RevieWeaver: Weaving Together Review Insights by Leveraging LLMs and Semantic Similarity Jiban Adhikary, Mohammad Alqudah, Arun Udayashankar 
- 
    SwissADT: An Audio Description Translation System for Swiss Languages Lukas Fischer, Yingqiang Gao, Alexa Lintner, Annette Rios, Sarah Ebling 
- 
    Exploring Straightforward Methods for Automatic Conversational Red-Teaming George Kour, Naama Zwerdling, Marcel Zalmanovici, Ateret Anaby Tavor, Ora Nova Fandina, Eitan Farchi 
- 
    Chatbot Arena Estimate: towards a generalized performance benchmark for LLM capabilities Lucas Spangher, Tianle Li, William F. Arnold, Nick Masiewicki, Xerxes Dotiwalla, Rama Kumar Pasumarthi, Peter Grabowski, Eugene Ie, Daniel Gruhl 
- 
    Can Post-Training Quantization Benefit from an Additional QLoRA Integration? Xiliang Zhu, Elena Khasanova, Cheng Chen 
- 
    Visual Zero-Shot E-Commerce Product Attribute Value Extraction Jiaying Gong, Ming Cheng, Hongda Shen, Pierre-Yves Vandenbussche, Janet Jenq, Hoda Eldardiry 
- 
    Natural Language Processing for Human Resources: A Survey Naoki Otani, Nikita Bhutani, Estevam Hruschka 
- 
    Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting Emmanuel Aboah Boateng, Cassiano O Becker, Nabiha Asghar, Kabir Walia, Ashwin Srinivasan, Ehi Nosakhare, Soundararajan Srinivasan, Victor Dibia 
- 
    Predicting ICU Length of Stay for Patients using Latent Categorization of Health Conditions Tirthankar Dasgupta, Manjira Sinha, Sudeshna Jana 
- 
    Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service Song Wang, Xun Wang, Jie Mei, Yujia Xie, Si-Qing Chen, Wayne Xiong 
- 
    Mitigating Bias in Item Retrieval for Enhancing Exam Assembly in Vocational Education Services Alonso Palomino, Andreas Fischer, David Buschhüter, Roland Roller, Niels Pinkwart, Benjamin Paassen 
- 
    RTSM: Knowledge Distillation with Diverse Signals for Efficient Real-Time Semantic Matching in E-Commerce Sanjay Agrawal, Vivek Sembium 
- 
    Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation Yi-Chang Chen, Po-Chun Hsu, Chan-Jan Hsu, Da-shan Shiu 
- 
    Efficient Continual Pre-training of LLMs for Low-resource Languages Arijit Nag, Soumen Chakrabarti, Animesh_Mukherjee, Niloy Ganguly 
- 
    Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard Chanjun Park, Hyeonwoo Kim 
- 
    Implementing Retrieval Augmented Generation Technique on Unstructured and Structured Data Sources in a Call Center of a Large Financial Institution Syed Shariyar Murtaza, Yifan Nie, Elias Avan, Utkarsh Soni, Wanyu Liao, Adam Carnegie, Cyril John Mathias, Junlin Jiang, Eugene Wen 
- 
    HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications Rishi Kalra, Zekun Wu, Ayesha Gulley, Airlie Hilliard, Xin Guan, Adriano Koshiyama, Philip Colin Treleaven 
- 
    Schema and Natural Language Aware In-Context Learning for Improved GraphQL Query Generation Nitin Gupta, MANISH KESARWANI, Sambit Ghosh, Sameep Mehta, Carlos Eberhardt, Dan Debrunner 
- 
    Medical Spoken Named Entity Recognition Khai Le-Duc, David Thulke, Hung-Phong Tran, Long Vo-Dang, Khai-Nguyen Nguyen, Truong-Son Hy, Ralf Schlüter 
- 
    An Efficient Context-Dependent Memory Framework for LLM-Centric Agents Pengyu Gao, Jinming Zhao, Xinyue Chen, Long Yilin 
- 
    Improved Near-Duplicate Detection for Aggregated and Paywalled News-Feeds Siddharth Tumre, Sangameshwar Patil, Alok Kumar 
- 
    Conflict and Overlap Classification in Construction Standards Using a Large Language Model Seong-Jin Park, Youn-Gyu Jin, Hyun-Young Moon, Choi Bong-Hyuck, Lee Seung Hwan, Ohjoon kwon, Kang-Min Kim 
- 
    Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs Hyeonwoo Kim, Dahyun Kim, Jihoo Kim, Sukyung Lee, Yungi Kim, Chanjun Park 
- 
    CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning Zukang Yang, Zixuan Zhu, Jennifer Zhu 
- 
    PLEX: Adaptive Parameter-Efficient Fine-Tuning for Code LLMs using Lottery-Tickets Jaeseong Lee, Hojae Han, Jongyoon Kim, seung-won hwang, Naun Kang, KyungJun An, Sungho Jang 
- 
    Query Variant Detection Using Retriever as Environment Minji Seo, Youngwon Lee, seung-won hwang, Seoho Song, Hee-Cheol Seo, Young-In Song 
- 
    CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models Yifan Zhang, Xue Yang 
- 
    Towards Reliable Agents: Benchmarking Customized LLM-Based Retrieval-Augmented Generation Frameworks with Deployment Validation Kevin Shukang Wang, Karel Joshua Harjono, Ramon Lawrence 
- 
    Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models Arvind Krishna Sridhar, Yinyi Guo, Erik Visser 
- 
    Evaluating the Performance of RAG Methods for Conversational AI in the Airport Domain Yuyang Li, PJM Kerbusch, RHR Pruim, Tobias Käfer 
- 
    QSpell 250K: A Large-Scale, Practical Dataset for Chinese Search Query Spell Correction Dezhi Ye, Haomei Jia, Junwei Hu, Tian Bowen, Jie Liu, Haijin Liang, Jin Ma, Wenmin Wang 
- 
    Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions Yang Li, Yuan Shangguan, Yuhao Wang, Liangzhen Lai, Ernie Chang, Changsheng Zhao, Yangyang Shi, Vikas Chandra 
- 
    CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents Jeiyoon Park, Chanjun Park, Heuiseok Lim 
- 
    DSRAG: A Double-Stream Retrieval-Augmented Generation Framework for Countless Intent Detection Pei Guo, Enjie Liu, Ruichao Zhong, Mochi Gao, Yunzhi Tan, Bo Hu, Zang Li 
- 
    Does Self-Attention Need Separate Weights in Transformers? Md Kowsher, Nusrat Jahan Prottasha, Chun-Nam Yu, Ozlem Garibay, Niloofar Yousefi 
- 
    Evaluating Large Language Models with Enterprise Benchmarks Bing Zhang, Mikio Takeuchi, Ryo Kawahara, Shubhi Asthana, Maruf Hossain, Guang-Jie Ren, Kate Soule, Yifan Mai, Yada Zhu 
- 
    MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, Mengyue Wu 
- 
    TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice Aman Goel, Xian Wu, Zhe Wang, Dmitriy Bespalov, Yanjun_Qi 
- 
    From Generating Answers to Building Explanations: Integrating Multi-Round RAG and Causal Modeling for Scientific QA Victor Barres, Clifton James McFate, Aditya Kalyanpur, Kailash Karthik Saravanakumar, Lori Moon, Natnael Seifu, Abraham Bautista-Castillo 
- 
    RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy Akshay Jagatap, Srujana Merugu, Prakash Mandayam Comar 
- 
    Octopus: On-device language model for function calling of software APIs Wei Chen, Zhiyuan Li, Mingyuan MA 
- 
    SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models Grigor Nalbandyan, Rima Shahbazyan, Evelina Bakhturina 
- 
    MedCodER: A Generative AI Assistant for Medical Coding Krishanu Das Baksi, Elijah Soba, John J Higgins, Ravi Saini, Jaden Wood, Jane Cook, Jack I Scott, Nirmala Pudota, Tim Weninger, Edward Bowen, Sanmitra Bhattacharya 
- 
    CodeGenWrangler: Data Wrangling task automation using Code-Generating Models Ashlesha Akella, Abhijit Manatkar, Krishnasuri Narayanam, Sameep Mehta 
- 
    QueryShield: A Platform to Mitigate Enterprise Data Leakage in Queries to External LLMs Nitin Ramrakhiyani, Delton Myalil, Sachin Pawar, Manoj Apte, RAJAN M A, Divyesh Saglani, Imtiyazuddin Shaik 
- 
    Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education Hayate Iso, Pouya Pezeshkpour, Nikita Bhutani, Estevam Hruschka 
- 
    Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia Fajri Koto 
- 
    MoFE: Mixture of Frozen Experts Architecture Jean Seo, Jaeyoon Kim, Hyopil Shin 
- 
    Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation gao yu zhu, Wei Shao, xichou zhu, Lei Yu, Jiafeng Guo, Xueqi Cheng 
- 
    A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge Jiaming Luo, Weiyi Luo, Guoqing Sun, Mengchen ZHU, Haifeng Tang, Kenny Q. Zhu, Mengyue Wu 
- 
    CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search Kaixin Wu, Yixin Ji, Zeyuan Chen, Qiang Wang, Cunxiang Wang, Hong Liu, Baijun Ji, Xu Jia, Zhongyi Liu, Jinjie GU, Yuan Zhou, Linjian Mo 
- 
    Break-Ideate-Generate (BrIdGe): Moving beyond Translations for Localization using LLMs Swapnil Gupta, Lucas Pereira Carlini, Prateek Sircar, Deepak Gupta 
- 
    Granite Guardian: Comprehensive LLM Safeguarding Inkit Padhi, Manish Nagireddy, Giandomenico Cornacchia, Subhajit Chaudhury, Tejaswini Pedapati, Pierre Dognin, Keerthiram Murugesan, Erik Miehling, Martín Santillán Cooper, Kieran Fraser, Giulio Zizzo, Muhammad Zaid Hameed, Mark Purcell, Michael Desmond, Qian Pan, Inge Vejsbjerg, Elizabeth M. Daly, Michael Hind, Werner Geyer, Ambrish Rawat, Kush R. Varshney, Prasanna Sattigeri 
- 
    WorkTeam: Constructing Workflows from Natural Language with Multi-Agents Hanchao Liu, Rongjun Li, Weimin Xiong, Ziyu Zhou, Wei Peng 
- 
    Search Query Embeddings via User-behavior-driven Contrastive Learning Sosuke Nishikawa, Jun Hirako, Nobuhiro Kaji, Koki Watanabe, Hiroki Asano, Souta Yamashiro, Shumpei Sano 
- 
    AutoKB: Automated Creation of Structured Knowledge Bases for Domain-Specific Support Rishav Sahay, Arihant Jain, Purav Aggarwal, Anoop Saladi 
- 
    Dialogue Language Model with Large-Scale Persona Data Engineering Mengze Hong, Chen Jason Zhang, Chaotao Chen, Rongzhong Lian, Di Jiang 
- 
    FinLLM-B: When Large Language Models Meet Financial Breakout Trading Kang Zhang, Osamu Yoshie, Lichao Sun, Weiran Huang