Accepted Papers
Main Conference - Long Papers
-
Effective Skill Unlearning through Intervention and Abstention
Yongce Li, Chung-En Sun, Tsui-Wei Weng -
Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts
Weisi Liu, Guangzeng Han, Xiaolei Huang -
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su, Xing W, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan Ma, Hui Chen, Songlin Hu, Guiguang Ding -
Can LLMs Convert Graphs to Text-Attributed Graphs?
Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye -
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Jannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller -
ParaICL: Towards Parallel In-Context Learning
Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing -
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee, Yerin Hwang, Yongil Kim, Joonsuk Park, Kyomin Jung -
The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu, Lemao Liu, Junjie Wu, Tsz Ting Chung, Shunchi Zhang, Jiangnan Li, Dit-Yan Yeung, Jie Zhou -
Have LLMs Reopened the Pandora’s Box of AI-Generated Fake News?
Xinyu Wang, Wenbo Zhang, Sai Koneru, Hangzhi Guo, Bonam Mingole, S. Shyam Sundar, Sarah Rajtmajer, Amulya Yadav -
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, Soroosh Mariooryad, Matt Shannon, Julian Salazar, David Teh-Hwa Kao -
Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models
Varun Gumma, Pranjal A Chitale, Kalika Bali -
Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs
Shuyang Yu, Runxue Bao, Parminder Bhatia, Taha Kass-Hout, Jiayu Zhou, Cao Xiao -
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li -
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra, Siddharth Sriraman, Gaurav Verma, Harneet Singh Khanuja, Jose Suarez Campayo, Zihang Li, Michael L. Birnbaum, Munmun De Choudhury -
Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages
Hoang H Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary -
How to Align Multiple Signed Language Corpora for Better Sign-to-Sign Translations?
Mert Inan, Yang Zhong, Vidya Ganesh, Malihe Alikhani -
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi, Christopher D Manning, Shikhar Murty -
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Kimihiro Hasegawa, Wiradee Imrattanatrai, Zhi-Qi Cheng, Masaki Asada, Susan Holm, Yuran Wang, Ken Fukuda, Teruko Mitamura -
Communication Makes Perfect: Persuasion Dataset Construction via Multi-LLM Communication
Weicheng Ma, Hefan Zhang, Ivory Yang, Shiyu Ji, Joice Chen, Farnoosh Hashemi, Shubham Mohole, Ethan Gearey, Michael Macy, Saeed Hassanpour, Soroush Vosoughi -
Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing
Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang -
ReGLA: Refining Gated Linear Attention
Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, Philippe Langlais -
A Distributional Perspective on Word Learning in Neural Language Models
Filippo Ficarra, Ryan Cotterell, Alex Warstadt -
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze -
Superlatives in Context: Modeling the Implicit Semantics of Superlatives
Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty -
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligoric, Tijana Zrnic, Cinoo Lee, Emmanuel Candes, Dan Jurafsky -
Simulating Classroom Education with LLM-Empowered Agents
Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin LIU, Zhiyuan Liu, Lei Hou, Juanzi Li -
Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models
Soham Poddar, Paramita Koley, Janardan Misra, Niloy Ganguly, Saptarshi Ghosh -
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
Lei Wang, Jianxun Lian, Yi Huang, Yanqi Dai, Haoxuan Li, Xu Chen, Xing Xie, Ji-Rong Wen -
ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations
Yichuan Li, Xinyang Zhang, Chenwei Zhang, Mao Li, Tianyi Liu, Pei Chen, Yifan Gao, Kyumin Lee, Kaize Ding, Zhengyang Wang, Zhihan Zhang, Jingbo Shang, Xian Li, Trishul Chilimbi -
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, Graham Neubig -
Decoding Hate: Exploring Language Models’ Reactions to Hate Speech
Paloma Piot, Javier Parapar -
ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning
Vy Vo, Lizhen Qu, Tao Feng, YUNCHENG HUA, Xiaoxi Kang, Songhai Fan, Tim Dwyer, Lay-Ki Soon, Gholamreza Haffari -
Arabic Dataset for LLM Safeguard Evaluation
Yasser Ashraf, Yuxia Wang, Bin Gu, Preslav Nakov, Timothy Baldwin -
Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction
William P Hogan, Jingbo Shang -
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
Sungjin Park, Xiao Liu, Yeyun Gong, Edward Choi -
Is a Peeled Apple Still Red? Evaluating LLMs’ Ability for Conceptual Combination with Property Type
Seokwon Song, Taehyun Lee, Jaewoo Ahn, Jae Hyuk Sung, Gunhee Kim -
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong -
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors
Georgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan -
SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection
Sindhu Padakandla, Sadbhavana Babar, Rathod Darshan D, Manohar Kaul -
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, fu yujie, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha -
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
David Wan, Justin Chen, Elias Stengel-Eskin, Mohit Bansal -
Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations
Yinghan Zhou, Juan Wen, Wanli Peng, Xue yiming, ZiWei Zhang, Wu Zhengxian -
AID: Adaptive Integration of Detectors for Safe AI with Language Models
Xinran Wang, Enmao Diao, Qi Le, Jie Ding, Ali Anwar -
FactTrack: Time-Aware World State Tracking in Story Outlines
Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Dan Klein -
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
Suyoung Bae, YunSeok Choi, Jee-Hyong Lee -
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang, Xiaolei Wang, Xin Zhao, Ji-Rong Wen -
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
Sami Baral, Li Lucy, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo -
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung -
Beemo: Benchmark of Expert-edited Machine-generated Outputs
Ekaterina Artemova, Jason S Lucas, Saranya Venkatraman, Jooyoung Lee, Sergei Tilga, Adaku Uchendu, Vladislav Mikhailov -
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou -
Multi$^3$Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models
Minh Duc Bui, Katharina von der Wense, Anne Lauscher -
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang, Mengyu Bu, Yang Feng -
FiNE: Filtering and Improving Noisy Data Elaborately with Large Language Models
Junliang He, Ziyue Fan, Shaohui Kuang, Li Xiaoqing, Kai Song, Yaqian Zhou, Xipeng Qiu -
Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
Kaushal Kumar Maurya, KV Aditya Srivatsa, Kseniia Petukhova, Ekaterina Kochmar -
An LLM-Based Approach for Insight Generation in Data Analysis
Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood -
A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding
Abdulfattah Safa, Gözde Gül Şahin -
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin -
Language Models Predict Empathy Gaps Between Social In-groups and Out-groups
Yu Hou, Hal Daumé III, Rachel Rudinger -
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
Lindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao -
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging
Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing -
Language Models Can Infer Action Semantics for Symbolic Planners from Environment Feedback
Wang Bill Zhu, Ishika Singh, Robin Jia, Jesse Thomason -
WHoW: A Cross-domain Approach for Analysing Conversation Moderation
Ming-Bin Chen, Lea Frermann, Jey Han Lau -
MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet Extraction
Shaopeng Tang, Lin Li, Xiaohui Tao, Leqi Zhong, Qing Xie -
What the #?*!: Disentangling Hate Across Target Identities
Yiping Jin, Leo Wanner, Aneesh Moideen Koya -
CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs
Amey Hengle, Aswini Kumar Padhi, Anil Bandhakavi, Tanmoy Chakraborty -
CoME: An Unlearning-based Approach to Conflict-free Model Editing
Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim -
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai, Tetsuro Morimura, Kaito Ariu, Kenshi Abe -
Graph Neural Network Enhanced Retrieval for Question Answering of Large Language Models
Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang -
Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning
Faeze Ghorbanpour, Viktor Hangya, Alexander Fraser -
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi, Areeb Ahmad, Divyaksh Shukla, Ashutosh Modi -
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal -
K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor
Jeonghun Cho, Gary Lee -
LLMs as Meta-Reviewers’ Assistants: A Case Study
Eftekhar Hossain, Sanjeev Kumar Sinha, Naman Bansal, R. Alexander Knipper, Souvika Sarkar, John Salvador, Yash mahajan, Sri Ram Pavan Kumar Guttikonda, Mousumi Akter, Md. Mahadi Hassan, Matthew Freestone, Matthew C. Williams Jr., Dongji Feng, Santu Karmaker -
Understanding Figurative Meaning through Explainable Visual Entailment
Arkadiy Saakyan, Shreyas Kulkarni, Tuhin Chakrabarty, Smaranda Muresan -
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data
Deren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu Wang, Emily Ching, Alex Deng -
IHEval: Evaluating Language Models on Following the Instruction Hierarchy
Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, Yichuan Li, Qingyu Yin, Bing Yin, Meng Jiang -
Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization
Jin Zhao, Jingxuan Tu, Bingyang Ye, Xinrui Hu, Nianwen Xue, James Pustejovsky -
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping
Ryan Li, Yanzhe Zhang, Diyi Yang -
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch -
KS-Lottery: Finding Certified Lottery Tickets for Multilingual Transfer in Large Language Models
Fei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei Li -
Open-World Evaluation for Retrieving Diverse Perspectives
Hung-Ting Chen, Eunsol Choi -
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness
Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, Jordan Lee Boyd-Graber -
Benchmarking Language Model Creativity: A Case Study on Code Generation
Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Sanjeev Khudanpur, Meng Jiang, Daniel Khashabi -
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi -
EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in Dialogues
Qiming Feng, Qiujie Xie, Xiaolong Wang, Qingqiu Li, Yuejie Zhang, Rui Feng, Tao Zhang, Shang Gao -
ToW: Thoughts of Words Improve Reasoning in Large Language Models
Zhikun Xu, Ming Shen, Jacob Dineen, Zhaonan Li, Xiao Ye, Shijie Lu, Aswin RRV, Chitta Baral, Ben Zhou -
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Xueyang Feng, bo lan, Quanyu Dai, Lei Wang, Jiakai Tang, Xu Chen, Zhenhua Dong, Ji-Rong Wen -
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
Antonia Karamolegkou, Sandrine Schiller Hansen, Ariadni Christopoulou, Filippos Stamatiou, Anne Lauscher, Anders Søgaard -
Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions
Hongru WANG, Boyang XUE, Baohang Zhou, Tianhua Zhang, Cunxiang Wang, Huimin WANG, Guanhua Chen, Kam-Fai Wong -
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen -
Improving Data Annotation for Low-Resource Relation Extraction with Logical Rule-Augmented Collaborative Language Models
Xiyang Liu, Chunming Hu, Richong Zhang, Junfan Chen, Baowen Xu -
Matina: A Large-Scale 73B Token Persian Text Corpus
Sara Bourbour Hosseinbeigi, Heshaam Faili, Fatemeh Taherinezhad, Hamed Baghbani, Fatemeh Nadi, Mostafa Amiri -
CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking
Delvin Ce Zhang, Dongwon Lee -
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, Meng Jiang -
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
Jiwoong Sohn, Yein Park, Chanwoong Yoon, Sihyeon Park, Hyeon Hwang, Mujeen Sung, Hyunjae Kim, Jaewoo Kang -
Stronger Models are Not Always Stronger Teachers for Instruction Tuning
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran -
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov -
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang -
DIRAS: Efficient LLM Annotation of Document Relevance for Retrieval Augmented Generation
Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold -
SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture
Jiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Yuanfang Zhang, Weibo Zheng, Donghong Han -
SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA
Venktesh V, Mandeep Rathee, Avishek Anand -
Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training
Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini, Zoran Tiganj -
A Survey of QUD Models for Discourse Processing
Yingxue Fu -
Not All Models Are Created Equal: Differences in which Surprisal Predicts Reading Time by Speaker First Language
Shannon Clark, Daniela Teodorescu, Lin Chen, Gaisha Oralova, Charles Perfetti, Alona Fyshe, Carrie Demmans Epp -
Adapting Sentence-level Automatic Metrics for Document-level Simplification Evaluation
Mounica Maddela, Fernando Alva-Manchego -
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja, Vivek Iyer, Xiaoyu He, Graham Neubig -
Substance Beats Style: Why Beginning Students Fail to Code with LLMs
Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson -
Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities
Kourosh T Baghaei, Dieter Pfoser, Antonios Anastasopoulos -
Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering
Yeonjun In, Sungchul Kim, Ryan A. Rossi, Mehrab Tanjim, Tong Yu, Ritwik Sinha, Chanyoung Park -
Efficient Prompting for Continual Adaptation to Missing Modalities
Zirun Guo, Shulei Wang, Wang Lin, Weicai Yan, Yangyang Wu, Tao Jin -
How to Make the Most of LLMs’ Grammatical Knowledge for Acceptability Judgments
Yusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe -
TurkingBench: A Challenge Benchmark for Web Agents
Kevin Xu, Yeganeh Kordi, Tanay Nayak, Adi Asija, Yizhong Wang, Kate Sanders, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi -
The State and Fate of Summarization Datasets: A Survey
Noam Dahan, Gabriel Stanovsky -
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models
Chaoqun Liu, Wenxuan Zhang, Yiran Zhao, Anh Tuan Luu, Lidong Bing -
Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization
Min Xiao, Junnan Zhu, Feifei Zhai, Chengqing Zong, Yu Zhou -
S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency
Yuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, XiTai Jin, Chen Tianying Tiana, Jing Li, Xiaohua Xu -
No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks
Ryan A. Cook, John P. Lalor, Ahmed Abbasi -
Anticipating Future with Large Language Model for Simultaneous Machine Translation
Siqi Ouyang, Oleksii Hrinchuk, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Lei Li, Boris Ginsburg -
LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?
Jan Cegin, Jakub Simko, Peter Brusilovsky -
SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen, Qi Bi, JIA-HONG HUANG, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania -
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
Ada Tur, Gaurav Kamath, Siva Reddy -
Fine-Tuned LLMs are “Time Capsules” for Tracking Societal Bias Through Books
Sangmitra Madhusudan, Robert Morabito, Skye Reid, Nikta Gohari Sadr, Ali Emami -
Enhancing Language Model Hypernetworks with Restart: A Study on Optimization
Yihan Zhang, Jie Fu, Rongrong Ji, Jie Chen -
Private Synthetic Text Generation with Diffusion Models
Sebastian Ochs, Ivan Habernal -
HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh -
PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona
Jihyun Lee, Yejin Jeon, Seungyeon Seo, Gary Lee -
When2Call: When (not) to Call Tools
Hayley Ross, Ameya Sunil Mahabaleshwarkar, Yoshi Suhara -
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee -
MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
Nishant Balepur, Alexa Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun, Jordan Lee Boyd-Graber, Puneet Mathur -
Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework
Reza Averly, Xia Ning -
Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction
Dongming Sheng, Kexin Han, Hao Li, Yan Zhang, Yucheng Huang, Jun Lang, Wenqiang Liu -
SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent
Keyeun Lee, Seo Hyeong Kim, Seolhee Lee, Jinsu Eun, Yena Ko, Hayeon Jeon, Esther Hehsun Kim, Seonghye Cho, Soeun Yang, Eun-mee Kim, Hajin Lim -
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao -
CORG: Generating Answers from Complex, Interrelated Contexts
Hyunji Lee, Franck Dernoncourt, Trung Bui, Seunghyun Yoon -
CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following Analysis
Ziyue Fan, Junliang He, Li Xiaoqing, Shaohui Kuang, Kai Song, Yaqian Zhou, Xipeng Qiu -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Xin Zhao, Yang Song, Tao Zhang -
An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMs
Eui Jun Hwang, Sukmin Cho, Junmyeong Lee, Jong C. Park -
Pula: Training Large Language Models for Setswana
Nathan Brown, Vukosi Marivate -
QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang -
FLEX: Expert-level False-Less EXecution Metric for Text-to-SQL Benchmark
Heegyu Kim, Jeon taeyang, SeungHwan Choi, Seungtaek Choi, Hyunsouk Cho -
Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models
Joan Nwatu, Oana Ignat, Rada Mihalcea -
Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization
Yang Zhong, Diane J. Litman -
Representing Rule-based Chatbots with Transformers
Dan Friedman, Abhishek Panigrahi, Danqi Chen -
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang, Seonghyeon Ye, Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo -
Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora
JoonHo Lee, JuYoun Son, Juree Seok, Wooseok Jang, Yeong-Dae Kwon -
Are explicit belief representations necessary? A comparison between Large Language Models and Bayesian probabilistic models
Dingyi Pan, Ben Bergen -
Assessing the State of the Art in Scene Segmentation
Albin Zehe, Elisabeth Fischer, Andreas Hotho -
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi -
Efficient One-shot Compression via Low-Rank Local Feature Distillation
Yaya SY, Christophe Cerisara, Irina Illina -
The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals
Xiaofeng Wu, Karl Stratos, Wei Xu -
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Xiaomeng Jin, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong -
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei DAI, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He -
Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang, Minpeng Liao, Kai Fan -
JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking
Samar Mohamed Magdy, Sang Yun Kwon, Fakhraddin Alwajih, Safaa Taher Abdelfadil, Shady Shehata, Muhammad Abdul-Mageed -
Functional Lexicon in Subword Tokenization
Zachary William Hopton, Yves Scherrer, Tanja Samardzic -
Exploring the Cost-Effectiveness of Perspective Taking in Crowdsourcing Subjective Assessment: A Case Study of Toxicity Detection
Xiaoni Duan, Zhuoyan Li, Chien-Ju Ho, Ming Yin -
SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model
Jiayang Yu, Yihang Zhang, Bin Wang, Peiqin Lin, YongKang Liu, Shi Feng -
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models
Hyegang Son, Yonglak Son, Changhoon Kim, Young Geun Kim -
Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation
Cong-Duy T Nguyen, Xiaobao Wu, Thong Thanh Nguyen, Shuai Zhao, Khoi M. Le, Nguyen Viet Anh, Feng Yichao, Anh Tuan Luu -
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder -
MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media
Muhammad Arslan Manzoor, Ruihong Zeng, Dilshod Azizov, Preslav Nakov, Shangsong Liang -
Extracting and Understanding the Superficial Knowledge in Alignment
Runjin Chen, Gabriel Jacob Perin, Xuxi Chen, Xilun Chen, Yan Han, Nina S. T. Hirata, Junyuan Hong, Bhavya Kailkhura -
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
Jianxin Liang, Xiaojun Meng, Huishuai Zhang, Yueqian Wang, Jiansheng Wei, Dongyan Zhao -
FedSpaLLM: Federated Pruning of Large Language Models
Guangji Bai, Yijiang Li, Zilinghan Li, Liang Zhao, Kibaek Kim -
A Survey of NLP Progress in Sino-Tibetan Low-Resource Languages
Shuheng Liu, Michael Best -
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding, Zhiheng Xi, Wei He, Lizhuoyuan, Yitao Zhai, Shi Xiaowei, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang -
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Pei Wang, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu -
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
Reya Vir, Shreya Shankar, Harrison Chase, William Hinthorn, Aditya Parameswaran -
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Atharva Naik, Marcus Alenius, Daniel Fried, Carolyn Rose -
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
Bingfeng chen, Shaobin Shi, yongqi luo, Boyan Xu, Ruichu Cai, Zhifeng Hao -
Stronger Universal and Transferable Attacks by Suppressing Refusals
David Huang, Avidan Shah, Alexandre Araujo, David Wagner, Chawin Sitawarin -
You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL
Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng -
Rethinking the Role of LLMs for Document-level Relation Extraction: a Refiner with Task Distribution and Probability Fusion
Fu Zhang, Xinlong Jin, Jingwei Cheng, Hongsen Yu, Huangming Xu -
LBC: Language-Based-Classifier for Out-Of-Variable Generalization
Kangjun Noh, Baekryun Seong, Hoyoon Byun, Youngjun Choi, Sungjin Song, Kyungwoo Song -
ComPO: Community Preferences for Language Model Personalization
Sachin Kumar, Chan Young Park, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi -
Grammar Control in Dialogue Response Generation for Language Learning Chatbots
Dominik Glandorf, Peng Cui, Detmar Meurers, Mrinmaya Sachan -
A Template Is All You Meme
Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych -
SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals
Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang -
ITALIC: An Italian Culture-Aware Natural Language Benchmark
Andrea Seveso, Daniele Potertì, Edoardo Federici, Mario Mezzanzanica, Fabio Mercorio -
Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language Models
Danyang Liu, Fanjie Kong, Xiaohang Sun, Dhruva Patil, Avijit Vajpayee, Zhu Liu, Vimal Bhat, Najmeh Sadoughi -
Investigating Human Values in Online Communities
Nadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein -
AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction
Jie Feng, Yuwei Du, Jie Zhao, Yong Li -
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon, Roi Reichart -
LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems
Nan Xu, Xuezhe Ma -
Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective
Shenglai Zeng, Jiankun Zhang, Bingheng Li, Yuping Lin, Tianqi Zheng, Dante Everaert, Hanqing Lu, Hui Liu, Hui Liu, Yue Xing, Monica Xiao Cheng, Jiliang Tang -
Generative Prompt Internalization
Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo -
ScreenQA: Large-Scale Question-Answer Pairs Over Mobile App Screenshots
Yu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Srinivas Sunkara, Victor Carbune, Jason Lin, Maria Wang, Yun Zhu, Jindong Chen -
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Quoc Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Manoj Awalgaonkar, Rithesh R N, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong -
Advancing MoE Efficiency: A Collaboration-Constrained Routing ($\texttt{C2R}$) Strategy for Better Expert Parallelism Design
Mohan Zhang, Pingzhi Li, Jie Peng, Mufan Qiu, Tianlong Chen -
ALiiCE: Evaluating Positional Fine-grained Citation Generation
Yilong Xu, Jinhua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng -
LLM-Supported Natural Language to Bash Translation
Finnian Westenfelder, Erik Hemberg, Stephen Moskal, Una-May O’Reilly, Silviu Chiricescu -
PICLe: Pseudo-annotations for In-Context Learning in Low-Resource Named Entity Detection
Sepideh Mamooler, Syrielle Montariol, Alexander Mathis, Antoine Bosselut -
Fine-Tuning Large Language Models with Sequential Instructions
Hanxu Hu, Simon Yu, Pinzhen Chen, Edoardo Ponti -
Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue -
From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks
Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye -
One Unified Model for Diverse Tasks: Emotion Cause Analysis via Self-Promote Cognitive Structure Modeling
Zhaoxin Yu, Xinglin Xiao, Wenji Mao -
Through the Lens of History: Methods for Analyzing Temporal Variation in Content and Framing of State-run Chinese Newspapers
Shijia Liu, David A. Smith -
A Systematic Examination of Preference Learning through the Lens of Instruction-Following
Joongwon Kim, Anirudh Goyal, Aston Zhang, Bo Xiong, Rui Hou, Melanie Kambadur, Dhruv Mahajan, Hannaneh Hajishirzi, Liang Tan -
Mutual-pairing Data Augmentation for Fewshot Continual Relation Extraction
Nguyen Hoang Anh, Quyen Tran, Thanh Xuan Nguyen, Nguyen Thi Ngoc Diep, Linh Ngo Van, Thien Huu Nguyen, Trung Le -
Coverage-based Fairness in Multi-document Summarization
Haoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi -
Guiding Medical Vision-Language Models with Diverse Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
Kangyu Zhu, Ziyuan Qin, Huahui Yi, Zekun Jiang, Qicheng Lao, Shaoting Zhang, Kang Li -
CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts
Malvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis Konstas, Alessandro Suglia -
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou, Yang Zhang, Jacob Andreas, Shiyu Chang -
The Impact of Domain-Specific Terminology on Machine Translation for Finance in European Languages
Arturo Oncevay, Charese Smiley, Xiaomo Liu -
Exploring the Potential of Large Language Models for Heterophilic Graphs
Yuxia Wu, Shujie Li, Yuan Fang, Chuan Shi -
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen -
Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion
Ziyao Xu, Houfeng Wang -
Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning
Gaurav Arora, Srujana Merugu, shreya jain, Vaibhav Saxena -
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després, Jinyue Feng, Zining Zhu, Frank Rudzicz -
DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang -
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni, Jonathan Colaço Carr, Yash More, Jackie CK Cheung, Golnoosh Farnadi -
A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model
Shengxiang Gao, Fang Nan, Yongbing Zhang, Yuxin Huang, Kaiwen Tan, Zhengtao Yu -
Ihquin tlahtouah in Tetelahtzincocah: An annotated, multi-purpose audio and text corpus of Western Sierra Puebla Nahuatl
Robert Pugh, Cheyenne Wing, María Ximena Juárez Huerta, Angeles Márquez Hernandez, Francis M. Tyers -
Knowledge Graph Guided Evaluation of Abstention Techniques
Kinshuk Vasisht, Navreet Kaur, Danish Pruthi -
A Novel Computational Modeling Foundation for Automatic Coherence Assessment
Aviya Maimon -
Query-focused Referentiability Learning for Zero-shot Retrieval
Jaeyoung Kim, Dohyeon Lee, seung-won hwang -
MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps
Xiongtao Zhou, Jie He, Lanyu Chen, jingyu li, Haojing Chen, Victor Gutierrez Basulto, Jeff Z. Pan, Hanjie Chen -
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan -
PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries
Mingwen Dong, Nischal Ashok Kumar, Yiqun Hu, Anuj Chauhan, Chung-Wei Hang, Shuaichen Chang, Lin Pan, Wuwei Lan, Henghui Zhu, Jiarong Jiang, Patrick Ng, Zhiguo Wang -
Option Symbol Matters: Investigating and Mitigating Multiple-Choice Option Symbol Bias of Large Language Models
Zhen Yang, Ping Jian, Chengzhi Li -
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba Oluwadara Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Ijeoma Chukwuneke, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson Koffi KALIPE, Jonathan Mukiibi, Salomon KABONGO KABENAMUALU, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Salomey Osei, Shamsuddeen Hassan Muhammad, Sokhar Samb, Tadesse Kebede Guge, Tombekai Vangoni Sherman, Pontus Stenetorp -
PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles
Siyan Li, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, Zhou Yu -
Tonguescape: Exploring Language Models Understanding of Vowel Articulation
Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe -
Exploring Large Language Models for Effective Rumor Detection on Social Media
Yirong Zeng, Xiao Ding, Bibo Cai, Ting Liu, Bing Qin -
Instantly Learning Preference Alignment via In-context DPO
Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang -
MASTER: A Multi-Agent System with LLM Specialized MCTS
BINGZHENG GAN, Yufan Zhao, Tianyi Zhang, Jing Huang, LI YUSU, Shu Xian Teo, Changwang Zhang, Wei Shi -
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine -
Multimodal Cognitive Reframing Therapy via Multi-hop Psychotherapeutic Reasoning
Subin Kim, Hoonrae Kim, Heejin Do, Gary Lee -
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang -
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
Maximilian Spliethöver, Tim Knebler, Fabian Fumagalli, Maximilian Muschalik, Barbara Hammer, Eyke Hüllermeier, Henning Wachsmuth -
Evaluating and Improving Graph to Text Generation with Large Language Models
Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Victor Gutierrez Basulto, Jeff Z. Pan -
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon, Manish Shrivastava, David Krueger, Ekdeep Singh Lubana -
PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection
Jooyoung Lee, Toshini Agrawal, Adaku Uchendu, Thai Le, Jinghui Chen, Dongwon Lee -
What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language Models
Shengqi Zhu, Jeffrey Rzeszotarski -
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher, Brian R. Bartoldson, Tal Ben-Nun, Michael Cardei, Bhavya Kailkhura, Ferdinando Fioretto -
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar -
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval
Yu Xia, Junda Wu, Sungchul Kim, Tong Yu, Ryan A. Rossi, Haoliang Wang, Julian McAuley -
Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework
Jingyi Sun, Pepa Atanasova, Isabelle Augenstein -
THREAD: Thinking Deeper with Recursive Spawning
Philip Schroeder, Nathaniel W. Morgan, Hongyin Luo, James R. Glass -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander G Hauptmann, Yonatan Bisk, Yiming Yang -
Characterizing the Role of Similarity in the Property Inferences of Language Models
Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra -
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Sina Rismanchian, Yasaman Razeghi, Sameer Singh, Shayan Doroudi -
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning
Jinu Lee, Wonseok Hwang -
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang, Yongqi Tong, Hengyuan Zhang, Dawei Li, Xin Zhang, Tianlong Chen -
PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics
Daniil Larionov, Steffen Eger -
Not all Hallucinations are Good to Throw Away When it Comes to Legal Abstractive Summarization
Nihed Bendahman, Karen Pinel-Sauvagnat, Gilles Hubert, Mokhtar Boumedyen BILLAMI -
From Evidence to Belief: A Bayesian Epistemology Approach to Language Models
Minsu Kim, Sangryul Kim, James Thorne -
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Satya Krishna Gorti, Ilan Gofman, Zhaoyan Liu, Jiapeng Wu, Noël Vouitsis, Guangwei Yu, Jesse C. Cresswell, Rasa Hosseinzadeh -
AlgoPuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Algorithmic Multimodal Puzzles
Deepanway Ghosal, Vernon Toh, Yew Ken Chia, Soujanya Poria -
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyah Sanni, Tassallah Abdullahi, Devendra Deepak Kayande, Emmanuel Ayodele, Naome A Etori, Michael Samwel Mollel, Moshood O. Yekini, Chibuzor Okocha, Lukman Enegi Ismaila, Folafunmi Omofoye, Boluwatife A. Adewale, Tobi Olatunji -
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang -
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich -
Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs
Keqi Deng, Guangzhi Sun, Phil Woodland -
Evaluating the Prompt Steerability of Large Language Models
Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Kush R. Varshney, Eitan Farchi, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu, Prasanna Sattigeri -
Style Transfer with Multi-iteration Preference Optimization
Shuai Liu, Jonathan May -
Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning
Junzhi Chen, Juhao Liang, Benyou Wang -
SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning
Magdalena Wysocka, Danilo Carvalho, Oskar Wysocki, Marco Valentino, Andre Freitas -
REFFLY: Melody-Constrained Lyrics Editing Model
Songyan Zhao, Bingxuan Li, Yufei Tian, Nanyun Peng -
Analyzing and Improving Coherence of Large Language Models in Question Answering
Ivano Lauriola, Stefano Campese, Alessandro Moschitti -
Forest for the Trees: Overarching Prompting Evokes High-Level Reasoning in Large Language Models
Haoran Liao, Shaohua Hu, Zhihao Zhu, Hao HE, Yaohui Jin -
Soft Prompting for Unlearning in Large Language Models
Karuna Bhaila, Minh-Hao Van, Xintao Wu -
AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, Maarten Sap -
Self-Harmonized Chain of Thought
Ziqi Jin, Wei Lu -
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge
Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal -
Disentangling language change: sparse autoencoders quantify the semantic evolution of indigeneity in French
Jacob A. Matthews, Laurent Dubreuil, Imane Terhmina, Yunci Sun, Matthew Wilkens, Marten Van Schijndel -
Large Language Models Are Cross-Lingual Knowledge-Free Reasoners
Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang -
Familarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data
Jonas Golde, Patrick Haller, Max Ploner, Fabio Barth, Nicolaas Paul Jedema, Alan Akbik -
E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic Expressions
Hongbo Zheng, Suyuan Wang, Neeraj Gangwar, Nickvash Kani -
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Ashish Seth, Ramaneswaran Selvakumar, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha -
Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences
Heejin Kook, Junyoung Kim, Seongmin Park, Jongwuk Lee -
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang -
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, Wenmeng Zhou -
Embedding derived animacy rankings offer insights into the sources of grammatical animacy
Vivian G. Li -
Handling Missing Entities in Zero-Shot Named Entity Recognition: Integrated Recall and Retrieval Augmentation
Ruichu Cai, Junhao Lu, Zhongjie Chen, Boyan Xu, Zhifeng Hao -
Prototype Conditioned Generative Replay for Continual Learning in NLP
Xi Chen, Min Zeng -
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection
Shengmin Piao, Sanghyun Park -
Self-Training Meets Consistency: Improving LLMs’ Reasoning with Consistency-Driven Rationale Evaluation
Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak -
SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
Yan Yang, Zeguan Xiao, Xin Lu, Hongru WANG, Xuetao Wei, Hailiang Huang, Guanhua Chen, Yun Chen -
Are We Done with MMLU?
Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini -
UOREX: Towards Uncertainty-Aware Open Relation Extraction
Rebii Jamal, Mounir OUREKOUCH, Mohammed Erradi -
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Xuanjing Huang, zhongyu wei -
On Positional Bias of Faithfulness for Long-form Summarization
David Wan, Jesse Vig, Mohit Bansal, Shafiq Joty -
CAST: Corpus-Aware Self-similarity Enhanced Topic modelling
Yanan Ma, Chenghao Xiao, Chenhan Yuan, Sabine N van der Veer, Lamiece Hassan, Chenghua Lin, Goran Nenadic -
VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction
Xiaoman Wang, DAN YUAN, Xin Liu, Yike Zhao, Xiaoxiao Zhang, Xizhi Chen, Yunshi Lan -
FactEval: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models
Mamta Mamta, Oana Cocarascu -
Patent-CR: A Dataset for Patent Claim Revision
Lekang Jiang, Pascal A. Scherz, Stefan Goetz -
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman, Stephen Bach -
Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation
Liwen Sun, James Jialun Zhao, Wenjing Han, Chenyan Xiong -
Mixture of Multimodal Adapters for Sentiment Analysis
Kezhou Chen, Shuo Wang, Huixia Ben, Shengeng Tang, Yanbin Hao -
Benchmarking Failures in Tool-Augmented Language Models
Eduardo Treviño, Hugo Contant, James Ngai, Graham Neubig, Zora Zhiruo Wang -
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru WANG, Xuanli He, Kam-Fai Wong, Pasquale Minervini -
It Is Not Only the Negative that Deserves Attention! Understanding, Generation & Evaluation of (Positive) Moderation
Iman Jundi, Eva Maria Vecchi, Carlotta Quensel, Neele Falk, Gabriella Lapesa -
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian -
Measuring memorization in language models via probabilistic extraction
Jamie Hayes, Marika Swanberg, Ilia Shumailov, Itay Yona, Harsh Chaudhari, A. Feder Cooper, Christopher A. Choquette-Choo, Katherine Lee, Milad Nasr -
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien -
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing
James Seale Smith, Chi-Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, Yen-Chang Hsu -
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka, Assaf Ben-Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes -
Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory
Haoran Li, Wei Fan, Yulin Chen, Cheng Jiayang, Tianshu Chu, Xuebing Zhou, Peizhao Hu, Yangqiu Song -
NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals
Neha Srikanth, Rachel Rudinger -
Evaluating and Mitigating Object Hallucination in Large Vision-Language Models: Can They Still See Removed Objects?
Yixiao He, Haifeng Sun, Pengfei Ren, Jingyu Wang, Huazheng Wang, Qi Qi, Zirui Zhuang, Jing Wang -
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example Banks
Soumya Suvra Ghosal, Soumyabrata Pal, Koyel Mukherjee, Dinesh Manocha -
From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection
Rupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang -
The Russian-focused embedders’ exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev, Maria Tikhonova, Maksimova Anna, Alena Fenogenova, Aleksandr Abramov -
SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc
Daniel Guzman Olivares, Lara Quijano, Federico Liberatore -
Self-calibration for Language Model Quantization and Pruning
Miles Williams, George Chrysostomou, Nikolaos Aletras -
SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs
Zhichao Shi, Shaoling Jing, Yi Cheng, Hao Zhang, Yuanzhuo Wang, Jie Zhang, Huawei Shen, Xueqi Cheng -
Temporal-Aware Soft Prompt Tuning for Automatic Text Dating
Hai Wang, Yuzhi Liang, Han Ren -
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J Liu, Xuanhui Wang -
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
Abhilasha Ravichander, Jillian Fisher, Taylor Sorensen, Ximing Lu, Maria Antoniak, Bill Yuchen Lin, Niloofar Mireshghallah, Chandra Bhagavatula, Yejin Choi -
Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement
Qianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang, daiyuan li, Yu Hu, Mingkui Tan -
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu, Kejian Shi, Alexander Fabbri, Yilun Zhao, PeiFeng Wang, Chien-Sheng Wu, Shafiq Joty, Arman Cohan -
Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation
Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li -
HISTOIRESMORALES: A French Dataset for Assessing Moral Alignment
Thibaud Leteno, Irina Proskurina, Antoine Gourru, Julien Velcin, Charlotte Laclau, Guillaume Metzler, Christophe Gravier -
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
Junehyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin Kim -
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
Longju Bai, Angana Borah, Oana Ignat, Rada Mihalcea -
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu -
DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization
Yasir Khan, Xinlei Wu, Sangpil Youm, Justin Ho, Aryaan Mehboob Shaikh, Jairo Garciga, Rohan Sharma, Bonnie J Dorr -
Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis
Pamela D Riviere, Anne L. Beatty-Martínez, Sean Trott -
Conformalized Answer Set Prediction for Knowledge Graph Embedding
Yuqicheng Zhu, Nico Potyka, Jiarong Pan, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab -
A Multi-modal Large Language Model with Graph-of-Thought for Effective Recommendation
Zixuan Yi, Iadh Ounis -
Meta-Cultural Competence: Climbing the Right Hill of Cultural Awareness
Sougata Saha, Saurabh Kumar Pandey, Monojit Choudhury -
ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation
Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang -
On the Role of Speech Data in Reducing Toxicity Detection Bias
Samuel Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà -
CodeSCM: Causal Analysis for Multi-Modal Code Generation
Mukur Gupta, Noopur Bhatt, Suman Jana -
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
Elita Lobo, Chirag Agarwal, Himabindu Lakkaraju -
ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors
Qinchan Li, Sophie Hao -
Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data
Haonan Wang, Minbin Huang, Runhui Huang, Lanqing HONG, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi -
Fighting Spurious Correlations in Text Classification via a Causal Learning Perspective
Yuqing Zhou, Ziwei Zhu -
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Jaechang Kim, Jinmin Goh, Inseok Hwang, Jaewoong Cho, Jungseul Ok -
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao -
Reliability of Topic Modeling
Kayla Schroeder, Zach Wood-Doughty -
TRANSIENTTABLES: Evaluating LLMs’ Reasoning on Temporally Evolving Semi-structured Tables
Abhilash Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta -
On the Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models
Tianyang Zhao, Kunwar Yashraj Singh, srikar appalaraju, Peng Tang, Ying Nian Wu, Li Erran Li -
Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization
Zilu Tang, Rajen Chatterjee, Sarthak Garg -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang -
Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction
lu Yang, Jiajia Li, En Ci, Lefei Zhang, Zuchao Li, Ping Wang -
Fine-grained Fallacy Detection with Human Label Variation
Alan Ramponi, Agnese Daffara, Sara Tonelli -
Soft Language Prompts for Language Transfer
Ivan Vykopal, Simon Ostermann, Marian Simko -
Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation
Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui -
Soft Syntactic Reinforcement for Neural Event Extraction
Anran Hao, Jian Su, Shuo Sun, Teo Yong Sen -
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models
Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić -
Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack
Cheng Wang, Yiwei Wang, Yujun Cai, Bryan Hooi -
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao, Xinyu Hu, Li Lin, Xiaojun Wan -
World Models with Hints of Large Language Models for Goal Achieving
Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu -
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma, Zekun Wang, Joyce Chai -
Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment
Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David R Mortensen -
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao, Daniel Ben-Levi, Wei Hao, Junfeng Yang, Chengzhi Mao -
NAT: Enhancing Agent Tuning with Negative Samples
Renxi Wang, Xudong Han, Yixuan Zhang, Timothy Baldwin, Haonan Li -
AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems
Arihant Jain, Purav Aggarwal, Rishav Sahay, Chaosheng Dong, Anoop Saladi -
Learning to Summarize from LLM-generated Feedback
Hwanjun Song, Taewon Yun, Yuho Lee, Jihwan Oh, Gihun Lee, Jason Cai, Hang Su -
FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation
Garrett Tanzer -
Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models
Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty -
Prompt Compression for Large Language Models: A Survey
Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier -
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Juan Pablo Munoz, Jinjie Yuan, Nilesh Jain -
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation
Tianyu Liu, Jirui Qi, Paul He, Arianna Bisazza, Mrinmaya Sachan, Ryan Cotterell -
Has this Fact been Edited? Detecting Knowledge Edits in Language Models
Paul Youssef, Zhixue Zhao, Christin Seifert, Jörg Schlötterer -
DPL: Diverse Preference Learning Without A Reference Model
Abhijnan Nath, Andrey Volozin, Saumajit Saha, Albert Aristotle Nanda, Galina Grunin, Rahul Bhotika, Nikhil Krishnaswamy -
LegalViz: Legal Text Visualization by Text To Diagram Generation
Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita -
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
Sanwoo Lee, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Yunfang Wu -
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Thom Lake, Eunsol Choi, Greg Durrett -
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao, Zhen Ge, sujay sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, qingjun cui, Trishul Chilimbi -
WebQuality: A Large-scale Multi-modal Web Page Quality Assessment Dataset with Multiple Scoring Dimensions
Tao Zhang, Yige Wang, ZhuHangyu, Li Xin, CHEN XIANG, Tian Hua Zhou, Jin Ma -
A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization
Haoxin Liu, Chenghao Liu, B. Aditya Prakash -
Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition
Guanlin Li, Yuki Arase, Noel Crespi -
Revisiting Early Detection of Sexual Predators via Turn-level Optimization
JinMyeong AN, Sangwon Ryu, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Lee -
Knowledge Graph-Guided Retrieval Augmented Generation
Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu -
Towards Rationality in Language and Multimodal Agents: A Survey
Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J Su, Camillo Jose Taylor, Tanwi Mallick -
A Bayesian Optimization Approach to Machine Translation Reranking
Julius Cheng, Maike Züfle, Vilém Zouhar, Andreas Vlachos -
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee -
Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings
Linsen Li, Aron Culotta, Nicholas Mattei -
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang -
Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?
Xuan He, Da Yin, Nanyun Peng -
MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation
Junqing He, Liang Zhu, Rui Wang, Xi Wang, Gholamreza Haffari, Jiaxing Zhang -
EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics
Chenwei Wan, Matthieu Labeau, Chloé Clavel -
HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong -
ConMeC: A Dataset for Metonymy Resolution with Common Nouns
Saptarshi Ghosh, Tianyu Jiang -
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
Lingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han, Ting Chen -
Decoding Speculative Decoding
Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman -
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Nikhil Sharma, Kenton Murray, Ziang Xiao -
LongLeader: A Comprehensive Leaderboard for Large Language Models in Long-context Scenarios
Pei Chen, Hongye Jin, Cheng-Che Lee, Rulin Shao, Jingfeng Yang, Mingyu Zhao, Zhaoyu Zhang, Qin Lu, Kaiwen Men, Ning Xie, Huasheng Li, Bing Yin, Han Li, Lingyun Wang -
CausalEval: Towards Better Causal Reasoning in Language Models
Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan -
CAVE: Controllable Authorship Verification Explanations
Sahana Ramnath, Kartik Pandey, Elizabeth Boschee, Xiang Ren -
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li, Xilin Jiang, Cong Han, Nima Mesgarani -
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models
Xin Guo, Haotian Xia, Zhaowei Liu, Hanyang Cao, Zhi Yang, Zhiqiang Liu, Sizhe Wang, Jinyi Niu, Chuqi Wang, Yanhui Wang, Xiaolong Liang, Xiaoming Huang, Bing Zhu, zhongyu wei, Yun Chen, Weining Shen, Liwen Zhang -
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa -
Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning
Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, Xiaozhong Liu -
Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation
Damien De Mijolla, Hannan Saddiq, Kim Moore -
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang -
Sparser Mixture-of-Adapters with Cross-Layer Generalization
Ziyue Li, Tianyi Zhou -
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
ran zhang, Wei Zhao, Steffen Eger -
Script-Agnosticism and its Impact on Language Identification for Dravidian Languages
Milind Agarwal, Joshua Otten, Antonios Anastasopoulos -
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky, William Rudman, Vedant Palit, Carsten Eickhoff, Ritambhara Singh -
BEMEAE: Moving Beyond Exact Span Match for Event Argument Extraction
Enfa Fane, Md Nayem Uddin, Oghenevovwe Ikumariegbe, Daniyal Kashif, Eduardo Blanco, Steven Corman -
uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
Abdul Waheed, Karima Kadaoui, Bhiksha Raj, Muhammad Abdul-Mageed -
LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue
Sangyeop Kim, Sohhyung Park, Jaewon Jung, Jinseok Kim, Sungzoon Cho -
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary, Vikas Yadav, Hoang H Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan -
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque
Ander Corral, Ixak Sarasua Antero, Xabier Saralegi -
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
Justin Zhao, Flor Miriam Plaza-del-Arco, Amanda Cercas Curry -
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Tarek Naous, Wei Xu -
Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
Sehun Lee, Kang-wook Kim, Gunhee Kim -
Language Models are Crossword Solvers
Soumadeep Saha, Sutanoya Chakraborty, Saptarshi Saha, Utpal Garain -
A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Case Study of Supplementary Adverbs
Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun -
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie, Philippe Laban, Prafulla Kumar Choubey, Caiming Xiong, Chien-Sheng Wu -
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha, Saurabh Kumar Pandey, Harshit Gupta, Monojit Choudhury -
Enhancing Discriminative Representation in Similar Relation Clusters for Few-Shot Continual Relation Extraction
Anh Duc Le, Nam Le Hai, Thanh Xuan Nguyen, Linh Ngo Van, Nguyen Thi Ngoc Diep, Sang Dinh, Thien Huu Nguyen -
My LLM might Mimic AAE - But When Should It?
Sandra Camille Sandoval, Christabel Acquaye, Kwesi Adu Cobbina, Mohammad Nayeem Teli, Hal Daumé III -
Single Ground Truth Is Not Enough: Adding Flexibility to Aspect-Based Sentiment Analysis Evaluation
Soyoung Yang, Hojun Cho, Jiyoung Lee, Sohee Yoon, Edward Choi, Jaegul Choo, Won Ik Cho -
Differentially Private Learning Needs Better Model Initialization and Self-Distillation
Ivoline C. Ngong, Joseph Near, Niloofar Mireshghallah -
Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice
Sunny Rai, Khushang Zaveri, Shreya Havaldar, Soumna Nema, Lyle Ungar, Sharath Chandra Guntuku -
Reverse Thinking Makes LLMs Stronger Reasoners
Justin Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister -
One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity
Sonia Krishna Murthy, Tomer Ullman, Jennifer Hu -
Bayelemabaga: Creating Resources for Bambara NLP
Allahsera Auguste Tapo, Kevin Assogba, Christopher M Homan, M. Mustafa Rafique, Marcos Zampieri -
COVE: COntext and VEracity prediction for out-of-context images
Jonathan Tonglet, Gabriel Thiem, Iryna Gurevych -
Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion
Muzhi Li, Cehao Yang, Chengjin Xu, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-fung Leung, Irwin King -
mHumanEval - A Multilingual Benchmark to Evaluate Large Language Models for Code Generation
Md Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri -
KMMLU: Measuring Massive Multitask Language Understanding in Korean
Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman -
Making Language Models Robust Against Negation
MohammadHossein Rezaei, Eduardo Blanco -
Harnessing and Evaluating the Intrinsic Extrapolation Ability of Large Language Models for Vehicle Trajectory Prediction
Jiawei Liu, yanjiao liu, Xun Gong, Tingting Wang, Hong Chen, Yunfeng hu -
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian, Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Zepeng Zhai, Kezhi Mao -
Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools
Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan -
Analyzing Memorization in Large Language Models through the Lens of Model Attribution
Tarun Ram Menta, Susmit Agrawal, Chirag Agarwal -
AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs
Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan, Le Chen, Mihai Capotă, Theodore L. Willke, Nesreen K. Ahmed, Ali Jannesari -
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert -
$\textsf{TCProF}$: $\underline{\text{T}}$ime-$\underline{\text{C}}$omplexity $\underline{\text{Pr}}$edicti$\underline{\text{o}}$n SSL $\underline{\text{F}}$ramework
Joonghyuk Hahn, Hyeseon Ahn, Jungin Kim, Soohan Lim, Yo-Sub Han -
Language Models can Categorize System Inputs for Performance Analysis
Dominic Sobhani, Ruiqi Zhong, Edison Marrese-Taylor, Keisuke Sakaguchi, Yutaka Matsuo -
Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product
Pengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, Xingwei Wang -
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang -
KODIS: A Multicultural Dispute Resolution Dialogue Corpus
James Anthony Hale, Sushrita Rakshit, Kushal Chawla, Jeanne M Brett, Jonathan Gratch -
Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor Recognition
Haohao Zhu, Xiaokun Zhang, Zeyuan Zeng, Junyu Lu, Zewen Bai, Liang Yang, Hongfei Lin -
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu, Paul Röttger, Abigail Oppong, Andiswa Bukula, Chiamaka Ijeoma Chukwuneke, Ebrahim Chekol Jibril, Elyas Abdi ISMAIL, Esubalew Alemneh, Hagos Tesfahun Gebremichael, Lukman Jibril Aliyu, Meriem Beloucif, Oumaima Hourrane, Rooweither Mabuya, Salomey Osei, Samuel Rutunda, Tadesse Destaw Belay, Tadesse Kebede Guge, Tesfa Tegegne Asfaw, Lilian Diana Awuor Wanzare, Nelson Odhiambo Onyango, Seid Muhie Yimam, Nedjma Ousidhoum -
The Plagiarism Singularity Conjecture
Sriram Ranga, Rui Mao, Erik Cambria, Anupam Chattopadhyay -
CoRAC: Integrating Selective API Document Retrieval with Question Semantic Intent for Code Question Answering
YunSeok Choi, CheolWon Na, Jee-Hyong Lee -
GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing
Jinhao Duan, Xinyu Zhao, Zhuoxuan Zhang, Eunhye Grace Ko, Lily Boddy, Chenan Wang, Tianhao Li, Alexander Rasgon, Junyuan Hong, Min Kyung Lee, Chenxi Yuan, Qi Long, Ying Ding, Tianlong Chen, Kaidi Xu -
$B^4$: A Black-Box Scrubbing Attack on LLM Watermarks
Baizhou Huang, Xiao Pu, Xiaojun Wan -
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
Tianze Luo, Xingchen Miao, Wenbo Duan -
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che -
Vision-Language Models Can Self-Improve Reasoning via Reflection
Kanzhi Cheng, Li YanTao, Fangzhi Xu, Jianbing Zhang, Hao Zhou, Yang Liu -
Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference
Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum -
Sharpness-Aware Minimization for Topic Models with High-Quality Document Representations
Tung Nguyen, Tue Le, Hoang Tran Vuong, Quang Duc Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen -
DCE-LLM: Dead Code Elimination with Large Language Models
Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu -
Active Few-Shot Learning for Text Classification
Saeed Ahmadnia, Arash Yousefi Jordehi, Mahsa Hosseini Khasheh Heyran, Seyed Abolghasem Mirroshandel, Owen Rambow, Cornelia Caragea -
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
Zezhong WANG, Xingshan Zeng, Weiwen Liu, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong -
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini -
Mitigating Hallucinations in Multi-modal Large Language Models via Image Token Attention-Guided Decoding
Xinhao Xu, Hui Chen, Mengyao Lyu, Sicheng Zhao, Yizhe Xiong, Zijia Lin, Jungong Han, Guiguang Ding -
Cascading Large Language Models for Salient Event Graph Generation
Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He -
Fingerspelling within Sign Language Translation
Garrett Tanzer -
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, Sam Thomson -
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim, Yiming Yang -
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
Suchae Jeong, Inseong Choi, Youngsik Yun, Jihie Kim -
GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models
Harsh Kohli, Sachin Kumar, Huan Sun -
Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu -
Little Giants: Synthesizing High-Quality Embedding Data at Scale
Haonan Chen, Liang Wang, Nan Yang, Yutao Zhu, Ziliang Zhao, Furu Wei, Zhicheng Dou -
CluSanT: Differentially Private and Semantically Coherent Text Sanitization
Ahmed Musa Awon, Yun Lu, Shera Potka, Alex Thomo -
Towards Operationalizing Right to Data Protection
Abhinav Java, Simra Shahid, Chirag Agarwal -
In-Context Learning (and Unlearning) of Length Biases
Stephanie Schoch, Yangfeng Ji -
SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha -
NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models
Abhinav Sukumar Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap -
Hybrid Graphs for Table-and-Text based Question Answering using LLMs
Ankush Agarwal, Chaitanya Devaguptapu, Ganesh S -
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King -
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam, Marco Gaido, Sara Papi, Luisa Bentivogli, Barry Haddow -
A Data-Driven Method for Analyzing and Quantifying Lyrics-Dance Motion Relationships
Kento Watanabe, Masataka Goto -
Is In-Context Learning a Type of Error-Driven Learning? Evidence from the Inverse Frequency Effect in Structural Priming
Zhenghao Zhou, Robert Frank, R. Thomas McCoy -
Exploiting Edited Large Language Models as General Scientific Optimizers
Qitan Lv, Tianyu Liu, Hong Wang -
REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap -
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5
Arkadeep Acharya, Rudra Murthy, vishwajeet kumar, Jaydeep Sen -
MoDification: Mixture of Depths Made Easy
Chen Zhang, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang, Dawei Song -
Revealing the Barriers of Language Agents in Planning
Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao -
PeerQA: A Scientific Question Answering Dataset from Peer Reviews
Tim Baumgärtner, Ted Briscoe, Iryna Gurevych -
Reversed Attention: On The Gradient Descent Of Attention Layers In GPT
Shahar Katz, Lior Wolf -
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Zifeng Zhu, Mengzhao Jia, Zhihan Zhang, Lang Li, Meng Jiang -
Self-Pluralising Culture Alignment for Large Language Models
Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong -
Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator
Chengyuan Liu, Shihang Wang, Lizhi Qing, Jun Lin, Ji Zhang, Fei Wu, Kun Kuang -
Reward-Guided Tree Search for Inference Time Alignment of Large Language Models
Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria -
An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues
Rena Wei Gao, Xuetong Wu, Carsten Roever, Jing Wu, Long Lv, Jingxuan Wu, Jey Han Lau -
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training
Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, Zheng Li, Zhengyang Wang, Pei Chen, Ruijie Wang, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang -
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Yang Ouyang, Hengrui Gu, Shuhang Lin, Wenyue Hua, Jie Peng, Bhavya Kailkhura, Meijun Gao, Tianlong Chen, Kaixiong Zhou -
Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss
Fu-An Chao, Berlin Chen -
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu, Wenjiang Xu, Weizhe Huang, Yuting Zeng, Jiaxing Wang, Xingyu Wang, Hailong Yang, Jing Li -
Automatically Discovering How Misogyny is Framed on Social Media
Rakshitha Rao Ailneni, Sanda M. Harabagiu -
Leveraging LLM For Synchronizing Information Across Multilingual Tables
Siddharth Khincha, Tushar Kataria, Ankita Anand, Dan Roth, Vivek Gupta -
Rethinking Word Similarity: Semantic Similarity through Classification Confusion
Kaitlyn Zhou, Haishan Gao, Sarah Li Chen, Dan Edelstein, Dan Jurafsky, Chen Shani -
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang -
Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMs
Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin -
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You -
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy
Mian Zhang, Xianjun Yang, Xinlu Zhang, Travis Labrum, Jamie C. Chiu, Shaun M. Eack, Fei Fang, William Yang Wang, Zhiyu Chen -
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua -
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
Srija Mukhopadhyay, Abhishek Rajgaria, Prerana Khatiwada, Manish Shrivastava, Dan Roth, Vivek Gupta -
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Sangwon Yu, Jongyoon Song, Bongkyu Hwang, Hoyoung Kang, Sooah Cho, Junhwa Choi, Seongho Joe, Taehee Lee, Youngjune Gwon, Sungroh Yoon -
Analyzing the Inner Workings of Transformers in Compositional Generalization
Ryoma Kumon, Hitomi Yanaka -
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Tingyu Song, Guo Gan, Mingsheng Shang, Yilun Zhao -
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu, Yao Chen, Zeyi Wen, Weiguo Liu, Bingsheng He -
JRE-L: Journalist, Reader, and Editor LLMs in the Loop for Science Journalism for the General Audience
Gongyao Jiang, Xinran Shi, Qiong Luo -
Large Language Models for Persian $ \xleftrightarrow{} $ English Idiom Translation
Sara Rezaeimanesh, Faezeh Hosseini, Yadollah Yaghoobzadeh -
K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei -
LLM-Human Pipeline for Cultural Grounding of Conversations
Rajkumar Pujari, Dan Goldwasser -
SALAD: Improving Robustness and Generalization through Contrastive Learning with Structure-Aware and LLM-Driven Augmented Data
Suyoung Bae, YunSeok Choi, Hyojun Kim, Jee-Hyong Lee -
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle, Manan Dey, Sil Hamilton, Timm Dill, Jad Doughman, Ritam Dutt, Avijit Ghosh, Jessica Zosa Forde, Carolin Holtermann, Lucie-Aimée Kaffee, Tanmay Laud, Anne Lauscher, Roberto L Lopez-Davila, Maraim Masoud, Nikita Nangia, Anaelia Ovalle, Giada Pistilli, Dragomir Radev, Beatrice Savoldi, Vipul Raheja, Jeremy Qin, Esther Ploeger, Arjun Subramonian, Kaustubh Dhole, Kaiser Sun, Amirbek Djanibekov, Jonibek Mansurov, Kayo Yin, Emilio Villa Cueva, Sagnik Mukherjee, Jerry Huang, Xudong Shen, Jay Gala, Hamdan Al-Ali, Tair Djanibekov, Nurdaulet Mukhituly, Shangrui Nie, Shanya Sharma, Karolina Stanczak, Eliza Szczechla, Tiago Timponi Torrent, Deepak Tunuguntla, Marcelo Viridiano, Oskar van der Wal, Adina Yakefu, Aurélie Névéol, Mike Zhang, Sydney Zink, Zeerak Talat -
Learning vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models
Aliakbar Nafar, K. Brent Venable, Parisa Kordjamshidi -
Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing
Mayank Kothyari, Sunita Sarawagi, Soumen Chakrabarti, Gaurav Arora, Srujana Merugu -
KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy
Hyunjong Kim, Suyeon Lee, Yeongjae Cho, Eunseo Ryu, Yohan Jo, Suran Seong, Sungzoon Cho -
MILU: A Multi-task Indic Language Understanding Benchmark
Sshubam Verma, Mohammed Safi Ur Rahman Khan, vishwajeet kumar, Rudra Murthy, Jaydeep Sen -
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang -
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
Hanwen Du, Bo Peng, Xia Ning -
Navigating the Cultural Kaleidoscope: A Hitchhiker’s Guide to Sensitivity in Large Language Models
Somnath Banerjee, Sayan Layek, Hari Shrawgi, Rajarshi Mandal, Avik Halder, Shanu Kumar, Sagnik Basu, Parag Agrawal, Rima Hazra, Animesh Mukherjee -
Benchmarking Distributional Alignment of Large Language Models
Nicole Meister, Carlos Guestrin, Tatsunori Hashimoto -
Eliciting Critical Reasoning in Retrieval-Augmented Generation via Contrastive Explanations
Leonardo Ranaldi, Marco Valentino, Andre Freitas -
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Jiancheng Dong, Lei Jiang, Wei Jin, Lu Cheng -
GLiREL - Generalist Model for Zero-Shot Relation Extraction
Jack Boylan, Chris Hokamp, Demian Gholipour Ghalandari -
$C^2$: Scalable Auto-Feedback for LLM-based Chart Generation
Woosung Koh, Jang Han Yoon, MinHyung Lee, Youngjin Song, Jaegwan Cho, Jaehyun Kang, Taehyeon Kim, Se-Young Yun, Youngjae Yu, Bongshin Lee -
Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding
Junyi Ye, Ankan Dash, Wenpeng Yin, Guiling Wang -
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction
Liping Liu, Chunhong Zhang, Likang Wu, Chuang Zhao, Zheng Hu, Ming He, Jianping Fan -
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Zhichao Yang, hong yu -
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar, Tom Kocmi, Mrinmaya Sachan -
Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis
Donglei Yu, Xiaomian Kang, Yuchen Liu, Feifei Zhai, Nanchang Cheng, Yu Zhou, Chengqing Zong -
Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss
Kunal Dahiya, Diego Ortego, David Jimenez-Cabello -
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu -
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An, Shiyue Zhang, Mark Dredze -
RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models
Donghao Huang, Thanh-Son Nguyen, Fiona Liausvia, Zhaoxia WANG -
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu, Ricardo Henao -
IMRRF: Integrating Multi-Source Retrieval and Redundancy Filtering for LLM-based Fake News Detection
Dayang Li, Fanxiao Li, Bingbing Song, Li Tang, Wei Zhou -
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Saurabh Kumar Pandey, Sachin Vashistha, DEBRUP DAS, Somak Aditya, Monojit Choudhury -
Goal-Conditioned DPO: Prioritizing Safety in Misaligned Instructions
Joo Bon Maeng, Seongmin Lee, Seokin Seo, Kee-Eung Kim -
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan -
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, Deqing Yang -
A Logical Fallacy-Informed Framework for Argument Generation
Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings -
tRAG: Term-level Retrieval-Augmented Generation for Domain-Adaptive Retrieval
Dohyeon Lee, Jongyoon Kim, Jihyuk Kim, seung-won hwang, Joonsuk Park -
Evaluating Evidence Attribution in Generated Fact Checking Explanations
Rui Xing, Timothy Baldwin, Jey Han Lau -
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko -
Evaluating Morphological Compositional Generalization in Large Language Models
Mete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Duygu Ataman, Lonneke van der Plas -
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith -
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun -
CogLM: Tracking Cognitive Development of Large Language Models
Xinglin Wang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Boyuan Pan, Heda Wang, Yao Hu, Kan Li -
Automatic Input Rewriting Improves Translation with Large Language Models
Dayeon Ki, Marine Carpuat -
Typographic Attacks in a Multi-Image Setting
Xiaomeng Wang, Zhengyu Zhao, Martha Larson -
AnaScore: Understanding Semantic Parallelism in Proportional Analogies
Liyan Wang, Haotong Wang, Yves Lepage -
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana -
Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts
Tingchen Fu, Yupeng Hou, Julian McAuley, Rui Yan -
Cross-Lingual and Cross-Cultural Variation in Image Descriptions
Uri Berger, Edoardo Ponti -
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen -
Evaluating Defeasible Reasoning in LLMs with DEFREASING
Emily Allaway, Kathleen McKeown -
Generating Complex Question Decompositions in the Face of Distribution Shifts
Kelvin Han, Claire Gardent -
Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation
Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu -
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation
Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, Zhaozhuo Xu -
Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Koji Inoue, Divesh Lala, Gabriel Skantze, Tatsuya Kawahara -
Grounding Fallacies Misrepresenting Scientific Publications in Evidence
Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych -
Towards a Perspectivist Turn in Argument Quality Assessment
Julia Romberg, Maximilian Maurer, Henning Wachsmuth, Gabriella Lapesa -
EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs
Sam Lin, Wenyue Hua, Zhenting Wang, Mingyu Jin, Lizhou Fan, Yongfeng Zhang -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker, Ido Galil, Hadas Orgad, Rinon Gal, Yoad Tewel, Gal Chechik, Yonatan Belinkov -
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge
Li Zhou, Taelin Karidi, Wanlong Liu, Nicolas Garneau, Yong Cao, Wenyu Chen, Haizhou Li, Daniel Hershcovich -
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen -
Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction
Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang -
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng -
Parameter-free and Accessible Prompt Learning to Enhance Adversarial Robustness for Pre-trained Vision-Language Models
Xingran Zhou, Kun Yang, Changtao Miao, Bingyu Hu, Zhuoer Xu, shiwen cui, Changhua Meng, Dan Hong -
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
Wentao Ge, Shunian Chen, Hardy Chen, Nuo Chen, Junying Chen, Zhihong Chen, Wenya Xie, Shuo Yan, ChenghaoZhu, Ziyue Lin, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhang Zhiyi, Jianquan Li, Xiang Wan, Benyou Wang -
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, WANG YUTONG, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, CHENG Ching Lam, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Christabelle Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh, Chong-Wah Ngo -
Is Your LLM Outdated? A Deep Look at Temporal Generalization
ChenghaoZhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang -
ProSE: Diffusion Priors for Speech Enhancement
Sonal Kumar, Sreyan Ghosh, Utkarsh Tyagi, Anton Jeran Ratnarajah, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha -
Where is the answer? An empirical study of positional bias for parametric knowledge extraction in language model
Kuniaki Saito, Chen-Yu Lee, Kihyuk Sohn, Yoshitaka Ushiku -
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
Daniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Elena Tutubalina, Alexander Panchenko -
AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising
Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, Taro Watanabe -
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong LI, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou -
Can Large Language Models Invent Algorithms to Improve Themselves?
Yoichi Ishibashi, Taro Yano, Masafumi Oyamada -
Generating Diverse Hypotheses for Inductive Reasoning
Kang-il Lee, Hyukhun Koh, Dongryeol Lee, Seunghyun Yoon, Minsung Kim, Kyomin Jung -
LLaSA: Large Language and Structured Data Assistant
Yao Xu, Shizhu He, Jiabei Chen, ZengXiangrong, Bingning Wang, Guang Liu, Jun Zhao, Kang Liu -
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?
Kenza Benkirane, Jackie Kay, Maria Perez-Ortiz -
Legal Judgment Prediction based on Knowledge-enhanced Multi-Task and Multi-Label Text Classification
Ang Li, Yiquan Wu, Ming Cai, Adam Jatowt, Xiang Zhou, Weiming Lu, Changlong Sun, Fei Wu, Kun Kuang -
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li -
Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman -
Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios
Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee -
Continual Learning in Multilingual Sign Language Translation
Shakib Yazdani, Josef van Genabith, Cristina España-Bonet -
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori, Michael Curtis Mozer, Asma Ghandeharioun -
CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities
Peiran Wang, Xiaogeng Liu, Chaowei Xiao -
Constrained Decoding with Speculative Lookaheads
Nishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah -
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning
Yilun Zhao, Guo Gan, Chen Zhao, Arman Cohan -
Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language
Amalie Brogaard Pauli, Isabelle Augenstein, Ira Assent -
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models
Artem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov -
What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering
Federico Errica, Davide Sanvito, Giuseppe Siracusano, Roberto Bifulco -
Towards Lifelong Dialogue Agents via Timeline-based Memory Management
Kai Tzu-iunn Ong, Namyoung Kim, Minju Gwak, Hyungjoo Chae, Taeyoon Kwon, Yohan Jo, seung-won hwang, Dongha Lee, Jinyoung Yeo -
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Satyapriya Krishna, Kalpesh Krishna, Anhad Mohananey, Steven Schwarcz, Adam Stambler, Shyam Upadhyay, Manaal Faruqui -
Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding
Jiang Li, Xiangdong Su, Guanglai Gao -
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng, Krishna C Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg -
Transferable Post-training via Inverse Value Learning
Xinyu Lu, Xueru Wen, Yaojie Lu, Bowen Yu, Hongyu Lin, Haiyang Yu, Le Sun, Xianpei Han, Yongbin Li -
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur, Suleman Kazi, Ge Luo, Jimmy Lin, Amin Ahmad -
Identifying Emerging Concepts in Large Corpora
Sibo Ma, Julian Nyarko -
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster -
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models
Ziche Liu, Rui Ke, Yajiao LIU, Feng Jiang, Haizhou Li -
ConQRet: A New Benchmark for Fine-Grained Automatic Evaluation of Retrieval Augmented Computational Argumentation
Kaustubh Dhole, Kai Shu, Eugene Agichtein -
HARP: Hesitation-Aware Reframing in Transformer Inference Pass
Romain Storaï, seung-won hwang -
CultureInstruct: Curating Multi-Cultural Instructions at Scale
Viet Thanh Pham, Zhuang Li, Lizhen Qu, Gholamreza Haffari -
\textsc{MatViX}: Multimodal Information Extraction from Visually Rich Articles
Ghazal Khalighinejad, Sharon Scott, Ollie Liu, Kelly L. Anderson, Rickard Stureborg, Aman Tyagi, Bhuwan Dhingra -
MixLLM: Dynamic Routing in Mixed Large Language Models
Xinyuan Wang, Yanchi Liu, Wei Cheng, Xujiang Zhao, Zhengzhang Chen, Wenchao Yu, Yanjie Fu, Haifeng Chen -
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou, Abdalgader Abubaker, Hakim Hacid -
WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness
Baizhou Huang, Xiaojun Wan -
Probe-Free Low-Rank Activation Intervention
Chonghe Jiang, Bao Nguyen, Anthony Man-Cho So, Viet Anh Nguyen -
Multi-Conditional Ranking with Large Language Models
Pouya Pezeshkpour, Estevam Hruschka -
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage
Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang -
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour -
LLM-Based Explicit Models of Opponents for Multi-Agent Games
XiaoPeng Yu, Wanpeng Zhang, Zongqing Lu -
Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Qisheng Hu, Quanyu Long, Wenya Wang -
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs
Anirudh Phukan, Divyansh, Harshit Kumar Morj, Vaishnavi, Apoorv Saxena, Koustava Goswami -
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
Mahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu J. Han, Hang Su -
On the Vulnerability of Text Sanitization
Meng Tong, Kejiang Chen, Xiaojian Yuan, Jiayang Liu, Weiming Zhang, Nenghai Yu, Jie Zhang -
Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context Learning
Zixiao Zhu, Zijian Feng, Hanzhang Zhou, Junlang Qian, Kezhi Mao -
GloCOM: A Short Text Neural Topic Model via Global Clustering Context
Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen -
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang -
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units
Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf -
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models
Yimu Wang, Shuai Yuan, Bo Xue, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu -
A Unified Supervised and Unsupervised Dialogue Topic Segmentation Framework Based on Utterance Pair Modeling
Shihao YANG, Ziyi Zhang, Yue Jiang, Chunsheng Qin, Shuhua Liu -
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi -
VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models
Ming Cheng, Jiaying Gong, Chenhan Yuan, William A Ingram, Edward Fox, Hoda Eldardiry -
ALTER: Augmentation for Large-Table-Based Reasoning
Han Zhang, Yuheng Ma, Hanfang Yang -
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
Jierui Li, Hung Le, Yingbo Zhou, Caiming Xiong, silvio savarese, Doyen Sahoo -
Causally Modeling the Linguistic and Social Factors that Predict Email Response
Yinuo Xu, Hong Chen, Sushrita Rakshit, Aparna Ananthasubramaniam, Omkar Yadav, Mingqian Zheng, Michael Jiang, Lechen Zhang, Bowen Yi, Kenan Alkiek, Abraham Israeli, Bangzhao Shu, Hua Shen, Jiaxin Pei, Haotian Zhang, Miriam Schirmer, David Jurgens -
DTELS: Towards Dynamic Granularity of Timeline Summarization
Chenlong Zhang, Tong Zhou, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao -
MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling
Yakun Zhu, Shaohang Wei, Xu Wang, KUI XUE, Shaoting Zhang, Xiaofan Zhang -
Multilingual Reasoning via Self-training
Leonardo Ranaldi, Giulia Pucci -
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, zhongyu wei -
Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
Michael Hanna, Aaron Mueller -
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
Wonjun Lee, Solee Im, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Lee -
LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation
Sachit Kuhar, Wasi Uddin Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras -
Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
Lovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, Dieuwke Hupkes -
LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network Reconfiguration
Panayiotis Christou, Md. Zahidul Islam, Yuzhang Lin, Jingwei Xiong -
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung -
Understanding LLMs’ Fluid Intelligence Deficiency: An Analysis of the ARC Task
Junjie Wu, Mo Yu, Lemao Liu, Dit-Yan Yeung, Jie Zhou -
Few-Shot Natural Language to First-Order Logic Translation via Code Generation
Junnan Liu -
Teaching Models to Balance Resisting and Accepting Persuasion
Elias Stengel-Eskin, Peter Hase, Mohit Bansal -
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
Jiayi Wu, Hengyi Cai, Lingyong Yan, Hao Sun, Xiang Li, Shuaiqiang Wang, Dawei Yin, Ming Gao -
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner -
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables
Nikhil Abhyankar, Vivek Gupta, Dan Roth, Chandan K. Reddy -
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta, Candace Ross, David Pantoja, Rebecca J. Passonneau, Megan Ung, Adina Williams -
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An, Junyoung Sung, Wonpyo Park, Chanjun Park, Paul Hongsuck Seo -
Palette of Language Models: A Solver for Controlled Text Generation
ZHE YANG, Yi Huang, Yaqin Chen, XiaotingWu, Junlan Feng, Chao Deng -
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui GUO, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang -
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian
Erfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh, Azadeh Shakery, Mohammad Taher Pilehvar -
Audio Is the Achilles’ Heel: Red Teaming Audio Large Multimodal Models
Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari -
Main Predicate and Their Arguments as Explanation Signals For Intent Classification
Sameer Pimparkhede, Pushpak Bhattacharyya -
A Grounded Typology of Word Classes
Coleman Haley, Sharon Goldwater, Edoardo Ponti
Main Conference - Short Papers
-
A Layered Debating Multi-Agent System for Similar Disease Diagnosis
Yutian Zhao, Huimin WANG, Yefeng Zheng, Xian Wu -
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Hyunji Lee, Danni Liu, Supriti Sinhamahapatra, Jan Niehues -
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar, Sonal Kumar, Hemant Kumar Giri, Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha -
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han, Tianqing Fang, Zhaowei Wang, Yangqiu Song, Mark Steedman -
Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm
Vasudha Varadarajan, Syeda Mahwish, Xiaoran Liu, Julia Buffolino, Christian Luhmann, Ryan L. Boyd, H. Schwartz -
Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data
Laura Biester -
Defense against Prompt Injection Attacks via Mixture of Encodings
Ruiyi Zhang, David Sullivan, Kyle Jackson, Pengtao Xie, Mei Chen -
AMPS: ASR with Multimodal Paraphrase Supervision
Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi -
Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject
Zenghao Duan, Wenbin Duan, Zhiyi yin, Yinghan Shen, Shaoling Jing, Jie Zhang, Huawei Shen, Xueqi Cheng -
Evaluating Multimodal Generative AI with Korean Educational Standards
Sanghee Park, Geewook Kim -
Context-Efficient Retrieval with Factual Decomposition
Yanhong Li, David Yunis, David McAllester, Jiawei Zhou -
Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish
Aylin Ece Gunal, Bowen Yi, John D. Piette, Rada Mihalcea, Veronica Perez-Rosas -
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma -
Towards Federated Low-Rank Adaptation of Language Models with Rank Heterogeneity
Yuji Byun, Jaeho Lee -
Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg Vasilyev, Randy Sawaya, John Bohannon -
Reverse Modeling in Large Language Models
Sicheng Yu, Xu Yuanchen, Cunxiao Du, Yanying Zhou, Minghui Qiu, Qianru Sun, Hao Zhang, Jiawei Wu -
FLIQA-AD: a Fusion Model with Large Language Model for Better Diagnose and MMSE Prediction of Alzheimer’s Disease
Junhao Chen, Zhiyuan Ding, Yan Liu, Xiangzhu Zeng, Ling Wang -
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart, Pablo Agustin Martin-Torres, Daniel Hinjos, Pablo Bernabeu-Perez, Lucia Urcelay Ganzabal, Marta Gonzalez Mallo, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Sergio Alvarez-Napagao, Dario Garcia-Gasulla -
GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities
Usman Naseem, Shuvam Shiwakoti, Siddhant Bikram Shah, Surendrabikram Thapa, Qi Zhang -
Repetition Neurons: How Do Language Models Produce Repetitions?
Tatsuya Hiraoka, Kentaro Inui -
Complete Chess Games Enable LLM Become A Chess Master
Yinqi Zhang, Xintian Han, Haolong Li, Kedi Chen, Shaohui Lin -
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui -
Task-driven Layerwise Additive Activation Intervention
Hieu Trung Nguyen, Bao Nguyen, Binh Nguyen, Viet Anh Nguyen -
Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning
Juraj Vladika, Ivana Hacajova, Florian Matthes -
Developing multilingual speech synthesis system for Ojibwe, Mi’kmaq, and Maliseet
Shenran Wang, Changbing Yang, Michael l parkhill, Chad Quinn, Christopher Hammerly, Jian Zhu -
CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
Youngwon Lee, seung-won hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, Yuxiong He -
Interpret and Control Dense Retrieval with Sparse Latent Features
Hao Kang, Tevin Wang, Chenyan Xiong -
Identifying Power Relations in Conversations using Multi-Agent Social Reasoning
Zhaoqing Wu, Dan Goldwasser, Maria Leonor Pacheco, Leora Morgenstern -
Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement
Nicolas Floquet, Joseph Le Roux, Nadi Tomeh, Thierry Charnois -
MixRevDetect: Towards Detecting AI-Generated Content in Hybrid Peer Reviews.
Sandeep Kumar, Samarth Garg, Sagnik Sengupta, Tirthankar Ghosal, Asif Ekbal -
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs
Huaman Sun, Jiaxin Pei, Minje Choi, David Jurgens -
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
Isabel O. Gallegos, Ryan Aponte, Ryan A. Rossi, Joe Barrow, Mehrab Tanjim, Tong Yu, Hanieh Deilamsalehy, Ruiyi Zhang, Sungchul Kim, Franck Dernoncourt, Nedim Lipka, Deonna Owens, Jiuxiang Gu -
Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi, Sathwik Tejaswi Madhusudhan -
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs
Kawshik Manikantan, Makarand Tapaswi, Vineet Gandhi, Shubham Toshniwal -
Local Prompt Optimization
Yash Jain, Vishal Chowdhary -
Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3
Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara -
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs
Shangyi Geng, Wenting Zhao, Alexander M Rush -
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa, Teruko Mitamura -
Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models
Gleb Kuzmin, Neemesh Yadav, Ivan Smirnov, Timothy Baldwin, Artem Shelmanov -
CoRAG: Collaborative Retrieval-Augmented Generation
Aashiq Muhamed, Mona T. Diab, Virginia Smith -
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models
Sangmin Woo, Kang Zhou, Yun Zhou, Shuai Wang, Sheng Guan, Haibo Ding, Lin Lee Cheong -
Explore the Reasoning Capability of LLMs in the Chess Testbed
Shu Wang, Lei Ji, Renxi Wang, Wenxiao Zhao, Haokun Liu, Yifan Hou, Ying Nian Wu -
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu, Haoyi Wu, Kewei Tu -
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong, Noah Lee, Rodrigo Martínez-Castaño, César Rodríguez, James Thorne -
Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers
Akshit Achara, Anshuman Chhabra -
AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages
Steve Bakos, David Guzmán, Riddhi More, Kelly Chutong Li, Félix Gaschi, En-Shiun Annie Lee -
RuleR: Improving LLM Controllability by Rule-based Data Recycling
Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou -
EqualizeIR: Mitigating Linguistic Biases in Retrieval Models
Jiali Cheng, Hadi Amiri -
Improving Vietnamese-English Cross-Lingual Retrieval for Legal and General Domains
Toan Ngoc Nguyen, Nam Le Hai, Nguyen Doan Hieu, Dai An Nguyen, Linh Ngo Van, Thien Huu Nguyen, Sang Dinh -
ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots
Shani Goren, Oren Kalinsky, Tomer Stav, Yuri Rapoport, Yaron Fairstein, Ram Yazdi, Nachshon Cohen, Alexander Libov, Guy Kushilevitz -
STAR: Spectral Truncation and Rescale for Model Merging
Yu-Ang Lee, Ching-Yun Ko, Tejaswini Pedapati, I-Hsin Chung, Mi-Yen Yeh, Pin-Yu Chen -
DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph
Maitreya Prafulla Chitale, Uday Bindal, Rajakrishnan P Rajkumar, Rahul Mishra -
LLM2: Let Large Language Models Harness System 2 Reasoning
Cheng Yang, Chufan Shi, Siheng Li, Bo Shui, Yujiu Yang, Wai Lam -
Transform Retrieval for Textual Entailment in RAG
Quan Guo, Xin Liang -
Don’t Touch My Diacritics
Kyle Gorman, Yuval Pinter -
Beyond Literal Token Overlap: Token Alignability for Multilinguality
Katharina Hämmerl, Tomasz Limisiewicz, Jindřich Libovický, Alexander Fraser -
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad -
STRUX: An LLM for Decision-Making with Structured Explanations
Yiming Lu, Yebowen Hu, Hassan Foroosh, Wei Jin, Fei Liu -
Language Models ``Grok’’ to Copy
Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan -
Computational Discovery of Chiasmus in Ancient Religious Text
Hope McGovern, Hale Sirin, Tom Lippincott -
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma, Mengjie Qian, Yassir Fathullah, Siyuan Tang, Mark Gales, Kate Knill -
GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization
Margarita Bugueño, Hazem Abou Hamdan, Gerard de Melo -
PROM: Pivoted and Regulated Optimization for Multilingual Instruction Learning
Jaeseong Lee, seung-won hwang, Hojin Lee, Yunju Bak, Changmin Lee -
A Fair Comparison without Translationese: English vs. Target-language Instructions for Multilingual LLMs
Taisei Enomoto, Hwichan Kim, Zhousi Chen, Mamoru Komachi -
Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing
Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya -
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi -
Personalized Help for Optimizing Low-Skilled Users’ Strategy
Feng Gu, Wichayaporn Wongkamjan, Jordan Lee Boyd-Graber, Jonathan K. Kummerfeld, Denis Peskoff, Jonathan May -
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano, Takumi Ito, Jun Suzuki -
Leveraging Moment Injection for Enhanced Semi-supervised Natural Language Inference with Large Language Models
Seo Yeon Park -
Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models
Dipankar Srirag, Aditya Joshi, Jacob Eisenstein -
Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts
Kun Qian, Maximillian Chen, Siyan Li, Arpit Sharma, Zhou Yu -
DART: An AIGT Detector using AMR of Rephrased Text
Hyeonchu Park, Byungjun Kim, Bugeun Kim -
Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties
Zixin Tang, Chieh-Yang Huang, TSUNG-CHI LI, Ho Yin Sam Ng, Hen-Hsen Huang, Ting-Hao Kenneth Huang -
Is It Navajo? Accurate Language Detection for Endangered Athabaskan Languages
Ivory Yang, Weicheng Ma, Chunhui Zhang, Soroush Vosoughi -
Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces
Hope McGovern, Hale Sirin, Tom Lippincott -
Language Models Encode Numbers Using Digit Representations in Base 10
Amit Arnold Levy, Mor Geva -
Alligators All Around: Mitigating Lexical Confusion in Low-resource Machine Translation
Elizabeth Nielsen, Isaac Rayburn Caswell, Jiaming Luo, Colin Cherry -
Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages
Chunlan Ma, Ayyoob Imani, Haotian Ye, Renhao Pei, Ehsaneddin Asgari, Hinrich Schuetze -
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Lee Boyd-Graber, Rachel Rudinger -
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction
Xi Chen, Mao Mao, Shuo Li, Haotian Shangguan -
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech
Karl El Hajal, Ajinkya Kulkarni, Enno Hermann, Mathew Magimai Doss
Findings Papers
-
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi, Zheng Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, Jianfeng Chi -
HALLUCANA: Fixing LLM Hallucination with A Canary Lookahead
Tianyi Li, Erenay Dayanik, Shubhi Tyagi, Andrea Pierleoni -
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, Adam Jatowt -
Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation
Jiwon Jeong, Hyeju Jang, Hogun Park -
Time-aware ReAct Agent for Temporal Knowledge Graph Question Answering
QianyiHu, Xinhui Tu, guo cong, Shunping Zhang -
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu, Yashan Wang, Ruibin Yuan, Guo Zhancheng, Xu Tan, Ge Zhang, Monan Zhou, Jing Chen, Xuefeng Mu, Yuejie Gao, Yuanliang Dong, Jiafeng Liu, Xiaobing Li, Feng Yu, Maosong Sun -
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang, Shih-Fu Chang -
LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content
Mohamed Bayan Kmainasi, Ali Ezzat Shahroor, Maram Hasanain, Sahinur Rahman Laskar, Naeemul Hassan, Firoj Alam -
Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
Đorđe Klisura, Anthony Rios -
Position Really Matters: Towards a Holistic Approach for Prompt Tuning
Xianjun Yang, Wei Cheng, Xujiang Zhao, Wenchao Yu, Linda Ruth Petzold, Haifeng Chen -
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye, Fraser David Greenlee, Max Bartolo, Phil Blunsom, Jon Ander Campos, Matthias Gallé -
Aligning Black-box Language Models with Human Judgments
Gerrit J.J. Van den Burg, Gen Suzuki, Wei Liu, Murat Sensoy -
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma, Aman Dalmia, Mehran Kazemi, Amal Zouaq, Christopher Pal -
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Yue Zhang, Leyang Cui, V. W., Shuming Shi -
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
Junhyeok Kim, Min Soo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu -
Attention Tracker: Detecting Prompt Injection Attacks in LLMs
Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen -
Language Modeling with Editable External Knowledge
Belinda Z. Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig, Jacob Andreas -
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Athiya Deviyani, Fernando Diaz -
Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models
Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh -
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning
Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu -
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models
Changhai Zhou, Yuhua Zhou, Yibin Wang, Shijie Han, Qian Qiao, Hongguang Li -
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Jacky Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Hadi Askari, Chaowei Xiao, Muhao Chen -
WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents
Leyi Pan, Aiwei Liu, Yijian LU, Zitian Gao, Yichen Di, Lijie Wen, Irwin King, Philip S. Yu -
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, Yanghua Xiao -
FIRE: Fact-checking with Iterative Retrieval and Verification
Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov -
Advancing Persian LLM Evaluation
Sara Bourbour Hosseinbeigi, Behnam Rohani, Mostafa Masoudi, Mehrnoush Shamsfard, Zahra Saaberi, Mostafa Karimi Manesh, Mohammad Amin Abbasi -
Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs
Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty -
Huatuo-26M, a Large-scale Chinese Medical QA Dataset
Xidong Wang, Jianquan Li, Shunian Chen, Yuxuan Zhu, Xiangbo Wu, Zhiyi Zhang, Xiaolong Xu, Junying Chen, Jie Fu, Xiang Wan, Anningzhe Gao, Benyou Wang -
Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales
Zhen Qian, Xiuzhen Zhang, Xiaofei Xu, Feng Xia -
Causal Inference with Large Language Model: A Survey
Jing Ma -
CoPERLex: Content Planning with Event-based Representations for Legal Case Summarization
Santosh T.Y.S.S, Youssef Farag, Matthias Grabmair -
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
Yongjin Yang, Haneul Yoo, Hwaran Lee -
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering
Wei Zhou, Mohsen Mesgar, Annemarie Friedrich, Heike Adel -
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li, Yancheng He, Hui Huang, Xingyuan Bu, Jiaheng Liu, Hangyu Guo, Weixun Wang, Jihao Gu, Wenbo Su, Bo Zheng -
Broadening Applications: Grounding LLM Development in Potential User Needs
Kaitlyn Zhou, Kristina Gligoric, Myra Cheng, Vyoma Raman, Boluwatife Aminu, Caeley Woo, Michael Brockman, Dan Jurafsky -
PairScale: Analyzing Attitude Change with Pairwise Comparisons
Rupak Sarkar, Patrick Y. Wu, Kristina Miler, Alexander Miserlis Hoyle, Philip Resnik -
DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery
Xuming Hu, Xiao Qin, Chuan Lei, Asterios Katsifodimos, Zhengyuan Shen, Balasubramaniam Srinivasan, Huzefa Rangwala -
Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents
Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang -
LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages
Xuhan Huang, Qingning Shen, Yan Hu, Anningzhe Gao, Benyou Wang -
Guideline Compliance in Task-Oriented Dialogue: The Chained Prior Approach
Xiangyu Wen, Jianyuan Zhong, Zhijian Xu, Qiang Xu -
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park -
Robust Bias Detection in MLMs and its Application to Human Trait Ratings
Ingroj Shrestha, Louis Tay, Padmini Srinivasan -
Enhancing Temporal Understanding in LLMs for Semi-structured Tables
Irwin Deng, Kushagra Dixit, Dan Roth, Vivek Gupta -
LSDC: An Efficient and Effective Large-Scale Data Compression Method for Supervised Fine-tuning of Large Language Models
Zhaoguang Long, Yuhao Zhou, Shangqing Zhao, Yupei Ren, Li Cai, Chenghao Jia, Zhe Chen, Zhe Fang, Yuxiang Song, Man Lan -
SOLID: Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking Dialogs
Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin Abolghasemi, Evangelos Kanoulas, Suzan Verberne -
Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-sample Aggregation on Large Language Models
Jishnu Ray Chowdhury, Jayanth Mohan, Tomas Malik, Cornelia Caragea -
A Practical Method for Generating String Counterfactuals
Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel -
A Guide To Effectively Leveraging LLMs for Low-Resource Text Summarization: Data Augmentation and Semi-supervised Approaches
Gaurav Sahu, Olga Vechtomova, Issam H. Laradji -
Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios
Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, YimingZhao, LinHai, Hai-Tao Zheng -
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya, Kyle MacMillan, Hongyuan Mei, Chenhao Tan, Anup Malani -
Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vincent Howe, David Wingate -
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models
Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho, SueYeon Chung -
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
Cuong Le Chi, Chau Truong Vinh Hoang, Phan Nhật Huy, Dung D. Le, Tien N Nguyen, Nghi D. Q. Bui -
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
Itay Nakash, George Kour, Guy Uziel, Ateret Anaby Tavor -
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu, Yu Fang, Zhouhan Lin -
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Kyungmin Min, Minbeom Kim, Kang-il Lee, Dongryeol Lee, Kyomin Jung -
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning
Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan -
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
Kushagra Bhushan, Yatin Nandwani, Dinesh Khandelwal, Sonam Gupta, Gaurav Pandey, Dinesh Raghu, Sachindra Joshi -
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu, Ozlem Uzuner, Meliha Yetisgen, Fei Xia -
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song, Simran Khanuja, Graham Neubig -
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
Raghuveer Thirukovalluru, Bhuwan Dhingra -
MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU
Yan Li, So-Eon Kim, Seong-Bae Park, Caren Han -
Using Linguistic Entrainment to Evaluate Large Language Models for Use in Cognitive Behavioral Therapy
Mina Kian, Kaleen Shrestha, Katrin Fischer, Xiaoyuan Zhu, Jonathan Ong, Aryan Trehan, Jessica Wang, Gloria Chang, Séb Arnold, Maja Mataric -
When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models
Shufan Chen, He Zheng, Lei Cui -
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen -
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim -
How Do Large Language Models Perform in Dynamical System Modeling
Xiao Luo, Binqi Chen, Haixin Wang, Zhiping Xiao, Ming Zhang, Yizhou Sun -
Prototype Tuning: A Meta-Learning Approach for Few-Shot Document-Level Relation Extraction with Large Language Models
Dinghao Pan, Yuanyuan Sun, Bo Xu, Jiru Li, Zhihao Yang, Ling Luo, Hongfei Lin, Jian Wang -
$SusGen-GPT$: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
Qilong Wu, Xiaoneng Xiang, Huang Hejia, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, Bharadwaj Veeravalli -
Improving Pre-trained Language Models with Knowledge Enhancement and Filtering Framework
Qi Zhao, Qi Song, Tian Xie, Haiyue Zhang, Hongyu Yang, Xiangyang Li -
FIDELITY: Fine-grained Interpretable Distillation for Effective Language Insights and Topic Yielding
Divyansh Singh, Brodie Mather, Demi Zhang, Patrick Lehman, Justin Ho, Bonnie J Dorr -
MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts
Lin Ning, Harsh Lara, Meiqi Guo, Abhinav Rastogi -
Chasing Random: Instruction Selection Strategies Fail to Generalize
Harshita Diddee, Daphne Ippolito -
On the Role of Key Phrases in Argument Mining
Nilmadhab Das, Vijaya V Saradhi, Ashish Anand -
Concise and Organized Perception Facilitates Reasoning in Large Language Models
Junjie Liu, Shaotian Yan, Chen Shen, Zhengdong Xiao, Liang Xie, Wenxiao Wang, Jieping Ye -
Marrying LLMs with Dynamic Forecasting: A Graph Mixture-of-expert Perspective
Dapeng Jiang, Xiao Luo -
Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media
Owen Cook, Charlie Grimshaw, Ben Peng Wu, Sophie Dillon, Jack Hicks, Luke Jones, Thomas Smith, Matyas Szert, Xingyi Song -
Jailbreaking with Universal Multi-Prompts
Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen -
What can Large Language Models Capture about Code Functional Equivalence?
Nickil Maveli, Antonio Vergari, Shay B Cohen -
RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model
Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, Hongguang Li -
Task-wrapped Continual Learning in Task-Oriented Dialogue Systems
Min Zeng, Haiqin Yang, Xi Chen, Yike Guo -
As easy as PIE: understanding when pruning causes language models to disagree
Pietro Tropeano, Maria Maistro, Tuukka Ruotsalo, Christina Lioma -
Inference Scaling for Bridging Retrieval and Augmented Generation
Youngwon Lee, seung-won hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, Yuxiong He -
MultiCAT: Multimodal Communication Annotations for Teams
Adarsh Pyarelal, John M Culnan, Ayesha Qamar, Meghavarshini Krishnaswamy, Yuwei Wang, Cheonkam Jeong, Chen Chen, Md Messal Monem Miah, Shahriar Hormozi, Jonathan Tong, Ruihong Huang -
Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Jiatao Li, Xinyu Hu, Xunjian Yin, Xiaojun Wan -
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Haoping Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang -
Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety
Yiwei Wang, Muhao Chen, Nanyun Peng, Kai-Wei Chang -
PEMV: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors
Chen Lin, Fei Li, Donghong Ji, Chong Teng -
The Role of Prosody in Spoken Question Answering
Jie Chi, Maureen de Seyssel, Natalie Schluter -
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Abhijit Mishra, Shreya Shukla, Jose Torres, Jacek Gwizdka, Shounak Roychowdhury -
GPT-NER: Named Entity Recognition via Large Language Models
Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, Guoyin Wang, Chen Guo -
Media of Langue: Exploring Word Translation Network
Goki Muramoto, Atsuki Sato, Takayoshi Koyama -
BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla
Fabiha Haider, Fariha Tanjim Shifat, Md Farhan Ishmam, Md Sakib Ul Rahman Sourove, Deeparghya Dutta Barua, Md Fahim, Md Farhad Alam Bhuiyan -
Discrete Diffusion Language Model for Efficient Text Summarization
Do Huu Dat, Duc Anh Do, Anh Tuan Luu, Wray Buntine -
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu -
Enhancing Text-to-SQL with Question Classification and Multi-Agent Collaboration
Zhihui Shao, Shubin Cai, Rongsheng Lin, Zhong Ming -
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Mohamed Salim AISSI, Clément ROMAC, Thomas Carta, sylvain lamprier, Pierre-Yves Oudeyer, Olivier Sigaud, Laure Soulier, Nicolas THOME -
How much do contextualized representations encode long-range context?
Simeng Sun, Cheng-Ping Hsieh -
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe -
Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts?
Crystina Zhang, Jing Lu, Vinh Q. Tran, Tal Schuster, Donald Metzler, Jimmy Lin -
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search System
Hwiyeol Jo, Taiwoo Park, Hyunwoo Lee, Nayoung Choi, Changbong Kim, Ohjoon kwon, Donghyeon Jeon, Eui Hyeon Lee, Kyoungho Shin, Lim Sun Suk, Kyungmi KIM, LEE JIHYE, Sun Kim -
Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception
Zehan Wang, Haifeng Huang, Yang Zhao, Ziang Zhang, Tao Jin, Zhou Zhao -
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time
Mozhi Zhang, Pengyu Wang, Chenkun Tan, Mianqiu Huang, Dong Zhang, Yaqian Zhou, Xipeng Qiu -
AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts
Soichiro Murakami, Peinan Zhang, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura -
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Ingeol Baek, Hwan Chang, ByeongJeong Kim, Jimin Lee, Hwanhee Lee -
Avoiding Copyright Infringement via Large Language Model Unlearning
Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong -
Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs
David Ifeoluwa Adelani, A. Seza Doğruöz, Iyanuoluwa Shode, Anuoluwapo Aremu -
DynClean: Training Dynamics-based Label Cleaning for Distantly-Supervised Named Entity Recognition
Qi Zhang, Huitong Pan, Zhijia Chen, Longin Jan Latecki, Cornelia Caragea, Eduard Dragut -
Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models
Zaifu Zhan, Rui Zhang -
Biases in Opinion Dynamics in Multi-Agent Systems of Large Language Models: A Case Study on Funding Allocation
Pedro Cisneros-Velarde -
Atoxia: Red-teaming Large Language Models with Target Toxic Answers
Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao -
From Curiosity to Clarity : Exploring the Impact of Consecutive Why-Questions
Geonyeong Son, Jaeyoung Lee, Misuk Kim -
An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them
Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, Frederick Riemenschneider, Karthik R Narasimhan, Barbara Graziosi -
LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Yizheng Sun, Yanze Xin, Hao Li, Jingyuan Sun, Chenghua Lin, Riza Batista-Navarro -
BanNERD: A Benchmark Dataset and Context-Driven Approach for Bangla Named Entity Recognition
Md. Motahar Mahtab, Faisal Ahamed Khan, Md. Ekramul Islam, Md. Shahad Mahmud Chowdhury, Labib Imam Chowdhury, Sadia Afrin, Hazrat Ali, Mohammad Mamun Or Rashid, Nabeel Mohammed, Mohammad Ruhul Amin -
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Charese Smiley -
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Chenyang Tao, Anil Ramakrishna, Dimitrios Dimitriadis, Jieyu Zhao, Salman Avestimehr -
Pairwise Prompt-Based Tuning with Parameter Efficient Fast Adaptation for Generalized Zero-Shot Intent Detection
Xiaotong Zhang, Qianru Zhou, Han Liu, Hong Yu -
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation
Palaash Goel, Dushyant Singh Chauhan, Md Shad Akhtar -
GAIfE: Using GenAI to Improve Literacy in Low-resourced Settings
Allahsera Auguste Tapo, Nouhoum COULIBALY, Seydou DIALLO, Sebastien Diarra, Christopher M Homan, Mamadou K. KEITA, Michael Leventhal -
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks
Gagan Bhatia, El Moatez Billah Nagoudi, Abdellah EL MEKKI, Fakhraddin Alwajih, Muhammad Abdul-Mageed -
“Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French
Fanny Ducel, Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol -
FeRG-LLM : Feature Engineering by Reason Generation Large Language Models
Jeonghyun Ko, Gyeongyun Park, Donghoon Lee, Kyunam Lee -
Improving Consistency in LLM Inference using Probabilistic Tokenization
Ashutosh Sathe, Divyanshu Aggarwal, Sunayana Sitaram -
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Maxim Ifergan, Omri Abend, Idan Szpektor, Leshem Choshen, Roee Aharoni -
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, Vivek Gupta -
Infogent: An Agent-Based Framework for Web Information Aggregation
Revanth Gangi Reddy, Sagnik Mukherjee, Jeonghwan Kim, Zhenhailong Wang, Dilek Hakkani-Tür, Heng Ji -
Adaptive Retrieval-Augmented Generation for Conversational Systems
Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz -
Evaluating Cultural and Social Awareness of LLM Web Agents
Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu -
Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue
Jieming Cao, Chen Huang, Yanan Zhang, Ruibo Deng, Jincheng Zhang, Wenqiang Lei -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmented Generation
Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou -
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe Zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Simon Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang -
RATSD: Retrieval Augmented Truthfulness Stance Detection from Social Media Posts Toward Factual Claims
Zhengyuan Zhu, Zeyu Zhang, Haiqi Zhang, Chengkai Li -
LITERA: An LLM Based Approach to Latin-to-English Translation
Paul Rosu -
Exploring Backward Reasoning in Large Language Models
Leonardo Ranaldi, Giulia Pucci -
Investigating the Zone of Proximal Development of Language Models for In-Context Learning
Peng Cui, Mrinmaya Sachan -
Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning
Hyundong Justin Cho, Karishma Sharma, Nicolaas Paul Jedema, Leonardo F. R. Ribeiro, Jonathan May, Alessandro Moschitti -
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models
Jiyue Jiang, Pengan CHEN, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu -
Hierarchical Speculative Decoding with Dynamic Window
Shensian Syu, Hung-yi Lee -
ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models
David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee -
On the Impacts of Contexts on Repository-Level Code Generation
Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui -
AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization via Multi-LLMs
Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, ZHAOXIA YIN, Hang Su -
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lin Zhang, Lijie Hu, Di Wang -
Accounting for Sycophancy in Language Model Uncertainty Estimation
Anthony Sicilia, Mert Inan, Malihe Alikhani -
Data-centric NLP Backdoor Defense from the Lens of Memorization
Zhenting Wang, Zhizhi Wang, Mingyu Jin, Mengnan Du, Juan Zhai, Shiqing Ma -
Unsupervised Sentence Representation Learning with Syntactically Aligned Negative Samples
Zhilan Wang, Zekai Zhi, Rize Jin, Kehui Song, He Wang, Da-Jung Cho -
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe YANG, Yifei Zhang, Yan Hu, Yilin GUO, Ruoli Gan, Yueru He, Mingcong Lei, Xiao Zhang, Haining Wang, Qianqian Xie, Jimin Huang, Honghai Yu, Benyou Wang -
Can Large Language Models Generate High-quality Patent Claims?
Lekang Jiang, Caiqi Zhang, Pascal A. Scherz, Stefan Goetz -
Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models
Srishti Yadav, Zhi Zhang, Daniel Hershcovich, Ekaterina Shutova -
MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG
Pingyu Wu, Daiheng Gao, Jing Tang, Huimin Chen, Wenbo Zhou, Weiming Zhang, Nenghai Yu -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhail Salnikov -
Investigating the Transferability of Code Repair for Low-Resource Programming Languages
Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang -
Dis2Dis: Explaining Ambiguity in Fact-Checking
Ieva Staliunaite, Andreas Vlachos -
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Aashiq Muhamed, Mona T. Diab, Virginia Smith -
Dynamic Feature Fusion for Sign Language Translation Using HyperNetworks
Ruiquan Zhang, Rui Zhao, Zhicong Wu, Liang Zhang, Haoqi Zhang, Yidong Chen -
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis
Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu -
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression
Souvik Kundu, Anahita Bhiwandiwalla, Sungduk Yu, Phillip Howard, Tiep Le, Sharath Nittur Sridhar, David Cobbley, Hao Kang, Vasudev Lal -
DSQG-Syn: Synthesizing High-quality Data for Text-to-SQL Parsing by Domain Specific Question Generation
Shaoming Duan, Youxuan Wu, Chuanyi Liu, Yuhao Zhang, Zirui Wang, Peiyi Han, Shengyuan Yu, Liang Yan, yingwei liang -
Extracting Military Event Temporal Relations via Relative Event Time Prediction and Virtual Adversarial Training
Jie Gong, qiwang hu -
CDB: A Unified Framework for Hope Speech Detection Through Counterfactual, Desire and Belief
Tulio Ferreira Leite da Silva, Gonzalo Freijedo Aduna, Farah Benamara, Alda Mari, Zongmin Li, Li Yue, Jian Su -
SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine
Xiaochen Wang, Junqing He, Liang Chen, Gholamreza Haffari, Yiru Wang, Zhe Yang, Xiangdi Meng, Kunhao Pan, Zhifang Sui -
Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs
Michael JQ Zhang, Eunsol Choi -
Evaluating Vision-Language Models for Emotion Recognition
Sree Bhattacharyya, James Z. Wang -
Multi-Condition Guided Diffusion Network for Multimodal Emotion Recognition in Conversation
Wenjin Tian, Xianying Huang, Shihao Zou -
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Yiyi Chen, Qiongxiu Li, Russa Biswas, Johannes Bjerva -
PolyJoin: Semantic Multi-key Joinable Table Search in Data Lakes
Xuming Hu, Chuan Lei, Xiao Qin, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala -
ImaRA: An Imaginative Frame Augmented Method for Low-Resource Multimodal Metaphor Detection and Explanation
Yuan Tian, Minzheng Wang, Nan Xu, Wenji Mao -
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
Sen Yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam -
Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains
Katerina Korre, Arianna Muti, Federico Ruggeri, Alberto Barrón-Cedeño -
Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors
Mayank Nagda, Phil Ostheimer, Sophie Fellenz -
When natural language is not enough: The limits of in-context learning demonstrations in multilingual reasoning
Leonardo Ranaldi, Barry Haddow, Alexandra Birch -
ThoughtSculpt: Reasoning with Intermediate Revision and Search
Yizhou Chi, Kevin Yang, Dan Klein -
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias
Yuen Chen, Vethavikashini Chithrra Raghuram, Justus Mattern, Rada Mihalcea, Zhijing Jin -
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir, Younes Samih, Suraj Maharjan, Tim Polzehl, Sebastian Möller -
SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse
Giang Do, Hung Le, Truyen Tran -
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding
Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Yonas Chanie, Bontu Fufa Balcha, Negasi Haile Abadi, Henok Biadglign Ademtew, Mulubrhan Abebe Nerea, Debela Desalegn Yadeta, Derartu Dagne Geremew, Assefa Atsbiha Tesfu, Philipp Slusallek, Thamar Solorio, Dietrich Klakow -
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Mingqi Gao, Yixin Liu, Xinyu Hu, Xiaojun Wan, Jonathan Bragg, Arman Cohan -
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Peiqin Lin, Andre Martins, Hinrich Schuetze -
Adaptive Parameter Compression for Language Models
Jeremias Bohn, Frederic Mrozinski, Georg Groh -
Unlocking the Planning Capabilities of Large Language Models with Maximum Diversity Fine-tuning
Wenjun Li, Changyu Chen, Pradeep Varakantham -
GRAG: Graph Retrieval-Augmented Generation
Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao -
Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning
Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych, Alham Fikri Aji -
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
Ruiyao Xu, Kaize Ding -
Claim-Guided Textual Backdoor Attack for Practical Applications
Minkyoo Song, Hanna Kim, Jaehan Kim, Youngjin Jin, Seungwon Shin -
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Go Kamoda, Benjamin Heinzerling, Tatsuro Inaba, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui -
Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish
Juan Manuel Pérez, Paula Miguel, Viviana Cotik -
Preserving Zero-shot Capability in Supervised Fine-tuning for Multi-label Text Classification
Si-An Chen, Hsuan-Tien Lin, Chih-Jen Lin -
CodeRAG-Bench: Can Retrieval Augment Code Generation?
Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried -
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding
Zayd Muhammad Kawakibi Zuhri, Muhammad Farid Adilazuarda, Ayu Purwarianti, Alham Fikri Aji -
Neuro-symbolic Training for Reasoning over Spatial Language
Tanawan Premsri, Parisa Kordjamshidi -
Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation
Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang -
Identifying and Mitigating Social Bias Knowledge in Language Models
Ruizhe Chen, Yichen Li, Jianfei Yang, YANG FENG, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu -
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
Sukmin Cho, Sangjin Choi, Taeho Hwang, Jeongyeon Seo, Soyeong Jeong, Huije Lee, Hoyun Song, Jong C. Park, Youngjin Kwon -
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Viacheslav Vasilev, Julia Agafonova, Nikolai Gerasimenko, Alexander Kapitanov, Polina Mikhailova, Evelina Mironova, Denis Dimitrov -
Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence
Qianren Mao, Weifeng Jiang, Junnan Liu, Chenghua Lin, Qian Li, Xianqing Wen, Jianxin Li, Jinhu Lu -
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Wanqi Yang, Yanda Li, Meng Fang, Ling Chen -
GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design
Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, Qiaoyu Tan -
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na, YunSeok Choi, Jee-Hyong Lee -
Unsupervised Speech-text word-level alignment with Dynamic Programming
Tianshu Yu, Zihan Gong, Minghuan Tan, Guhong Chen, Min Yang -
Semi-supervised Fine-tuning for Large Language Models
Junyu Luo, Xiao Luo, Xiusi Chen, Zhiping Xiao, Wei Ju, Ming Zhang -
An Optimizable Suffix Is Worth A Thousand Templates: Efficient Black-box Jailbreaking without Affirmative Phrases via LLM as Optimizer
Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen -
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models
Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee -
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Shiwei Lyu, YUE SHEN, Lei Liang, Jinjie GU, Huajun Chen, Ningyu Zhang -
LOFT: Scalable and More Realistic Long-Context Evaluation
Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Séb Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu -
Prompt-Guided Selective Masking Loss for Context-Aware Emotive Text-to-Speech
Yejin Jeon, Youngjae Kim, Jihyun Lee, Gary Lee -
On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems
Juraj Vladika, Florian Matthes -
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack
Xin Liu, Aoyang Zhou, Kun He -
MojoBench: Language Modeling and Benchmarks for Mojo
Md Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri -
Decoding Fatphobia: Examining Anti-Fat and Pro-Thin Bias in AI-Generated Images
Jane Warren, Gary M. Weiss, Fernando Martinez, Annika Guo, Yijun Zhao -
DialogGen: Multi-modal Interactive Dialogue System with Multi-turn Text-Image Generation
Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu -
A Practical Examination of AI-Generated Text Detectors for Large Language Models
Brian Tufts, Xuandong Zhao, Lei Li -
Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs
Yiyang Luo, Ke Lin, Chao Gu, Jiahui Hou, Lijie Wen, Luo ping -
GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge
Daniil Gurgurov, Rishu Kumar, Simon Ostermann -
Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context
Sangwon Yu, Ik-hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, Sungroh Yoon -
Aligning to Constraints for Data-Efficient Language Model Customization
Fei Wang, Chao Shang, Shuai Wang, Sarthak Jain, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, Dan Roth -
DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization
Pucheng Dang, Xing Hu, Dong Li, Rui Zhang, Qi Guo, Kaidi Xu -
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
Hanan Gani, Rohit Bharadwaj, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan -
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe -
Personalize Your LLM: Fake it then Align it
Yijing Zhang, Dyah Adila, Changho Shin, Frederic Sala -
LLMs for Extremely Low-Resource Finno-Ugric Languages
Taido Purason, Hele-Andra Kuulmets, Mark Fishel -
TESTEVAL: Benchmarking Large Language Models for Test Case Generation
Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, LINGMING ZHANG, An Ran Chen, Lei Ma -
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Senses
Samuel Cahyawijaya, Ruochen Zhang, Jan Christian Blaise Cruz, Holy Lovenia, Elisa Gilbert, Hiroki Nomoto, Alham Fikri Aji -
Exploring Hybrid Sampling Inference for Aspect-based Sentiment Analysis
Xiaoyi Bao, Minjie Qiang, Jinghang Gu, Zhongqing Wang, Chu-Ren Huang -
Augmented Adversarial Trigger Learning
Zhe Wang, Yanjun Qi -
Open Domain Question Answering with Conflicting Contexts
Siyi Liu, Qiang Ning, Kishaloy Halder, Zheng Qi, Wei Xiao, Phu Mon Htut, Yi Zhang, Neha Anna John, Bonan Min, Yassine Benajiba, Dan Roth -
OLMES: A Standard for Language Model Evaluations
Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi -
Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
Jiali Chen, Xusen Hei, Yuqi Xue, Zihan Wu, Jiayuan Xie, Yi Cai -
Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax
Iuliia Zaitova, Vitalii Hirak, Badr M. Abdullah, Dietrich Klakow, Bernd Möbius, Tania Avgustinova -
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei Ai, Furong Huang -
On Reference (In-)Determinacy in Natural Language Inference
Sihao Chen, Chaitanya Malaviya, Alex Fabrikant, Hagai Taitelbaum, Tal Schuster, Senaka Buthpitiya, Dan Roth -
FaithfulPersona: Balancing Faithfulness and Personalization in Code Explanations through Self-Critique
Zhuang Luo, Yichuan Li, Zexing Xu, Kyumin Lee, S. Rasoul Etesami -
Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs
Abdellah EL MEKKI, Muhammad Abdul-Mageed -
Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation
Chenyu Wang, Weichao Zhou, Shantanu Ghosh, kayhan Batmanghelich, Wenchao Li -
Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language
Zhiqiang Zhong, Simon Sataa-Yu Larsen, Haoyu Guo, Tao Tang, Kuangyu Zhou, Davide Mottin -
LogRules: Enhancing Log Analysis Capability of Large Language Models through Rules
Xin Huang, Ting Zhang, Wen Zhao -
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
Xiaoming Shi, Zeming Liu, Yiming Lei, Chenkai Zhang, Haitao Leng, Chuan Wang, Qingjie Liu, Wanxiang Che, Yunhong Wang -
ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification
Yaswanth M, Vaibhav Singh, Ayush Maheshwari, Amrith Krishna, Ganesh Ramakrishnan -
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks
Hanyong Lee, Chaelyn Lee, Yongjae Lee, Jaesung Lee -
Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
Yen-Ju Lu, Ting-Yao Hu, Hema Swetha Koppula, Hadi Pouransari, Jen-Hao Rick Chang, Yin Xia, Xiang Kong, Qi Zhu, Xiaoming Simon Wang, Oncel Tuzel, Raviteja Vemulapalli -
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang -
GPT-4V Cannot Generate Radiology Reports Yet
Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan -
Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Hao Yang, Hongyuan Lu, Xinhua Zeng, Yang Liu, Xiang Zhang, HAORAN YANG, Yumeng Zhang, SHAN HUANG, YIRAN WEI, Wai Lam -
CAPE: A Chinese Dataset for Appraisal-based Emotional Generation in Large Language Models
June M. Liu, He CAO, Renliang Sun, Rui Wang, Yu Li, Jiaxing Zhang -
Towards Zero-Shot Multimodal Machine Translation
Matthieu Futeral, Cordelia Schmid, Benoît Sagot, Rachel Bawden -
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart, Reza Akbarian Bafghi, Stephen Becker, Maziar Raissi -
CausalGraph2LLM: Evaluating LLMs for Causal Queries
Ivaxi Sheth, Bahare Fatemi, Mario Fritz -
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering
Yang Bai, Christan Grant, Daisy Zhe Wang -
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
Xiaonan Jing, Srinivas Billa, Danny Godbout -
SEP-MLDC: A Simple and Effective Paradigm for Multi-Label Document Classification
Han Liu, Shuqin Li, Xiaotong Zhang, Yuanyuan Wang, Feng Zhang, Hongyang Chen, Hong Yu -
A Closer Look into Mixture-of-Experts in Large Language Models
Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu -
Do Large Language Models Align with Core Mental Health Counseling Competencies?
Viet Cuong Nguyen, Mohammad Taher, Dongwan Hong, Vinicius Konkolics Possobom, Vibha Thirunellayi Gopalakrishnan, Ekta Raj, Zihang Li, Heather J. Soled, Michael L. Birnbaum, Srijan Kumar, Munmun De Choudhury -
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification
Vishnu Kabir Chhabra, Ding Zhu, Mohammad Mahdi Khalili -
Long-Tail Crisis in Nearest Neighbor Language Models
Yuto Nishida, Makoto Morishita, Hiroyuki Deguchi, Hidetaka Kamigaito, Taro Watanabe -
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias
Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, Vincent Ginis -
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
Yuankai Li, Jia-Chen Gu, Di Wu, Kai-Wei Chang, Nanyun Peng -
TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System
Zeyu Zhang, Jianxun Lian, Chen Ma, Yaning Qu, Ye Luo, Lei Wang, Rui Li, Xu Chen, Yankai Lin, Le Wu, Xing Xie, Ji-Rong Wen -
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Kang-il Lee, Minbeom Kim, Seunghyun Yoon, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung -
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
Lian Remme, Kevin Tang -
From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis
Moritz Plenz, Philipp Heinisch, Janosch Gehring, Philipp Cimiano, Anette Frank -
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
Yingqiang Gao, Lukas Fischer, Alexa Lintner, Sarah Ebling -
UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models
Ruoli Gan, Duanyu Feng, Chen Zhang, Zhihang Lin, Haochen Jia, Hao Wang, Zhenyang Cai, Lei Cui, Qianqian Xie, Jimin Huang, Benyou Wang -
GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets
Qiming Wu, Zichen Chen, Will Corcoran, Misha Sra, Ambuj Singh -
UniRAG: Universal Retrieval Augmentation for Large Vision Language Models
Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin -
DDGIP: Radiology Report Generation Through Disease Description Graph and Informed Prompting
Chentao Huang, Guangli Li, Xinjiong Zhou, Yafeng Ren, Hongbin Zhang -
Linguistically Grounded Analysis of Language Models using Shapley Head Values
Marcell Fekete, Johannes Bjerva -
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances
zehui wu, Ziwei Gong, Lin Ai, Pengyuan Shi, Kaan Donbekci, Julia Hirschberg -
A Context-Aware Contrastive Learning Framework for Hateful Meme Detection and Segmentation
Xuanyu Su, Yansong Li, Diana Inkpen, Nathalie Japkowicz -
Syntriever: How to Train Your Retriever with Synthetic Data from LLMs
Minsang Kim, Seung Jun Baek -
Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li -
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Alessio Miaschi, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli, Roberto Navigli -
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
Zhenyue Qin, Yu Yin, Dylan Campbell, Xuansheng Wu, Ke Zou, Ninghao Liu, Yih Chung Tham, Xiuzhen Zhang, Qingyu Chen -
ConShift: Sense-based Language Variation Analysis using Flexible Alignment
Clare Arrington, Mauricio Gruppi, Sibel Adali -
OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description
Alessio Cocchieri, Giacomo Frisoni, Marcos Martínez Galindo, Gianluca Moro, Giuseppe Tagliavini, Francesco Candoli -
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong WANG, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong -
MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing
Vlad Andrei Negru, Robert Vacareanu, Camelia Lemnaru, Mihai Surdeanu, RODICA POTOLEA -
InstructAny2Pix: Image Editing with Multi-Modal Prompts
Shufan Li, Harkanwar Singh, Aditya Grover -
LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain
Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher D Manning -
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
Armineh Nourbakhsh, Siddharth Parekh, Pranav Shetty, Zhao Jin, Sameena Shah, Carolyn Rose -
Data Poisoning for In-context Learning
Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang -
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
Jialiang Wu, Yi Shen, Sijia Liu, Yi Tang, Sen Song, Xiaoyi Wang, Longjun Cai -
Transformer-based Causal Language Models Perform Clustering
Xinbo Wu, Lav R. Varshney -
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description
Mahshid Dehghani, Amirahmad Shafiee, Ali Shafiei, Neda Fallah, Farahmand Alizadeh, Mohammad Mehdi Gholinejad, Hamid Behroozi, Jafar Habibi, Ehsaneddin Asgari -
Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models
Yuyi Huang, Runzhe Zhan, Derek F. Wong, Lidia S. Chao, Ailin Tao -
MedEureka: A Medical Domain Benchmark for Multi-Granularity and Multi-Data-Type Embedding-Based Retrieval
Yongqi Fan, Nan Wang, KUI XUE, Jingping Liu, Tong Ruan -
CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis
Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee -
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Chaoqun Liu, Wenxuan Zhang, Jiahao Ying, Mahani Aljunied, Anh Tuan Luu, Lidong Bing -
MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts
Shizhou Huang, Bo Xu, Changqun Li, Yang Yu, Xin Alex Lin -
GuideQ: Framework for Guided Questioning for progressive informational collection and classification
PRIYA MISHRA, Suraj Racha, Kaustubh Ponkshe, Adit Akarsh, Ganesh Ramakrishnan -
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang, Jiayi Yuan, Yu-Neng Chuang, Zhuoer Wang, Yingchi Liu, Mark Cusick, Param Kulkarni, Zhengping Ji, Yasser Ibrahim, Xia Hu -
LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy
Zhiwen Ruan, Yixia Li, He Zhu, Longyue Wang, Weihua Luo, Kaifu Zhang, Yun Chen, Guanhua Chen -
Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion
Guoliang Zhu, Tao Ren, Dandan Wang, JUN HU -
Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
Jaehan Kim, Minkyoo Song, Seung Ho Na, Seungwon Shin -
Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents
Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Iquebal, Chitta Baral -
On the Impact of Noise in Differentially Private Text Rewriting
Stephen Meisenbacher, Maulik Chevli, Florian Matthes -
Multi-Stage LLM Fine-Tuning with a Continual Learning Setting
Changhao Guan, Chao Huang, Hongliang Li, You Li, Ning Cheng, Zihe Liu, Yufeng Chen, Jinan Xu, Jian Liu -
Learning to Search Effective Example Sequences for In-Context Learning
Xiang Gao, Ankita sinha, Kamalika Das -
Representation-to-Creativity (R2C): Automated Holistic Scoring Model for Essay Creativity
Deokgi Kim, Joonyoung Jo, Byung-Won On, Ingyu Lee -
BioEL: A Comprehensive Python Package for Biomedical Entity Linking
Prasanth Bathala, Christophe Ye, Batuhan Nursal, Shubham Lohiya, David Kartchner, Cassie S. Mitchell -
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu, Pengfei Liu, Arman Cohan -
Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models
Weidi Luo, He CAO, Zijing Liu, Yu Wang, Aidan Wong, Bin Feng, Yuan Yao, Yu Li -
MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens
Yongqi Fan, Hongli Sun, KUI XUE, Xiaofan Zhang, Shaoting Zhang, Tong Ruan -
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
Venkatesh Mishra, Bimsara Pathiraja, Mihir Parmar, Sat Chidananda, Jayanth Srinivasa, Gaowen Liu, Ali Payani, Chitta Baral -
Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning
Joy Crosbie, Ekaterina Shutova -
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang, Jiajia Li, lu Yang, Zhiqiang Zhang, Jinhao Tian, Zuchao Li, Lefei Zhang, Ping Wang -
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows
Xingjian Zhang, Yutong Xie, Jin Huang, Jinge Ma, Zhaoying Pan, Qijia Liu, Ziyang Xiong, Tolga Ergen, Dongsub Shim, Honglak Lee, Qiaozhu Mei -
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
yuelin bai, Xeron Du, Yiming Liang, Leo Jin, Junting Zhou, Ziqiang Liu, Feiteng Fang, Mingshan Chang, Tianyu Zheng, Xincheng Zhang, Nuo ma, Zekun Moore Wang, Ruibin Yuan, Haihong Wu, Hongquan Lin, Wenhao Huang, Jiajun Zhang, Chenghua Lin, Jie Fu, Min Yang, Shiwen Ni, Ge Zhang -
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education
Iain Weissburg, Sathvika Anand, Sharon Levy, Haewon Jeong -
ResoFilter: Fine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis
Zeao Tu, Xiangdi Meng, Yu He, Zihan Yao, Tianyu Qi, Jun Liu, Ming Li -
Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis
Angelina Parfenova, Andreas Marfurt, Jürgen Pfeffer, Alexander Denzler -
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection
Myrthe Reuver, Indira Sen, Matteo Melis, Gabriella Lapesa -
FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG
Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, zhenyu liu, Dongfang Li, Baotian Hu, Min zhang -
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation
Runchuan Zhu, Xinke Jiang, Jiang Wu, Zhipeng ma, Jiahe Song, Fengshuo Bai, Dahua Lin, Lijun Wu, Conghui He -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu -
Evaluating Numeracy of Language Models as a Natural Language Inference Task
Rahmad Mahendra, Damiano Spina, Lawrence Cavedon, Karin Verspoor -
PRDetect: Perturbation-Robust LLM-generated Text Detection Based on Syntax Tree
Xiang Li, Zhiyi yin, Hexiang Tan, Shaoling Jing, Du Su, Yi Cheng, Huawei Shen, Fei Sun -
Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo, Aryo Pradipta Gema, Xuanli He, Emile van Krieken, Pietro Lesci, Pasquale Minervini -
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya -
Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction
Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, Pankaj Wasnik -
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
Abhishek Kumar Singh, vishwajeet kumar, Rudra Murthy, Jaydeep Sen, Ashish Mittal, Ganesh Ramakrishnan -
CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V
Siyu Xu, Yunke Wang, Daochang Liu, Bo Du, Chang Xu -
CodeSim: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez -
Multimodal Generation with Consistency Transferring
Junxiang Qiu, Jinda Lu, Shuo Wang -
An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning
Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly -
People will agree what I think: Investigating LLM’s False Consensus Effect
Junhyuk Choi, Yeseon Hong, Bugeun Kim -
Entity Pair-guided Relation Summarization and Retrieval in LLMs for Document-level Relation Extraction
Fu Zhang, Hongsen Yu, Jingwei Cheng, Huangming Xu -
How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise in Machine Translation
Yan Meng, Di Wu, Christof Monz -
Token Weighting for Long-Range Language Modeling
Falko Helm, Nico Daheim, Iryna Gurevych -
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
ShengbinYue, Ting Huang, Zheng Jia, Siyuan Wang, Shujun Liu, Yun Song, Xuanjing Huang, zhongyu wei -
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
Atharva Mehta, Shivam Chauhan, Amirbek Djanibekov, Atharva Kulkarni, Gus Xia, Monojit Choudhury -
Hard Emotion Test Evaluation Sets for Language Models
Tiberiu Sosea, Cornelia Caragea -
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Peiran Wang, Xiaogeng Liu, Chaowei Xiao -
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
Ryan Soh-Eun Shim, Barbara Plank -
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation
Yu Wang, Jiaxin Zhang, Xiang Gao, Wendi Cui, Peng Li, Kamalika Das -
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert, Valentina Pyatkin, Jacob Morrison, Lester James Validad Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi -
LLM-Generated Passphrases That Are Secure and Easy to Remember
Jie S. Li, Jonas Geiping, Micah Goldblum, Aniruddha Saha, Tom Goldstein -
Analysis of LLM as a grammatical feature tagger for African American English
Rahul Porwal, Alice Rozet, Jotsna Gowda, Pryce Houck, Kevin Tang, Sarah Moeller -
Constraining Sequential Model Editing with Editing Anchor Compression
Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu -
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee -
Rationale Behind Essay Scores: Enhancing S-LLM’s Multi-Trait Essay Scoring with Rationale Generated by LLMs
SeongYeub Chu, Jong woo kim, Bryan Wong, Mun Yong Yi -
Synonym-unaware Fast Adversarial Training against Textual Adversarial Attacks
Yichen Yang, Xin Liu, Kun He -
A Comprehensive Survey of Contemporary Arabic Sentiment Analysis: Methods, Challenges, and Future Directions
Zhiqiang Shi, Ruchit Agrawal -
MAiDE-up: Multilingual Deception Detection of AI-generated Hotel Reviews
Oana Ignat, Xiaomeng Xu, Rada Mihalcea -
From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning
Ranran Haoran Zhang, Bensu Uçar, Soumik Dey, Hansi Wu, Binbin Li, Rui Zhang -
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification
Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya -
Using Review Combination and Pseudo-Tokens for Aspect Sentiment Quad Prediction
Jiazhou Chen, Xu Jia, RuiQiang Guo -
Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
Tunazzina Islam, Dan Goldwasser -
Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization
Weiqi Wu, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, hai zhao -
SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer
Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Zhuosheng Zhang, Gongshen Liu -
HEISIR: Hierarchical Expansion of Inverted Semantic Indexing for Training-free Retrieval of Conversational Data using LLMs
Sangyeop Kim, Hangyeul Lee, Yohan Lee -
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Taha Kass-Hout, Furong Huang, Cao Xiao -
TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
Jipeng Zhang, Yaxuan Qin, Renjie Pi, WEIZHONG ZHANG, Rui Pan, Tong Zhang -
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G Belém, Pouya Pezeshkpour, Hayate Iso, Seiji Maekawa, Nikita Bhutani, Estevam Hruschka -
DisComp: A Two-Stage Prompt Optimization Framework Combining Task-Agnostic and Task-Aware Compression
Liu quancai, Haihui Fan, Jinchao Zhang, lixiangfang, Lichuanrong, Bo Li -
MoLA: MoE LoRA with Layer-wise Expert Allocation
Chongyang Gao, Kezhen Chen, Jinmeng Rao, Ruibo Liu, Baochen Sun, Yawen Zhang, Daiyi Peng, Xiaoyuan Guo, VS Subrahmanian -
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo, Yerin Hwang, Yongil Kim, Taegwan Kang, Hyunkyung Bae, Kyomin Jung -
UNLEARN Efficient Removal of Knowledge in Large Language Models
Tyler Lizzo, Larry Heck -
RetrieverGuard: Empowering Information Retrieval to Combat LLM-Generated Misinformation
Chuwen Chen, Shuai Zhang -
ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning
Millennium Bismay, Xiangjue Dong, James Caverlee -
TagGen: Enforcing Syntactic Structures with Tag-Based Control
Vicky Xefteri, Afra Amini, Tim Vieira, Ryan Cotterell -
Uncertainty Quantification for Clinical Outcome Predictions with (Large) Language Models
Zizhang Chen, Peizhao Li, Xiaomeng Dong, Pengyu Hong -
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Wang Changxin, Zhifeng Gao, Hongshuai Wang, Li Yongge, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Xi Fang, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, changhong chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke -
PLD+: Accelerating LLM Inference by Leveraging Language Model Artifacts
Shwetha Somasundaram, Anirudh Phukan, Apoorv Saxena -
A Federated Framework for LLM-based Recommendation
Jujia Zhao, Wenjie Wang, Chen Xu, See-Kiong Ng, Tat-Seng Chua -
“All that Glitters”: Techniques for Evaluations with Unreliable Model and Human Annotations
Michael Hardy -
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu -
Language-based Valence and Arousal Expressions between the United States and China: a Cross-Cultural Examination
Young Min Cho, Dandan Pang, Stuti Thapa, Garrick Sherman, Lyle Ungar, Louis Tay, Sharath Chandra Guntuku -
Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models
Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang -
DiPT: Enhancing LLM Reasoning through Diversified Perspective-Taking
Hoang Anh Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia -
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
Sathya Krishnan Suresh, Wu Mengjun, Tushar Pranav, EngSiong Chng -
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jiachen Ma, Yijiang Li, Zhiqing Xiao, Anda Cao, Jie Zhang, Chao Ye, Junbo Zhao -
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models
Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang -
SFMSS: Service Flow aware Medical Scenario Simulation for Conversational Data Generation
Zhijie Bao, Qingyun Liu, Xuanjing Huang, zhongyu wei -
DomainSum: A Hierarchical Benchmark for Fine-Grained Domain Shift in Abstractive Text Summarization
Haohan Yuan, Haopeng Zhang -
On the Feasibility of In-Context Probing for Data Attribution
Cathy Jiao, Weizhen Gao, Aditi Raghunathan, Chenyan Xiong -
Modeling the Differential Prevalence of Online Supportive Interactions in Private Instant Messages of Adolescents
Ondrej Sotolar, Michał Tkaczyk, Jaromír Plhák, David Smahel -
CLERC: A Dataset for U. S. Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme -
Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification
Hamidreza Saffari, Mohammadamin Shafiei, Donya Rooein, Francesco Pierri, Debora Nozza -
DOLFIN - Document-Level Financial Test-Set for Machine Translation
Mariam Nakhle, Marco Dinarelli, Raheel Qader, Emmanuelle Esperança-Rodier, Hervé Blanchon -
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Tanmay Parekh, Pradyot Prakash, Alexander Radovic, Akshay Shekher, Denis Savenkov -
Enhancing the Prototype Network with Local-to-Global Optimization for Few-Shot Relation Extraction
Hui Sun, Rongxin Chen -
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li, Wei Wang, YiQing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang -
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
Navya Jain, Zekun Wu, CRISTIAN ENRIQUE MUNOZ VILLALOBOS, Airlie Hilliard, Xin Guan, Adriano Koshiyama, Emre Kazim, Philip Colin Treleaven -
SimulBench: Evaluating Language Models with Creative Simulation Tasks
Qi Jia, Xiang Yue, Tuney Zheng, Jie Huang, Bill Yuchen Lin -
Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs
Aarón Galiano-Jiménez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena -
PREMISE: Matching-based Prediction for Accurate Review Recommendation
Wei Han, Hui Chen, Soujanya Poria -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton Razzhigaev, Matvey Mikhalchuk, Temurbek Rahmatullaev, Elizaveta Goncharova, Polina Druzhinina, Ivan Oseledets, Andrey Kuznetsov -
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Chengyue Wu, Zhixuan Liang, Yixiao Ge, Qiushan Guo, Zeyu Lu, Jiahao Wang, Ying Shan, Ping Luo -
How to Talk to Language Models: Serialization Strategies for Structured Entity Matching
Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, Vidit Bansal -
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
Leonidas Gee, Milan Gritta, Gerasimos Lampouras, Ignacio Iacobacci -
Overcoming both Domain Shift and Label Shift for Referring Video Segmentation
Hai Huang, Sashuai zhou, Yan Xia -
Verifiable Format Control for Large Language Model Generations
Zhaoyang Wang, Jinqi Jiang, Huichi Zhou, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Huaxiu Yao -
The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Tomáš Horych, Christoph Mandl, Terry Ruas, Andre Greiner-Petter, Bela Gipp, Akiko Aizawa, Timo Spinde -
Can LLMs Learn Macroeconomic Narratives from Social Media?
Almog Gueta, Amir Feder, Zorik Gekhman, Ariel Goldstein, Roi Reichart -
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation
Chanhee Park, Hyeonseok Moon, Chanjun Park, Heuiseok Lim -
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases
Sang Quang Nguyen, Kiet Van Nguyen -
Omni-Chart-600K: A Comprehensive Dataset of Chart Types for Chart Understanding
Shulei Wang, Shuai Yang, Wang Lin, Zirun Guo, Sihang Cai, Hai Huang, Ye Wang, Jingyuan Chen, Tao Jin -
Rejected Dialects: Biases Against African American Language in Reward Models
Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, Maarten Sap -
ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li -
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Xiaoying Song, Sharon Lisseth Perez, Xinchen Yu, Eduardo Blanco, Lingzi Hong -
The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge
Lee Kezar, Nidhi Munikote, Zian Zeng, Zed Sehyr, Naomi Caselli, Jesse Thomason -
Evaluation of Multilingual Image Captioning: How far can we get with CLIP models?
Goncalo Emanuel Cavaco Gomes, Chrysoula Zerva, Bruno Martins -
Multilingual Blending: Large Language Model Safety Alignment Evaluation with Language Mixture
Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma -
GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration
Ziwen Li, Xiang Chen, Youngseung Jeon -
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data
Juanhui Li, Sreyashi Nag, Hui Liu, Xianfeng Tang, Sheikh Muhammad Sarwar, Limeng Cui, Hansu Gu, Suhang Wang, Qi He, Jiliang Tang -
Large-Scale Corpus Construction and Retrieval-Augmented Generation for Ancient Chinese Poetry: New Method and Data Insights
Yang Liu, Lan Lan, Jiahuan Cao, Hiuyi Cheng, Kai Ding, Lianwen Jin -
Rethinking Smoothness for Fast and Adaptable Entity Alignment Decoding
Yuanyi Wang, Han Li, Haifeng Sun, Lei Zhang, Bo He, Wei Tang, Tianhao Yan, Qi Qi, Jingyu Wang -
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation
Armel Randy Zebaze, Benoît Sagot, Rachel Bawden -
Joint Learning Event-Specific Probe and Argument Library with Differential Optimization for Document-Level Multi-Event Extraction
Jianpeng Hu, Chao Xue, Chunqing Yu, JiaCheng Xu, Chengxiang Tan -
Zero-Shot Strategies for Length-Controllable Summarization
Fabian Retkowski, Alexander Waibel -
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models
Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, Heuiseok Lim -
MedThink: A Rationale-Guided Framework for Explaining Medical Visual Question Answering
Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, YANG FENG, Jian Wu, Zuozhu Liu -
Can’t Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Manveer Singh Tamber, Jasper Xian, Jimmy Lin -
Tackling Social Bias against the Poor: a Dataset and a Taxonomy on Aporophobia
Georgina Curto, Svetlana Kiritchenko, Muhammad Hammad Fahim Siddiqui, Isar Nejadgholi, Kathleen C. Fraser -
Human and LLM-Based Resume Matching: An Observational Study
Swanand Vaishampayan, Hunter Leary, Yoseph Berhanu Alebachew, Louis Hickman, Brent A. Stevenor, Weston Beck, Chris Brown -
A Survey to Recent Progress Towards Understanding In-Context Learning
Haitao Mao, Guangliang Liu, Yao Ma, Rongrong Wang, Kristen Johnson, Jiliang Tang -
M-IFEval: Multilingual Instruction-Following Evaluation
Antoine Dussolle, A. Cardeña, Shota Sato, Peter Devine -
Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
Hadi Askari, Anshuman Chhabra, Muhao Chen, Prasant Mohapatra -
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Peiqin Lin, Andre Martins, Hinrich Schuetze -
Continuous Speech Tokenizer in Text To Speech
Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, Zhanhui Kang -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Sonam Gupta, Yatin Nandwani, Asaf Yehudai, Dinesh Khandelwal, Dinesh Raghu, Sachindra Joshi -
On Using Arabic Language Dialects in Recommendation Systems
Abdulla Alshabanah, Murali Annavaram -
$\mathcal{S}^2$IT: Stepwise Syntax Integration Tuning for Large Language Models in Aspect Sentiment Quad Prediction
Bingfeng chen, Chenjie Qiu, Yifeng Xie, Boyan Xu, Ruichu Cai, Zhifeng Hao -
SEEval: Advancing LLM Text Evaluation Efficiency and Accuracy through Self-Explanation Prompting
Meng-Chen Wu, Md Mosharaf Hossain, Tess Wood, Shayan Ali Akbar, Si-Chi Chin, Erwin Cornejo -
Unified Automated Essay Scoring and Grammatical Error Correction
SeungWoo Song, Junghun Yuk, ChangSu Choi, HanGyeol Yoo, HyeonSeok Lim, KyungTae Lim, Jungyeul Park -
Towards Long Context Hallucination Detection
Siyi Liu, Kishaloy Halder, Zheng Qi, Wei Xiao, Nikolaos Pappas, Phu Mon Htut, Neha Anna John, Yassine Benajiba, Dan Roth -
TabComp: A Dataset for Visual Table Reading Comprehension
Somraj Gautam, Abhishek Bhandari, Gaurav Harit -
Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun -
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement
Zhaopeng Feng, Yan Zhang, Hao Li, Bei Wu, Jiayu Liao, Wenqiang Liu, Jun Lang, YANG FENG, Jian Wu, Zuozhu Liu -
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
Jean Vassoyan, Nathanaël Beau, Roman Plaud -
The Power of Bullet Lists: A Simple Yet Effective Prompting Approach to Enhancing Spatial Reasoning in Large Language Models
Ikhyun Cho, Changyeon Park, Julia Hockenmaier -
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering
Yumeng Wang, Zhiyuan Fan, Qingyun Wang, Yi R. Fung, Heng Ji -
Challenges in Trustworthy Human Evaluation of Chatbots
Wenting Zhao, Alexander M Rush, Tanya Goyal -
How Inclusively do LMs Perceive Social and Moral Norms?
Michael Galarnyk, Agam Shah, Dipanwita Guhathakurta, Poojitha Nandigam, Sudheer Chava -
AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation
Vaishnavi Pulavarthi, Deeksha Nandal, Soham Dan, Debjit Pal -
Chain-of-Rank: Enhancing Large Language Models for Domain-Specific RAG in Edge Device
Juntae Lee, Jihwan Bang, Kyuhong Shim, Seunghan Yang, Simyung Chang -
ToVo: Toxicity Taxonomy via Voting
Tinh Son Luong, Thanh-Thien Le, Thang Viet Doan, Linh Ngo Van, Thien Huu Nguyen, Nguyen Thi Ngoc Diep -
Understanding the Role of Mental Models in User Interaction with an Adaptive Dialog Agent
Lindsey Morgan Vanderlyn, Dirk Väth, Thang Vu -
Beyond Excess and Deficiency: Adaptive Length Bias Mitigation in Reward Models for RLHF
Yuyan Bu, Liangyu Huo, Yi Jing, Qing Yang -
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan -
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
Mohammad Jahid Ibna Basher, Md Kowsher, Md Saiful Islam, Rabindra Nath Nandi, Nusrat Jahan Prottasha, Mehadi Hasan Menon, Tareq Al Muntasir, Shammur Absar Chowdhury, Firoj Alam, Niloofar Yousefi, Ozlem Garibay -
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
Gal Yona, Or Honovich, Omer Levy, Roee Aharoni -
Synthetic Audio Helps for Cognitive State Tasks
Adil Soubki, John Murzaku, Peter Zeng, Owen Rambow -
Sequence-level Large Language Model Training with Contrastive Preference Optimization
Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao, Sheng Zha -
FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
Jinlin Wang, Suyuchen Wang, Ziwen Xia, Sirui Hong, Yun Zhu, Bang Liu, Chenglin Wu -
SAFR: Neuron Redistribution for Interpretability
Ruidi Chang, Chunyuan Deng, Hanjie Chen -
Is Semantic Chunking Worth the Computational Cost?
Renyi Qu, Ruixuan Tu, Forrest Sheng Bao -
TeCoFeS: Text Column Featurization using Semantic Analysis
Ananya Singha, Mukul Singh, Ashish Tiwari, Sumit Gulwani, Vu Le, Chris Parnin -
kNN For Whisper And Its Effect On Bias And Speaker Adaptation
Maya K. Nachesa, Vlad Niculae -
Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
Nikita Soni, Pranav Chitale, Khushboo Singh, Niranjan Balasubramanian, H. Schwartz -
LeCoPCR: Legal Concept-guided Prior Case Retrieval for European Court of Human Rights cases
Santosh T.Y.S.S, Isaac Misael Olguín Nolasco, Matthias Grabmair -
Optimizing Hidden Markov Language Models: An Empirical Study of Reparameterization and Initialization Techniques
Ivan Lee, Taylor Berg-Kirkpatrick -
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura, Ahmed Heakl, Omkar Chakradhar Thawakar, Ali Husain Salem Abdulla Alharthi, Ines Riahi, Abduljalil Radman, Jorma Laaksonen, Fahad Shahbaz Khan, Salman Khan, Rao Muhammad Anwer -
Seeds of Discourse: A Multilingual Corpus of Direct Quotations from African Media on Agricultural Biotechnologies
Patricia Chiril, Trevor Spreadbury, Joeva Sean Rock, Brian Dowd-Uribe, David Uminsky -
Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring
Heejin Do, Taehee park, Sangwon Ryu, Gary Lee -
An empirical study of validating synthetic data for formula generation
Usneek Singh, José Cambronero, Sumit Gulwani, Aditya Kanade, Anirudh Khatry, Vu Le, Mukul Singh, Gust Verbruggen -
CA*: Addressing Evaluation Pitfalls in Computation-Aware Latency for Simultaneous Speech Translation
Xi Xu, Wenda Xu, Siqi Ouyang, Lei Li -
Are Large Language Models Effective in Clinical Trial Design? A Study on Baseline Feature Generation
Nafis Neehal, Bowen Wang, Shayom Debopadhaya, Corey Curran, Keerthiram Murugesan, Soham Dan, Vibha Anand, Kristin Bennett -
AcrosticSleuth: Probabilistic Identification and Ranking of Acrostics in Multilingual Corpora
Aleksandr Fedchin, Isabel Cooperman, Pramit Chaudhuri, Joseph P. Dexter -
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan, Hung-yi Lee -
Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations
Kirti Bhagat, Kinshuk Vasisht, Danish Pruthi -
MMLF: Multi-query Multi-passage Late Fusion Retrieval
Yuan-Ching Kuo, Yi Yu, Chih-Ming Chen, Chuan-Ju Wang -
Advocating Character Error Rate for Multilingual ASR Evaluation
Thennal D K, Jesin James, DEEPA PADMINI GOPINATH, MUHAMMED ASHRAF K -
Evaluating the Performance of Large Language Models via Debates
Behrad Moniri, Hamed Hassani, Edgar Dobriban -
RELexED: Retrieval-Enhanced Legal Summarization with Exemplar Diversity
Santosh T.Y.S.S, Chen Jia, Patrick Goroncy, Matthias Grabmair -
Demystifying the Power of Large Language Models in Graph Generation
Yu Wang, Ryan A. Rossi, Namyong Park, Nesreen K. Ahmed, Danai Koutra, Franck Dernoncourt, Tyler Derr -
Meta-Reasoning Improves Tool Use in Large Language Models
Lisa Alazraki, Marek Rei -
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos, Shan Chen, Siena Kathleen V. Placino, Yingya Li, Juan Carlos Climent Pardo, Daphna Idan, Takeshi Tohyama, David Restrepo, Luis Filipe Nakayama, José María Millet Pascual-Leone, Guergana K Savova, Hugo Aerts, Leo Anthony Celi, An-Kwok Ian Wong, Danielle Bitterman, Jack Gallifant -
QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums
Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dana Calacci, Andrés Monroy-Hernández -
A Practical Analysis of Human Alignment with *PO
Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song -
Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting
Jiarui Wu, Zhuo Liu, Hangfeng He -
Considering Length Diversity in Retrieval-Augmented Summarization
Juseon-Do, Jaesung Hwang, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura -
Lessons from a User Experience Evaluation of NLP Interfaces
Eduardo Calò, Lydia Penkert, Saad Mahamood -
Can GPT-4 Sway Experts’ Investment Decisions?
Takehiro Takayanagi, Hiroya Takamura, Kiyoshi Izumi, Chung-Chi Chen -
Adapting LLM Agents with Universal Communication Feedback
Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, yelong shen -
On Localizing and Deleting Toxic Memories in Large Language Models
Anubrata Das, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, Rahul Gupta -
Lost in the Distance: Large Language Models Struggle to Capture Long-Distance Relational Knowledge
Meiyun Wang, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo