Automated surveillance of arXiv for my core research tracks.
2026-04-10 | Shukang Yin, Sirui Zhao, Hanchao Wang...
Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advances the two predominant token-pruning paradigms: attention-based selection and similarity-based clustering. Our study reveals two critical limitations in existing...
2026-04-10 | Hadas Orgad, Boyi Wei, Kaden Zheng, M...
Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely bypass them, and fine-tuning on narrow domains can induce ``emergent misalignment'' that generalizes broadly. Whether this brittleness reflects a fundamental lack...
2026-04-10 | Yu Liu, Kun Peng, Wenxiao Zhang, Fang...
Retrieval Augmented Generation (RAG) systems deployed across organizational boundaries face fundamental tensions between security, accuracy, and efficiency. Current encryption methods expose plaintext during decryption, while federated architectures prevent resource integration and incur substantial overhead. We introduce Trans-RAG, implementing a novel...
2026-04-10 | Guanyu Zhou, Yida Yin, Wenhao Chai, S...
Vision-language models (VLMs) still struggle with visual perception tasks such as spatial understanding and viewpoint recognition. One plausible contributing factor is that natural image datasets provide limited supervision for low-level visual skills. This motivates a practical question: can targeted synthetic...
2026-04-10 | Wenyi Xiao, Xinchi Xu, Leilei Gan
Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, which hinders their usage in high-stakes domains. Existing verbalized confidence calibration methods, largely developed for text-only LLMs, typically optimize a single...
2026-04-10 | Ahmed Nusayer Ashik, Shaowei Wang, Ts...
The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to provide up-to-date API specifications, "context-memory conflict" arises when external...
2026-04-10 | Xinyu Wang, Sai Koneru, Wenbo Zhang, ...
Recent advances in large language models (LLMs) have enabled the large-scale generation of highly fluent and deceptive news-like content. While prior work has often treated fake news detection as a binary classification problem, modern fake news increasingly arises through human-AI...
2026-04-10 | Wending Yan, Rongkai Zhang, Kaihua Ta...
Universal image restoration (UIR) aims to recover clean images from diverse and unknown degradations using a unified model. Existing UIR methods primarily focus on pixel reconstruction and often lack explicit diagnostic reasoning over degradation composition, severity, and scene semantics prior...
2026-04-10 | Gonzalo Ballestero, Hadi Hosseini, Sa...
AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. We implement a simple experimental design...
2026-04-10 | Hippolyte Gisserot-Boukhlef, Nicolas ...
Accurate evaluation is central to the large language model (LLM) ecosystem, guiding model selection and downstream adoption across diverse use cases. In practice, however, evaluating generative outputs typically relies on rigid lexical methods to extract and assess answers, which can...
2026-04-10 | Kyle Whitecross, Negin Rahimi
We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which identifies relevant evidence from context, and reasoning are deeply intertwined: retrieval supports reasoning, while reasoning often determines what must be...
2026-04-10 | Om Solanki, Lopamudra Praharaj, Deept...
Large Language Models (LLMs) offer a promising interface for intent-driven control of autonomous cyber-physical systems, but their direct use in mission-critical Internet of Battlefield Things (IoBT) environments raises significant safety, reliability, and policy-compliance concerns. This paper presents a Policy-Aware Large...
2026-04-10 | Nilanjan Sinhababu, Soumedhik Bharati...
Large Language Models (LLM) have been widely used in reranking. Computational overhead and large context lengths remain a challenging issue for LLM rerankers. Efficient reranking usually involves selecting a subset of the ranked list from the first stage, known as...
2026-04-10 | Vishnu Murali, Anmol Gulati, Elias Lu...
Translating natural language into Jira Query Language (JQL) requires resolving ambiguous field references, instance-specific categorical values, and complex Boolean predicates. Single-pass LLMs cannot discover which categorical values (e.g., component names or fix versions) actually exist in a given Jira instance,...
2026-04-10 | Sathvik Nair, Colin Phillips
Under the lens of Marr's levels of analysis, we critique and extend two claims about language models (LMs) and language processing: first, that predicting upcoming linguistic information based on context is central to language processing, and second, that many advances...
2026-04-10 | Chenchen Zhang
Reinforcement learning (RL) for large language models (LLMs) increasingly relies on sparse, outcome-level rewards -- yet determining which actions within a long trajectory caused the outcome remains difficult. This credit assignment (CA) problem manifests in two regimes: reasoning RL, where...
2026-04-10 | Weiyang Guo, Zesheng Shi, Liye Zhao, ...
While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), existing training paradigms face significant limitations: Zero-RL suffers from inefficient exploration and mode degradation due to a lack of prior guidance, while SFT-then-RL is limited by high...
2026-04-10 | Elena Eleftheriou, George Pallis, Mar...
Large Language Models (LLMs) are widely used by students, yet their tendency to provide fast and complete answers may discourage reflection and foster overconfidence. We examined how alternative LLM interaction designs support deeper thinking without excessively increasing cognitive burden. We...
2026-04-10 | Jingyu Zhang, Tianjian Li, William Ju...
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The...
2026-04-10 | Dasen Dai, Shuoqi Li, Ronghao Chen, H...
UI-to-Code generation requires vision-language models (VLMs) to produce thousands of tokens of structured HTML/CSS from a single screenshot, making visual token efficiency critical. Existing compression methods either select tokens at inference time using task-agnostic heuristics, or zero out low-attention features...
2026-01-07 | Ruiyi Guo, Bodong Zhang
Artificial intelligence governance exhibits a striking paradox: while major jurisdictions converge rhetorically around concepts such as safety, risk, and accountability, their regulatory frameworks remain fundamentally divergent and mutually unintelligible. This paper argues that this fragmentation cannot be explained solely by...
2026-01-07 | Tom Deckenbrunnen, Alessio Buscemi, M...
The EU AI Act adopts a horizontal and adaptive approach to govern AI technologies characterised by rapid development and unpredictable emerging capabilities. To maintain relevance, the Act embeds provisions for regulatory learning. However, these provisions operate within a complex network...
2026-01-06 | Simon Chesterman
This essay examines the evolving concept of legal personality through the lens of recent developments in artificial intelligence and the possible emergence of superintelligence. Legal systems have long been open to extending personhood to non-human entities, most prominently corporations, for...
2026-01-03 | Wenbo Wu, George Konstantinidis
Trust and Reputation Management Systems (TRMSs) are critical for the modern web, yet their reliance on subjective user ratings or narrow Quality of Service (QoS) metrics lacks objective grounding. Concurrently, while regulatory frameworks like GDPR and HIPAA provide objective behavioral...
2025-12-29 | Jake Hartnell, Eugenio Battaglia
Current DAO governance praxis limits organizational expressivity and reduces complex organizational decisions to token-weighted voting due to on-chain computational limits. This paper proposes verifiable off-chain computation (leveraging Verifiable Services, TEEs, and ZK proofs) as a framework to transcend these constraints...
2025-12-26 | Sunil Arora, John Hastings
Natural Language Processing (NLP) systems are increasingly used in sensitive domains such as healthcare, finance, and government, where they handle large volumes of personal and regulated data. However, these systems introduce distinct risks related to security, privacy, and regulatory compliance...
2025-12-22 | Shaun Khoo, Jessica Foo, Roy Ka-Wei Lee
Agentic AI systems present both significant opportunities and novel risks due to their capacity for autonomous action, encompassing tasks such as code execution, internet interaction, and file modification. This poses considerable challenges for effective organizational governance, particularly in comprehensively identifying,...
2025-12-21 | Yang Ni, Tong Yang
Large Language Models (LLMs) and AI chatbots are increasingly used for emotional and mental health support due to their low cost, immediacy, and accessibility. However, when safety guardrails are triggered, conversations may be abruptly terminated, introducing a distinct form of...
2025-12-20 | Wei Meng
This paper tackles practical challenges in governing child centered artificial intelligence: policy texts state principles and requirements but often lack reproducible evidence anchors, explicit causal pathways, executable governance toolchains, and computable audit metrics. We propose Graph-GAP, a methodology that decomposes...
2025-12-19 | Lucia Velasco, Charles Martinet, Henr...
This policy memo examines the evolution of the international AI Summit series, initiated at Bletchley Park in 2023 and continued through Seoul in 2024 and Paris in 2025, as a forum for cooperation on the governance of advanced artificial intelligence....
2025-12-19 | Robin Schimmelpfennig, Mark Díaz, Vin...
Over a billion users across the globe interact with AI systems engineered with increasing sophistication to mimic human traits. This shift has triggered urgent debate regarding Anthropomorphism, the attribution of human characteristics to synthetic agents, and its potential to induce...
2025-12-18 | Otman A. Basir
Artificial intelligence systems are increasingly deployed in domains that shape human behaviour, institutional decision-making, and societal outcomes. Existing responsible AI and governance efforts provide important normative principles but often lack enforceable engineering mechanisms that operate throughout the system lifecycle. This...
2025-12-18 | A. Talha Yalta, A. Yasemin Yalta
Growing concerns about fairness, privacy, robustness, and transparency have made it a central expectation of AI governance that automated decisions be explainable by institutions and intelligible to affected parties. We introduce the Smart Data Portfolio (SDP) framework, which treats data...
2025-12-17 | Atte Ojanen, Johannes Anttila, Thilo ...
The rapid advancements in artificial intelligence (AI) present unique challenges for policymakers that seek to govern the technology. In this context, the Delphi method has become an established way to identify consensus and disagreement on emerging technological issues among experts...
2025-12-16 | Francesca Gomez, Adam Buick, Leah Fer...
Frontier AI developers operate at the intersection of rapid technical progress, extreme risk exposure, and growing regulatory scrutiny. While a range of external evaluations and safety frameworks have emerged, comparatively little attention has been paid to how internal organizational assurance...