Research Dashboard

Automated surveillance of arXiv for my core research tracks.

1. Kinetic AI Risk

Scope: Intersection of Large Language Models (LLM) and ICS/SCADA.

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

2026-01-09 | Chengming Cui, Tianxin Wei, Ziyi Chen...

Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical way to combine these capabilities without retraining. However, existing ensemble approaches suffer from fundamental limitations. Most rely...

Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks

2026-01-09 | Elias Lumer, Faheem Nizar, Akshaya Ja...

Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with increasingly large context windows. However, although major LLM providers offer prompt caching to...

Open-Vocabulary 3D Instruction Ambiguity Detection

2026-01-09 | Jiayu Ding, Haoran Tang, Ge Li

In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embodied AI research overlooks this, assuming instructions are clear and focusing on...

Distilling Feedback into Memory-as-a-Tool

2026-01-09 | Víctor Gallego

We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for...

Global Optimization for Combinatorial Geometry Problems

2026-01-09 | Timo Berthold, Dominik Kamp, Gioni Me...

Recent progress in LLM-driven algorithm discovery, exemplified by DeepMind's AlphaEvolve, has produced new best-known solutions for a range of hard geometric and combinatorial problems. This raises a natural question: to what extent can modern off-the-shelf global optimization solvers match such...

Can We Predict Before Executing Machine Learning Agents?

2026-01-09 | Jingsheng Zheng, Jintian Zhang, Yujie...

Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize...

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

2026-01-09 | Haoming Xu, Ningyuan Zhao, Yunzhi Yao...

As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show...

Can AI mediation improve democratic deliberation?

2026-01-09 | Michael Henry Tessler, Georgina Evans...

The strength of democracy lies in the free and equal exchange of diverse viewpoints. Living up to this ideal at scale faces inherent tensions: broad participation, meaningful deliberation, and political equality often trade off with one another (Fishkin, 2011). We...

HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search

2026-01-09 | Zihang Tian, Rui Li, Jingsen Zhang, X...

Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures while overlooking parameter settings, which are critical for task performance. In this paper, we...

TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

2026-01-09 | Dawei Wang, Chengming Zhou, Di Zhao, ...

Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and decision-making emerging as core general-purpose capabilities for adapting to diverse scenarios and tasks. Real-time strategy (RTS) games serve as an...

Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense

2026-01-09 | Víctor Mayoral-Vilches, María Sanz-Gó...

AI-driven penetration testing now executes thousands of actions per hour but still lacks the strategic intuition humans apply in competitive security. To build cybersecurity superintelligence --Cybersecurity AI exceeding best human capability-such strategic intuition must be embedded into agentic reasoning processes....

Continual-learning for Modelling Low-Resource Languages from Large Language Models

2026-01-09 | Santosh Srinath K, Mudit Somani, Varu...

Modelling a language model for a multi-lingual scenario includes several potential challenges, among which catastrophic forgetting is the major challenge. For example, small language models (SLM) built for low-resource languages by adapting large language models (LLMs) pose the challenge of...

IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck

2026-01-09 | Huilin Deng, Hongchen Luo, Yue Zhu, L...

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Model (LLM) reasoning have been hindered by a persistent challenge: exploration collapse. The semantic homogeneity of random rollouts often traps models in narrow, over-optimized behaviors. While existing methods...

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

2026-01-09 | Alexandra Dragomir, Florin Brad, Radu...

Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given...

Left, Right, or Center? Evaluating LLM Framing in News Classification and Generation

2026-01-09 | Molly Kennedy, Ali Parker, Yihong Liu...

Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle wording choices can shape interpretation. Across nine state-of-the-art LLMs, we study political framing by...

2. GRC Engineering & AI Governance

Scope: AI Governance, Policy as Code, and Compliance Engineering.

From Abstract Threats to Institutional Realities: A Comparative Semantic Network Analysis of AI Securitisation in the US, EU, and China

2026-01-07 | Ruiyi Guo, Bodong Zhang

Artificial intelligence governance exhibits a striking paradox: while major jurisdictions converge rhetorically around concepts such as safety, risk, and accountability, their regulatory frameworks remain fundamentally divergent and mutually unintelligible. This paper argues that this fragmentation cannot be explained solely by...

From Slaves to Synths? Superintelligence and the Evolution of Legal Personality

2026-01-06 | Simon Chesterman

This essay examines the evolving concept of legal personality through the lens of recent developments in artificial intelligence and the possible emergence of superintelligence. Legal systems have long been open to extending personhood to non-human entities, most prominently corporations, for...

Compliance as a Trust Metric

2026-01-03 | Wenbo Wu, George Konstantinidis

Trust and Reputation Management Systems (TRMSs) are critical for the modern web, yet their reliance on subjective user ratings or narrow Quality of Service (QoS) metrics lacks objective grounding. Concurrently, while regulatory frameworks like GDPR and HIPAA provide objective behavioral...

Verifiable Off-Chain Governance

2025-12-29 | Jake Hartnell, Eugenio Battaglia

Current DAO governance praxis limits organizational expressivity and reduces complex organizational decisions to token-weighted voting due to on-chain computational limits. This paper proposes verifiable off-chain computation (leveraging Verifiable Services, TEEs, and ZK proofs) as a framework to transcend these constraints...

With Great Capabilities Come Great Responsibilities: Introducing the Agentic Risk & Capability Framework for Governing Agentic AI Systems

2025-12-22 | Shaun Khoo, Jessica Foo, Roy Ka-Wei Lee

Agentic AI systems present both significant opportunities and novel risks due to their capacity for autonomous action, encompassing tasks such as code execution, internet interaction, and file modification. This poses considerable challenges for effective organizational governance, particularly in comprehensively identifying,...

Computable Gap Assessment of Artificial Intelligence Governance in Children's Centres: Evidence-Mechanism-Governance-Indicator Modelling of UNICEF's Guidance on AI and Children 3.0 Based on the Graph-GAP Framework

2025-12-20 | Wei Meng

This paper tackles practical challenges in governing child centered artificial intelligence: policy texts state principles and requirements but often lack reproducible evidence anchors, explicit causal pathways, executable governance toolchains, and computable audit metrics. We propose Graph-GAP, a methodology that decomposes...

The Future of the AI Summit Series

2025-12-19 | Lucia Velasco, Charles Martinet, Henr...

This policy memo examines the evolution of the international AI Summit series, initiated at Bletchley Park in 2023 and continued through Seoul in 2024 and Paris in 2025, as a forum for cooperation on the governance of advanced artificial intelligence....

Smart Data Portfolios: A Quantitative Framework for Input Governance in AI

2025-12-18 | A. Talha Yalta, A. Yasemin Yalta

Growing concerns about fairness, privacy, robustness, and transparency have made it a central expectation of AI governance that automated decisions be explainable by institutions and intelligible to affected parties. We introduce the Smart Data Portfolio (SDP) framework, which treats data...

How frontier AI companies could implement an internal audit function

2025-12-16 | Francesca Gomez, Adam Buick, Leah Fer...

Frontier AI developers operate at the intersection of rapid technical progress, extreme risk exposure, and growing regulatory scrutiny. While a range of external evaluations and safety frameworks have emerged, comparatively little attention has been paid to how internal organizational assurance...