PaperNest
HomeSearchDiscoverBookshelfSettings

PaperNest — Academic Paper Aggregator

Searching across arXiv, Semantic Scholar, OpenAlex, CORE, CrossRef, DBLP, SciELO, BDTD, Oasisbr & RCAAP

HomeSearchDiscoverSaved

Discover

Explore trending and recent academic papers

Trending Papers

Cancer statistics, 2025

OpenAlex
20251727 citations

Rebecca L. Siegel, Tyler B. Kratzer, Angela N. Giaquinto, Hyuna Sung, Ahmedin Jemal

Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths in the United States and compiles the most recent data on population-based cancer occurrence and outcomes using incidence data collected by central cancer registries (through 2021) and mortality data collected by the National Center for Health Statistics (through 2022). In 2025, 2,041,910 new cancer cases and 618,120 cancer deaths are projected to occur in the United States. The cancer mortality rate continued to decline through 2022, averting nearly 4.5 million deaths since 1991 because of smoking reductions, earlier detection for some cancers, and improved treatment. Yet alarming disparities persist; Native American people bear the highest cancer mortality, including rates that are two to three times those in White people for kidney, liver, stomach, and cervical cancers. Similarly, Black people have two-fold higher mortality than White people for prostate, stomach, and uterine corpus cancers. Overall cancer incidence has generally declined in men but has risen in women, narrowing the male-to-female rate ratio (RR) from a peak of 1.6 (95% confidence interval, 1.57-1.61) in 1992 to 1.1 (95% confidence interval, 1.12-1.12) in 2021. However, rates in women aged 50-64 years have already surpassed those in men (832.5 vs. 830.6 per 100,000), and younger women (younger than 50 years) have an 82% higher incidence rate than their male counterparts (141.1 vs. 77.4 per 100,000), up from 51% in 2002. Notably, lung cancer incidence in women surpassed that in men among people younger than 65 years in 2021 (15.7 vs. 15.4 per 100,000; RR, 0.98, p = 0.03). In summary, cancer mortality continues to decline, but future gains are threatened by rampant racial inequalities and a growing burden of disease in middle-aged and young adults, especially women. Continued progress will require investment in cancer prevention and access to equitable treatment, especially for Native American and Black individuals.

Unification of Dimensional Depth and Scanning Parameter Via DDP&DUP

OpenAlex
20261576 citations

Edward Witten

<p>This thesis is the first thesis of the trilogy thesis project "Spectrum of Dimension". I addressed classical dimension as continuous entity, and devised two operators DDP, and DUP to deal with the continuity. Handling singularity problem through distribution theory, I managed to prove these operators are isomorphic to integral-differential operators. Following problem was the justification of the synthesis of dimension, which is also expressed as 'Conservation of Geometric Measure'. Via Fubni's Theorem, it is possible to treat dimension not only as discrete, but continuum. This unique theory is proved by the duality theorem by Edward Witten, concluding that reducing dimension via Scanning Parameter does not destroy information but merely <br>transforms its state of existence—from "Geometric Coexistence" to "Temporal Sequentiality.</p>

Materials Today: Proceedings

OpenAlex
20251386 citations

Thy Truc Doan

Deliberative Democracy or Agonistic Pluralism?

OpenAlex
20261298 citations

Chantal Mouffe

As testified by the increasing success of the extreme right in several countries, western societies are witnessing a growing disaffection with democratic institutions. Such a disaffection may have serious consequences for the future of democracy. Unfortunately, liberal democratic societies are ill-prepared to confront the present challenge, since they are unable to grasp its nature. One of the main reasons for this inability lies in the type of political theory currently in vogue, dominated as it is by an individualistic, uni-versalistic, and rationalistic framework. Such a framework erases the dimension of the political and impedes envisaging in an adequate manner the nature of a pluralistic democratic public sphere.

The Influence of Ectotrophic Mycorrhizal Fungi on the Resistance of Pine Roots to Pathogenic Infections. I. Antagonism of Mycorrhizal Fungi to Root Pathogenic Fungi and Soil Bacteria

OpenAlex
20251133 citations

Donald H. Marx

Antagonism of ectotrophic mycorrhizal fungi to <i>Phytophthora cinnamomi</i>, other root pathogenic fungi, and soil bacteria was examined. In agar plate tests, <i>Laccaria laccata</i>, <i>Lactarius deliciosus</i>, <i>Leucopaxillus cerealis</i> var. <i>piceina</i>, <i>Pisolithus tinctorius</i>, and <i>Suillus luteus</i> inhibited growth of nearly half of the 48 different fungal root pathogens. <i>Leucopaxillus cerealis</i> var. <i>piceina</i> inhibited 92% of the test pathogens. Differences in sensitivity of several isolates of <i>P. cinnamomi</i> to inhibitions by this symbiont were not found. Culture filtrates of <i>L. cerealis</i> var. <i>piceina</i> were inhibitory also to growth of <i>P. cinnamomi</i> and soil bacteria. Zoospore germination was inhibited completely in filtrates of this symbiont. Maximum antibiotic production occurred during the rapid growth phase in liquid culture. Length of culture incubation and temperature strongly influenced production of inhibitory substances by <i>L. cerealis</i> var. <i>piceina</i> in liquid culture. It grew best from 10 to 20 C, whereas <i>P. tinctorius</i> grew best from 30 to 35 C in liquid culture.

An Introduction to Input/Output Automata

OpenAlex
20261020 citations

Nancy Lynch, Marc R. Tuttle

We describe the input/output automaton model, a model for concurrent and distributed discrete event systems. We define the model, illustrate the model with several examples concerning vending machines and a leader election algorithm, and survey the ways in which the model has been used.

Arts of the Contact Zone

OpenAlex
20251003 citations

Mary Louise Pratt

Whenever the subject of literacy comes up, what often pops first into my mind is a conversation I overheard eight years ago between my son Sam and his best friend, Willie, aged six and seven, respectively: “Why don’t you trade me Many Trails for Carl Yats . . . Yes, it’s . . . Ya-strum-scrum.” “That’s not how you say it, dummy, it’s Carl Yes . . . Yes . . . oh, I don’t know.” Sam and Willie had just discovered baseball cards. Many Trails was their decoding, with the help of first-grade English phonics, of the name Manny Trillo. The name they were quite rightly stumped on was Carl Yastremski. That was the first time I remembered seeing them put their incipient literacy to their own use, and I was of course thrilled.

BindingDB Entry 50022499: ZD1839 (Iressa): an orally active inhibitor of epidermal growth factor signaling with potential for cancer therapy.

OpenAlex
2025933 citations

Alan E. Wakeling

The epidermal growth factor receptor (EGFR) is a promising target for anticancer therapy because of its role in tumor growth, metastasis and angiogenesis, and tumor resistance to chemotherapy and radiotherapy. We have developed a low-molecular-weight EGFR tyrosine kinase inhibitor (EGFR-TKI), ZD1839 (Iressa(2) ). ZD1839, a substituted anilinoquinazoline, is a potent EGFR-TKI (IC(50) = 0.033 micro M) that selectively inhibits EGF-stimulated tumor cell growth (IC(50) = 0.054 micro M) and that blocks EGF-stimulated EGFR autophosphorylation in tumor cells. In studies with mice bearing a range of human tumor-derived xenografts, ZD1839 given p.o. once a day inhibited tumor growth in a dose-dependent manner. The level of expression of EGFR did not determine xenograft tumor sensitivity to ZD1839. Long-term ZD1839 (>3 months) treatment of mice bearing A431 xenografts was well tolerated, and ZD1839 completely inhibited tumor growth and induced regression of established tumors. No drug-resistant tumors appeared during ZD1839 treatment, but some tumors regrew after drug withdrawal. These studies indicate the potential utility of ZD1839 in the treatment of many human tumors and indicate that continuous once-a-day p.o. dosing might be a suitable therapeutic regimen.

Multiscale structural similarity for image quality assessment

OpenAlex
2025876 citations

Zhou Wang

The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.

An Improved Grading System for Measuring Plant Diseases

OpenAlex
2025870 citations

J. G. Horsfall, R. W. Barratt

Heretofore, in recording severity of plant diseases by grading, the grades have been assigned equal values in percentage. According to the Weber-Fechner law, the human eye distinguishes according to the logarithm of the light intensity. Hence, the grades should be based on equal ability to distinguish, not on equal disease. Below 50 percent, the eye sees the amount of diseased tissue. Above 50 percent, it sees the amount of disease-free tissue. A new scoring system is based on 50 percent as a midpoint. The grades differ by a factor of two in either direction as follows: 1 = 0, 2 = 0 to 3, 3 = 3 to 6, to 12, 5 = 12 to 25, 6 = 25 to 50, 7 = 50 to 75, 8 = 75 to 87, 9 = 87 to 94, 10 = 94 to 97, 11 = 97 to 100, 12 = 100. Several plants (20 or more) at random are graded. Mean grade = grade readings + number of readings. A calibration curve is set up with grade numbers on the X-axis and percentage disease on a special semi-log. Y-axis with one and one-half phases from either end up to 50 percent. The grid has the aspect of arithmetic-probability paper. This scheme has been very useful in fungicide research, varietal resistance, etc. It should be useful in plant disease surveying.

BindingDB Entry 50022536: Induction of apoptosis and cell cycle arrest by CP-358,774, an inhibitor of epidermal growth factor receptor tyrosine kinase.

OpenAlex
2025844 citations

James D. Moyer

The epidermal growth factor receptor (EGFR) is overexpressed in a significant percentage of carcinomas and contributes to the malignant phenotype. CP-358,774 is a directly acting inhibitor of human EGFR tyrosine kinase with an IC50 of 2 nM and reduces EGFR autophosphorylation in intact tumor cells with an IC50 of 20 nM. This inhibition is selective for EGFR tyrosine kinase relative to other tyrosine kinases we have examined, both in assays of isolated kinases and whole cells. At doses of 100 mg/kg, CP-358,774 completely prevents EGF-induced autophosphorylation of EGFR in human HN5 tumors growing as xenografts in athymic mice and of the hepatic EGFR of the treated mice. CP-358,774 inhibits the proliferation of DiFi human colon tumor cells at submicromolar concentrations in cell culture and blocks cell cycle progression at the G1 phase. This inhibitor produces a marked accumulation of retinoblastoma protein in its underphosphorylated form and accumulation of p27KIP1 in DiFi cells, which may contribute to the cell cycle block. Inhibition of the EGFR also triggers apoptosis in these cells as determined by formation of DNA fragments and other criteria. These results indicate that CP-358,774 has potential for the treatment of tumors that are dependent on the EGFR pathway for proliferation or survival.

Journal of Experimental Psychology: Human Perception and Performance 2000–2005.

OpenAlex
2025827 citations

David A. Rosenbaum

The present author was honored to serve as editor of Journal of Experimental Psychology: Human Perception and Performance (JEP:HPP) for the 2000-2005 volumes, carrying on the work of his predecessors. Along with the happiness and pride he felt during his time as editor, he also experienced disquiet. He captures the source of the unease with an anecdote from when he was an independent researcher. These comments are not the mournful expressions of an-about-to-become dinosaur. Rather, they are motivated by the conviction that approaches which have proven useful should continue to be supported. Others have argued this point as well vis à vis the simultaneous pursuit of neural and behavioral science. Pursuing both paths is an imperative for the community at large. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

Recent AI/ML Papers

Model Agreement via Anchoring

arXiv
2026

Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

Numerous lines of aim to control $\textit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and standard notion of model disagreement in real-valued prediction problems, namely the expected squared difference in predictions between two models trained on independent samples, without any coordination of the training processes. We would like to be able to drive disagreement to zero with some natural parameter(s) of the training procedure using analyses that can be applied to existing training methodologies. We develop a simple general technique for proving bounds on independent model disagreement based on $\textit{anchoring}$ to the average of two models within the analysis. We then apply this technique to prove disagreement bounds for four commonly used machine learning algorithms: (1) stacked aggregation over an arbitrary model class (where disagreement is driven to 0 with the number of models $k$ being stacked) (2) gradient boosting (where disagreement is driven to 0 with the number of iterations $k$) (3) neural network training with architecture search (where disagreement is driven to 0 with the size $n$ of the architecture being optimized over) and (4) regression tree training over all regression trees of fixed depth (where disagreement is driven to 0 with the depth $d$ of the tree architecture). For clarity, we work out our initial bounds in the setting of one-dimensional regression with squared error loss -- but then show that all of our results generalize to multi-dimensional regression with any strongly convex loss.

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

arXiv
2026

Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu

We identify occlusion reasoning as a fundamental yet overlooked aspect for 3D layout-conditioned generation. It is essential for synthesizing partially occluded objects with depth-consistent geometry and scale. While existing methods can generate realistic scenes that follow input layouts, they often fail to model precise inter-object occlusions. We propose SeeThrough3D, a model for 3D layout conditioned generation that explicitly models occlusions. We introduce an occlusion-aware 3D scene representation (OSCR), where objects are depicted as translucent 3D boxes placed within a virtual environment and rendered from desired camera viewpoint. The transparency encodes hidden object regions, enabling the model to reason about occlusions, while the rendered viewpoint provides explicit camera control during generation. We condition a pretrained flow based text-to-image image generation model by introducing a set of visual tokens derived from our rendered 3D representation. Furthermore, we apply masked self-attention to accurately bind each object bounding box to its corresponding textual description, enabling accurate generation of multiple objects without object attribute mixing. To train the model, we construct a synthetic dataset with diverse multi-object scenes with strong inter-object occlusions. SeeThrough3D generalizes effectively to unseen object categories and enables precise 3D layout control with realistic occlusions and consistent camera control.

A Dataset is Worth 1 MB

arXiv
2026

Elad Kimchi Shoshani, Leeyam Gabay, Yedid Hoshen

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on diverse hardware and software frameworks, transmitting a pre-trained model is often infeasible; instead, agents require raw data to train their own task-specific models locally. While dataset distillation attempts to compress training signals, current methods struggle to scale to high-resolution data and rarely achieve sufficiently small files. In this paper, we propose Pseudo-Labels as Data (PLADA), a method that completely eliminates pixel transmission. We assume agents are preloaded with a large, generic, unlabeled reference dataset (e.g., ImageNet-1K, ImageNet-21K) and communicate a new task by transmitting only the class labels for specific images. To address the distribution mismatch between the reference and target datasets, we introduce a pruning mechanism that filters the reference dataset to retain only the labels of the most semantically relevant images for the target task. This selection process simultaneously maximizes training efficiency and minimizes transmission payload. Experiments on 10 diverse datasets demonstrate that our approach can transfer task knowledge with a payload of less than 1 MB while retaining high classification accuracy, offering a promising solution for efficient dataset serving.

SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

arXiv
2026

Simon Roschmann, Paul Krzakala, Sonia Mazelet, Quentin Bouniot, Zeynep Akata

The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the world. Recent work exploits this convergence by aligning frozen pretrained vision and language models with lightweight alignment layers, but typically relies on contrastive losses and millions of paired samples. In this work, we ask whether meaningful alignment can be achieved with substantially less supervision. We introduce a semi-supervised setting in which pretrained unimodal encoders are aligned using a small number of image-text pairs together with large amounts of unpaired data. To address this challenge, we propose SOTAlign, a two-stage framework that first recovers a coarse shared geometry from limited paired data using a linear teacher, then refines the alignment on unpaired samples via an optimal-transport-based divergence that transfers relational structure without overconstraining the target space. Unlike existing semi-supervised methods, SOTAlign effectively leverages unpaired images and text, learning robust joint embeddings across datasets and encoder pairs, and significantly outperforming supervised and semi-supervised baselines.

FlashOptim: Optimizers for Memory Efficient Training

arXiv
2026

Jose Javier Gonzalez Ortiz, Abhay Gupta, Chris Renard, Davis Blalock

Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and one or more optimizer state variables. With each of these values typically requiring 4 bytes, training even a 7 billion parameter model can be impractical for researchers with less than 100GB of accelerator memory. We introduce FlashOptim, a suite of optimizations that reduces per-parameter memory by over 50% while preserving model quality and API compatibility. Our approach introduces two key techniques. First, we improve master weight splitting by finding and exploiting a tight bound on its quantization error. Second, we design companding functions that greatly reduce the error in 8-bit optimizer state quantization. Together with 16-bit gradients, these techniques reduce AdamW memory from 16 bytes to 7 bytes per parameter, or 5 bytes with gradient release. They also cut model checkpoint sizes by more than half. Experiments with FlashOptim applied to SGD, AdamW, and Lion show no measurable quality degradation on any task from a collection of standard vision and language benchmarks, including Llama-3.1-8B finetuning.

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

arXiv
2026

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. [...]

Differentiable Zero-One Loss via Hypersimplex Projections

arXiv
2026

Camilo Gomez, Pengyang Wang, Liansheng Tang

Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.

Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset

arXiv
2026

Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay DeYoung, Jena D. Hwang, Uri Katz, Mark Polak, Sangho Suh, Harshit Surana, Aryeh Tiktinsky, Shriya Atmakuri, Jonathan Bragg, Mike D'Arcy, Sergey Feldman, Amal Hassan-Ali, Rubén Lozano, Bodhisattwa Prasad Majumder, Charles McGrady, Amanpreet Singh, Brooke Vlahos, Yoav Goldberg, Doug Downey

AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

arXiv
2026

Yuhao Liu, Salim Ullah, Akash Kumar

Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185 to 3.5671 times speedup in inferring mixed-precision models and has less critical path delay, supporting a higher clock frequency (250MHz).

Utilizing LLMs for Industrial Process Automation

arXiv
2026

Salim Fares

A growing number of publications address the best practices to use Large Language Models (LLMs) for software engineering in recent years. However, most of this work focuses on widely-used general purpose programming languages like Python due to their widespread usage training data. The utility of LLMs for software within the industrial process automation domain, with highly-specialized languages that are typically only used in proprietary contexts, remains underexplored. This research aims to utilize and integrate LLMs in the industrial development process, solving real-life programming tasks (e.g., generating a movement routine for a robotic arm) and accelerating the development cycles of manufacturing systems.

Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks

arXiv
2026

Kunihiro Miyazaki, Takanobu Kawahara, Stephen Roberts, Stefan Zohren

The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and less transparent decision-making. Therefore, we propose a multi-agent LLM trading framework that explicitly decomposes investment analysis into fine-grained tasks, rather than providing coarse-grained instructions. We evaluate the proposed framework using Japanese stock data, including prices, financial statements, news, and macro information, under a leakage-controlled backtesting setting. Experimental results show that fine-grained task decomposition significantly improves risk-adjusted returns compared to conventional coarse-grained designs. Crucially, further analysis of intermediate agent outputs suggests that alignment between analytical outputs and downstream decision preferences is a critical driver of system performance. Moreover, we conduct standard portfolio optimization, exploiting low correlation with the stock index and the variance of each system's output. This approach achieves superior performance. These findings contribute to the design of agent structure and task configuration when applying LLM agents to trading systems in practical settings.

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

arXiv
2026

Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper Götting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael

Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls (95% CI [2.63, 6.87]). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.