Science

Transformers and State Space Models: A Connection in Sequential Data Modeling

Feb 28, 2026

A dual perspective on how decoder-only Transformers and State Space Models can be viewed as two sides of the same coin in modeling sequential data, with an implication for efficient algorithm design.
Belief Net: Bridging Probabilistic Modeling and Deep Learning for Sequential Data

Feb 21, 2026

A HMM-based learning framework that understand sequential data while ensuring full interpretability and fast convergence.
Positional Encoding in Transformer Language Models

Feb 13, 2026

An exploration of the role of positional encodings in transformer language models, including their construction and impact on model performance.
How regularization affects the critical points in linear neural networks

Nov 08, 2018

The existence and optimality properties of the critical points of the linear neural networks with mean-squared loss function in the face of regularization.