Science
-
A dual perspective on how decoder-only Transformers and State Space Models can be viewed as two sides of the same coin in modeling sequential data, with an implication for efficient algorithm design.
-
A HMM-based learning framework that understand sequential data while ensuring full interpretability and fast convergence.
-
An exploration of the role of positional encodings in transformer language models, including their construction and impact on model performance.
-
The existence and optimality properties of the critical points of the linear neural networks with mean-squared loss function in the face of regularization.