Science

  1. A dual perspective on how decoder-only Transformers and State Space Models can be viewed as two sides of the same coin in modeling sequential data, with an implication for efficient algorithm design.

  2. A HMM-based learning framework that understand sequential data while ensuring full interpretability and fast convergence.

  3. An exploration of the role of positional encodings in transformer language models, including their construction and impact on model performance.

  4. The existence and optimality properties of the critical points of the linear neural networks with mean-squared loss function in the face of regularization.