Mathematical formula representation via tree embeddings

Zichao Wang, Andrew Lan, Richard Baraniuk

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.

Original languageEnglish (US)
Pages (from-to)121-133
Number of pages13
JournalCEUR Workshop Proceedings
StatePublished - 2021
Event3rd International Workshop on Intelligent Textbooks, iTextbooks 2021 - Virtual, Online
Duration: Jun 15 2021 → …

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Mathematical formula representation via tree embeddings'. Together they form a unique fingerprint.

Cite this