TY - JOUR

T1 - Mathematical formula representation via tree embeddings

AU - Wang, Zichao

AU - Lan, Andrew

AU - Baraniuk, Richard

N1 - Publisher Copyright:
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

PY - 2021

Y1 - 2021

N2 - We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.

AB - We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.

UR - http://www.scopus.com/inward/record.url?scp=85109647988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85109647988&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85109647988

SN - 1613-0073

VL - 2895

SP - 121

EP - 133

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 3rd International Workshop on Intelligent Textbooks, iTextbooks 2021

Y2 - 15 June 2021

ER -