123x Filetype PDF File size 0.61 MB Source: homes.cs.washington.edu
Parsing Algebraic Word Problems into Equations RikKoncel-Kedziorski, Hannaneh Hajishirzi, † † Ashish Sabharwal , Oren Etzioni , and Siena Dumas Ang University of Washington, †Allen Institute for AI {kedzior,hannaneh,sienaang}@uw.edu, {ashishs,orene}@allenai.org Abstract Oceanside Bike Rental Shop = charges 17dollars plus This paper formalizes the problem of solv- 7 dollars an hour for renting a + 80 $ $ ingmulti-sentencealgebraicwordproblemsas bike. Tom paid 80 dollars to that of generating and scoring equation trees. 17 ∗ rent a bike. How many hours $ $ Weuse integer linear programming to gener- did he pay to have the bike ate equation trees and score their likelihood by 7 xh checked out? $ learning local and global discriminative mod- solution : 9 17+(7∗x)=80 els. These models are trained on a small set of word problems and their answers, without Figure 1: Example problem and solution any manual annotation, in order to choose the equation that best matches the problem text. Werefer to the overall system as ALGES. sentences into a coherent mental model. In contrast, We compare ALGES with previous work and the challenge for an NLP system is to “make sense” showthatitcoversthefullgamutofarithmetic of the narrative, which may refer to arbitrary activ- operations whereas Hosseini et al. (2014) only ities like renting bikes, collecting coins, or eating handle addition and subtraction. In addition, cookies. ALGESovercomesthebrittlenessoftheKush- Previous work coped with the open-domain as- manetal. (2014) approach on single-equation pect of algebraic word problems by relying on deter- problems, yielding a 15% to 50% reduction in error. ministic state transitions based on verb categoriza- tion (Hosseini et al., 2014) or by learning templates 1 Introduction that cover equations of particular forms (Kushmanet al., 2014). We have discovered, however, that both Grade-school algebra word problems are brief nar- approaches are brittle, particularly as training data ratives (see Figure 1). A typical problem first de- is scarce in this domain, and the space of equations scribes a partial world state consisting of characters, grows exponentially with the number of quantities entities, and quantities. Next it updates the condition mentioned in the math problem. 1 of an entity or explicates the relationship between We introduce ALGES, which maps an unseen entities. Finally, it poses a question about a quantity multi-sentence algebraic word problem into a set of in the narrative. possible equation trees. Figure 1 shows an equation An ordinary child has to learn the required alge- tree alongside the word problem it represents. bra, but will easily grasp the narrative utilizing ex- ALGES generates the space of trees via Integer tensive world knowledge, large vocabulary, word- Linear Programming (ILP), which allows it to con- sense disambiguation, coreference resolution, mas- 1The code and data is publicly available at tery of syntax, and the ability to combine individual https://gitlab.cs.washington.edu/ALGES/TACL2015 . strain the space of trees to represent type-consistent ing across the multi-sentence discourse of the prob- algebraic equations satisfying as many desirable lem text. Recent efforts in the math domain have properties as possible. ALGES learns to map spans studied number word problems (Shi et al., 2015), of text to arithmetic operators, to combine them logic puzzle problems (Mitra and Baral, 2015), given the global context of the problem, and to arithmetic word problems (Hosseini et al., 2014; choosethe“best”treecorrespondingtotheproblem. Roy et al., 2015a), algebra word problems (Kush- The training set for ALGES consists of unannotated man et al., 2014; Zhou et al., 2015), and geometry algebraic word problems and their solution. Solv- wordproblems(Seoetal., 2015). ing the equation represented by such a tree is trivial. Royet al. (2015b) introduce a system for reason- ALGESisdescribed in detail in Section 4. ing about quantities, which they extend to arithmetic ALGES is able to solve word problems with word problems that can be solved by choosing only single-variable equations like the ones in Figure 1. two values from the text and applying an arithmetic In contrast to Hosseini et al. (2014), ALGES covers operator. By comparison, our method learns to solve +,−,∗, and /. The work of Kushman et al. (2014) complex problems with many operands where the has broader scope but we show that it relies heav- space of possible solutions is larger. ily on overlap between training and test data. When Hosseini et al. (2014) solve elementary addition that overlap is reduced, ALGES is 15% to 50% more and subtraction problems by learning verb cate- accurate than this system. gories. They ground the problem text to a seman- Our contributions are as follows: (1) We formal- tics of entities and containers, and decide if quanti- ize the problem of solving multi-sentence algebraic ties are increasing or decreasing in a container based word problems as that of generating and ranking upon the learned verb categories. While relying equation trees; (2) We show how to score the like- only on verb categories works well for +,−, model- lihood of equation trees by learning discriminative ing ∗,/ requires going beyond verbs. For instance, models trained from a small number of word prob- “Tina has 2 cats. John has 3 more cats than Tina. lems and their solutions – without any manual an- How many cats do they have together?” and “Tina notation; and (3) We demonstrate empirically that has 2 cats. John has 3 times as many cats as Tina. ALGES has broader scope than the system of Hos- Howmanycatsdotheyhavetogether?” haveidenti- seini et al. (2014), and overcomes the brittleness of cal verbs, but the indicated operation (+ and * resp.) the method of Kushman et al. (2014). is different. ALGES makes use of a richer seman- tic representation which facilitates deeper learning 2 Previous Work and a wider scope of application, solving problems involving the +,−,/, and ∗ operators (see Table 6). Our work is related to situated semantic interpre- Kushmanetal.(2014)introduceageneralmethod tation, which aims to map natural language sen- for solving algebra problems. This work can align a tences to formal meaning representations (Zelle and word problem to a system of equations with one or Mooney, 1996; Zettlemoyer and Collins, 2005; Ge two unknowns. They learn a mapping from word and Mooney, 2006; Kwiatkowski et al., 2010). problems to equation templates using global and lo- Morecloselyrelatedisworkonlanguagegrounding, cal features from the problem text. However, the whose goal is the interpretation of a sentence in the large space of equation templates makes it challeng- context of a world representation (Branavan et al., ing for this model to learn to find the best equation 2009; Liang et al., 2009; Chen et al., 2010; Bordes directly, as a sufficiently similar template may not et al., 2010; Feng and Lapata, 2010; Hajishirzi et al., have been observed during training. Instead, our 2011; Matuszek et al., 2012; Hajishirzi et al., 2012; method maps word problems to equation trees, tak- Artzi and Zettlemoyer, 2013; Koncel-Kedziorski et ing advantage of a richer representation of quanti- al., 2014; Yatskar et al., 2014; Seo et al., 2014; fied nouns and their properties, as well as the recur- Hixon et al., 2015). However, while most previ- sive nature of equation trees. These allow ALGES ous work considered individual sentences in isola- to use a bottom-up approach to learn the correspon- tion, solving word problems often requires reason- dence between spans of texts and arithmetic oper- ators (corresponding to intermediate nodes in the On Monday, 375 students went on a trip to the zoo. All 7 buses were filled tree). ALGES then scores equations using global and 4 students had to travel in cars. How many students were in each bus ? structure of the problem to produce the final result. 1. Ground text w into base Qsets (sec(on 5) Our work is also related to research in using Qnt: 375 Qnt: x Qnt: 7 Qnt: 4 Ent: Student Ent: Student Ent: Bus Ent: Student ILP to enforce global constraints in NLP appli- Ctr: Bus cations (Roth and Yih, 2004). Most previous 2. Use ILP to generate M equa(on trees T(w) (Sec(on 6) work (Srikumar and Roth, 2011; Goldwasser and T(w): subset of T(w) yielding correct solu(on T(w)\T(w) Roth, 2011; Berant et al., 2014; Liu et al., 2015) uti- l l = = = lizes ILP as an inference procedure to find the best - 4 375 4 s s s + - s global prediction over initially trained local classi- s s 4 375 + fiers. Similarly, we use ILP to enforce global and 375 * *s s s s s s domain specific constraints. We, however, use ILP 7 x 7 x 7 x b s b s b s to form candidate equations which are then used to 3. Train local model (sec(on 7.1) 4. Train global model (sec(on 7.2) generate training data for our classifiers. Our work : problem-‐tree pairs Tr : operator nodes in T (w) Tr local l global is also related to parser re-ranking (Collins, 2005; Training example Label Posi>ve examples Nega>ve examples T(w) (from ) T(w)\T(w) l (from ) GeandMooney,2005),whereare-rankermodelat- (7b,xs) * l tempts to improve the output of an existing proba- (375s,combine(7b,xs)) − 375−(7*x)=4 375+(7*x)=4 bilistic parser. Similarly, the global equation model (7b,xs) * 375=(7*x)+4 375=(7/x)+4 (combine(7b,xs),4s) + 375=(x*7)+4 375−(x+7)=4 designed in ALGES attempts to re-rank equations based on global problem structure. Figure 2: An overview of the process of learning for a wordproblemanditsQsets. 3 Setup and Problem Definition Given numeric quantities V and an unknown x An exhaustive enumeration over T quickly be- whose value is the answer we seek, an equation comes impractical as problem complexity increases over V and x is any valid mathematical expression and n = |V ∪ {x}| grows. Specifically, |T| > n−4 formedbycombiningelementsofV ∪{x}usingbi- h(n) = n!(n−1)!(n−1)2 , h(4) = 432,h(6) > nary operators from O = {+,−,∗,/,=} such that 1.7M,h(8) > 22B, etc. This vast search space x appears exactly once. When each element of V makes it challenging for a discriminative model to ˜ appears at most once in the equation, it may natu- learn to find t directly, as a sufficiently similar tree rally be represented as an equation tree where each may not have been observed during training. In- operator is a node with edges to its two operands.2 stead, our method first generates syntactically valid T denotes the set of all equation trees over V and x. equation trees, and then uses a bottom-up approach to score equations with a local model trained to map Problem Formulation. We address the problem spans of text to math operators, and a global model of solving grade-school algebra word problems that trained for coherence of the entire equation w.r.t. maptosingle equations. Solving such a word prob- global problem text structure. lemwamountstoselectinganequationtreetrepre- 4 OverviewoftheApproach senting the mathematical computation implicit in w. Figure 1 shows an example of w with quantities un- Figure 2 gives an overview of our method, also derlined, and the corresponding tree t. Formally, we detailed in Figure 3. In order to build equation use a joint probability distribution p(t,w) that de- trees, we use a compact representation for each node fineshow“well”anequationtreet ∈ T capturesthe called a Quantified Set or Qset to model natural lan- mathematical computation expressed in w. Given guage text quantities and their properties (e.g., ‘375 a word problem w as input, our goal is to compute students’ in ‘7 buses’). Qsets are used for tracking ˜ t = argmaxt∈T p(t|w). and combining quantities when learning the corre- 2Problems involving simultaneous equations require com- spondence between equation trees and text. bining multiple equation trees, one per equation. Definition 1. Given a math word problem w, let S Learning (word problems W, corresponding solutions L): 1. For every problem-solution pair (w ,ℓ ) with w ∈ W,ℓ ∈ L i i i i (a) S ←BaseQsetsobtained by Grounding text wi and Reordering the resulting Qsets (Section 5) (b) Ti ← Top M type-consistent equation tree candidates generated by ILP(wi) (Section 6) (c) T ←SubsetofT thatyieldsthecorrect numerical solution ℓ ℓ i i i (d) Add to Tr features hs ,s i with label op for each operator op combining Qsets s ,s in trees in T local 1 2 1 2 ℓ i (e) Add to Trglobal features hw,ti labeled positive for each t ∈ Tℓ and labeled negative for each t ∈ T \Tℓ i i 2. Llocal ← Train a local Qset relationship model on Trlocal (Section 7.1) 3. Gglobal ← Train a global equation model on Trglobal (Section 7.2) 4. Output local and global models (L , G ) local global Inference (word problem w, local set relation model L , global equation model G ): local global 1. S ←BaseQsetsobtainedbyGroundingtextwi andReordering the resulting Qsets (Section 5) 2. T ←TopM type-consistent equation tree candidates generated by ILP(w) (Section 6) 3. t∗ ← argmax Q L (t |w)×G (t|w), scoring each tree t ∈ T based on Equation 1 t ∈T local j global i i tj∈t 4. ℓ ← Numeric solution to w obtained by solving equation tree t∗ for the unknown 5. Output (t∗,ℓ) Figure 3: Overview of our method for solving algebraic word problems. betheset of all possible spans of text in w, φ denote equation trees (Section 6). Finally, ALGES uses dis- the empty span, and Sφ = S ∪{φ}. A Qset for w is criminativemodelstoscoreeachcandidateequation, either a base Qset or a compound Qset. A base Qset using both local and global features (Section 7). is a tuple (ent,qnt,adj,loc,vrb,syn,ctr) with: Specifically, the recursive nature of our represen- • ent ∈ S: entity or quantity noun (e.g., ‘student’); tation allows us to decompose the likelihood func- • qnt ∈ R∪{x}: number or quantity (e.g., 4 or x); tion p(t,w) into local scoring functions for each in- φ ternal node of t followed by scoring the root node: • adj ⊆ S : adjectives for ent in w; • loc ∈ Sφ: location of ent (e.g., ‘in the drawer’); • vrb ∈ Sφ: governing verb for ent (e.g., ‘fill’); Y p(t|w) ∝ L (tj|w) ×G (t|w) (1) • syn: syntactic and positional information for ent local global (e.g., ‘buses’ is in subject position) ; tj∈t φ • ctr ⊆ S : containers of ent (e.g., ‘Bus’ is a con- wherethelocalfunctionL (tj|w)scoresthelike- tainer for the ‘students’ Qset). local lihood of the subtree t , modeling pairwise Qset re- Properties being φ indicates these optional proper- j ties are unspecified. A compound Qset is formed by lationships, while the global function Gglobal(t|w) scores the likelihood of the root of t, modeling the combiningtwoQsetswithanon-equalitybinaryop- equation in its entirety. erator as discussed in section 5. Learning. ALGES learns in a weakly supervised Qsets can be further combined with the equality fashion, using word problems w and only their cor- operator to yield a semantically augmented equation i rect answer ℓ (not the corresponding equation tree) 3 i tree. The example in Figure 2 has four base Qsets as training data {(w ,ℓ )} . We ground extracted from problem text. Each possible equation i i i∈{1,...,N} each w into ordered Qsets and generate a list of tree correspondstoadifferentrecursivecombination i type-consistent candidate training equations T that ℓ of these four Qsets. i yield the correct answer ℓi. Given w, ALGES first extracts a list of n base We build a local discriminative model L to Qsets S = {s ,...,s } (Section 5). It then uses local 1 n score the likelihood that a math operator op ∈ O an ILP-based optimization method to combine ex- can correctly combine two Qsets s1 and s2 based on tracted Qsets into a list of type-consistent candidate their semantics and intertextual relationships. For 3Inspired by Semantically Augmented Parse Trees (Ge and example, in Figure 2 this model learns that ∗ has a Mooney,2005)adapted to equational logic. high likelihood score for ‘7 buses’ and ‘x students’.
no reviews yet
Please Login to review.