RonkLogan

Calculate an Expression

Scene from Goodwill Hunting (1997)
Scene from Goodwill Hunting (1997).

If you’re like me, you find interview whiteboard questions a lot of fun! What? You… you don’t agree? Okay, well… I’m one of those weirdos who enjoys a good interview whiteboard question – even if I don’t get it right or [heaven forbid] not finish it in time. If it’s a particularly fun question, I’ll even work on it more for funsies after the interview is over. Parsing a string representing a mathematical expression and returning the results is one of those problems! For me, anyway.

If you’re lucky, your interviewer will trim down the scope considerably, at least at first. For instance, maybe they say you won’t have to deal with whitespace, the literal numbers will only be positive integers, or more importantly, maybe you only have to deal with a limited set of operators, like addition and multiplication. But be careful: that specific choice of operators was likely carefully chosen to surface some of the complications we’ll run into later.

I’ll go over a more fun, more complete solution in this blog post – certain parts may not be needed for an actual whiteboard interview question. But the main gist of the solution is that your code will need two parts: one to “scan” tokens out of the source text, and the other to “parse” the tokens into the expression results. For the parsing section, you will need two stacks: one for the values, and one for the operators.

The complication here is that mathematical expressions have what’s called “operator precedence.” These rules are what make all those dumb Facebook posts of “What’s the correct value?” get so many folks arguing over the right answer. You scan through the expression, and you do all the multiplication and division before you do the addition and subtraction. And before the multiply/divide steps, you compute the exponents. And before the exponents you do any unary negation operators. The expression “4+2*10” should return 24, not 60.

And to complicate things even further, most operators (like the binary operators of addition, subtraction, multiplication, and division) evaluate from left to right. But some operators, like the unary negate operator, evaluate right to left – the direction is called “operator associativity.” If the problem is limited to just a couple operators (like, say, addition and multiplication), then you don’t have to worry about it, because they’re both left to right. But as soon as you want to start making negative numbers available (and therefore the negation unary operator), associativity also needs to come into play.

I’m going to use JavaScript for my implementation, but if you are going to implement this in a type-safe language, such as C, then you should probably also worry about overflow errors, especially if the scope has been limited to just integers: maybe the input literal will be too big to fit into the integer type you chose, or maybe the results of a multiplication or addition operator overflows the type. Either way, you should have some sort of overflow check in your code – or at least mention it to see if the interviewer dismisses those concerns for the sake of brevity.

Okay then, let’s get to it!

Setup

So first off, I don’t want to pollute the global namespace, so I’m going to define a single global function named “evaluateExpression,” which takes the string and returns the numeric results of the expression evaluation (or throws an error). All of my helper functions, “enums,” data structures, etc, will be nested within it. That’s just my style choice, and your mileage my differ.

The first part sets up some state variables and the all-important pair of stacks for the values and the operators:

function evaluateExpression(text) {
  let index = 0;
  let unaryMode = true;

  const values = [];
  const operators = [];

That unaryMode variable is only needed if I intend to parse the negate operator (and its near-useless cousin, the unary-plus). The purpose here is that there are some places where a minus sign would be parsed as a unary operator, and some places where it would be a binary operator. This variable will be set to true when it should be interpreted as unary.

The next section sets up some reference data:

  const TokenType = {
    endOfString: 'EOS',
    error: 'ERR',
    operator: 'BINOP',
    constant: 'CONST',
    openParen: 'OPENP',
    closeParen: 'CLOSEP',
  };

  const Associativity = {
    leftToRight: (l, r) => l.prec >= r.prec,
    rightToLeft: (l, r) => l.prec > r.prec,
  }

Because JavaScript doesn’t have native enums, I’m just going to fake a TokenType enum by using an object literal. I’ll be able to use syntax like TokenType.constant in my code with this. The associativity literal is really only needed if the set of operators you are implementing has a mix of left-to-right and right-to-left, but it defines the two directions as functions that I can leverage later just like my fake enum. These functions will be passed two tokens (left and right) and will return a Boolean value indicating whether the left operator should be evaluated before the right operator. In both cases, if the left operator’s precedence is higher, it should be evaluated first. But in left-to-right, the left operator should also be evaluated first if the two precedence values are equal. In right-to-left, equal precedence should go to the right-side operator.

Next, I just define a quick helper function for popping a value off the values stack. In JavaScript, if you call pop on an empty array, it won’t error – it simply returns undefined. However, I want to throw an error if I try to pop from the values stack when it’s empty, so I just have this quick little helper function for doing just that:

  function popValue() {
    if (!values.length) { throw 'Invalid Expression'; }
    return values.pop();
  }

Next comes a complicated piece of reference data that defines all our supported operators. I don’t want to implement all the operator logic in my code as giant case-statements sprinkled throughout. Instead, I want to be able to look up my operator data using just the operator character itself. The one complication here is that there are two operators (+ and -) that can be either binary (addition/subtraction) or unary (same-sign/negate). Therefore, the key value to this object collection will be the operator character if its behavior is always the same, or the operator character prefixed with a “b” for its binary behavior or a “u” for its unary behavior.

The values stored in the map for each operator is itself an object literal with three properties:

  • prec is the precedence of the operator. I’ve picked an arbitrary spread of numeric values, but the main point here is that lower-precedence operators have lower values, and higher-precedence operators have higher values. You may choose whatever actual numeric values you wish, as long as they are in the proper higher/lower relation relative to each other.
  • assoc is the associativity of the operator. We will use the Associativity “enumeration” we defined above to declare it as either Associativity.leftToRight or Associativity.rightToLeft. This property is a function that will be called later.
  • eval is the evaluation method. It will be called whenever we want to evaluate the operator. Inside, it will pop off the required operand(s) from the values stack (using that popValue helper function), perform the desired action on it/them, then push the result back onto the values stack. For example, the eval method for the addition operator will pop off the right-hand operand, then the left-hand operand, add them together, and push the results. If you are using a type-safe language, or if the scope has been set to just integer values, these are some great methods to which to add overflow error handling.

I have chosen to implement the binary operators + (addition), - (subtraction), * (multiplication), / (division), % (remainder), and ^ (exponent). In addition, I’ve thrown in the unary +/- operators so I could have negative numbers and expressions:

  const opDefs = {
    'b+': {
      prec: 10,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l + r); },
    },
    'b-': {
      prec: 10,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l - r); },
    },
    '*': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l * r); },
    },
    '/': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l / r); },
    },
    '%': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l % r); },
    },
    '^': {
      prec: 30,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(Math.pow(l, r)); },
    },
    'u+': {
      prec: 40,
      assoc: Associativity.rightToLeft,
      eval: () => { },
    },
    'u-': {
      prec: 40,
      assoc: Associativity.rightToLeft,
      eval: () => { values.push(-popValue()); },
    },
  };

You might notice the eval method for the unary-plus operator (“u+” key) doesn’t do anything. If we wanted to really implement it, the operator would pop the value off the values stack, then just push it back. Sort of a no-op operator, really. I’m not intending it to be an absolute-value operator; more like the JavaScript unary-plus operator.

Scanning

Okay, now we’re ready to look at how we pull the next token off the string. We define a function, nextToken, which starts at the current position in the string and returns a token for whatever it represents (and advancing the current position).

The first step – skipping whitespace – isn’t necessary if the scope assumes there are no whitespace characters. We’ll look at that helper function later:

  function nextToken() {
    skipWhitespace();

Then we check to see if we are at or beyond the length of the string. If we are, we return an end-of-string token (object literal with a type property of TokenType.endOfString). All tokens parsed from the string are objects with at least the type property set to a value in the TokenType enumeration:

    if (index >= text.length) {
      return { type: TokenType.endOfString };
    }

If we’re not at the end of the string, we pull the next character using the index value set up at the top of the evaluateExpression function:

    const ch = text[index];

Finally, we have a big case-statement. The first set of cases are for the operator characters. I suppose I could’ve had a loop before the case that looked at all the keys in the opDefs object to match the implemented operator characters, but for brevity and clarity, I’ve just duplicated them all within this switch:

    switch (ch) {
      case '+':
      case '-':
      case '*':
      case "/":
      case '%':
      case '^':
        ++index;
        const defOp = opDefs[ch] ?? opDefs[unaryMode ? `u${ch}` : `b${ch}`];
        if (!defOp) { throw 'Unexpected operator'; }
        unaryMode = true;
        return { type: TokenType.operator, operator: ch, ...defOp };

Because we’ve found a valid character, we first increment index past this character – don’t forget to do this or you’ll get into an infinite loop! Then we attempt to get the operator from the opDefs reference; if we can’t get it with just the operator character, we try again prefixing the operator with either a “b” or a “u” given the current unaryMode. We’ll throw an error if for some reason the operator definition can’t be found. After it gets the operator definition, we set the unaryMode to true for next time, because unary operators can appear after any operator, but a binary operator can’t. Finally, it returns the token for this operator: an object literal with the type property set to TokenType.operator, the operator character we found, and then we use the object spread operator (...) to merge the three properties from the operator definition object (prec, assoc, and eval) into the token object.

The next group of cases are for numeric literals. Once we find any of the ten digits or the decimal point, we set the unaryMode to false for next time (can’t have a unary operator following a numeric literal) and then we return the results of a function that scans out the full number literal token with its numeric value (we’ll see that function later). Don’t bother checking for the decimal point if the scope is such that you’re only dealing with integer values.

      case '.':
      case '0':
      case '1':
      case '2':
      case '3':
      case '4':
      case '5':
      case '6':
      case '7':
      case '8':
      case '9':
        unaryMode = false;
        return scanNumber();

The next two case statements deal with parentheses. The open parenthesis will advance the index [just say NO to infinite loops!], set the unaryMode flag to true for next time, then return a token for the open parenthesis. We will be stuffing this token onto the operators stack, so in addition to the type property, it will also include a precedence of 0 (or whatever constitutes the lowest value below all the other operators), and an evaluation function that throws an error. If we try to evaluate an open parenthesis token, that means we don’t have a corresponding close parenthesis!

      case '(':
        ++index;
        unaryMode = true;
        return {
          type: TokenType.openParen,
          prec: 0,
          eval: () => { throw 'Mismatched open parenthesis'; },
        };

The close parenthesis case advances past the scanned parenthesis, sets the unaryMode to false, and returns a token for the close. No other properties will be required for this one.

      case ')':
        ++index;
        unaryMode = false;
        return { type: TokenType.closeParen };

The last case statement is just the catch-all error condition: any other character than what we already are looking for, and we will return an error token with some clarifying information:

      default:
        return { type: TokenType.error, position: index + 1, message: 'Unexpected character' };
    }
  }

Now let’s look at these helper functions we’ve been calling. First is the function to skip whitespace, which itself uses a helper function to determine if a character is a whitespace. They’re pretty straight-forward:

  function skipWhitespace() {
    while (index < text.length && isWhitespace(text[index])) {
      ++index;
    }
  }

  function isWhitespace(ch) {
    return ch === ' ' || ch === '\t' || ch === '\r' || ch === '\n';
  }

Next we have our scanNumber function.

  function scanNumber() {
    let value = 0;
    let ch = text[index];
    while (index < text.length && '0' <= ch && ch <= '9') {
      value = value * 10 + (ch - '0');
      ch = text[++index];
    }

    if (ch === '.') {
      ch = text[++index];
      let decimal = 10;
      while (index < text.length && '0' <= ch && ch <= '9') {
        value += (ch - '0') / decimal;
        decimal *= 10;
        ch = text[++index];
      }
    }

    return { type: TokenType.constant, value }
  }

The first loop builds up the integer portion of the numeric literal. As long as we have a numeric digit, we are going to multiply the current value (which is initialized to zero) by 10, then add the new value (which is the character codepoint value less the character codepoint value of ‘0’). Be sure to put those parentheses around the (ch – ‘0’) part or you will get the wrong answer! Without the parentheses, JavaScript would interpret the plus-operator as a string concatenation because the type of ch will be string!

The if-statement is only necessary if you want to support floating-point numbers. If the next character is a period, we skip past it, set a decimal denominator to 10, and then start looping as long as we have numeric characters. For each number character we find, divide it by the decimal numerator and add it to the value; multiply that denominator by ten for each loop, and advance past the character we just processed.

The return value is a token object literal of type TokenType.constant, and a value property for the value we just scanned. If you are using a type-safe language, or if you are scoped to just integer literals, this function is a great place to have some overflow error handling when trying to convert the string literal into an actual numeric literal.

Parsing

The next helper function will be used by the parsing loop:

  function flushOperators(toParen) {
    while (operators.length) {
      const op = operators.pop();
      if (toParen && op.type === TokenType.openParen) { return; }
      op?.eval();
    }

    if (toParen) {
      throw 'Mismatched close parenthesis';
    }
  } 

This function will flush out the operators stack, evaluating each one. There are two situations in which this function will be called (as indicated by the argument):

  1. At the end of the parse loop, after we scanned the last token from the string and we want to calculate our final answer, and
  2. Whenever we scan a closing parenthesis character.

In the former case, we want to just pop off the top operator, call its eval method, and keep going until the stack is empty. This is where the open-parenthesis eval method comes into play: any mismatched open parentheses will be left on the stack and throw an error when we try to evaluate it.

In the latter case, we only want to evaluate operators until we find an open parenthesis on the stack (and pop it off without evaluating). A modicum of error handling in this case throws an error if we get to the end of the stack without finding any open parenthesis token; that would indicate we have a mismatched close parenthesis character in the expression.

Okay, now we get to the actual parsing loop of the function. We’re just going to loop forever (well, until we explicitly stop the loop or throw an error), grabbing the next token from the string at the start of each iteration:

  while (true) {
    const token = nextToken();

Then we have a series of if-statements that deal with the different token types we might get back. First, for an end-of-string token, we’re just going to break out of the loop:

    if (token.type === TokenType.endOfString) { break; }

If it’s an error token, we’re going to throw an error:

    if (token.type === TokenType.error) { 
      throw `${token.message ?? ''} at position ${token.position}`; 
    }

Those two if-statements terminate the loop if true, so the next batch are all one big if-else-statement. If the token is a constant, we’re just going to push its value onto the values stack:

    if (token.type === TokenType.constant) {
      values.push(token.value);
    }

If it’s an open-parenthesis, we’re going to push it onto the operators stack:

    else if (token.type === TokenType.openParen) {
      operators.push(token);
    }

If it’s a close-parenthesis, we’re going to flush the operator stack until we get to that corresponding open parenthesis (passing true for the toParen parameter):

    else if (token.type === TokenType.closeParen) {
      flushOperators(true);
    }

Now, the only token type we should have left is TokenType.operator. We have two branch possibilities here that deal with the relative precedence and associativity of operators.

    else if (token.type === TokenType.operator) {
      if (!operators.length || !token.assoc(operators[operators.length - 1], token)) {
        operators.push(token);
      }
      else {
        while (operators.length && token.assoc(operators[operators.length - 1], token)) {
          const op = operators.pop();
          op?.eval();
        }
        operators.push(token);
      }
    }

If the operators stack is empty, or if the current token’s associativity function returns false when passed the top of the stack for the left operator and the current token as the right operator, we will just push the current token onto the operators stack.

But if that associativity function returned true, then we need to pop operators off the operators stack and evaluate them until either the stack is empty or the current token’s associativity function returns false given the new top of the stack. After that loop is finished, push the current token onto the operators stack.

And of course, if I really messed up and there are more token types than I thought, there’s a catch-all error throw:

    else {
      throw `Unexpected token: ${JSON.stringify(token)}`;
    }
  }

After we’ve looped through all the tokens and successfully reached the end of the string, we need to flush out the operators stack in order to calculate and return our result.

  flushOperators(false);
  if (operators.length || values.length != 1) {
    throw 'Malformed expression';
  }

  return values.pop();
}

If the expression was properly formed and all goes well with the flush, the operators stack should now be empty and there should be a single value left in the values stack. If that’s not the case, throw an error. Otherwise, we just pop that final value off the values stack and return it. Pau!

Try It Out

Try it out here by entering a mathematical expression in the box and clicking the “Calculate” button. The expression can contain:

  • Numeric decimal literals
  • Negation and “same sign” unary operators (- or +)
  • Addition, subtraction, multiplication, division, remainder, and exponent binary operators (+, -, *, /, % or ^)
  • Matched grouping parentheses, ( and ).

Final Code

For reference, the full final code is listed below:

function evaluateExpression(text) {
  let index = 0;
  let unaryMode = true;

  const values = [];
  const operators = [];

  const TokenType = {
    endOfString: 'EOS',
    error: 'ERR',
    operator: 'BINOP',
    constant: 'CONST',
    openParen: 'OPENP',
    closeParen: 'CLOSEP',
  };

  const Associativity = {
    leftToRight: (l, r) => l.prec >= r.prec,
    rightToLeft: (l, r) => l.prec > r.prec,
  }

  function popValue() {
    if (!values.length) { throw 'Invalid Expression'; }
    return values.pop();
  }

  const opDefs = {
    'b+': {
      prec: 10,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l + r); },
    },
    'b-': {
      prec: 10,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l - r); },
    },
    '*': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l * r); },
    },
    '/': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l / r); },
    },
    '%': {
      prec: 20,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(l % r); },
    },
    '^': {
      prec: 30,
      assoc: Associativity.leftToRight,
      eval: () => { const r = popValue(), l = popValue(); values.push(Math.pow(l, r)); },
    },
    'u+': {
      prec: 40,
      assoc: Associativity.rightToLeft,
      eval: () => { },
    },
    'u-': {
      prec: 40,
      assoc: Associativity.rightToLeft,
      eval: () => { values.push(-popValue()); },
    },
  };

  function nextToken() {
    skipWhitespace();
    if (index >= text.length) {
      return { type: TokenType.endOfString };
    }

    const ch = text[index];
    switch (ch) {
      case '+':
      case '-':
      case '*':
      case "/":
      case '%':
      case '^':
        ++index;
        const defOp = opDefs[ch] ?? opDefs[unaryMode ? `u${ch}` : `b${ch}`];
        if (!defOp) { throw 'Unexpected operator'; }
        unaryMode = true;
        return { type: TokenType.operator, operator: ch, ...defOp };

      case '.':
      case '0':
      case '1':
      case '2':
      case '3':
      case '4':
      case '5':
      case '6':
      case '7':
      case '8':
      case '9':
        unaryMode = false;
        return scanNumber();

      case '(':
        ++index;
        unaryMode = true;
        return {
          type: TokenType.openParen,
          prec: 0,
          eval: () => { throw 'Mismatched open parenthesis'; },
        };
      case ')':
        ++index;
        unaryMode = false;
        return { type: TokenType.closeParen };

      default:
        return { type: TokenType.error, position: index + 1, message: 'Unexpected character' };
    }
  }

  function skipWhitespace() {
    while (index < text.length && isWhitespace(text[index])) {
      ++index;
    }
  }

  function isWhitespace(ch) {
    return ch === ' ' || ch === '\t' || ch === '\r' || ch === '\n';
  }


  function scanNumber() {
    let value = 0;
    let ch = text[index];
    while (index < text.length && '0' <= ch && ch <= '9') {
      value = value * 10 + (ch - '0');
      ch = text[++index];
    }

    if (ch === '.') {
      ch = text[++index];
      let decimal = 10;
      while (index < text.length && '0' <= ch && ch <= '9') {
        value += (ch - '0') / decimal;
        decimal *= 10;
        ch = text[++index];
      }
    }

    return { type: TokenType.constant, value }
  }

  function flushOperators(toParen) {
    while (operators.length) {
      const op = operators.pop();
      if (toParen && op.type === TokenType.openParen) { return; }
      op?.eval();
    }

    if (toParen) {
      throw 'Mismatched close parenthesis';
    }
  }

  while (true) {
    const token = nextToken();
    if (token.type === TokenType.endOfString) { break; }
    if (token.type === TokenType.error) {
      throw `${token.message ?? ''} at position ${token.position}`;
    }

    if (token.type === TokenType.constant) {
      values.push(token.value);
    }
    else if (token.type === TokenType.openParen) {
      operators.push(token);
    }
    else if (token.type === TokenType.closeParen) {
      flushOperators(true);
    }
    else if (token.type === TokenType.operator) {
      if (!operators.length || !token.assoc(operators[operators.length - 1], token)) {
        operators.push(token);
      }
      else {
        while (operators.length && token.assoc(operators[operators.length - 1], token)) {
          const op = operators.pop();
          op?.eval();
        }
        operators.push(token);
      }
    }
    else {
      throw `Unexpected token: ${JSON.stringify(token)}`;
    }
  }

  flushOperators(false);
  if (operators.length || values.length != 1) {
    throw 'Malformed expression';
  }

  return values.pop();
}