Converting hexadecimal


Martin McBride, 2018-05-15
Tags none
Categories none

Following on from the previous section on state machines, we will look at how we can check the syntax of a hex string such as "0xAA55", how to implement a state machine in code, and how to use it to convert a hex string to an integer.

Defining the syntax

The syntax of our hex number is as follows:

  • A 0 character, followed by
  • An x character, followed by
  • One or more pairs of hex characters (0-9 or A-F)

Examples of valid hex strings are 0xAB, 0x1234, 0x8BADF00D.

Our scheme doesn't allow an odd number of hex digits, so 0xC5C would be invalid as it has 3 digits. Note that most languages do allow an odd number of digits. We have added the requirements for an even number of digits to make the state machine a bit more interesting.

The state machine

Here is our state machine:

We have used names, rather than numbers, for the states, because we will also be creating code for this state machine. It is better to use meaningful names rather than "magic numbers" in code.

The transitions are labelled as follows:

  • 0 and x represent the individual characters
  • hex represents any single character is the set 0-9 or A-F
  • * represents any character that isn't matched by any other branch

Checking a valid string

Let's see how we could use the state machine to check a valid string 0x1234:

  1. We start in the initial state, start.
  2. The first symbol is a 0, which leads us to state waitx - in this state we are waiting for an x character.
  3. The next symbol is an x, which leads us to state digit1 - in this state we are waiting for the first hex digit.
  4. The next symbol is a 1, a valid hex digit, so we go to state odd - in this state we have an odd number of digits.
  5. The next symbol is a 2, a valid hex digit, so we go to state even - in this state we have an even number of digits. This is a valid end state, because the string so far (0x12) is a valid hex string.
  6. The next symbol 3 takes us back to the odd state. Then the symbol 4 takes us back to the even state, where the string ends.

The string is now finished, and the state machine is left is state even, which is the valid end state (marked in green).

Notice that if we add an extra character the state machine goes back to state odd (not a valid end state). If we add another character it goes to state even - this ensure that the string is only marked as valid if there are an even number of hex digits.

Invalid strings

Each state looks for specific characters, and will revert straight to the error state is an unexpected character is encountered.

So for example if we tried to decode 0123:

  1. We start in the initial state, start.
  2. The first symbol is a 0, which leads us to state waitx.
  3. The next symbol is a 1, but we expected an x. We go to state error.
  4. In error state, any input character leads back to error.

A state machine in code

To implement a state machine in code, first we define constants for our states. We could give our states number values, but we will use strings here because it will make it easier to follow when we print out the results:

START = 'start'
WAITX = 'waitx'
DIGIT1 = 'digit1'
ODD = 'odd'
EVEN = 'even'
ERROR = 'error'

Next we define a function next_state that accepts a current state and an input value, and returns a next state. Here is a start:

def next_state(state, value):
    if state == START:
        if value == '0':
            return WAITX
        else:
            return ERROR
    else:
        return ERROR

Inside the function we check the state. If it is START we check the value. If it is 0 we return a next state of WAITX. If the value is not 0, it is incorrect so we return a state of ERROR.

If the state is not START we return a next state of ERROR.

Of course, this function isn't complete. We need to add if clauses for the other states:

def next_state(state, value):
    if state == START:
        if value == '0':
            return WAITX
        else:
            return ERROR
    elif state == WAITX:
        if value == 'x':
            return DIGIT1
        else:
            return ERROR
    elif state == DIGIT1:
        if value in '0123456789ABCDEF':
            return ODD
        else:
            return ERROR
    elif state == ODD:
        if value in '0123456789ABCDEF':
            return EVEN
        else:
            return ERROR
    elif state == EVEN:
        if value in '0123456789ABCDEF':
            return ODD
        else:
            return ERROR
    else:
        return ERROR

The extra elif clauses deal with the other possible input states. For example, if the input state is WAITX we check for an x and return either DIGIT1 if it is, or ERROR if not.

For the DIGIT1 case, we need to check if value is any valid hex digit. We do this by checking if it is in the string 0123456789ABCDEF. Similar for ODD and EVEN.

One final point of note, we end with an if statement. This deals with the case of state being equal to ERROR or any other value that isn't a valid state. So if for some reason our code gets called with a nonsense string like XYZ we will just treat it as an error state (this is an example of defensive coding - make sure your code behaves sensibly for unexpected inputs).

The net thing we need to do is create a main loop that requests an input string, and feedd it into the state machine one character at a time, and prints the result:

s = input('Type in a hex value ')
state = START
for c in s:
    state = next_state(state, c)
    print(state)

if state == EVEN:
    print('VALID')
else:
    print('INVALID')

To help us understand what is going on, the loop prints out that state after each stage.

Here is the complete code:

START = 'start'
WAITX = 'waitx'
DIGIT1 = 'digit1'
ODD = 'odd'
EVEN = 'even'
ERROR = 'error'

def next_state(state, value):
    if state == START:
        if value == '0':
            return WAITX
        else:
            return ERROR
    elif state == WAITX:
        if value == 'x':
            return DIGIT1
        else:
            return ERROR
    elif state == DIGIT1:
        if value in '0123456789ABCDEF':
            return ODD
        else:
            return ERROR
    elif state == ODD:
        if value in '0123456789ABCDEF':
            return EVEN
        else:
            return ERROR
    elif state == EVEN:
        if value in '0123456789ABCDEF':
            return ODD
        else:
            return ERROR
    else:
        return ERROR

s = input('Type in a hex value ')
state = START
for c in s:
    state = next_state(state, c)
    print(state)

if state == EVEN:
    print('VALID')
else:
    print('INVALID')

Hex conversion

So far our code can only analyse a string and tell us if it is correct or not. But what if we wanted to use the state machine to convert a hex string into an integer value?

First lets look at the conversion. Imagine the string is 0x1234. We will be given the digits in order, from left to right (that is, the most significant number first). So assuming with have a variable total that starts off at 0.

  • The first digit is a 1, so our total is 0x1
  • The next digit is a 2, so our total is 0x12
  • The next digit is a 3, so our total is 0x123
  • The next digit is a 4, so our total is 0x1234

In other words, each time a new digit comes in, we shift the total value to the left by one digit and add the new number. Since hex is base 16, shifting the value left by one digit is the same as multiplying it by 16.

Here is a simple function that accumulates a digit into an existing total:

total = 0

def accumulate(c):
    global total
    v = '0123456789ABCDEF'.index(c)
    total = total*16 + v

We use the global keyword because total is a global variable.

To convert the character 0 to F into a value 0 to 15, we look up its index in the string 0123456789ABCDEF. Character 0 appears at position 0 so index returns 0. Character F is at position 15 so index returns 15.

Now we just need to call the accumulate function each time we pass through states DIGIT1, ODD or EVEN. Here is the modified full code:

START = 'start'
WAITX = 'waitx'
DIGIT1 = 'digit1'
ODD = 'odd'
EVEN = 'even'
ERROR = 'error'

total = 0

def accumulate(c):
    global total
    v = '0123456789ABCDEF'.index(c)
    total = total*16 + v

def next_state(state, value):
    if state == START:
        if value == '0':
            return WAITX
        else:
            return ERROR
    elif state == WAITX:
        if value == 'x':
            return DIGIT1
        else:
            return ERROR
    elif state == DIGIT1:
        if value in '0123456789ABCDEF':
            accumulate(value)
            return ODD
        else:
            return ERROR
    elif state == ODD:
        if value in '0123456789ABCDEF':
            accumulate(value)
            return EVEN
        else:
            return ERROR
    elif state == EVEN:
        if value in '0123456789ABCDEF':
            accumulate(value)
            return ODD
        else:
            return ERROR
    else:
        return ERROR

s = input('Type in a hex value ')
total = 0
state = START
for c in s:
    state = next_state(state, c)
    print(state)

if state == EVEN:
    print('VALID')
    print(total)
else:
    print('INVALID')

In the next section we will look at a non-regular language, and see why it is impossible to implement it as a state machine.

Copyright (c) Axlesoft Ltd 2021