Test Automation: Taking Command of the Command Line

By: Steve Smith

Tags:

  • qa
  • testing
  • automation

Examples from this blog post currently run on a front-end automation suite running Ruby 2.2.0. Jenkins is used as a CI server to run tests from.

Background

The automation test suite here at Venntro has been growing steadily over the past few years, and is now able to test most of the user-facing features on the platform. However, there are some features which have always been difficult for us to automate tests for.

Search and member feeds (which members we recommend to the user on different parts of the site), have been notoriously difficult because once you make changes to the data, they need to be reindexed by the Sphinx search server before changes are actually effected on the front-end. This runs every few minutes on the test server, but as any test automation specialist will tell you, a few seconds is too long for a test to be waiting, let alone a few minutes!

We needed to be able to force this index to be rebuilt, which required us to be able to access the “scary world” of the command line on the test environment host server.

Evolution

First Iteration: Basic Commands

It turns out Ruby has many ways to access the command line built-in, and the most simple way to do this was to just encase the command we wanted to run in backticks:

def reindex_all_sphinx_indices do
  `ssh subdomain.hostname.com 'sudo /usr/bin/indexer --rotate --all'`
end

One of the documented disadvantages of using this method is that it blocks processing until the command is complete. Thankfully, for a test suite, this actually becomes an advantage; we don’t actually want the test to continue running until the indexing is complete.

Second Iteration: Combatting Stalls

It didn’t take long using this method before tests started to get stuck at this point. Unfortunately, there are many things that can happen when using the command line to cause your test to stall. For example, our script uses a sudo command, and if the user running the test does not have sudo access for the action specified, the command line will prompt for a password. As this task is being run in the background by Ruby, you will never see this prompt.

To combat stalling we reused a tactic from our front-end tests, wrapping the call in a simple timeout:

def reindex_all_sphinx_indices do
  Timeout::timeout(30) do
    `ssh subdomain.hostname.com 'sudo /usr/bin/indexer --rotate --all'`
  end
end

So, if the call takes more than 30 seconds, the test will abort as a failure, and the cause for it taking so long can be investigated.

Third Iteration: Output

Debugging was the next issue to address. It was great that we now had the ability to run command line operations in our test suite, and a way to deal with the most crippling issue, stalling, but there were other issues that needed to be debugged. If a test failed for a missed expectation after appearing to have executed the command without fail, we had no way to see what had actually happened during the call.

The command to rotate all the Sphinx indices is an interesting one to inspect. It runs through each Sphinx index, performs the rotation, and moves on to the next regardless of whether it successfully performed the rotation or not. Even if a number of the rotations fail, and these failures are shown in the output to the user, once the command is completed, Ruby does not understand these fails. As far as it can tell, the command was completed successfully.

It turns out getting hold of this output is as easy as assigning it to a variable, so that’s what I did, and immediately output it using the puts method.

def reindex_all_sphinx_indices do
  Timeout::timeout(30) do
    output = `ssh subdomain.hostname.com 'sudo /usr/bin/indexer --rotate --all'`
    puts output
  end
end

Not long after implementing this functionality, our test environment was rebuilt on a new server. By seeing this output in our test automation console output, I could alert our Operations team to the fact that the Sphinx user being used did not have access to a specific database, which was causing the index rotation to quietly fail unbeknownst to anyone not closely monitoring the logs.

This worked for some specific scenarios, but one of the review comments I got from my colleague when I proposed it was:

Matt points out this will not work if the call times out

I tested this by reducing the timeout down to a few seconds, and sure enough found that when using this backticks methods, the output will only be written to the variable output on completion of the call. Even if the timeout does its job, and puts output is moved to a rescue block, the variable output will be empty.

It was time to dig deeper into Ruby’s array of command line tools.

Fourth Iteration: IO.pipes

After a bit of research into the challenge, I stumbled across an awesome StackOverflow answer, which gave a fantastic basis for what would eventually be the method we use to execute command line operations:

def exec_with_timeout(cmd, timeout = 30)
  rout, wout = IO.pipe
  rerr, werr = IO.pipe
  stdout, stderr = nil

  begin
    puts "Executing: #{cmd}"

    pid = Process.spawn(cmd, pgroup: true, :out => wout, :err => werr)

    Timeout.timeout(timeout) do
      Process.waitpid(pid)

      wout.close
      werr.close

      stdout = rout.readlines.join
      stderr = rerr.readlines.join
    end
  rescue Timeout::Error
    Process.kill('TERM', pid)
    Process.detach(pid)

    wout.close
    werr.close

    timedout = true
    stdout = rout.readlines.join
    stderr = rerr.readlines.join
  ensure
    wout.close unless wout.closed?
    werr.close unless werr.closed?

    rout.close
    rerr.close

    puts "OUTPUT: #{stdout}" if stdout.present?
    puts "ERRORS: #{stderr}" if stderr.present?
    raise "Execution timed out: #{cmd}" if timedout
  end
end

There’s quite a lot going on, so I’ll explain piece by piece.

The Setup and Begin block

rout, wout = IO.pipe
rerr, werr = IO.pipe
stdout, stderr = nil

begin
  puts "Executing: #{cmd}"

  pid = Process.spawn(cmd, pgroup: true, :out => wout, :err => werr)

  Timeout.timeout(timeout) do
    Process.waitpid(pid)

    wout.close
    werr.close

    stdout = rout.readlines.join
    stderr = rerr.readlines.join
  end

We start by creating two IO pipes. One will record STDOUT (standard output stream from the command), and one will record STDERR (standard error stream). We then set variables stdout and stderr to nil.

After printing out the command being executed, we spawn a new Ruby Process to run the command given. We make a new process group to run it using pgroup: true, and assign the write ends of our two new IO.pipes to record the output and error streams. We store the process number as variable pid.

Next, we start our timeout block and use Process.waitpid(pid) to wait for the process we spawned to stop. If the time taken to do this does not exceed the number of seconds specified (30 by default), we close the two write ends of the IO.pipes (we cannot read from the read ends if not), and then read the read ends of each pipe, assigning them to the specified variables to print out later.

The Rescue block

rescue Timeout::Error
  Process.kill('TERM', pid)
  Process.detach(pid)

  wout.close
  werr.close

  timedout = true
  stdout = rout.readlines.join
  stderr = rerr.readlines.join

If the Begin block does time out, the rescue block will clean up. Firstly, it will terminate and detach the process we started so it does not either become a zombie process or affect any other tests unexpectedly. Then a similar process of closing and reading the IO pipes is performed. We also set a local variable, timedout to true for use in error reporting later.

The Ensure block

ensure
  wout.close unless wout.closed?
  werr.close unless werr.closed?

  rout.close
  rerr.close

  puts "OUTPUT: #{stdout}" if stdout.present?
  puts "ERRORS: #{stderr}" if stderr.present?
  raise "Execution timed out: #{cmd}" if timedout
end

The ensure block is run no matter the outcome of the Begin or Rescue blocks, and simply closes any ends of IO pipes that may still be open, and then prints out all the output and all the errors to the console if they exist.

Finally, a raise is performed if the rescue block was run. This will immediately fail the test being run, with a very clear message; this particular command being run took an unexpectedly long time to complete. This, along with all the output and errors that were written right up to the time of failure or timeout should give all the information needed to analyse the output.

Hints & Tips

Access

When running commands on a remote host, whether you have access to run those commands is going to be an issue. You should investigate ssh access, tty, known hosts, and use of the sudo command; you will probably need to have a discussion with your Ops team to discuss which commands need running, by which users, and how to do that in a secure way. I found a useful way to ensure if your CI user has access to run a command was to ssh to the CI server as the user which runs automation (in our case, Jenkins uses the default user ‘jenkins’) and trying to run the command.

Test candidates

Any test which requires waiting for a cron job to do something is a prime candidate for accessing the command line and kicking off the process yourself, as cron jobs do not by default run at intervals smaller than 1 minute. I’ve also found that if you need to hit a url in order to trigger an action, but don’t want your test to navigate away from the page you’re currently on, using this method to run a curl command to hit the url works perfectly.

Tagging scenarios

We use Cucumber to organise and execute our test scenarios, which supports tagging. We tag scenarios depending on which command line operations are run during the test, for example, @ssh, so they can be run as a batch if required, or excluded from test runs if we are experiencing issues with ssh.

Conditionally printed output

If you don’t want all the output printed on every test run, with a bit of tweaking you could:

  • Only print the output if the call fails or the timeout is exceeded
  • Add an additional argument to the exec_with_timeout method to control whether output is printed or not
  • Create an environment variable to control whether output is printed or not

About the Author

Steve Smith