How CMIPS binary and source code works for CPU and RAM benchmarks

I’ve been requested to explain how CMIPS works. Here I explain the basic mechanics of the source code for the CPU and RAM speed benchmarks.

Cmips uses a lot of my knowledge on computers, architecture, virtualization and assembler to prevent the hypervisors from devising the results, and providing fake data.

So at the end the program is a very precise one, concentrating into doing its jobs the best way possible.

It uses a very small binary file and really few amount of RAM to prevent the Host hypervisor from improving or worse the pure results (some providers allow the tenants to use more total RAM than the host server actually have, as many times only a part of the RAM assigned to the instances is really used, and uses swap the same way a computer does if RAM is really used).

Basically it calculates the CPU speed, by doing simple calculations involving the hardware registers and the read and write access to memory speed.

For the writings to the memory only one byte is written, and different, to minimize the hardware and software caches optimizations.

The operations are the simplest, the most close to assembler basic functions.

Operations are:

Increase counter
Compare if greater
Assign var to 0
Read a byte from a position of memory (read a char)
Write a byte to a char variable

So there are no callings to the Operating System that can be tweaked by the Hypervisor / guest tools or containers.

Finally cmips launches 100 threads (void *t_calculations(void *param)) at the same time to stress all the cores available, and provide a real benchmark on the independent CPU power of the public instance (some host servers isolate or share resources more than others, so cmips claims all the resources to get the real picture of performance provided).

When we benchmark an instance, we block the firewall to prevent incoming petitions from wasting resources and we launch cmips several times, one time after the other, on the same instance to be sure that the results are consistent and reliable.

Netbeans is used as IDE for the cmips source code. (For my Linux C++ GUI apps I use Qt Creator)
That’s the basic code in C++

Using those libraries:

#include <cstdlib>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <fstream>
#include <sstream>
#include <cstring>
#include <sys/time.h>
#include <ctime>

using namespace std;

So we link the program with the standard Posix thread library:

-o ${CND_DISTDIR}/${CND_CONF}/${CND_PLATFORM}/cmips -lpthread

Some global variables:

typedef unsigned long long timestamp_t;

char s_cmips[50] = "CMIPS V.1.0.3 by Carles Mateo www.carlesmateo.com";
char s_tmp_copy[1];

int i_max_threads = 100;
int i_finished_threads = 0;

int i_loop1 = 0;
int i_loop_max = 32000;
int i_loop2 = 0;
int i_loop2_max = 32000;
int i_loop3 = 0;
int i_loop3_max = 10;

The core is this thread function:

void *t_calculations(void *param)
{

    // current date/time based on current system
    time_t now = time(0);
    int i_counter = 0;
    int i_counter_char = 0;

    // convert now to string form
    char* dt_now = ctime(&now);

    printf("Starting thread ");
    cout << dt_now << "\n";
    for (i_loop1 = 0; i_loop1<i_loop_max; i_loop1++)
    {
        for (i_loop2 = 0; i_loop2<i_loop2_max; i_loop2++) 
        {
            for (i_loop3 = 0; i_loop3<i_loop3_max; i_loop3++) {
                // Increment test
                i_counter++;
                // If test and assignement
                if (i_counter > 32000) {
                    i_counter = 0;
                }
                // Char test
                s_tmp_copy[0] = s_cmips[i_counter_char];

                i_counter_char++;
                if (i_counter_char > 49) {
                    i_counter_char = 0;
                }

            }
        }   
    }

    time_t now_end = time(0);

    // convert now to string form
    char* dt_now_end = ctime(&now_end);

    printf("End thread at ");
    cout << dt_now_end << "\n";

    i_finished_threads++;

    return NULL;
}

The timestamps is calculated:

static timestamp_t get_timestamp ()
{
  struct timeval now;
  gettimeofday (&now, NULL);
  return  now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}

After all the threads finish main calculates:

    // Process
    timestamp_t t1 = get_timestamp();

    double secs = (t1 - t0) / 1000000.0L;

    int cmips = (1 / secs) * 1000000;

CMIPS

Cloud Million Instructions Per Second – Measuring Cloud Performance

How CMIPS binary and source code works for CPU and RAM benchmarks

Leave a Reply Cancel reply