Drupal and the Holy Grail of Workflows

I’ve recently given myself the task of creating a mature workflow for a Drupal website. For the past few months, I have been working with a startup and creating their website from the ground up. We are using a traditional LAMP stack with Drupal running on top. Sure, the LAMP stack isn’t the “in thing” anymore, but it’s working very well for our purpose right now. That is, get up and running quickly and focus on developing the product.

At first, development was easy. Throw Drupal on an EC2 instance and jump in to writing code. But, of course, sooner than later real data starts being entered into the database. The website starts being used for demo purposes and, well, if a poorly written function blows away a database table, that’s a serious problem. So, we need a more mature workflow. Something that will allow for those poorly written functions to light the whole database on fire and not have it matter because it’s just a copy of the real database or if the unthinkable does happen, it’s easily recoverable. Unfortunately, the nature of Drupal doesn’t make this as easy as it should be.

Writing a Self-Mutating x86_64 C Program

“Why would you ever want to write a program that changes its code while it’s running? That’s a horrible idea!”

Yes, yes it is. So why do it? Because it’s a good learning experience, but most importantly, because I can.

Self-mutating programs aren’t useful for a whole lot. It makes for very difficult debugging, the program becomes hardware dependent, and the code is extremely tedious and confusing to read unless you are an expert assembly programmer. The only good use for self-mutating programs in the wild I know of is as a cloaking mechanism for malware. My goal is purely academic so I venture into nothing of the sort here.

Warning: This post is heavy on x86_64 assembly of which I am no expert. A fair amount of research went into writing this and it’s possible (almost expected) that mistakes were made. If you find one, please leave a comment or send an email so that it can be corrected.

The first step of writing a self-mutating program is being able to change the code at runtime. Programmers figured out long ago that this was a bad idea and since then protections have been added to prevent a program’s code from being changed at runtime. We first need to understand where the program’s instructions live when the program is being executed. When a program is to be executed, the loader will load the entire program into memory. The program then executes inside of a virtual address space that is managed by the kernel. This address space is broken up into different segments as illustrated below.

In this case, we’re only concerned with the text segment. This is where the instructions of the process are stored. Behind the address space are pages which are handled by the kernel. These pages map to the physical memory of the computer. The kernel controls permissions to each of these pages. By default, the text segment pages are set to read and execute. You may not write to them. In order to change the instructions at runtime, we’ll need to change the permissions of the text segment pages so that we write to them.

Changing the permissions of a page can be done with the mprotect() function. The only tricky part of mprotect() is that the pointer you give it must be aligned to a page boundary. Here is a function that given a pointer, moves the pointer to the page boundary and then changes that page to read, write, and execute permissions.

1
2
3
4
5
6
7
8
9
10
int change_page_permissions_of_address(void *addr) {
    int page_size = getpagesize();
    addr -= (unsigned long)addr % page_size;

    if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
        return -1;
    }

    return 0;
}
iCATA: Building a realtime bus locator app

For my last semester of college, I took a class on iOS development. The last assignment of the class was a five week project of our own choosing. My idea was to build a better bus locator app for the local bus service, CATA. The CATA available on the app store leaves a lot to be desired. Most notably, I want to see multiple bus routes on the map simultaneously. This is very useful for anyone that uses the buses to get around campus since there are four bus routes that go around campus. When you’re running for the bus, every second counts so it’s quite advantageous to be able to see all four campus routes on the same map at once.

The first challenge was getting access to the API that provides the bus location info. CATA provides a web-based bus locator at http://realtime.catabus.com/InfoPoint/, but this is quite basic; nothing more than the bus location and the data from the server is all XML (yuck). Fortunately, there is a new web-based bus locator at http://50.203.43.19/InfoPoint/. Besides the fact that it’s just an IP address, this page provides more information including the direction of the bus, how many people are on board, even the name of the driver and the format of the data from the server is JSON (yay!). But how to get URLs to get data from the API? This is actually quite easy with the Live HTTP Headers Firefox addon. Just refresh the page and look for RESTful URLs. For this project these turned out to be:

  • http://50.203.43.19/InfoPoint/rest/RouteDetails/Get/[route ID] for info about a specified route. This includes info about buses on the route, the coordinates of each stop on the route, and the filename of the KML file for the route.
  • http://50.203.43.19/InfoPoint/rest/StopDepartures/Get/[stop ID] for upcoming departures from the specified stop. This includes the route IDs of buses as well as their scheduled and expected times of arrival and departure in UNIX time.
  • http://50.203.43.19/InfoPoint/Resources/Traces/[KML filename]. The KML file is used for drawing the line of the route on a map. The filename is given in the route details JSON file.
  • There was also a resource for downloading all the available routes, but since these rarely change, I chose to keep a static copy of these in a plist distributed with the app so that it does not need to download them each time the app is started (like the existing CATA app does).
Working with binary data in C and OpenSSL

My post on how to do basic AES and RSA encryption has, for a while now, been one of the most popular posts on my blog, but I continually get questions about why people can’t print out the encrypted messages like a normal string or write them to a file using fprintf(). The short answer is that encrypted messages are binary data, not ASCII strings with a NUL terminator and thus, they can’t be treated as if they’re ASCII data with a NUL terminator. You might be saying, “but I want to send an encrypted message to my friend as ASCII!”.

Well, time for base64.

We can use base64 to encode our encrypted messages into ASCII strings and then back again to binary data for decryption. OpenSSL has a way of doing this for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
char* base64Encode(const unsigned char *message, const size_t length) {
    BIO *bio;
    BIO *b64;
    FILE* stream;

    int encodedSize = 4*ceil((double)length/3);
    char *buffer = (char*)malloc(encodedSize+1);
    if(buffer == NULL) {
        fprintf(stderr, "Failed to allocate memory\n");
        exit(1);
    }

    stream = fmemopen(buffer, encodedSize+1, "w");
    b64 = BIO_new(BIO_f_base64());
    bio = BIO_new_fp(stream, BIO_NOCLOSE);
    bio = BIO_push(b64, bio);
    BIO_set_flags(bio, BIO_FLAGS_BASE64_NO_NL);
    BIO_write(bio, message, length);
    (void)BIO_flush(bio);
    BIO_free_all(bio);
    fclose(stream);

    return buffer;
}
MITM Protection via the Socialist Millionaire Protocol (OTR-style)

Crypto disclaimer!  I am NOT a crypto expert. Don’t take the information here as 100% correct; you should verify it yourself. You are dangerously bad at crypto.


The problem

Man-in-the-middle attacks are a serious problem when designing any cryptographic protocol. Without using a PKI, a common solution is to provide users’ with the fingerprint of exchanged public keys which they should then verify with the other party via another secure channel to ensure there is no MITM. In practice, this is a very poor solution because most users will not check fingeprints and even if they do, they may only compare the first and last few digits of the fingerprint meaning an attacker only need create a public key with the same few first and last digits of the public key they are trying to impersonate.

The solution

There’s no good protection from MITM, but there is a way to exchange secrets without worrying about a MITM without using a PKI and without checking fingerprints. OTR (off-the-record) messaging utilizes the Socialist Millionaire Protocol. In a (very small) nutshell, SMP allows two parties to check if a secret they both hold are equal to one another without revealing the actual secret to one another (or anyone else). If the secrets are not equal, no other information is revealed except that the secrets are not equal. Because of this, a would-be MITM attacker cannot interfere with the SMP, except to make it fail, because the secret value is never exchanged by the two parties.

How does it work?

As usual, the Wikipedia article on SMP drowns the reader with difficult to read math and does a poor job explaining the basic principle behind SMP. Luckily, there are much better explanations out there. The actual implementation of it is, unfortunately, just as convoluted as the math. The full implementation details can be found in the OTR protocol 3 spec under the SMP section. Below is the basic implementation of the protocol as defined in OTR version 3: