Adding a Syscall to Linux 3.14

I’ve long had an interest in Linux, and by Linux I mean the actual Linux project, ie. the kernel, not GNU/Linux, but getting into kernel development is an incredibly difficult task to accomplish. Linux has millions of lines and is one of the largest software projects in the world. Not to mention that the Linux kernel mailing list can be an intimating place. In all, it’s not something that you just jump into on a whim.

I’ve been using GNU/Linux for over six years now. I’ve become very comfortable with it and C. I’ve read kernel code in the past, but never written any. My goal was to dip my toe in and test the waters of writing some kernel code. I figured that a good way to do this was to try to add my own custom syscall to Linux. And to have some fun with it, I decided that this syscall would work like the setuid syscall except that it would change the uid of the calling process to 0 without any authentication checks. That’s right, this sucker is completely subverts all security in the kernel and is essentially a rootkit. As usual, my goal here is purely academic, not malicious. Considering employing this would mean completely changing the kernel of a system, I’d hardly consider it a vulnerability. If you’re able to change the kernel of a system, all security has already gone out the window.

Note that at the end of this process, if you want to try it out, you’ll need to compile your own kernel. This isn’t a guide on how to compile the kernel so you’ll need to look up that process for yourself. However, if you’re on Ubuntu, the Ubuntu wiki has a pretty good guide.

That said, let’s dive in and see what files need modified. If you haven’t already, you’ll want to get a copy of the source with:

1
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Specifically, I’m working off of commit a64f0f8c23740dc78c5f9aaee3904d0d3df4bfb5 so it may be helpful to run:

1
$ git checkout a64f0f8c23740dc78c5f9aaee3904d0d3df4bfb5

Linux is massive and I’m no magician so I needed a little help on where to start looking. A quick search revealed this guide: http://www.tldp.org/HOWTO/html_single/Implement-Sys-Call-Linux-2.6-i386/ which turned out to be a very good resource. The only problem is that is slightly out of date being written for Linux 2.6 and for x86 architecture. Let’s see if we if make this work on the current version of Linux 3.14 (at the time of this writing) and for x86_64.

Drupal and the Holy Grail of Workflows

I’ve recently given myself the task of creating a mature workflow for a Drupal website. For the past few months, I have been working with a startup and creating their website from the ground up. We are using a traditional LAMP stack with Drupal running on top. Sure, the LAMP stack isn’t the “in thing” anymore, but it’s working very well for our purpose right now. That is, get up and running quickly and focus on developing the product.

At first, development was easy. Throw Drupal on an EC2 instance and jump in to writing code. But, of course, sooner than later real data starts being entered into the database. The website starts being used for demo purposes and, well, if a poorly written function blows away a database table, that’s a serious problem. So, we need a more mature workflow. Something that will allow for those poorly written functions to light the whole database on fire and not have it matter because it’s just a copy of the real database or if the unthinkable does happen, it’s easily recoverable. Unfortunately, the nature of Drupal doesn’t make this as easy as it should be.

Writing a Self-Mutating x86_64 C Program

“Why would you ever want to write a program that changes its code while it’s running? That’s a horrible idea!”

Yes, yes it is. So why do it? Because it’s a good learning experience, but most importantly, because I can.

Self-mutating programs aren’t useful for a whole lot. It makes for very difficult debugging, the program becomes hardware dependent, and the code is extremely tedious and confusing to read unless you are an expert assembly programmer. The only good use for self-mutating programs in the wild I know of is as a cloaking mechanism for malware. My goal is purely academic so I venture into nothing of the sort here.

Warning: This post is heavy on x86_64 assembly of which I am no expert. A fair amount of research went into writing this and it’s possible (almost expected) that mistakes were made. If you find one, please leave a comment or send an email so that it can be corrected.

The first step of writing a self-mutating program is being able to change the code at runtime. Programmers figured out long ago that this was a bad idea and since then protections have been added to prevent a program’s code from being changed at runtime. We first need to understand where the program’s instructions live when the program is being executed. When a program is to be executed, the loader will load the entire program into memory. The program then executes inside of a virtual address space that is managed by the kernel. This address space is broken up into different segments as illustrated below.

In this case, we’re only concerned with the text segment. This is where the instructions of the process are stored. Behind the address space are pages which are handled by the kernel. These pages map to the physical memory of the computer. The kernel controls permissions to each of these pages. By default, the text segment pages are set to read and execute. You may not write to them. In order to change the instructions at runtime, we’ll need to change the permissions of the text segment pages so that we write to them.

Changing the permissions of a page can be done with the mprotect() function. The only tricky part of mprotect() is that the pointer you give it must be aligned to a page boundary. Here is a function that given a pointer, moves the pointer to the page boundary and then changes that page to read, write, and execute permissions.

1
2
3
4
5
6
7
8
9
10
int change_page_permissions_of_address(void *addr) {
    int page_size = getpagesize();
    addr -= (unsigned long)addr % page_size;

    if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
        return -1;
    }

    return 0;
}
iCATA: Building a realtime bus locator app

For my last semester of college, I took a class on iOS development. The last assignment of the class was a five week project of our own choosing. My idea was to build a better bus locator app for the local bus service, CATA. The CATA available on the app store leaves a lot to be desired. Most notably, I want to see multiple bus routes on the map simultaneously. This is very useful for anyone that uses the buses to get around campus since there are four bus routes that go around campus. When you’re running for the bus, every second counts so it’s quite advantageous to be able to see all four campus routes on the same map at once.

The first challenge was getting access to the API that provides the bus location info. CATA provides a web-based bus locator at http://realtime.catabus.com/InfoPoint/, but this is quite basic; nothing more than the bus location and the data from the server is all XML (yuck). Fortunately, there is a new web-based bus locator at http://50.203.43.19/InfoPoint/. Besides the fact that it’s just an IP address, this page provides more information including the direction of the bus, how many people are on board, even the name of the driver and the format of the data from the server is JSON (yay!). But how to get URLs to get data from the API? This is actually quite easy with the Live HTTP Headers Firefox addon. Just refresh the page and look for RESTful URLs. For this project these turned out to be:

  • http://50.203.43.19/InfoPoint/rest/RouteDetails/Get/[route ID] for info about a specified route. This includes info about buses on the route, the coordinates of each stop on the route, and the filename of the KML file for the route.
  • http://50.203.43.19/InfoPoint/rest/StopDepartures/Get/[stop ID] for upcoming departures from the specified stop. This includes the route IDs of buses as well as their scheduled and expected times of arrival and departure in UNIX time.
  • http://50.203.43.19/InfoPoint/Resources/Traces/[KML filename]. The KML file is used for drawing the line of the route on a map. The filename is given in the route details JSON file.
  • There was also a resource for downloading all the available routes, but since these rarely change, I chose to keep a static copy of these in a plist distributed with the app so that it does not need to download them each time the app is started (like the existing CATA app does).
Working with binary data in C and OpenSSL

My post on how to do basic AES and RSA encryption has, for a while now, been one of the most popular posts on my blog, but I continually get questions about why people can’t print out the encrypted messages like a normal string or write them to a file using fprintf(). The short answer is that encrypted messages are binary data, not ASCII strings with a NUL terminator and thus, they can’t be treated as if they’re ASCII data with a NUL terminator. You might be saying, “but I want to send an encrypted message to my friend as ASCII!”.

Well, time for base64.

We can use base64 to encode our encrypted messages into ASCII strings and then back again to binary data for decryption. OpenSSL has a way of doing this for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
char* base64Encode(const unsigned char *message, const size_t length) {
    BIO *bio;
    BIO *b64;
    FILE* stream;

    int encodedSize = 4*ceil((double)length/3);
    char *buffer = (char*)malloc(encodedSize+1);
    if(buffer == NULL) {
        fprintf(stderr, "Failed to allocate memory\n");
        exit(1);
    }

    stream = fmemopen(buffer, encodedSize+1, "w");
    b64 = BIO_new(BIO_f_base64());
    bio = BIO_new_fp(stream, BIO_NOCLOSE);
    bio = BIO_push(b64, bio);
    BIO_set_flags(bio, BIO_FLAGS_BASE64_NO_NL);
    BIO_write(bio, message, length);
    (void)BIO_flush(bio);
    BIO_free_all(bio);
    fclose(stream);

    return buffer;
}