A year on Ubuntu

As I was closing tabs in Firefox, I noticed a had my Twitter profile opened. When I saw the page, I was reminded of the fact that I’ve been using Ubuntu for over a year now. I wish I could add “exclusively” to the end of that sentence, but alas, my PC at work has Windows on it.

I’ve gone trough some changes with Ubuntu in the year. I started with Xubuntu 10.04. In time, I moved away from XFCE to KDE then then quickly to Gnome. Somewhere in between I also upgraded to 10.10, but I’m not sure about the chronology. With Ubuntu 11.04 came Unity, which I have a fair chance for about 2 weeks. Not really liking it, I have Gnome 3 / Gnome Shell a try. Anybody who gave Gnome Shell a try knows its not anywhere near as customisable as Gnome 2 was. That’s okay they, ’cause it rocks as-is. :)

The power of streams

A while ago at work, a customer asked that we add an export of a specific table in his database to his website. It wasn’t a difficult export, just get every row in the database and output it in a format MS Excel understands. I was assigned the task, so I went at it.

I started to just get the data out of the database. That meant having something like this:

require 'php/bootstrap.php';
$rResultset = $oDatabase->query("SELECT * FROM mytable");

Simple enough, right? The next thing to do was decide what format the export should be in. All they said was a format MS Excel can understand, so the simples thing I could think of was the CSV format. And since I had recently used fgetcsv to read some stuff from a csv export, I took a wild guess and typed fputcsv in my editor.

So, I had to take the data I already got from the database, create a csv file, and send that to the user with the correct HTTP headers. Easy enough, that means this:

// Put the data in a file
$rHandle = fopen("export.csv", "w+");
while($aResult = $oDatabase->fetch($rResultset)) {
    fputcsv($rHandle, $aResult);
}
 
// Send the headers
header("Content-type: application/csv");
header("Content-Disposition: attachment; filename=mytable-export.csv");
header("Pragma: no-cache");
header("Expires: 0");
 
// Read the file, from the beginning
rewind($rHandle);
 while (($sOutput = fgets($rHandle, 4096)) !== false) {
        echo $sOutput;
 }

Okay, so that should work pretty well. Unless two people run the export at the same time. Better make the filename unique. Oh but then I’ll get huge load of files in my tmp directory, better delete the file afterwards.

Luckily, at this point, I hadn’t actually written any code, it was all just pseudo-code in my head. Writing to a file just so I can output it, and then delete that file… that’s stupid. (Not to mention a waist of IO, which is expensive enough as it is without me throwing it out the door like this.)

This meant I had a mission: I wanted to output the data right away, but using fputcsv since I don’t want to bother with all the csv stuff myself.

Enter fopen(“php://output”, “w”); which allows you to write to stdout (read: the webserver, read: the browser) as if it where a file. With that, my script became as simple as this:

// Send the headers
header("Content-type: application/csv");
header("Content-Disposition: attachment; filename=mytable-export.csv");
header("Pragma: no-cache");
header("Expires: 0");
 
$rHandle = fopen("php://output", "w");
while($aResult = $oDatabase->fetch($rResultset)) {
    fputcsv($rHandle, $aResult);
}

Rather simple if you ask me. And to think I considered messing with files.

After that, the script needed some tweaking: as first row, they wanted the column names. And “America style” csv (where the separator as a comma) doesn’t work here in Europe, since we use the comma for decimal numbers. Neither of which was a problem:

$bHeadersDone = false;
$rHandle = fopen("php://output", "w");
while($aResult = $oDatabase->fetch($rResultset)) {
     if(!$bHeadersDone) {
          $bHeadersDone = true;
          fputcsv($rHandle, array_keys($aResult), ";");
    }
    fputcsv($rHandle, $aResult);
}

Et voila, it does exactly what the customer wants, its just 29 lines long, and it learned me something new. :)

Stupid limitations

Today at work, I was tasked to move a website from an external hosting, to one of our servers. They gave me the FTP username and password, so I started to download the whole website. I then grabbed the database info from the configuration, and made an export of the database as well. Uploaded everything to our server, restored the database, and voila, almost everything worked.

The website in question is a news website. Its content is basically text+images. The text, obviously, was in the database. The images where, obviously, on the filesystem. So logic would suggest that transferring the files and database would do the trick, right? (Ignore the DNS for a moment, okay?)

And sure, normally it would. And all the files where there, except for the images. Taking a closer look, it turned out a part of the images (the first 1998 to be exact) where. in fact, there. Taking an even closer look made me notice that a part of the response text when doing an FTP LIST of the images directory was “TRUNCATING RESULT AT 2000 MATCHES”. Pretty fucking awsome, right, they limited their FTP LIST to 2000 items (‘.’ and ‘..’ being 1 and 2).

I called the support phonenumber, but they couldn’t help me – apparently increasing that 2000 would be bad for server performance. They suggested I grab 2000 files, delete them on the server and then get the next 2000. Nice idea, except that’d mean the website would be broken while I’m getting the files. (The actual switch will be monday/thuesday, so removing all but 2000 images now is not an option.)

Long live FTP access though – I wrote a small PHP script that would list (links to) all the files in the directory, install the firefox plugin ‘download them all’, and about two hours later, I had all the files sitting on my harddrive nicely. :) (And I’m sure the 14.000 HTTP requests where a much bigger hit on their server then allowing me to grab them trough FTP would have ever been.)

So, in the end, this was just a dumb limitation that (in this specific case) did more harm then good.

How keeping code in the right layer saved a website

At my job at e-sites, one of the projects I worked on was the new Sport 1 website. The website went live yesterday morning. About 20 minutes later, I was asked to look at why it was so slow.

For the website, we had a layered setup. In our ‘domain layer’, we had models and entities, where a model understood how to work with an entity. So, say I want all the news articles for a specific sport, I can do this:

$aNewsArticles = $oNewsModel->getForSport($iSportId);

Where $oNewsModel is an instance of our news model. After the call, $aNewsArticles would be an array, containing instances of our news entity class. Simple, right?

Well, apparently not. Since I setup the groundwork for the project and (basically) then went on vacation with little time to explain it to others, this is what I ran into:

Class NewsModel {
	public function getForSport($iSportId) {
	    $aArticles = query("SELECT newsId from news where sportId = %d", $iSportId);
	    $aArticleList = array()
	    foreach($aArticles AS $aArticle) {
	        $aArticleList[] = new NewsEntity($aArticle['newsId']);
	    }
	    return $aArticleList;
	}
}

Do you see where this is going? Yes, thats where

Class NewsEntity {
	public function __construct($iArticleId) {
	    $aData = query("SELECT * from news where newsId = %d", $iSportId);
	    $this->iId = $aData['newsId'];
	    $this->sTitle = $aData['newsTitle'];
	    /* ..... */
	}
}

Taking into account that on the frontpage we a bunch of lists with a total of 80 articles that where all retreived the same way, we where executing at least 80 queries to much. And all that, just because somebody put the query in the wrong place.

In the end, it was a fairly easy fix, but it shouldn’t have been done this way in the first place. Now, I understand how this would have happened. Somebody had to do the first thing with news and, in that particular case, knew the ID of the news-item, since it was part of the URL. Well, if you already know the ID, whats easier then this:

$oNewsItem = new NewsEntity($iId);

Yea, that must be the easiest way to make that work. Feed the ID into the entity object and have that get the rest. And then, when you need a bunch of those objects, well, just selecting the IDs of the items you need and creating new objects, which then get the rest of the data, thats pretty easy, too. To bad its also not so fun for the database server.

So now we know how it happened, how do we make sure it doesn’t happen again? Obviously, making sure everybody has the same idea of what should go where is a (good) start. Ideally, there would be time to write some documentation, and you’d be able to oversee (and participate in) the further development, meaning you can put a stop to something while its happening instead of after the website went live. But, like I said, I was going on vacation, so I did neither. :)