mPicasaIntegration is no longer available for download on my site. Björn Teichmann now has it, and you can download it from his site. Björn started with my fixed version, and has added some new features. He plans to continue working on it. As I mentioned before, I made some fixes to it after its original author stopped working on it. I decided not to do any further work on it though, as its design is not something I want to stick with (It downloads the images from Picasa and stores them on your server. I’d rather just point to the images on Picasa, which lets me scale the images arbitrarily, and then I don’t have to worry about going over my disk quota either.) Since then I’ve started working on my own plugin, but I probably won’t finish it until sometime after I return to the US (i.e. sometime this summer).
I’ve finished my emergency surgery on Markus Steinhauer’s mPicasaIntegration plugin, so it can now read the new Picasa RSS feed. Post to Post Links II error: No post found with slug "mpicasaintegration-wordpress-plugin", with more information and a download link. It’s still 95% Markus’ code – most of my changes are in the RSS parsing, and I added some code to get the image dimensions and update the database, since that information is no longer in the RSS feed.
In my previous post, I experimented with putting my photos on Picasa. I’ve been posting my photos to my own site until now, but it’s a real chore. Also, I severely hacked my Coppermine installation to make it do some things I wanted, which has made it hard enough for me to upgrade that I haven’t done so. For a long time I was thinking about writing my own photo management application, but never had the time. Picasa’s come along and it does about 80% of the things I’ve been wanting, so I want to start putting my photos there. The trick is getting it to integrate with WordPress.
I looked for a plugin, and found folks discussing the mPicasaIntegration plugin. It’s a plugin that reads the Picasa RSS feed and stores data from it in custom tables, and then caches images as it needs them from Picasa. The only problem is that the plugin’s author and his website vanished from the web a few months ago, and I couldn’t find the code. So I was emailing people who I found were using it, and a kind person sent me a copy. He also informed me that it no longer worked. Google had significantly changed the Picasa RSS feed, which broke the plugin.
The time I normally set aside for blogging has been consumed the past few days by trying to fix it. I have it reading the RSS feed correctly again, but unfortunately some of the data the plugin relied on is no longer in the feed, so some of its features will need to be re-thought. In a few days I should have it working well enough so that at least some of the key features are working. Then I’ll post the code, as I know there are others who have been trying to track down this plugin as well (the author released it under a GNU license, so I don’t think he’ll mind).
It would have been much more difficult for me to figure out how to setup my Japanese keyboard without the help of the articles, blog posts, and forum posts that others wrote describing their experiences. I figured out a few things that no one else has written about, so the purpose of this post is to give something back to the community of folks who have also struggled with using Japanese in Windows.
I decided to try my luck using a 109 key Japanese keyboard with my English Windows laptop. I thought it might help my Japanese writing if I learned to use the direct Hiragana and Katakana input, instead of typing in Romaji and relying on MS Word to do the conversions for me. I succeeded in getting everything working, but it took some doing.
The place to start is the excellent article Windows XP Japanese Input. As thorough as that article is, it wasn’t quite enough to get my keyboard working correctly. So the next step is Cameron Beccario’s instructions for installing a Japanese keyboard. My keyboard is USB, but the only driver option available for a Japanese keyboard is PS/2. I picked that anyway and it’s working fine. But that only gets the driver in place – you still need to do some configuration work:
Some other things worth noting:
To the left of the spacebar, (Shift-JIS) 無変換 (muhenkan) means no conversion from kana to kanji. To the right of the spacebar, 変換 (henkan) means conversion from kana to kanji. In Microsoft systems it converts the most recently input sequence of kana to the system’s first guess at a string of kanji/kana/etc. with the correct pronunciation and a guess at the meaning. Repeated keypresses change it to other possible guesses which are either less common or less recently used, depending on the situation. The shifted version of this key is 前侯補 (zenkouho) which means “previous candidate” — “zen” means “previous”, while “kouho” means “candidate” (explanation courtesy of NIIBE Yutaka) — it rotates back to earlier guesses for kanji conversion. The alt version of this key is 全侯補 also pronounced (zenkouho), which means “all candidates” — here, “zen” means “all” — it displays a menu of all known guesses. I never use the latter two functions of the key, because after pushing the henkan key about three times and not getting the desired guess, it displays a menu of all known guesses anyway.
Next on the right, ひらがな (hiragana) means that phonetic input uses one conventional Japanese phonetic alphabet, which of course can be converted to kanji by pressing the henkan key later. The shifted version is カタカナ (katakana) which means the other Japanese phonetic alphabet, and the alt version is ローマ字 (ro-maji) which means the Roman alphabet.
Near the upper left, 半/全 (han/zen) means switch between hankaku (half-size, the same size as an ASCII character) and zenkaku (full-size, since the amount of space occupied by a kanji is approximately a square, twice as fat as an ASCII character). It only affects katakana and a few other characters (for example there’s a full-width copy of each ASCII character in addition to the single-byte half-width encodings). The alt version of this is 漢字 (kanji) which actually causes typed Roman phonetic keys to be displayed as Japanese phonetic kana (either hiragana or katakana depending on one of the other keys described above) and doesn’t cause conversion to kanji.
I’m a fairly fast typist, and it’s taken about a week to retrain my fingers for some of the different key positions. The hardest thing to get used to is the teeny tiny space bar (it’s only about twice the width of a regular key). Some of the layout reminds me of my old Commodore 64 – double quote is Shift-2, @ has its own key, etc.
Yesterday was the 60th anniversary of the creation of ENIAC, the world’s first all-electronic computer, here at U Penn. An interview with Presper Eckert, one of its co-inventor’s, was recently published on the ComputerWorld site. I was fascinated by his description of the Harvard Mark 1, ENIAC’s mechanical predecessor:
It could solve linear differential equations, but only linear equations. It had a long framework divided into sections with a couple dozen shafts buried through it. You could put different gears on the shafts using screwdrivers and hammers and it had “integrators,” that gave [the] product of two shafts coming in on a third shaft coming out. By picking the right gear ratio you should get the right constants in the equation. We used published tables to pick the gear ratios to get whatever number you wanted. The limit on accuracy of this machine was the slippage of the mechanical wheels on the integrator.
And about ENIAC itself:
The ENIAC was the first electronic digital computer and could add those two 10-digit numbers in .00002 seconds — that’s 50,000 times faster than a human, 20,000 times faster than a calculator and 1,500 times faster than the Mark 1. For specialized scientific calculations it was even faster… ENIAC could do three-dimensional, second-order differential equations. We were calculating trajectory tables for the war effort. In those days the trajectory tables were calculated by hundreds of people operating desk calculators — people who were called computers. So the machine that does that work was called a computer… ENIAC had 18,000 vacuum tubes… The radio has only five or six tubes, and television sets have up to 30.
He also mentioned that back then Philadelphia was “Vacuum Tube Valley.” My neighbor, a man in his 70s, told me he use to work on re-entry systems in an office on Walnut St. I asked if he meant programs for people re-entering the work force. “No,” he said “I worked for GE, designing re-entry systems for astronauts in spaceships.” It seems that little of this technological legacy remains here. Penn’s school of engineering isn’t what it used to be (Penn’s schools of business, architecture, communications, medicine, nursing and veterinary medicine are all top 5 schools, but engineering ranks 27th). And while there are Lockheed-Martin offices and pharmeceutical companies scattered around the tri-state area, and Drexel is a good engineering school, I don’t get any sense that the city of Philadelphia does anything to capitalize on its remaining engineering and technology assets.
If you need to make use of an external program from within a PHP script, then this essay is for you. My example script is for managing an sftp connection (using OpenSSH), but the principals can be applied to any interaction that requires communication between your script and an external process.
One of the Penn Medical School’s business partners recently stopped allowing ftp connections to their servers for retrieving data files. They required us to switch to sftp (secure ftp). Those providing services for transferring sensitive data files over the internet have been steadily moving from ftp to sftp over the past couple of years, and from what I can see, the pace is accelerating. This poses a programming challenge if you have scripts that automate your ftp needs, as they’ll need to be re-written for sftp. This is not a trivial undertaking, especially if you’re programming in PHP. You can’t just swap out your PHP ftp function calls with sftp equivalents. Actually, you can, but you probably don’t want to, as you would have to upgrade to PHP 5 (adoption of which has been very slow across the PHP community) and you would have to install the PECL/ssh2 library, which – as noted on php.net – currently has no stable version.
So we had to roll our own sftp solution, which required using PHP’s program execution functions. The php.net documentation is good on this topic, but much of it is fully comprehensible only if you already know what you’re doing (this isn’t a criticism – it’s a documentation site after all, not a tutorial site). This annotated sample script will help you get started if you’re new to PHP’s program execution functions.
#!/usr/local/bin/php <?php $keyPath = 'path/to/your/ssh_key'; $login = 'your_username'; $server = 'your_sftp_server'; $connectionString = "Connecting to $server...\n"; $childPipes = array( 0 => array("pipe", "r"), // stdin is a pipe that the child will read from 1 => array("pipe", "w"), // stdout is a pipe that the child will write to 2 => array("pipe", "w"), // stderr is a pipe that the child will write to ); # turning off password authentication will avoid getting a password prompt if # the key fails for any reason $connection = proc_open( "sftp -oPasswordAuthentication=no -oIdentityFile={$keyPath} {$login}@{$server}", $childPipes, $parentPipes); if ($connection === FALSE) { print "Cannot connect to $server.\n"; exit; }
PHP’s proc_open is a fork by another name. The $childPipes array is for setting up the communication channels from the child process perspective, and proc_open will set $parentPipes to a corresponding set of communication channels from the parent process perspective. Looking at the definition of $childPipes, the logic may seem backwards at first, but it’s not. For example, the parent process will write to the child’s stdin (element 0), which means the child process is reading that channel.
In the user contributed notes on the php.net proc_open page, most folks write out stderr to a file. But for our sftp script we need to see what’s coming through on stderr, so we’re not directing it to a file.
For establishing the connection, we turn off password authentication, which means we won’t get a password prompt if the key authentication fails. This is important, since the script cannot see or respond to such a prompt (the prompt goes directly to the terminal, so you can’t see it on stdin or stdout; you could see it if you want to do TTY buffering, but let’s not go there…).
# The "connecting..." message is written to stderr. Make sure there's nothing # besides that in stderr before continuing. $error = readError($parentPipes, TRUE); sleep(3); $error .= readError($parentPipes); if ($error != $connectionString) { fclose($parentPipes[0]); fclose($parentPipes[1]); fclose($parentPipes[2]); $closeStatus = proc_close($connection); print $error; print "proc_close return value: $closeStatus\n"; exit; }
I don’t know if this is typical, but the sftp server we’re connecting to returns the “connecting…” welcome message on stderr (we’re reading stderr with a custom function named readError, which we’ll get to below). Having this message on stderr is problematic, since it’s not really an error message. An actual connection error, such as having a bad key, will come through on stderr after the “connecting…” message. This means we first look for the “connecting…” string (the TRUE argument to readError turns blocking on, so we’ll wait for it to appear – more on this below in the readError function), and then we have no choice but to sleep for a few seconds, to see if anything else comes through on stderr. And finally, to see if there was anything in stderr besides the “connecting…” message, we have no choice but to analyze the string 🙁 . This is an ugly solution, but dealing with stderr is difficult, since you never know when an error may or may not appear.
If we detect an error, we close the pipes before closing the connection. This is important for avoiding the possibility of a deadlock.
# gets us past the first "sftp>" prompt $output = readOut($parentPipes);
After logging in, we’ll get an “sftp>” prompt on stdout. We’ll read from stdout to get past this prompt, using the custom function readOut (which is defined below).
# Get the directory listing and print it writeIn($parentPipes, "ls -l"); $output .= readOut($parentPipes); $error = readError($parentPipes); if (strlen($error)) { fclose($parentPipes[0]); fclose($parentPipes[1]); fclose($parentPipes[2]); $closeStatus = proc_close($connection); print $error; print "proc_close return value: $closeStatus\n"; exit; } print $output; # close the sftp connection writeIn($parentPipes, "quit"); fclose($parentPipes[0]); fclose($parentPipes[1]); fclose($parentPipes[2]); $closeStatus = proc_close($connection); if ($closeStatus != 0) { print "proc_close return value: $closeStatus\n"; }
This code just demonstrates getting a directory listing (using the custom function writeIn), printing it, and then closing the connection. You can use this as a template for any sftp commands you want to run.
function readOut($pipes, $end = 'sftp> ', $length = 1024) { stream_set_blocking($pipes[1], FALSE); while (!feof($pipes[1])) { $buffer = fgets($pipes[1], $length); $returnValue .= $buffer; if (substr_count($buffer, $end) > 0) { $pipes[1] = "" ; break; } } return $returnValue; }
readOut loops over the stdout pipe until it sees an “sftp>” prompt, which is how we know that the server has finished writing to stdout. Note that we’ve turned off stream_set_blocking. This lets us define our own controls for reading from the stdout stream. In this case, we want readOut to return when the server has finished responding to a command. The best marker for that is the appearance of the “sftp>” after it finishes processing a command, so we set the while loop to break when it sees the prompt.
function readError($pipes, $blocking = FALSE, $length = 1024) { stream_set_blocking($pipes[2], $blocking); while (!feof($pipes[2])) { $buffer = fgets($pipes[2], $length); $returnValue .= $buffer; if ((!strlen($buffer) && $blocking === FALSE) || ($blocking === TRUE && substr_count($buffer, "\n") > 0)) { $pipes[2] = "" ; break; } } return $returnValue; } function writeIn($pipes, $string) { fwrite($pipes[0], $string . "\n"); } ?>
Reading from stderr is more complicated than reading from stdout, because 1. there is no equivalent to the “sftp>” prompt to let us know when the server is done writing to stdout, and 2. at any given time, there may or may not be an error. For most places in the script, we solve this problem by:
The one situation when this approach doesn’t work is when we first log in, since the server writes to stderr before writing to stdout (as described above, it sends that “connecting…” message on stderr). In this case we turn blocking on, since we know the message is coming.
So far this script has been used with only one sftp server, so you may need to make some adjustments to make it work with your server (particularly with how it reads stderr when logging in). Also, I’d be interested in hearing from anyone who has a more elegant solution to handling the initial connection.
Stepping back from the specifics of sftp, the key thing to take away from this is that you will need to acquire a detailed knowledge of the behaviors of your external process so that your script can interact with it reliably. In particular, you need to test your handling of all the different kinds of errors the external process might throw at your script.
Last week was the culmination of my work so far here at Penn. I was hired to overhaul the Med School’s web-based admissions tools. Over the past year and half I’ve written over 32,000 lines of code for this project. That means there are a lot of moving parts. The more moving parts you have, the more features you can offer. On the downside, every moving part you add introduces another possibility for something to go wrong. In a post about a year ago I explained the home-grown development tools we use for UI development and database access (since then we made the unfortunate choice of renaming our “LDL” database access tool to “the API”). With the admissions project I added to this toolset, introducing the concept of “data objects” (I called them that to distinguish them from the UI objects my coworkers were already familiar with). Here’s a presentation I made about a year ago if you want to know the gory details. But the basic point is that, to minimize the potential for chaos, confusion, and things generally going wrong when you have so much code, I went with an object oriented design for managing and manipulating the data (done properly, this gives you clearly defined containers for your data and functionality, and provides a set of unambiguous “touch points” between all the moving parts). Last week we launched the new tools for the applicants for the 2006 class, and I’m told it’s been the smoothest launch since the Med School first moved the process online four years ago.
That might not be quite the achievement it sounds like, as they really had nowhere to go but up. That’s not the Med School’s fault though. When someone applies to medical school, they don’t apply directly to the school. They send their application to AMCAS (the American Medical College Application Service), and it’s up to AMCAS to get the application data to the schools where the applicants want to be considered. When AMCAS moved to doing this electronically several years ago, many of the med schools were nervous, so AMCAS tried to cajole them into feeling better about it with the Who Moved My Cheese? approach. Then when their new electronic system went live, it was a total disaster. Which goes to show that sometimes fear is a perfectly rational response to change. In the years since then they’ve improved their system, so there haven’t been any repeats of what happened the first time, but it takes time to rebuild trust after an experience like that. I was rewarded the other day with a t-shirt saying “AMCAS moved my cheese.” I’m amazed they’ve stuck with that slogan.
Fortunately we don’t use much Microsoft software at my job. But we do have one vendor-dependent application that requires us to use SQL Server. I needed to add a column to a table indicating when a record was modified. So I dutifully went to Microsoft’s MSDN site to learn how this is done in SQL Server. I came across the “timestamp” data type. “Hmmm,” I foolishly thought, “maybe this will help me with creating a time stamp.” But no, the documentation says: “The SQL Server timestamp data type has nothing to do with times or dates.” It’s actually a sequential record modification marker that’s useful in data recovery, but it has “…no relationship to time.”
I guess this is the kind of stuff people have to spend their time learning when they go for Microsoft Certification.
It’s really inefficient to try writing software in 20 minute chunks of time, on alternating days. You spend most of the 20 minutes just figuring out whatever it was you were working on last time (at least, that’s what you do if you’re going prematurely senile, like me). That’s how I’ve been proceeding on my photo management software, since 20 minutes is about all the time I can spare most days. At this point I have the database tables in place and I’ve started on the administrative interface. I’m realizing just how big of a project this is going to be – there’s a reason applications like Coppermine and Gallery consist of hundreds of files and thousands of lines of code! My application won’t be that big though. That’s because I’m not going to pack everything plus the kitchen sink into it – i.e. it won’t include emailing photos, photo comments, rating photos, etc. I like the WordPress model, where those sorts of things would be plug-ins you could add if you want, not part of the core program. Also, I’m trying to design it with the goal of maximizing code reuse, which will hopefully keep the codebase compact and robust.
Here’s the concept of my approach:
So that’s what I’m working on. The development is still very much in it’s early stages, and it’s going to be a long time before I have a working prototype, so if any of my geeky readers (and you know who you are) have any suggestions, please let me know!
I’ve mentioned before that I’m using Coppermine to manage the photos on my website. I was happy with it at first, but recently I’ve been finding it to be limiting and poorly written. The most frustrating aspect is that the UI and the application logic are not separated – I’ve had to hack the code all over the place to make simple design changes, which means it’ll be impossible for me to upgrade to a future version (at least, not without doing my hacks all over again). Another annoyance is that the only way to re-order photos in an album is to rename them: the order is determined alphabetically by filename. That’s…really lame.
Something I’ve been looking for, but haven’t found in any photo management software, is the separation of “slideshows” from “albums.” To me, an album is used to thematically organize photos, while a slideshow can be used for a one-off presentation of photos that might come from more than one album. I’d also like to be able to put pictures in more than one album. For example, Baby X will get his own photo album, just like Kai, but I’m sure we’ll have pictures of the two of them together that I’d like to make available in both albums. Lastly, I really like the “random photo” that I’ve got on the blog, but I have no ability to exclude photos from appearing in the random display, which is something I’d like to be able to do.
So I’ve started work on my own photo management software. My approach is a completely OO design. This allows me to rationalize the effort as an opportunity to start learning the new objects implementation in PHP 5, since we’re still using PHP 4 at my job. I don’t have a lot of time to work on it though, so we’ll see how far I get with it.