Making Things Work or Doing Them Right

When I’m trying to figure out how to get my CSS styles to render properly there are usually a number of possible solutions that “work”. I sometimes have trouble deciding which one is the best: the one with the most concise CSS? The one with the least entanglement with the HTML? It can be hard to know which way is “right”.

While I was working on the portfolio on this site, I had a problem like this. I wanted to float the screen shot to the left so the text would wrap around it, but I had to get the container for each screenshot/description to actually encapsulate them so its external spacing would line up properly. The usual way I knew to do it was to put another element in there that cleared the float so the container would be forced down where I want it. However this seems hackish and entangles the display with the content more than I like, so I looked for a better way.

I found a page on the subject of clearing floats and immediately I could see that the solution described there was the “right” way to do it. It’s simple, unentangling, and above all it works. This is one I’ll have to remember for next time.

Character Sets are Important™

(Note: since this article is about a character that shouldn’t have been able to appear on my screen, I’ve used that character several times to demonstrate.  If you can’t see it, it’s the trademark character, an elevated TM.)

A few days ago I implemented an “email this product to your friend” feature for my new employer Reusable Bags. It all went smoothly until I tested it with products like “ACME Bags™ Workhorse Style 1500″. The ™ in that name caused me endless problems, all related to one of the least known aspects of computing (at least for English speakers), character encoding.

I’ve read Joel Spolsky’s article on character encoding, so I know just enough to identify that my problem has to do with that, but not enough to know how to fix it. I find out that on our website, where the ™ displays fine, the charset is “ISO-8854-1″ a.k.a. Latin1. This is used without problems all over the place. The curiosity here is that ™ is not in that charset. Somehow Firefox translated a sequence of bits from the web page into a character that shouldn’t even exist. I couldn’t wrap my head around that, so I kind of assumed that it was expressing it some other way I didn’t know about and kept going. In the emails I was sending, the character was displaying as a sequence of 3 unusual characters, meaning it was being interpreted wrong. The charset in the email was Latin1 so that was what I would expect from the browser. Since it was 3 chars, that reinforced my idea that it was being encoded in some other unusual way (with multiple bytes) and I kept looking.

I tried everything I could figure to try and make some headway on this bug. I used every English charset I could find everywhere to see if I was inputting the character in one set and interpreting it with another, but nothing worked. I would recount everything I tried, but there was so much I don’t remember it all. I spent probably half a day just switching charsets and retrying things.

Eventually we gave up on representing the character properly and just wanted to strip it out, so I threw in a “str_replace(“™”, “”, $string)”. This didn’t work either! I could replace anything else in the string, but not that blasted ™! This problem was preposterous. There’s no way PHP isn’t recognizing this character. I wrote a testing script to verify the problem in absence of the rest of the page, and there it was recognized and replaced just fine. So what was the difference between the two scripts?

The difference was the source of the text being searched. In my testing script, I typed both the needle and the haystack. In the real page, the haystack came out of the database. I don’t think the database pays much attention to the character encoding, it just stores whatever sequence of bytes you enter. So the encoding used on that string depends on who entered it. Who did enter it? A Windows user. Therefore, the encoding was undoubtedly Windows-1252, which is one of the only encodings I found that includes the ™ character. If I had been smart about it earlier I would have realized that must be the case, because someone obviously entered the character and Windows-1252 is the only encoding that contains it in a way that’s easy to enter.

So how do I type that character in our code files that aren’t Windows-1252? Well I know that in that encoding, ™ is represented by the number 157. That means I can get php to give it to me with the call “chr(157)”. I put that into my str_replace call from earlier and it worked perfectly; detected the ™ and stripped it out no problem. Originally I was going to berate the PHP developers for assuming the Windows-1252 charset in the chr() function but I subsequently realized that it doesn’t matter what little picture is associated with character #157 in any encoding, the binary is still the same.

So the lesson here is to not assume something quasi-magical is happening when two facts seem to conflict, like when I assumed the ™ was encoded in some multi-byte extension to Latin1. It can’t be, that’s not possible. The only common encoding in the English world that includes it is Windows-1252, so that had to be what I was seeing, despite Firefox reporting otherwise. If I had realized and accepted that earlier I would have saved myself a lot of shotgun debugging. Why Firefox did that is a separate question that I don’t really care enough to answer, but IE does some auto-detecting of character encodings and displays whatever it thinks will work the best. Maybe Firefox did the same thing, ignoring the encoding specified in the document, and forgot to update the page info? That’s all I can figure.

JS equivalence operators: “Good enough for government work”

I was having some strange behavior with a javascript app I wrote. It’s an image thumbnailing interface that allowed the user to zoom and drag an image around. When it loads, the image is scaled to be either as tall or as wide as the thumbnail size, and the other dimension is larger. The user can zoom in and out, but they can’t zoom it smaller than it starts so no whitespace can appear. When a user zoomed in and then all the way out, the image would pop out of the frame a little bit and whitespace would appear at the bottom (this was an image that was as tall as the thumbnail size, I imagine the whitespace would be on the right if the image were as wide as the thumbnail size and taller). After tracing through the javascript for a while I realized the problem; javascript considers (” == 0) to be True.

I have a function that repositions the image so when you zoom in/out it stays centered on the same point. I wanted to be able to call it to reposition for a move that only had a horizontal vector, so I made it check to make sure there was a value for each of the x and y coordinates before it tried moving the image on that vector. I passed in an empty string when I didn’t want to make a move on that vector. The problem came in to play when I zoomed out to the max and the image’s position on the short dimension became 0. I want to move the image to 0 on that vector, but my test for no value was catching the 0 and calling it “nothing”, just like ”.

Once I tracked this down, the solution was simple. Just use the “really equal, I mean it for reals” operator; a.k.a. “===”.

if (left != ” || left === 0) { do stuff; }

A more appropriate way to do this might be to have a real value like “nochange” mark when I don’t want to do anything with that vector, but I did this because I didn’t want to find all the places where I used ” and change them.

“bug” with onclick handlers in IE

I had an issue today with Internet Explorer. An object with an onClick handler worked fine in Firefox and Safari, but in IE the handler only fired every other click. In the course of debugging I discovered that if I clicked slowly, it worked on every click. I realized that this was because IE must be registering an onDblClick event instead of two onClick events. A little testing confirmed this. I searched to see if someone else had the same problem, and found this page. User jamescover had the same issue and found a solution: use the onMouseUp event to handle clicks instead of onClick. He also directed the focus in the onMouseDown event, but I found that part to be unnecessary in my application. A demo of his solution can be found here. I’ll reproduce the code in this post in case that page ever gets taken down:

<script type="text/javascript">
<!--

var x = 0;
function addX(){
document['oFrm']['num'].value = x;
x++;
}

var y = 0;
function addY(){
document['oFrm2']['num2'].value = y;
y++;
}

//-->
</script>
This one invokes the function <b>onclick</b>
<form name="oFrm">
<input type="text" name="num" size="5" />
<input type="button" value="add" onclick="addX();" />
</form>
This one focuses the text field <b>onmousedown</b>, then invokes the function <b>onmouseup</b>
<form name="oFrm2">
<input type="text" name="num2" size="5" />
<input type="button" value="add" onmousedown="this.focus();" onmouseup="addY();" />
</form>

A Better CAPTCHA

I don’t yet have any need to implement CAPTCHA myself, but if I did, it wouldn’t be your standard distorted and scribbled on text. It would be one of these:

Microsoft Asirra

With Asirra, to identify yourself as a human you have to identify a series of pictures as cats (excluding the dogs). It seems like a sound approach, but on the face it looks so nonsensical that I feel compelled to use it. Picture this internet argument: “Well you plainly have no idea what you’re talking about on this issue, so I won’t keep wasting my valuable time trying to fight with your stupidity! As soon as I click on these cats, you’ll never hear from me again!”

reCAPTCHA

This one is more serious, and has a purpose too. Instead of displaying random obscured characters, it displays a real scanned image of two words from an old book. One of these words has been identified and the other has not. The user types both words, the computer verifies the user’s humanity with the known word, and records what the human said the unknown word was. Through this process the un-digitized book becomes completely digitized. They’re turning CAPTCHA tests, a “lesser evil” annoyance, into something that’s actually good.

Rot13 Utility

Rot13 is a common method for obfuscating text, often used to randomize passwords or to hide “spoilers” from online discussions. The tool I most commonly use to translate rot13′d text is http://www.rot13.com/, and that works well for translating long sections of ciphertext back in to plaintext. However, often there is just one or a few words to be translated from plaintext to ciphertext, and I find the site to be too much overhead for the task.

That’s why I made a simple php script on my website to do my rot13 translations from now on. The key difference between mine and rot13.com is that the form on mine uses the GET method rather than POST. This allows me to make a firefox bookmark to translate text directly from the url bar. To do this, bookmark this url: http://timsaylor.com/tools/rot13.php?plaintext=%s. Then in the bookmark’s properties add a value to the keyword field. My keyword is “rot”, so now whenever I type “rot [text]” into my url bar, it sends that to my script and opens a page with the ciphertext.

It’s just a simple utility, and writing this blog post about it took longer than actually making the script itself. I just had to rot13 something today, though, and I remembered wishing that I could do it more simply. A quick search turned up this rot13 php function, which meant all the hard work was done. I just wrapped that up in an html form and put it online. The source is here.

Internet Famous

Today I was informed that a project that Dan and I collaborated on recently received a bit of attention on the internets. Our “In Case of Revolution Break Glass” box, pictured below.

Mask box

Dan had the idea, I did the woodwork, and Dan painted and lettered. One major reason that Dan wanted to make this was so he could post it on a “Show off stuff you’ve made” thread on Something Awful. From there it was cross posted to Digg (edit: and made the #1 spot apparently!), then on to this blog, which was Reddited, and then on to College Humor today. There were many more too, 3,580 hits on Google. The only credit we got was someone in the digg comments saying “This is from Something Awful, a goon did it” (half true), but that’s to be expected. I’m still pleased. And next time it’ll say “timsaylor.com” across the bottom. :-)

Using Screen for Unreliable Connections

The other day I was working in a coffee shop and their internet connection went down a couple times. Unfortunately, I was ssh’ed into another box where my work was. Fortunately, I was using screen. I figured my session would disconnect and be sitting there ready for me to reconnect when the link came back up. Sure enough, it was. Saved me a lot of hassle reopening my files and saving more frequently. Here’s the article that describes how to reconnect to a lost screen session after your ssh session times out (not that it’s that difficult, but I’m sure I’ll forget and have to reference this).

Proxying web.py through apache

I’ve been wanting to try web.py for a while. It seems like it’s easier to learn than Django for making python web applications. I made the hello world app quickly and simply, and decided I’d use it for a project I’m working on.

That’s when it turned south. I have to use php for another project I’m doing, and I use cgi-irc (in perl through CGI) on that server, so I didn’t want to re-setup all that stuff on lighttpd (the recommended method for running web.py). I tried running it through apache with cgi. After a few hours of that not working I switched to trying fastcgi on apache, also to no avail. I got fastcgi to the point where I was having an error that was common enough to be in an FAQ: I was getting 500 responses from apache because fastcgi/web.py wasn’t starting fast enough for the flup WSGI library to realize it, so it would start them over and over until it eventually gave up. I decided I just wanted to see the thing work at this point, so I installed lighttpd and went through the setup for that. When that method didn’t work I was fed up with their install instructions. I decided to try a method that John Quigley recommended. He said I should just run it with the internal http server that I’ve already used successfully and proxy the requests through apache to web.py. This might not achieve the goal of making the server large scale production ready, but I don’t really have to worry about that so much at this point. It does achieve my more important goal of making my php, cgi, and web.py applications all available through a single url and port, so I gave it a shot.

I looked up apache proxying and found it was done through mod_proxy, and each protocol you want to proxy has it’s own module as well. So I installed mod_proxy and mod_proxy_http and added the following line to my apache2.conf file:


ProxyPass /lifelog http://192.168.1.20:8888

192.168.1.20 being the IP address of the server in question. It would be better to get it resolving localhost so this line can be used on other servers, but I didn’t want to bother with it yet. I also had to change the proxy permission rules in proxy.conf to this:


ProxyRequests Off

<Proxy *>
AddDefaultCharset off
Order allow,deny
Allow from all
</Proxy>

This allows any user online to use my ProxyPass rule. Note that I left “ProxyRequests” set to “Off”. If I turned that on, I would be an open proxy that spammers and hackers could use to hide their identity during their nefarious behavior.

Now all that’s left is to start up my web.py server on port 8888 as specified in apache2.conf and whenever I go to /lifelog/* through apache, it’ll send the request to web.py.