inner join

Today I had a sql query joining a bunch of tables that looked to be hanging.  I ran it and went to lunch, and when I got back it still wasn’t done.  So I killed all my connection threads, restarted the server, tried to find every way I could to kill any errant table locks that were hanging around, but nothing would make it complete.  Finally after going through the syntax again I noticed that I left off the condition joining one of the tables, so it was joining 500,000 rows * 100,000 rows and it just hadn’t finished in the hour I left it to work.

I was more likely to have this problem because of the joining syntax I was taught:

select *
from table1, table2
where table1.id = table2.id

As you add more tables the where clause gets longer, and combined with the rest of the query’s constraints it gets easy to lose track of the ones that are necessary to do the joins properly. My boss never has this problem because he uses a different syntax:

select *
from table1
inner join table2 on table1.id = table2.id
where ...

This way, you can easily see what constraints are joining the tables together and would know if there was something missing. Also, this syntax is more like what you would use for left and right joins, so your queries will be more consistent. So from now on I’m going to use this syntax to try and avoid wasting time figuring out a problem that’s so easily preventable.

Making Things Work or Doing Them Right

When I’m trying to figure out how to get my CSS styles to render properly there are usually a number of possible solutions that “work”. I sometimes have trouble deciding which one is the best: the one with the most concise CSS? The one with the least entanglement with the HTML? It can be hard to know which way is “right”.

While I was working on the portfolio on this site, I had a problem like this. I wanted to float the screen shot to the left so the text would wrap around it, but I had to get the container for each screenshot/description to actually encapsulate them so its external spacing would line up properly. The usual way I knew to do it was to put another element in there that cleared the float so the container would be forced down where I want it. However this seems hackish and entangles the display with the content more than I like, so I looked for a better way.

I found a page on the subject of clearing floats and immediately I could see that the solution described there was the “right” way to do it. It’s simple, unentangling, and above all it works. This is one I’ll have to remember for next time.

Character Sets are Important™

(Note: since this article is about a character that shouldn’t have been able to appear on my screen, I’ve used that character several times to demonstrate.  If you can’t see it, it’s the trademark character, an elevated TM.)

A few days ago I implemented an “email this product to your friend” feature for my new employer Reusable Bags. It all went smoothly until I tested it with products like “ACME Bags™ Workhorse Style 1500″. The ™ in that name caused me endless problems, all related to one of the least known aspects of computing (at least for English speakers), character encoding.

I’ve read Joel Spolsky’s article on character encoding, so I know just enough to identify that my problem has to do with that, but not enough to know how to fix it. I find out that on our website, where the ™ displays fine, the charset is “ISO-8854-1″ a.k.a. Latin1. This is used without problems all over the place. The curiosity here is that ™ is not in that charset. Somehow Firefox translated a sequence of bits from the web page into a character that shouldn’t even exist. I couldn’t wrap my head around that, so I kind of assumed that it was expressing it some other way I didn’t know about and kept going. In the emails I was sending, the character was displaying as a sequence of 3 unusual characters, meaning it was being interpreted wrong. The charset in the email was Latin1 so that was what I would expect from the browser. Since it was 3 chars, that reinforced my idea that it was being encoded in some other unusual way (with multiple bytes) and I kept looking.

I tried everything I could figure to try and make some headway on this bug. I used every English charset I could find everywhere to see if I was inputting the character in one set and interpreting it with another, but nothing worked. I would recount everything I tried, but there was so much I don’t remember it all. I spent probably half a day just switching charsets and retrying things.

Eventually we gave up on representing the character properly and just wanted to strip it out, so I threw in a “str_replace(“™”, “”, $string)”. This didn’t work either! I could replace anything else in the string, but not that blasted ™! This problem was preposterous. There’s no way PHP isn’t recognizing this character. I wrote a testing script to verify the problem in absence of the rest of the page, and there it was recognized and replaced just fine. So what was the difference between the two scripts?

The difference was the source of the text being searched. In my testing script, I typed both the needle and the haystack. In the real page, the haystack came out of the database. I don’t think the database pays much attention to the character encoding, it just stores whatever sequence of bytes you enter. So the encoding used on that string depends on who entered it. Who did enter it? A Windows user. Therefore, the encoding was undoubtedly Windows-1252, which is one of the only encodings I found that includes the ™ character. If I had been smart about it earlier I would have realized that must be the case, because someone obviously entered the character and Windows-1252 is the only encoding that contains it in a way that’s easy to enter.

So how do I type that character in our code files that aren’t Windows-1252? Well I know that in that encoding, ™ is represented by the number 157. That means I can get php to give it to me with the call “chr(157)”. I put that into my str_replace call from earlier and it worked perfectly; detected the ™ and stripped it out no problem. Originally I was going to berate the PHP developers for assuming the Windows-1252 charset in the chr() function but I subsequently realized that it doesn’t matter what little picture is associated with character #157 in any encoding, the binary is still the same.

So the lesson here is to not assume something quasi-magical is happening when two facts seem to conflict, like when I assumed the ™ was encoded in some multi-byte extension to Latin1. It can’t be, that’s not possible. The only common encoding in the English world that includes it is Windows-1252, so that had to be what I was seeing, despite Firefox reporting otherwise. If I had realized and accepted that earlier I would have saved myself a lot of shotgun debugging. Why Firefox did that is a separate question that I don’t really care enough to answer, but IE does some auto-detecting of character encodings and displays whatever it thinks will work the best. Maybe Firefox did the same thing, ignoring the encoding specified in the document, and forgot to update the page info? That’s all I can figure.

JS equivalence operators: “Good enough for government work”

I was having some strange behavior with a javascript app I wrote. It’s an image thumbnailing interface that allowed the user to zoom and drag an image around. When it loads, the image is scaled to be either as tall or as wide as the thumbnail size, and the other dimension is larger. The user can zoom in and out, but they can’t zoom it smaller than it starts so no whitespace can appear. When a user zoomed in and then all the way out, the image would pop out of the frame a little bit and whitespace would appear at the bottom (this was an image that was as tall as the thumbnail size, I imagine the whitespace would be on the right if the image were as wide as the thumbnail size and taller). After tracing through the javascript for a while I realized the problem; javascript considers (” == 0) to be True.

I have a function that repositions the image so when you zoom in/out it stays centered on the same point. I wanted to be able to call it to reposition for a move that only had a horizontal vector, so I made it check to make sure there was a value for each of the x and y coordinates before it tried moving the image on that vector. I passed in an empty string when I didn’t want to make a move on that vector. The problem came in to play when I zoomed out to the max and the image’s position on the short dimension became 0. I want to move the image to 0 on that vector, but my test for no value was catching the 0 and calling it “nothing”, just like ”.

Once I tracked this down, the solution was simple. Just use the “really equal, I mean it for reals” operator; a.k.a. “===”.

if (left != ” || left === 0) { do stuff; }

A more appropriate way to do this might be to have a real value like “nochange” mark when I don’t want to do anything with that vector, but I did this because I didn’t want to find all the places where I used ” and change them.

“bug” with onclick handlers in IE

I had an issue today with Internet Explorer. An object with an onClick handler worked fine in Firefox and Safari, but in IE the handler only fired every other click. In the course of debugging I discovered that if I clicked slowly, it worked on every click. I realized that this was because IE must be registering an onDblClick event instead of two onClick events. A little testing confirmed this. I searched to see if someone else had the same problem, and found this page. User jamescover had the same issue and found a solution: use the onMouseUp event to handle clicks instead of onClick. He also directed the focus in the onMouseDown event, but I found that part to be unnecessary in my application. A demo of his solution can be found here. I’ll reproduce the code in this post in case that page ever gets taken down:

<script type="text/javascript">
<!--

var x = 0;
function addX(){
document['oFrm']['num'].value = x;
x++;
}

var y = 0;
function addY(){
document['oFrm2']['num2'].value = y;
y++;
}

//-->
</script>
This one invokes the function <b>onclick</b>
<form name="oFrm">
<input type="text" name="num" size="5" />
<input type="button" value="add" onclick="addX();" />
</form>
This one focuses the text field <b>onmousedown</b>, then invokes the function <b>onmouseup</b>
<form name="oFrm2">
<input type="text" name="num2" size="5" />
<input type="button" value="add" onmousedown="this.focus();" onmouseup="addY();" />
</form>

Why so many microsoft shops?

While looking for a job, I’ve been surprised by the number of places developing on a Microsoft platform. I didn’t know that it was so common, at least in Chicago. If you’re using a Microsoft OS, service, or language and it works for you then great, but aren’t there open source alternatives that are just as good and not as expensive? Especially MS SQL server; whenever I see that I can’t help but wonder why they aren’t using MySQL or Postgresql. Does anyone reading this work in the Microsoft world at work? What keeps your company there?