Monday, May 10, 2010

Intuition When Using Closures

I got into a debate with somebody in #javascript today regarding closures: how they work, how they should work, and are they intuitive. This person had run into a problem similar to the one I posted about earlier. In any case, as the discussion was coming to a close, I created some examples which somebody suggested I blog about. Let's start with a simple piece of code that more or less shows the original problem.

var i = 0; 
el.onclick = function() { print(i); }; 
i++;

In the above example what will be printed? This person argued that, intuitively, "0" should be printed. Of course the correct answer is that "1" will be printed. I can understand how this may be confusing to someone new to Javascript. However, take another example:

var i = 0;
var func = function() { i++; };
func();
print(i);

If I were to show this piece of code to someone, they would probably say that it makes sense. Of couse "1" will be printed. It is intuitive (and correct). Following that line of reasoning:

var i = 0;
var func = function() { print(i); };
i++;
func();

This code also seems to make sense, especially in light of the previous example. The only real difference is that this example reads/prints the variable, and the previous one modified the variable. In both cases "1" will be printed.

Now we're back to the original code:

var i = 0; 
el.onclick = function() { print(i); }; 
i++;

Note that the only difference between this piece of code and the previous one is when the function is executed. In one the function is executed right away, in the other the function is executed as a click handler. In both cases i++ occurs first and in both cases "1" is printed. This is consistent behavior and somewhat shows how closures can morph from something which "just makes sense" into something that isn't quite as intuitive.

The reason for this behavior is that closures work by pointing to the original variable by reference, if you will, not by value - even for primitives. So when the anonymous function runs, i is still pointing back to the original i which has been incremented to 1.

Thursday, April 22, 2010

Easy Breezy Javascript OOP

Javascript does not support classical OOP. Instead it supports prototypal inheritance. It seems that everybody still wants to use the classical OOP pattern though and I don't really blame them. I use it all the time in my own projects and I see it used in many of the libraries and frameworks out there. Some people think that classical OOP has no place in Javascript, but I and many others respectfully disagree.

Despite the numerous tutorials on the web, many people still ask what is the proper way to do classical OOP. I'm not sure if the issue is the quality of the tutorials or the myriad of ways to simulate classes. In any case, I feel that there is room for one more quick tutorial. I do not claim this way to be the "proper" way, but I do feel that it is a "good" way. Many people and libraries/frameworks use this pattern or something very close to it

Classes and the Constructor

Defining a class and it's constructor function is easy. It looks like this:

// create a class called Animal
var Animal = function(name, age) { 
    // assign the passed-in arguments to instance members
    this.name = name; 
    this.age = age;
};

Using this class is also easy:

// create an instance and assign it to the pet variable
var pet = new Animal("Spot", 2);

Instance Methods

To add an instance method, you want to use the prototype property. It looks like this:

// create an instance method called "sayHello"
Animal.prototype.sayHello = function() {
    // access instance members using the "this" keyword
    return this.name + " doesn't speak.";
};

To use an instance method, you can simply call it on the instance:

// make the animal say hello
pet.sayHello(); // returns "Spot doesn't speak."

Static Methods

Static methods are applied to directly to the class like so:

// create a static method called "getName"
Animal.getName = function(animal) {
    // return the name of the passed-in animal
    return animal.name;
};

When using a static method, you use the class:

// call the static method
Animal.getName(pet); // returns "Spot"

Subclassing

This is usually the sticky point and where many implementations differ slightly (or not-so-slightly). Like I said, I think the way I present is a good way though. It requires one little helper function.

var extend = function(subClass, parentClass) {
    var tempFn = function() {};
    tempFn.prototype = parentClass.prototype;
    subClass.prototype = new tempFn();
    subClass.prototype.constructor = subClass;
    subClass.superclass = parentClass.prototype;
};

The what, why and how are beyond the scope of this article, but I certainly encourage you to do deeper digging if you are so inclined. Using the function is rather easy though.

// create a new class called "Dog"
var Dog = function(name, age) {
    // call the constructor of the superclass
    Dog.superclass.constructor.call(this, name, age);
};

// explicitly make Dog subclass Animal
extend(Dog, Animal);

Instance Methods

To add new instance methods, just add them to the prototype of the subclass:

// create an instance method called "bark"
Dog.prototype.bark = function() {
    return "Woof!";
};

To override instance methods, and to call superclass methods, do this:

// override the instance method called "sayHello"
Dog.prototype.sayHello = function() {
    if (this.age < 1) {
        // call the super "sayHello" method
        return Dog.superclass.sayHello.call(this);
    }
    else {
        return this.bark();
    }
}

Using instance methods on the subclass works exactly like before:

var petDog = new Dog("Spot", 2);

// make the animal say hello
petDog.sayHello(); // returns "Woof!"

That's really about it. It's simple and doesn't include many things like interfaces or private members, but it works. And that's how you do easy breezy Javascript OOP.

Wednesday, April 14, 2010

Introducing Yabble!

A few months ago I started working with Node.js, and by extension, CommonJS. Since then I've become fascinated with the goals of CommonJS and wanted to get involved. CommonJS is mostly suited for the server-side environment, but it does not exclude browsers. There are even a number of frameworks/libraries out there using it in the browser space such as SproutCore, Narwhal, and RequireJS. However, these don't really fullfill the need for a simple, barebones CommonJS module loader for the browser. SproutCore and Narwhal are great, and come with a ton of goodies, but oftentimes all of those goodies are not needed. RequireJS fits more closely with my goals, but still doesn't quite fill the niche that I'm after. Enter Yabble.

Yabble is a general purpose browser-side CommonJS module loader. The goals are to be small, flexible and useful. There are numerous places where, when two paths exist, both were made available instead of choosing one.

XHR vs script tags

Many argue that using script tags for dynamic Javascript injection is preferable over XHR. I tend to agree. Script tags are usually easier to debug and support fetching cross-domain. However, in the CommonJS world, that means that the modules need to be wrapped in some boilerplate code (named a Transport). Since no server-side CommonJS module loader requires this wrapping, it is a pretty big disconnect between modules meant for the server-side and those meant for the browser. Ideally, a single module could be supported on both. With XHR and eval(), it is possible to retrieve the raw module code and "wrap" it on-the-fly. It is also harder to debug and doesn't often support cross-domain requests. So, Yabble supports both. The XHR and eval() method is the default one and I envision its use mostly for quick-and-dirty development efforts. The script tag method can always be used if the developer prefers and when the application is deployed to production.

Hand-wrapped vs auto-wrapped modules

It is possible for modules to be automatically wrapped in a Transport through a build process or some other means. However, some argue that hand-wrapping is the best way to go. I tend to disagree, but, there is nothing stopping anyone from hand-wrapping modules with the use of Yabble. For automatically wrapping, a build-time tool is included to analyze modules and wrap them. Another method is to use a server-side script to wrap and serve modules on demand.

Individual file loading vs packaged file loading

This isn't really a debate. In general, individual file loading is used during development and for deployment a single, packaged file is created to reduce the number of requests. Yabble supports multiple modules being defined in a single file without any additional instructions. In addition, file packaging support for the tooling is on the roadmap.


It is my sincere hope that many find this project useful in creating well-structured web applications. The more people jumping on the CommonJS bandwagon, the better.

For more information and to get the code, visit the project page.
To see it running live, visit the unit tests.

Tuesday, April 6, 2010

Creating A Javascript Function Inside A Loop

I see the following question asked quite often in #javascript: "I loop over an array of elements and attach an event handler to each one. I pass along an index/variable for such & such reason. The problem is, when the event handler is executed, the index/variable is wrong!"

Example:

for (var i = 0, n = elements.length; i<n; i++) {
  var el = elements[i];
  el.addEventListener('click', function() {
    doSomethingWith(i, el); // i, el are not what you expect!
  }, false);
}

The reason that this is true is somewhat complex, but in basic terms, the function is only actually created once (instead of once each iteration of the loop) and that one function points to the last known values of the variables it uses. For more reading, start with closures and maybe move on to the ECMA Specification, in particular section 13.2, point 1.

The fix to this problem is not too difficult. There is a specific pattern that can be used to ensure that each iteration of the loop creates a brand-new function with the correct values. I call this pattern a "generator function". It probably has a proper name, but I'm not aware of it. The basic idea is to define a function (the generator) which creates and returns functions with the proper variables defined, and then call that generator function for each iteration of the loop, passing in the appropriate values. A typical generator function looks like this:

(function(variable) {
  return function() {
    // do something with variable 
  }
})(value);

There are a few things to notice in this pattern. First is that there are two function expressions: an outer one and an inner one. The outer one is the generator function, and the inner one is the function that contains your original code. Secondly, the generator function expression is wrapped in parenthesis and immediately called with an input of "value". This means that the inner function can use the identifier "variable" and it will refer to whatever value was passed in. The result of this whole shebang is a brand-new function which uses whichever values were passed in to the generator.

Applying this concept to our original problem, we come up with:

for (var i = 0, n = elements.length; i<n; i++) {
  var el = elements[i];
  el.addEventListener('click', (function(i, el) { 
    return function() {
      doSomethingWith(i, el);
    }
  })(i, el), false);
}

It's quite close to the original code, but with a little bit of wrapping. Note that this pattern can be useful for more than just attaching event handlers, although in most cases some form of loop is involved.

Monday, March 29, 2010

Why I Don't Like The Private Access Specifier

I like the concept of encapsulation. Really. I do. Hiding implementation details behind an interface is top-notch design work. I'm all for it. As long as it's done with "protected" and not "private".

The problem that I have with the private access specifier is that it discourages or prevents entirely code reuse in certain situations. Let me explain: as is often the case, I find myself coding with certain libraries or frameworks. Just as often they do not do everything exactly how I would like. Perhaps they are missing a feature, or maybe they have a small bug somewhere. No problem! They are coded with OOP principles, so if I want to add a feature, I'll just subclass it and do what I need to do, right? Wrong!

All too often the particular hook that I need to add my feature is marked private. And since it's marked private, at the discretion of the original author, I can't do what I need to do. Let me lay this out: I have access to the original source code in one form or another, I understand what the original source code is doing, I've identified exactly what I need to change, and yet I'm stuck. What are my options?

One is to change the original source code. However, this is not always possible. Take the .NET Framework for example. The framework comes in a "compiled" form, yet the source code is readable thanks to tools like Reflector. Even if I can change the source code, I don't always want to. Perhaps I want to be able to distribute my application without packaging my custom build of the library. In the case of Javascript, perhaps the library is coming from a CDN like Yahoo provides for YUI.

Another option is to use what hooks I do have available. For instance, if protected MemberA calls private MemberB, and I need to change MemberB, I can override MemberA, copying most of the code, and have it call my custom MemberC instead of the original MemberB. Sometimes this requires changing code many levels deep, and the final product is a copy/pasted mess of the original.

It seems to me that a balance could be struck between the concepts of encapsulation and the flexibility of OOP inheritance and code-reuse. This balance, I think, is called "protected". If somebody has the time and dedication to read and understand the implementation details of a class, I do not think they should be barred from subclassing and having full access to that implementation. I do not believe that it should be up to the original authors to decide what should and should not be changed; they do not have enough foresight. It is impossible to imagine every single feature that could potentially be added and create an extension point for each. Using protected means that normal users of the code still have encapsulation, and those that want or need to go the extra step have the flexibility to do so.

I don't recall a time when I ever thought to myself: "I sure am glad that member is private. That really saved me a lot of hassle." I know of numerous times when I've thought the opposite. Very dark thoughts indeed.

Saturday, March 20, 2010

A Really, Really Big HTML Table

I recently needed to create a scrollable HTML table which could handle a lot of rows quickly. My first target was about 3,000 rows, but as you will see, I managed to put together something that will "render" 1 million rows without breaking a sweat.

This problem has already been solved before. A typical approach for handling large amounts of data is what I like to broadly refer to as data partitioning. There are many ways to partition data, one of which is pagination. The most common example? Google. When you perform a Google search, even if there are a million results, only 10 are displayed at a time. This principle is also commonly applied to "table components" in many Javascript frameworks. For example, the YUI DataTable supports pagination.

One slant on pagination is the concept of making a scrollable table which somehow "paginates" on the data which is actually visible. Quite a few frameworks now support this concept, such as qooxdoo's virtual table. My goal was to create a table component which used this scrolling concept to allow for Really, Really Big tables. This article describes a generic method for creating such a table, as well as a proof-of-concept implementation.

The first step in the method is to assume a fixed row height. While this is not ideal, in most cases the row heights are fixed anyways, and it helps us to do simple calculations. For instance, if we know the row height, and we know the number of rows, then the scrollable height of the container should be (row height * number of rows). We know that for a particular scroll position, rows i through n are visible, where i is (scroll position / row height) and n is ( (scroll position + visible area height) / row height). So now we have the math to set up a scrollbar with the correct height and how to figure out which rows need to be rendered when the user scrolls.

The next part is structuring this in HTML.We want fixed headers and a scrollable body.


The blue outline is the visible header area. The red outline is the visible body area. Both the headers and the body extend outside of the visible region which creates scrollbars for the body (gray). The blue and red outlines are simply divs, each containing a table.

Next, we need to make the table body tall enough to force a vertical scrollbar. In addition to this, we need a way to very selectively "fill in" rows. One approach is to go ahead and create the table row elements (but not cells) for each row. This is surprisingly pretty fast, but not fast enough (will work for 3,000 rows, won't work for 1,000,000). The solution I came up with is to create "filler" row elements to fill in the gaps between rendered rows.


The green areas are rendered rows. The yellow area is a filler row with an explicit height set. This "pushes down" the rows beneath it to their correct positions. When initially creating the table, one filler row is created for the entire height of all rows.

When the user scrolls (and on initial creation), the table determines which rows are visible based on the scroll position. Those rows are created if not already done. The filler rows are then created, removed, or updated as necessary to keep everything in the right spot. Maintaining the filler rows is the most complicated piece of the puzzle, but not overly difficult.

The end result? For a table of 1 million rows, only the visible rows + 1 are actually created in the DOM. One nice thing about this approach is that it practically scales in a constant (not linear) time.

Click here for a proof-of-concept example with 1,000,000 rows

The source code for the proof-of-concept is available at http://github.com/jbrantly/bigtable. Note that it is not ready for production. Instead, it simply demonstrates that this technique is feasible and could be used to create a more full-featured component. Also note that many details crucial to the successful implementation of such a component were omitted or glossed over in either the article or the source code.