Monday, March 29, 2010

Why I Don't Like The Private Access Specifier

I like the concept of encapsulation. Really. I do. Hiding implementation details behind an interface is top-notch design work. I'm all for it. As long as it's done with "protected" and not "private".

The problem that I have with the private access specifier is that it discourages or prevents entirely code reuse in certain situations. Let me explain: as is often the case, I find myself coding with certain libraries or frameworks. Just as often they do not do everything exactly how I would like. Perhaps they are missing a feature, or maybe they have a small bug somewhere. No problem! They are coded with OOP principles, so if I want to add a feature, I'll just subclass it and do what I need to do, right? Wrong!

All too often the particular hook that I need to add my feature is marked private. And since it's marked private, at the discretion of the original author, I can't do what I need to do. Let me lay this out: I have access to the original source code in one form or another, I understand what the original source code is doing, I've identified exactly what I need to change, and yet I'm stuck. What are my options?

One is to change the original source code. However, this is not always possible. Take the .NET Framework for example. The framework comes in a "compiled" form, yet the source code is readable thanks to tools like Reflector. Even if I can change the source code, I don't always want to. Perhaps I want to be able to distribute my application without packaging my custom build of the library. In the case of Javascript, perhaps the library is coming from a CDN like Yahoo provides for YUI.

Another option is to use what hooks I do have available. For instance, if protected MemberA calls private MemberB, and I need to change MemberB, I can override MemberA, copying most of the code, and have it call my custom MemberC instead of the original MemberB. Sometimes this requires changing code many levels deep, and the final product is a copy/pasted mess of the original.

It seems to me that a balance could be struck between the concepts of encapsulation and the flexibility of OOP inheritance and code-reuse. This balance, I think, is called "protected". If somebody has the time and dedication to read and understand the implementation details of a class, I do not think they should be barred from subclassing and having full access to that implementation. I do not believe that it should be up to the original authors to decide what should and should not be changed; they do not have enough foresight. It is impossible to imagine every single feature that could potentially be added and create an extension point for each. Using protected means that normal users of the code still have encapsulation, and those that want or need to go the extra step have the flexibility to do so.

I don't recall a time when I ever thought to myself: "I sure am glad that member is private. That really saved me a lot of hassle." I know of numerous times when I've thought the opposite. Very dark thoughts indeed.

Saturday, March 20, 2010

A Really, Really Big HTML Table

I recently needed to create a scrollable HTML table which could handle a lot of rows quickly. My first target was about 3,000 rows, but as you will see, I managed to put together something that will "render" 1 million rows without breaking a sweat.

This problem has already been solved before. A typical approach for handling large amounts of data is what I like to broadly refer to as data partitioning. There are many ways to partition data, one of which is pagination. The most common example? Google. When you perform a Google search, even if there are a million results, only 10 are displayed at a time. This principle is also commonly applied to "table components" in many Javascript frameworks. For example, the YUI DataTable supports pagination.

One slant on pagination is the concept of making a scrollable table which somehow "paginates" on the data which is actually visible. Quite a few frameworks now support this concept, such as qooxdoo's virtual table. My goal was to create a table component which used this scrolling concept to allow for Really, Really Big tables. This article describes a generic method for creating such a table, as well as a proof-of-concept implementation.

The first step in the method is to assume a fixed row height. While this is not ideal, in most cases the row heights are fixed anyways, and it helps us to do simple calculations. For instance, if we know the row height, and we know the number of rows, then the scrollable height of the container should be (row height * number of rows). We know that for a particular scroll position, rows i through n are visible, where i is (scroll position / row height) and n is ( (scroll position + visible area height) / row height). So now we have the math to set up a scrollbar with the correct height and how to figure out which rows need to be rendered when the user scrolls.

The next part is structuring this in HTML.We want fixed headers and a scrollable body.


The blue outline is the visible header area. The red outline is the visible body area. Both the headers and the body extend outside of the visible region which creates scrollbars for the body (gray). The blue and red outlines are simply divs, each containing a table.

Next, we need to make the table body tall enough to force a vertical scrollbar. In addition to this, we need a way to very selectively "fill in" rows. One approach is to go ahead and create the table row elements (but not cells) for each row. This is surprisingly pretty fast, but not fast enough (will work for 3,000 rows, won't work for 1,000,000). The solution I came up with is to create "filler" row elements to fill in the gaps between rendered rows.


The green areas are rendered rows. The yellow area is a filler row with an explicit height set. This "pushes down" the rows beneath it to their correct positions. When initially creating the table, one filler row is created for the entire height of all rows.

When the user scrolls (and on initial creation), the table determines which rows are visible based on the scroll position. Those rows are created if not already done. The filler rows are then created, removed, or updated as necessary to keep everything in the right spot. Maintaining the filler rows is the most complicated piece of the puzzle, but not overly difficult.

The end result? For a table of 1 million rows, only the visible rows + 1 are actually created in the DOM. One nice thing about this approach is that it practically scales in a constant (not linear) time.

Click here for a proof-of-concept example with 1,000,000 rows

The source code for the proof-of-concept is available at http://github.com/jbrantly/bigtable. Note that it is not ready for production. Instead, it simply demonstrates that this technique is feasible and could be used to create a more full-featured component. Also note that many details crucial to the successful implementation of such a component were omitted or glossed over in either the article or the source code.