A blog about software development, primarily in Java and about web applications.

Wednesday, June 4, 2008

Sorting Alphabetically in Java

I've recently had to fix alphabetical sorting in two different Java-based web applications and know we have the problem in other places. It's a pain that this comes up all of the time and there's really nothing to prevent it in the future except programmer diligence. The basic problem is that Strings in Java are sorted lexicographically by default (usually using String's compareTo() method). This is almost never what you want shown in your UI, but when you are writing code it's easy to just call Collections.sort() and expect it to do what you want.
Once you know you have this problem, there's a few ways to do the sorting:

  • Do a case-insensitive sort on the Strings. This is very common and is basically a left-over pattern for handling this issue that shouldn't be done. You see this in Java code and JDBC/SQL code where a toUpperCase() or toLowerCase() function is called. A nuance of this is that it's technically random whether "de Lucca" or "De Lucca" comes first since when sorting they both look like "de lucca".
  • Use one of Java's language-aware sorting methods such as Arrays.sort(array, Collator.getInstance()). This puts lower case letters before upper case, but on a per-letter basis so that capital Z is not before lower case 'a'. See the sorting examples below.
  • Use a SQL Function to properly sort if your database supports it. For example, Oracle's NLSSORT() method. Note that you can't rely on SQL's order by clause by itself because it will also sort text values by their ascii code. And to make matters worse...some of our code did use the NLSSORT() function, but it put the results into a Java TreeSet which resorted them by their ascii code :(
  • Consistently use the natural ordering of Strings and get your users to start thinking in terms of ascii and unicode.



Here's some simple code to show the differences between the above techniques. The output is in the table below.

import java.text.*;
import java.util.*;

public class coll {
public static void main(String args[]) {

String [] a1 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };

String [] a2 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };

String [] a3 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };

Arrays.sort(a1);
Arrays.sort(a2, Collator.getInstance());
Arrays.sort(a3, new Comparator<String>() {
public int compare(String s1, String s2) {
if (s1 == null) {
if (s2 == null) return 0;
return 1;
} else if (s2 == null) {
return -1;
}
return s1.toUpperCase().compareTo(s2.toUpperCase());
}
});

for (int i=0; i < a1.length; i++) {
System.out.printf("%-15s %-15s %-15s\n", a1[i], a2[i], a3[i]);
}
}
}

Here's the example output:


Natural Order (by ASCII code)using a CollatorCase-insensitive Ordering
Dasdafdaf
De la SolDasDas
DeBatistaDeBatistade Battista
DeepdeBattistade la Sol
Deloittede BattistaDe la Sol
dafDeepde Lucca
de BattistadefDeBatista
de Luccade la SoldeBattista
de la SolDe la SolDeep
deBattistaDeloittedef
defde LuccaDeloitte

No comments: