Once you know you have this problem, there's a few ways to do the sorting:
- Do a case-insensitive sort on the Strings. This is very common and is basically a left-over pattern for handling this issue that shouldn't be done. You see this in Java code and JDBC/SQL code where a toUpperCase() or toLowerCase() function is called. A nuance of this is that it's technically random whether "de Lucca" or "De Lucca" comes first since when sorting they both look like "de lucca".
- Use one of Java's language-aware sorting methods such as Arrays.sort(array, Collator.getInstance()). This puts lower case letters before upper case, but on a per-letter basis so that capital Z is not before lower case 'a'. See the sorting examples below.
- Use a SQL Function to properly sort if your database supports it. For example, Oracle's NLSSORT() method. Note that you can't rely on SQL's order by clause by itself because it will also sort text values by their ascii code. And to make matters worse...some of our code did use the NLSSORT() function, but it put the results into a Java TreeSet which resorted them by their ascii code :(
- Consistently use the natural ordering of Strings and get your users to start thinking in terms of ascii and unicode.
Here's some simple code to show the differences between the above techniques. The output is in the table below.
import java.text.*;
import java.util.*;
public class coll {
public static void main(String args[]) {
String [] a1 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };
String [] a2 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };
String [] a3 = { "def", "daf", "de la Sol", "De la Sol", "de Battista", "DeBatista", "de Lucca", "deBattista", "Deloitte", "Das", "Deep" };
Arrays.sort(a1);
Arrays.sort(a2, Collator.getInstance());
Arrays.sort(a3, new Comparator<String>() {
public int compare(String s1, String s2) {
if (s1 == null) {
if (s2 == null) return 0;
return 1;
} else if (s2 == null) {
return -1;
}
return s1.toUpperCase().compareTo(s2.toUpperCase());
}
});
for (int i=0; i < a1.length; i++) {
System.out.printf("%-15s %-15s %-15s\n", a1[i], a2[i], a3[i]);
}
}
}
Here's the example output:
Natural Order (by ASCII code) | using a Collator | Case-insensitive Ordering |
---|---|---|
Das | daf | daf |
De la Sol | Das | Das |
DeBatista | DeBatista | de Battista |
Deep | deBattista | de la Sol |
Deloitte | de Battista | De la Sol |
daf | Deep | de Lucca |
de Battista | def | DeBatista |
de Lucca | de la Sol | deBattista |
de la Sol | De la Sol | Deep |
deBattista | Deloitte | def |
def | de Lucca | Deloitte |
No comments:
Post a Comment