I decided to share the method I came up with:
/**
* Compares two strings, ignoring the case of ASCII characters. It treats
* non-ASCII characters taking in account case differences. This is an
* attempt to mimic glib's string utility function
* <a href="http://developer.gnome.org/glib/2.28/glib-String-Utility-Functions.html#g-ascii-strcasecmp">g_ascii_strcasecmp ()</a>.
*
* This is a slightly modified version of java.lang.String.CASE_INSENSITIVE_ORDER.compare(String s1, String s2) method.
*
* @param str1 string to compare with str2
* @param str2 string to compare with str1
* @return 0 if the strings match, a negative value if str1 < str2, or a positive value if str1 > str2
*/
private static int compareToIgnoreCaseASCIIOnly(String str1, String str2) {
int n1 = str1.length();
int n2 = str2.length();
int min = Math.min(n1, n2);
for (int i = 0; i < min; i++) {
char c1 = str1.charAt(i);
char c2 = str2.charAt(i);
if (c1 != c2) {
if ((int) c1 > 127 || (int) c2 > 127) { //if non-ASCII char
return c1 - c2;
} else {
c1 = Character.toUpperCase(c1);
c2 = Character.toUpperCase(c2);
if(c1 != c2) {
c1 = Character.toLowerCase(c1);
c2 = Character.toLowerCase(c2);
if(c1 != c2) {
return c1 - c2;
}
}
}
}
}
return n1 - n2;
}
Answer from bancer on Stack OverflowI decided to share the method I came up with:
/**
* Compares two strings, ignoring the case of ASCII characters. It treats
* non-ASCII characters taking in account case differences. This is an
* attempt to mimic glib's string utility function
* <a href="http://developer.gnome.org/glib/2.28/glib-String-Utility-Functions.html#g-ascii-strcasecmp">g_ascii_strcasecmp ()</a>.
*
* This is a slightly modified version of java.lang.String.CASE_INSENSITIVE_ORDER.compare(String s1, String s2) method.
*
* @param str1 string to compare with str2
* @param str2 string to compare with str1
* @return 0 if the strings match, a negative value if str1 < str2, or a positive value if str1 > str2
*/
private static int compareToIgnoreCaseASCIIOnly(String str1, String str2) {
int n1 = str1.length();
int n2 = str2.length();
int min = Math.min(n1, n2);
for (int i = 0; i < min; i++) {
char c1 = str1.charAt(i);
char c2 = str2.charAt(i);
if (c1 != c2) {
if ((int) c1 > 127 || (int) c2 > 127) { //if non-ASCII char
return c1 - c2;
} else {
c1 = Character.toUpperCase(c1);
c2 = Character.toUpperCase(c2);
if(c1 != c2) {
c1 = Character.toLowerCase(c1);
c2 = Character.toLowerCase(c2);
if(c1 != c2) {
return c1 - c2;
}
}
}
}
}
return n1 - n2;
}
I wouldn't use Collator, having read its Javadoc, because you have no control over how the strings get compared. You can pick the locale, but how that locale tells Collator how to compare strings is out of your hands.
If you know that the characters in your strings are all ASCII characters, then I'd just use the String.compareTo() method, which sorts lexicographically based on unicode character value. If all the characters in the strings are ASCII characters, their unicode character value will be their ASCII value and so sorting lexicographically on their unicode value will be the same as sorting lexicographically on their ASCII value, which appears to be what g_ascii_stcasecmp does. And if you need case-insensitivity, you could use String.compareToIgnoreCase().
As I noted in the comment, I think you'll need to write your own comparison function. You'll need to loop through the characters in the string, skipping over the ones that aren't in the ASCII range. So something like this, which is a simple, stupid implementation and needs to be beefed up to cover the corner cases I imagine g_ascii_strcasecmp does:
public int compareStrings(String str) {
List<Character> myAsciiChars = onlyAsciiChars(this.wordString);
List<Character> theirAsciiChars = onlyAsciiChars(str);
if (myAsciiChars.size() > theirAsciiChars.size()) {
return 1;
}
else if (myAsciiChars.size() < theirAsciiChars.size()) {
return -1;
}
for (int i=0; i < myAsciiChars.size(); i++) {
if (myAsciiChars.get(i) > theirAsciiChars.get(i)) {
return 1;
}
else if (myAsciiChars.get(i) < theirAsciiChars.get(i)) {
return -1;
}
}
return 0;
}
private final static char MAX_ASCII_VALUE = 127; // (Or 255 if using extended ASCII)
private List<Character> onlyAsciiChars(String s) {
List<Character> asciiChars = new ArrayList<>();
for (char c : s.toCharArray()) {
if (c <= MAX_ASCII_VALUE) {
asciiChars.add(c);
}
}
return asciiChars;
}
Videos
strcasecmp() : A Non-Standard Function?
Short answer: As strcasecmp() is not in the C standard library, that make it non-C standard.
strcasecmp() is defined in popular standards such as 4.4BSD, POSIX.1-2001.
The definition of case-less functions opens the door to the nit-picky details. These often involve the positive or negative result of case-less compares, not just the 0 or non-0 as used by OP. In particular:
In the POSIX locale, strcasecmp() and strncasecmp() shall behave as if the strings had been converted to lowercase and then a byte comparison performed. The results are unspecified in other locales.
The trouble with this is with upper and lower case letters that do not have a 1 to 1 mapping. Consider a locale that has E, e and é but no É, yet toupper('é') -- > 'E' . Then with "as if the strings had been converted to lowercase", 'E' has 2 choices.
As a candidate portable solution consider one that round trips the letter (to upper then to lower) to cope with non 1-to-1 mappings:
int SGA_stricmp(const char *a, const char *b) {
int ca, cb;
do {
ca = * (unsigned char *)a;
cb = * (unsigned char *)b;
ca = tolower(toupper(ca));
cb = tolower(toupper(cb));
a++;
b++;
} while (ca == cb && ca != '\0');
return ca - cb;
}
Alternate code needed with select implementations with UCHAR_MAX > INT_MAX.
If you do not want to round-trip the values use:
ca = tolower(ca);
cb = tolower(cb);
Detail: toupper() and tolower() only defined for int in the range of unsigned char and EOF. * (unsigned char *)a used as *a may have negative values.
strcasecmp is not in the C or C++ standard. It's defined by POSIX.1-2001 and 4.4BSD.
If your system POSIX or BSD compliant, you'll have no problems. Otherwise, the function will be unavailable.