Videos
Hi, I've been trying to find a name for my method that returns the length of an object. A quick google search showed that it's conventional to name getters getX(). However, when I investigated the first standard class that came to mind - String, I found it having the length() method instead of expected getLength(). Why is that? How should I name a length method in my own class then?
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#length--
The normal model of Java string length
String.length() is specified as returning the number of char values ("code units") in the String. That is the most generally useful definition of the length of a Java String; see below.
Your description1 of the semantics of length based on the size of the backing array/array slice is incorrect. The fact that the value returned by length() is also the size of the backing array or array slice is merely an implementation detail of typical Java class libraries. String does not need to be implemented that way. Indeed, I think I've seen Java String implementations where it WASN'T implemented that way.
Alternative models of string length.
To get the number of Unicode codepoints in a String use str.codePointCount(0, str.length()) -- see the javadoc.
To get the size (in bytes) of a String in a specific encoding (i.e. charset) use str.getBytes(charset).length2.
To deal with locale-specific issues, you can use Normalizer to normalize the String to whatever form is most appropriate to your use-case, and then use codePointCount as above. But in some cases, even this won't work; e.g. the Hungarian letter counting rules which the Unicode standard apparently doesn't cater for.
Using String.length() is generally OK
The reason that most applications use String.length() is that most applications are not concerned with counting the number of characters in words, texts, etcetera in a human-centric way. For instance, if I do this:
String s = "hi mum how are you";
int pos = s.indexOf("mum");
String textAfterMum = s.substring(pos + "mum".length());
it really doesn't matter that "mum".length() is not returning code points or that it is not a linguistically correct character count. It is measuring the length of the string using the model that is appropriate to the task at hand. And it works.
Obviously, things get a bit more complicated when you do multilingual text analysis; e.g. searching for words. But even then, if you normalize your text and parameters before you start, you can safely code in terms of "code units" rather than "code points" most of the time; i.e. length() still works.
1 - This description was on some versions of the question. See the edit history ... if you have sufficient rep points.
2 - Using str.getBytes(charset).length entails doing the encoding and throwing it away. There is possibly a general way to do this without that copy. It would entail wrapping the String as a CharBuffer, creating a custom ByteBuffer with no backing to act as a byte counter, and then using Encoder.encode(...) to count the bytes. Note: I have not tried this, and I would not recommend trying unless you have clear evidence that getBytes(charset) is a significant performance bottleneck.
java.text.BreakIterator is able to iterate over text and can report on "character", word, sentence and line boundaries.
Consider this code:
def length(text: String, locale: java.util.Locale = java.util.Locale.ENGLISH) = {
val charIterator = java.text.BreakIterator.getCharacterInstance(locale)
charIterator.setText(text)
var result = 0
while(charIterator.next() != BreakIterator.DONE) result += 1
result
}
Running it:
scala> val text = "Thîs lóo̰ks we̐ird!"
text: java.lang.String = Thîs lóo̰ks we̐ird!
scala> val length = length(text)
length: Int = 17
scala> val codepoints = text.codePointCount(0, text.length)
codepoints: Int = 21
With surrogate pairs:
scala> val parens = "\uDBFF\uDFFCsurpi\u0301se!\uDBFF\uDFFD"
parens: java.lang.String = surpíse!
scala> val length = length(parens)
length: Int = 10
scala> val codepoints = parens.codePointCount(0, parens.length)
codepoints: Int = 11
scala> val codeunits = parens.length
codeunits: Int = 13
This should do the job in most cases.
In Java, there are functions that end with parentheses and there are some that don't, like x.length which is only used for arrays and x.length() only for strings. In Ruby, this is exactly the opposite. In JavaScript, x.length is used for both arrays and strings, so without parentheses, but then it's x.toUpperCase() which ends with parentheses...
And then in Python, it's len(x) (it could very well have been length(x)), where the variable is put inside the parenthesis as parameter to get the length of a string/array, but then it's x.upper() which again puts the variable outside of parentheses...
All languages aforementioned are objected oriented, but have their own way of calling upon the length functionality, which is super confusing. Is there a logic to why it's x.function, x.function(), or function(x)?
They're two completely different things.
.length is a property on arrays. That isn't a method call.
.length() is a method call on String.
You're seeing both because first, you're iterating over the length of the array. The contents of the array are String, and you want to add up all of their lengths, so you then call the length on each individual String in the array.
.length is an array property. .length() is a method for the class String (look here).
When you are looping, you are looping for the length of the array using people.length.
But when you are using people[i].length(), you are accessing the string at that position of the array, and getting the length of the string, therefore using the .length() method in the String class.
However, just to confuse you more, a String at its core is just an array of chars (like this: char[]). One could make the argument that .length should work as well, considering it is an array of characters, however, it is a class and that is the reason .length will not work. Showing empty parameters shows it's a method, and showing no parameters shows that it's a property (like a static variable in a class).