The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
Answer from crazyscot on Stack OverflowThe pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
Videos
This behavior is explicitly documented in String.split(String regex) (emphasis mine):
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
If you want those trailing empty strings included, you need to use String.split(String regex, int limit) with a negative value for the second parameter (limit):
String[] array = values.split("\\|", -1);
Try this
String[] array = values.split("\\|",-1);
Yes use String.split() for each line as you read it from the file.
line.split("\\|");
An alternative is to use String.split(...)
String s="Hi farshad zeinali/ how are you?/i have a question!/can you help me?";
String[] ss=s.split("/");
for(int i=0;i<ss.length;i++)
{
System.out.println(ss[i]);
}
I've written a quick and dirty benchmark test for this. It compares 7 different methods, some of which require specific knowledge of the data being split.
For basic general purpose splitting, Guava Splitter is 3.5x faster than String#split() and I'd recommend using that. Stringtokenizer is slightly faster than that and splitting yourself with indexOf is twice as fast as again.
For the code and more info see https://web.archive.org/web/20210613074234/http://demeranville.com/battle-of-the-tokenizers-delimited-text-parser-performance (original link is dead and corresponding site does not appear to exist anymore)
As @Tom writes, an indexOf type approach is faster than String.split(), since the latter deals with regular expressions and has a lot of extra overhead for them.
However, one algorithm change that might give you a super speedup. Assuming that this Comparator is going to be used to sort your ~100,000 Strings, do not write the Comparator<String>. Because, in the course of your sort, the same String will likely be compared multiple times, so you will split it multiple times, etc...
Split all the Strings once into String[]s, and have a Comparator<String[]> sort the String[]. Then, at the end, you can combine them all together.
Alternatively, you could also use a Map to cache the String -> String[] or vice versa. e.g. (sketchy) Also note, you are trading memory for speed, hope you have lotsa RAM
HashMap<String, String[]> cache = new HashMap();
int compare(String s1, String s2) {
String[] cached1 = cache.get(s1);
if (cached1 == null) {
cached1 = mySuperSplitter(s1):
cache.put(s1, cached1);
}
String[] cached2 = cache.get(s2);
if (cached2 == null) {
cached2 = mySuperSplitter(s2):
cache.put(s2, cached2);
}
return compareAsArrays(cached1, cached2); // real comparison done here
}
You need a regular expression like "\\s+", which means: split whenever at least one whitespace is encountered. The full Java code is:
try {
String[] splitArray = input.split("\\s+");
} catch (PatternSyntaxException ex) {
//
}
String[] result = "hi i'm paul".split("\\s+"); to split across one or more cases.
Or you could take a look at Apache Common StringUtils. It has StringUtils.split(String str) method that splits string using white space as delimiter. It also has other useful utility methods