I'm using scanner with delimiter and I've came across a strange behaviour I'd like to understand.
I'm using this programm :
Scanner sc = new Scanner("Aller à : Navigation, rechercher");
sc.useDelimiter("\\s+|\\s*\\p{Punct}+\\s*");
String word="";
while(sc.hasNext()){
word = sc.next();
System.out.println(word);
}
The output is :
Aller
à
Navigation
rechercher
So first I don't understand why I'm getting a blank token, the documentation says :
Depending upon the type of delimiting pattern, empty tokens may be returned. For example, the pattern "\s+" will return no empty tokens since it matches multiple instances of the delimiter. The delimiting pattern "\s" could return empty tokens since it only passes one space at a time.
I'm using \\s+
so why it returns a blank token?
Then there is an other thing I'd like to understand concerning regex. If I change the delimiter using the "reversed" regex :
sc.useDelimiter("\\s*\\p{Punct}+\\s*|\\s+");
The output is correct and I get :
Aller
à
Navigation
rechercher
Why it works in the way?
EDIT :
With this case :
Scanner sc = new Scanner("(23 ou 24 minutes pour les épisodes avec introduction) (approx.)1");
sc.useDelimiter("\\s*\\p{Punct}+\\s*|\\s+"); //second regex
I still have a blank token between introduction
and approx
. Is it possible to avoid it?
"[\\s\\p{Punct}]+"
? Or am I over-simplifying the problem? – Hovercraft Full Of Eels 37 mins ago\\s+|\\p{Punct}+
(I started with this one, didn't mention it) was doing the same as your one but it's not why? – alain.janinm 29 mins ago\\s*\\p{Punct}+\\s*|\\s+
and\\s+|\\s*\\p{Punct}+\\s*
– alain.janinm 27 mins ago