In this tutorial, I am trying to show some simple exercises which can be used for learning and understanding String split and a few other methods. Also, you can find Java Assignment help at Assign code
- Understanding the two flavors of the split method
- Simplest thing first – Using the split() method with one character delimiter
- How about using another string as a delimiter?
- How about using multiple characters as a delimiter?
- How about using multiple words as a delimiter?
- Keeping a watch on a few special characters
- String Split using Java 1.1 Style – Using StringTokenizer class
Java.lang.String class contains two flavors of the split() method, which can be used to split a string. Here are the Javadoc definition of these two methods:
public String[] split(String regex) – This method splits the string based on the given regular expression and returns array of strings.
public String[] split(String regex,int limit) – This method splits the string based on the given regular expression and returns array of strings. The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n – 1 times, the array’s length will be no greater than n, and the array’s last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
So for example if you have a string [“This is a String”] then a) calling a split(” “) method (space as delimiter) on this string will convert it to an array of four strings [1] – This [2] – is [3] – a [4] – String a) calling a split(” “, 2) method on this string will convert it to an array of 2 strings [1] – This [2] – is a String
As you can see the split method supports regular expression, therefore, we should be able to use one or more characters as a delimiter. To start with the simple example I have chosen comma “,” as a delimiter for my sentence. So for example, if you have a string [“This, is, a, String”] then a split using comma as delimiter should result in the array of four strings. See below example code.
public static void singleCharacterDelimiterTest() { String str = "This,is,a,String"; String delimiter = ","; String[] splitStrings = str.split(delimiter); for (int i = 0; i < splitStrings.length; i++) { System.out.println(splitStrings[i]); } }
Yes, you can always use a string as a delimiter. Remember that delimiter is a regular expression, therefore, any valid regular expression can be passed as a value in this parameter. So for example, if you have a string [“ThisWORDisWORDaWORDString”] then a split using the word [WORD] as a delimiter is going result in an array of four strings. See below example code.
public static void singleWordDelimiterTest() { String str = "ThisWORDisWORDaWORDString"; String delimiter = "WORD"; String[] splitStrings = str.split(delimiter); for (int i = 0; i < splitStrings.length; i++) { System.out.println(splitStrings[i]); } }
The output of this method will be
[0] This [1] is [2] a [3] String
There can be some scenarios where you may want to split the string based on multiple characters. So for example, if you have a string [“This, is: a; String”] then a split using [comma or colon or semi-colon] as a delimiter is required. You need not to be an expert in Regular Expression to do this, just a simple use of pipe (|) operator should be sufficient to deal with this. so the regular expression for these 3 delimiters will be [“,|:|;”] See below example code.
public static void multiCharacterDelimiterTest() { String str = "This,is:a;String"; String delimiter = ",|:|;"; String[] splitStrings = str.split(delimiter); for (int i = 0; i < splitStrings.length; i++) { System.out.println("[" + i + "] "+ splitStrings[i]); } }
The output of this method will be
[0] This [1] is [2] a [3] String
Let’s assume a hypothetical scenario where you may want to split the string based on multiple words. So for example, if you have a string [“ThisWORD1isWORD2aWORD3String”] then a split using [WORD1 or WORD2 or WORD3] as a delimiter is required. Again the use of pipe (|) operator should be sufficient to deal with this. so the regular expression for these 3 delimiters will be [“WORD1|WORD2|WORD3”] See below example code.
public static void multiWordDelimiterTest() { String str = "ThisWORD1isWORD2aWORD3String"; String delimiter = "WORD1|WORD2|WORD3"; String[] splitStrings = str.split(delimiter); for (int i = 0; i < splitStrings.length; i++) { System.out.println("[" + i + "] "+ splitStrings[i]); } }
The output of this method will be
[0] This [1] is [2] a [3] String
As we discussed in previous scenarios, the delimiter field is a regular expression and therefore there are few characters which have special meanings. If any such character needs to use as a delimiter then an escape sequence of \ should be used in delimiter string. Some example characters are a pipe (|), dollar sign ($), dot (.) carat (^) See below example code.
public static void specialCharaterDelimiterTest() { String str = "This|is^a$String"; String delimiter = "\||\^|\$"; String[] splitStrings = str.split(delimiter); for (int i = 0; i < splitStrings.length; i++) { System.out.println("[" + i + "] "+ splitStrings[i]); } }
The output of this method will be
[0] This [1] is [2] a [3] String
You can try many other regular expression and explore the power of this method. Visit http://www.regular-expressions.info/ to get a more in-depth understanding of regular expressions.
StringTokenizer is a legacy class which is part of JDK since version 1.1. The String Tokenizer class allows an application to break a string into tokens. The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis. StringTokenizer’s nextToken() method can be used to produce the exact same output, See below example code.
public static void splitByTokenizer() { StringTokenizer st = new StringTokenizer("This,is:a;String",",:;"); while (st.hasMoreTokens()) { System.out.println(st.nextToken()); } }
The output of this method will be
This is a String
The split() method has been introduced since JDK version 1.4 and StringTokenizer use has been discouraged after that. Here is what Sun suggest in JDK 1.6 Javadocs “StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.” Let me know your comments/feedback on this tutorial.
Hi,
You've a problem in your 3rd example, the output must be :
[0] This
[1] is
[2] a
[3] String
and not
[0] D
[1] sM
[2] sTh
[3] String
😉
@Baptiste Thanks for pointing that out. I have corrected it now. Let me know if you have any other comments?
Small remark: you're escaping your special characters (|, ^ and $) twice, effectively escaping the escape character.
BTW, this form doesn't allow cut/copy/paste, or using home/end/ctrl/shift/arrows… What's up with that? (Using latest Firefox.)
Excellent post for newbies.
Nice post…
@Anonymous – 1. The two escape characters are required here one is for java string escaping and other is for regex engine. If we use only one escape character the Java program will not compile. Let me know if you need more details. 2. I have seen that problem too sometimes. I just tried searching on google for this issue and found there is a malware which affects firefox. Please check this link for details and do let me know if you still see the problem. Firefox Copy & Paste Bug
This comment is from IE6.0 : checking the copy paste bug on IE6.0. It seems to be working fine. No issues with copy/paste and home/end keys work perfectly alright. Need to check some other browsers too.
Looks like I am able to reproduce this issue from my Firefox 3.5.3. The issue seems to be reproducible only if I logout from my google account and try to enter comment. IE6.0 was looking fine without login to google account. Here are the issues1. Home key doesn't work.2. Array keys dont work.3. right click menu does not show any copy paste options.4. Cannot do select all using "Ctrl+A"Readers:Till I figure out the solution for this issue please try to use already logged in google account in case you want to do copy paste and other keyboard activities in comment textarea.
This is a comment from Google Chrome 3.0.195.21The comment form seems to be working fine. No issue even when not logged in to google account.
What happens if the delimiter is not found in the String to split?
For example:
String str = "Hello World";
String[] splitStrings = str.split("*");
What is going to be the value of splitStrings?
Thanks in advance.
@Anonymous – If the delimiter is not found then original String should be returned as it is.
So in case of your example the splitStrings array size will be 1 and splitStrings[0] will have value "Hello World".
Thanks for your answer, this is very useful.
Actually, when I tried my example above (posted on 10/31/2009), I found out that * and + are reserved characters for regex, and they need to be enclosed in [] for the method to work – otherwise the following exception is thrown:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
You are right. Any regular expression special character needs to be escaped inside Java String. For example for a String
"This|is+a$String"
if we use delimiter expression as
"\||+|\$"
then its going to throw below error.
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 3
||+|$
^
at java.util.regex.Pattern.error(Unknown Source)
at java.util.regex.Pattern.sequence(Unknown Source)
at java.util.regex.Pattern.expr(Unknown Source)
at java.util.regex.Pattern.compile(Unknown Source)
at java.util.regex.Pattern.in it(Unknown Source)
at java.util.regex.Pattern.compile(Unknown Source)
at java.lang.String.split(Unknown Source)
at java.lang.String.split(Unknown Source)
at RegExTest.specialCharaterDelimiterTest(RegExTest.java:59)
at RegExTest.main(RegExTest.java:53)
Instead if we escape the plus(+) character using double slashes like this :
"\||\+|\$"
then proper result will be displayed.