JavaScript Lesson 10. String Manipulation

Lesson objectives

A recap of strings

Just to recap something that you met in the first lesson: A string is a sequence of characters (letters, spaces and assorted punctuation and other symbols) which are put together end to end, to form what might be termed a "sentence". I put that last part in inverted commas as all the following are strings:

"To be or not to be, that is the question."
"x < y"
"#######"

Only one of those strings could be reasonably termed a sentence in the traditional sense of the word. You will also notice that the strings are enclosed within double quotation marks. This makes them string constants, in the same way that 5 or –6.7 are numeric constants. In fact, string constants can be enclosed within single quotation marks instead – JavaScript isn’t fussy. The examples above could just as easily have been written as follows:

'To be or not to be, that is the question.'
'x < y'
'#######'

However, you can’t mix the two – i.e. start a string constant with a double quotation mark and then end it with a single one or vice-versa. No, no! The following would be illegal:

"To be or not to be, that is the question.'
'x < y"
"#######'

As you found out in the first lesson, you can set variables to string constants using simple assignment:

var proverb = "Every mushroom cloud has a uranium lining";
name = "Henry Smith";

The fact that you can use either single quotation marks or double quotation marks is quite useful. It means that you can enclose a single quotation mark within a string without JavaScript getting confused:

name = 'Henry "Honest" Smith';

In this case, the double quotation marks form part of the string, so we had to use single quotation marks to define where the string constant started and finished. If we had used double quotation marks, then JavaScript would have assumed that the string finished after the word Henry:

name = "Henry "Honest" Smith";

In this case, JavaScript would assign the variable name to "Henry " and then not know how to deal with the word Honest that followed it. In the following example, the string must be enclosed within double quotation marks to stop JavaScript getting confused:

name = "Diane O’Brien";

In this case, using single quotation marks would lead to the program assigning name to Diane O and not understanding how to interpret the word Brien:

name = ‘Diane O’Brien’;

Finding a character at a given position

String variables contain a method called charAt() which displays the character present at a certain position in a string. The characters in the string are counted from 0 (in a similar way to elements of an array), so the first character is given by charAt(0), the second by charAt(1) etc. Of course, the number given to specify the character position can be a variable name or an expression:

var name = "Pete Smith";
var x = 2;
document.write(name.charAt(3 * x – 1));

In this example, 3 * x – 1 is equivalent to 3 * 2 – 1, i.e. 5, so the last instruction is equivalent to

document.write(name.charAt(5));

This would display the letter "S", which is character 5 of the string. If you ran the instruction

document.write(name.charAt(4));

in this case, the character displayed would be the space in the middle of the name, and you wouldn’t see anything on the screen. This can often lead to confusion: If you don’t see any output when a character is displayed on the screen, double check to make sure that the character you are trying to display isn’t a space character.

Extracting substrings from a string

What is a substring? It is a small string which is found inside another string. For instance, the strings "cat", "sat on t", "n the ma" and " " are all substrings of the string "The cat sat on the mat."

String variables have a method called substring() built into them which gives you a substring present in the string whose method is called. It takes two parameters, firstly a position within the string to start counting and another number which indicates the character one position after the position where you want to stop counting. This is best explained with an example:

var x = "My name is Richard Bowles";
alert(x.substring(11,18));

The substring that would be displayed by the alert() method is the word Richard. This is because character number 11 is the letter "R" (remember, the counting starts at 0) and character 18 is the space character just after the letter "d". It is important to remember that the second parameter is not the index of the last character you want displayed but one position after that.

If the second parameter value is a number which is larger than the length of the string, then the substring returned is the rest of the string from the first character position onwards. For instance, if the alert() command were changed to

alert(x.substring(11,200));

then the substring displayed would be Richard Bowles. Clearly the string variable x doesn’t have 200 characters in it. The program has given you all the characters up to the end of the string and then stopped. The same thing happens if you miss out the second parameter altogether. Again, the substring returned is the entire string from the specified point onward (i.e. until the end of the string). If the command above were changed to

alert(x.substring(11));

then, again, the substring displayed would be Richard Bowles.

Finding one substring inside another

There is a method built into string variables called indexOf() which is used to search for a substring within another string. It returns the character position of the first character of the substring to be found, or –1 if the substring is not present. Again, an example should clear this up:

var string1 = "This is an example of a string.";
document.write(string1.indexOf("example"));

In this case, the number displayed on the string is 11, as the substring example does appear in the variable string1, and the first character of the substring (the "e" in "example") appears at character position 11 in the string. If I replaced the instruction with the following:

document.write(string1.indexOf("examples"));

then the number displayed would be –1, because the substring examples does not appear in the main string, even though the substring example does. The method as shown only finds the first occurrence of the substring. The search is case sensitive, i.e. it treats upper case and lower case letters differently. For example

document.write(string1.indexOf("Example"));

would display the result -1 as the substring Example is not found in the main string. Just as with the substring() method, there is an optional second parameter. In this case, it specifies the starting position at which to search. For instance, compare the following searches:

<script language="JavaScript">
var saying = "Tony and Tony's cousin, also called Tony, are here.";
document.write(saying.indexOf("Tony") + "<p />");
document.write(saying.indexOf("Tony",6) + "<p />");
document.write(saying.indexOf("Tony",20) + "<p />");
document.write(saying.indexOf("Tony",11) + "<p />");
</script>

The first search finds the first occurrence of Tony in the string, which is clearly at position 0 as it is the first thing that appears in the string. The second search starts at character position 6, i.e. the letter n of and, and so it finds the first occurrence of Tony after that, i.e. at position 9. Similarly, the third search starts at position 20, so it finds the last occurrence of Tony, i.e. the one directly before the comma.

The last search is interesting, as it starts at position 11, which is half way through the second Tony. In this case, it will not find the Tony in Tony's, as it is starting the search in the middle of that string. Instead, it finds the last occurrence of Tony in the sentence, as that is the first complete occurrence of the name that it comes across. It is, after all, trying to find Tony in the string "ny's cousin, also called Tony, are here."

Curiously, string variables have another method built into them called search(). Used in its simple form it duplicates the function of indexOf() almost exactly (it doesn't allow you to specify a starting position). However, it is more powerful when applied to a special data type called a regular expression.

The split() method

This method takes a string and creates an array of substrings from it. It splits the string (leaving the original intact, of course) every time it comes across a special character specified by you. Perhaps an example will make it clear:

var test = "Hello#2#7.4#Goodbye";
var x = text.split("#");

The crucial line is the second one. It creates an array, called x, which contains four elements, namely x[0], which holds Hello, x[1], which holds 2, x[2], which holds 7.4 and x[3] which holds Goodbye. These are the four "words" present in the original string, except that the # character has been used to separate them rather than spaces. If we had given this instruction, instead:

var x = text.split(".");

then the array x would have had only two elements, x[0] which contains Hello#2#7 and x[1] which contains 4#Goodbye. Please note, that the original string is completely unaltered (so you can use the split() method built into it as many times as you like) and also that the character used for splitting (the # in the first case and the full stop in the second case) does not appear in any of the strings produced. There is no reason why you can't use more than one character to act as the split:

textToSplit = "This_is_!_a_!_peculiar_string";
array2 = textToSplit.split("_!_");

This produces an array with elements array2[0] which is This_is, array2[1] which is a and array2[2] which is peculiar_string. You will notice that the individual underscore character, _, has not caused a split - only when it occurred in the pattern _!_

One other thing that you have probably noticed is that you don't need to declare the array with the command new Array as you had to when you first met arrays. This is because JavaScript knows that the split() method automatically produces an array, so you don't need to tell it specifically that the variable you want is to be an array.

The replace() method

This is another method built into strings. It produces a version of the string with the first occurrence of one specified substring replaced with another. Here's what I mean:

var s = "Tony was here. Not Bob, Tony's brother, but Tony himself.";
document.write(s.replace("Tony","Jim"));

The function call in this case returns a version of the sentence with the substring Tony replaced by Jim the first time it occurs:

Jim was here. Not Bob, Tony's brother, but Tony himself.

The original is not changed in any way - the altered version is simply a copy of the original with the replacements made. The code does not inform the user where those replacements occurred in the string. If the substring to be replaced is not present, then the method produces a string which is identical to the original. Note also that the substring will be replaced even if it is part of a larger word:

s = "The Queen lives in Buckingham Palace."
var s2 = s.replace("ham","pork");

This would set s2 to The Queen lives in Buckingpork Palace. If you want every occurrence of the substring replaced, then you need some sort of loop:

var s = "Tony was here. Not Bob, Tony's brother, but Tony himself.";
while (s.indexOf("Tony") > -1)
 s = s.replace("Tony","Jim");

In this case, the string s is changed, as it is reassigned in the statement within the loop. The loop executes as long as the s.indexOf() method finds an occurrence of the substring - and that substring is immediately replaced!

The toLowerCase() and toUpperCase() methods

These methods are also part of string variables. They return copies of the string with the letter characters either all in upper case (capital letters) or all in lower case ("small" letters). Characters which don't represent letters of the alphabet are not changed. The original string is, of course, left unchanged:

s = "Here Are some random Characters: #123$%^";
s2 = s.toUpperCase();
var s3 = s.toLowerCase();

This program segment sets the variable s2 to

HERE ARE SOME RANDOM CHARACTERS: #123$%^

and the variable s3 to

here are some random characters: #123$%^

The variable s itself is not changed by either of these two method calls.

The match() method

This isn't a particularly useful method when you use it with plain text. It only really becomes useful when we use it with those curious "regular expressions", which aren't explained here. However, it does have one use.

The match() method is used to find substrings within a main string, except that it produces an array of values, in much the same manner as split(). It produces an array with an element matching each substring found. Let's consider our example with Tony again:

var s = "Tony was here. Not Bob, Tony's brother, but Tony himself.";
var occur = s.match("Tony");

This creates an array called occur which contains one single element, matching the first occurrence of the word Tony in the string. The first (and only) element, occur[0] contains Tony.

In fact, you can use a regular expression (explained in a later chapter) to create an array containing all the occurrences of Tony in the string. Replace the match command with the following:

var occur = s.match(/Tony/g);

The use of / rather than quotation marks indicates that the match command is being passed a regular expression rather than a string, and the g means "global" i.e. find all the occurrences. This time the array occur has three elements, all of them containing Tony and corresponding to the three times it appears in the sentence.

Pretty useless, huh! Well, no. If you follow the code that you see above, with the following:

document.write(occur.length);

this is will display the number 3, as that is the number of elements in the array. Effectively, this gives you a way of finding out how many times a substring occurs within the main string. match() really comes into its own when it is used with regular expressions, which let you specify ambiguous substrings (i.e. finding all the words which start with "T" or all the words taking the form of a letter followed by two digits etc.)

Again, you notice that there is no need to specify the array that is created, occur in this case, as the match() method automatically produces an array. JavaScript already knows that the variable will have to be an array.

If you want to count the number of occurrences of a substring within a main string (e.g. how many times Tony appears in a string), without having to use a regular expression, why not use the following?

document.write(s.split("Tony").length-1);

I will leave you to work out for yourself how it works.