String Functions
As the concatenation operator joins strings, the various string functions allow you to divide strings and manipulate what’s already in a string. This will allow you to
Separate a string of data into more workable pieces
Retrieve only a particular part of a string
Find the location of a substring you want to extract
Replace a substring with a different string
Extracting Substrings
Extracting substrings is simply a matter of knowing where within a string the information (another string) you want is located. Specifically, you have to know the index of the first character and the length of the string you want to extract.
For example, let’s assume you have a person’s Social Security number stored in a string and you want to use the last four digits of the number as the default PIN code.
Let’s assume your program (or a person) has already formatted the string such that it is simply a sequence of nine numbers, without hyphens, spaces, or other characters separating the numbers. Let’s say the Social Security number is 012-34-5678. The sequence is stored in a variable as follows:
$SSN = '012345678';
TIP
Notice that the string above must be within quotes or it will lose the intial zero. Although most numeric values may be easily converted back and forth between numbers and strings, this one would lose the zero as soon as it became a numeric type.
It is a good practice to enclose all numbers intended to be used as strings in quotes to denote them as strings and not numbers. Not doing so can not only yield strange results if the number begins with a zero, but it also makes your code somewhat obscure. Variables intended for use only as strings should be coded only as strings.
Now, you want to retrieve the last four characters (in this case, digits) of a nine-character string. To do this, use the substr (substring) function. The syntax for substr follows:
string substr(string str, int start [, int length])
INTERPRETING SYNTAX GUIDES
The monospaced text you see just before this block is called a syntax guide. It’s a brief way of showing how a function is intended to be used that tells two important things about the function: what value is returned and what parameters it takes.
The function’s return value is given before the function name. In this case, it’s the first occurrence of string on that line.
After the function name, the parameters are given in parentheses, similarly to actually calling the function. However, the parameter types are given in addition to the typical parameter itself. Also, the parameters given here are italicized because they are symbolic names for what should be passed as that parameter.
Syntax guides can also tell you which parameters are optional. Optional parameters are enclosed in brackets so you’re aware of which parameters are optional and which aren’t.
str is the string you want to extract a substring from, start is the index of the first character to be extracted, and the optional parameter length is the length of the substring you wish to extract. If you leave length out, the substring returned will go all the way to the end of the string.
So, to get the last four characters of the Social Security number, use
$SSN_lastFour = substr($SSN, 5, 4);
The 5 here means the substring you get will start at the index position 5, which is the fourth character from the end of the string. The last parameter, 4, tells substr() to give us four characters—in this case, the last four. Figure 5.1 illustrates the extraction of the last four digits from the rest of the string.
Figure 5.1. The substring here is the last four characters of the nine-character string, starting at the character index 5 and continuing to the end.
NOTE
When a substring is extracted from a string, it is not removed, but rather only retrieved. For example, in the demonstration involving a Social Security number, $SSN will still be a nine-character string, and it will still be the same as it was before. You are not changing the string in any way; instead, you’re merely “taking a look” at what’s inside the string.
The substr function is much more flexible than that, however. Let’s assume for a moment that you’re not sure if the Social Security number has its number groups separated by some character or not. Any of the following assignments could be true:
$SSN = '012345678';
or
$SSN = '012-34-5678';
or even
$SSN = '012.34.5678';
Independent of the rest of the string, if you know that the last four characters of the string are the last four digits of the number, you can retrieve the last four characters from the end.
Counting from the end is especially important in this case because you can’t be sure whether the string’s length will be 9 (just the nine digits) or 11 (the nine digits plus two separating characters). If you counted from the beginning of the string, you would then have the problem of figuring out what the starting position of the substring would be; it could be either 5 or 7. However, if you count from the end, the substring will always start 4 characters from the end.
To express this to substr, use a negative starting position. Doing so tells substr to count from the end of the string instead of from the beginning. However, unlike counting from the beginning of the string, when counting from the end, the first character is –1 (not 0 or -0).
The following statement retrieves the last four digits, regardless of the format of the string:
$SSN_lastFour = substr($SSN, -4);
Notice that the length parameter is omitted. Because you’re trying to retrieve everything up to the end of the string, it’s not necessary.
Now, try using the length parameter. The length parameter determines how long the substring returned will be. For example, if length is specified as 2, the substring returned will be two characters long. The following example demonstrates this principle.
$str = 'abcdef';
echo substr($str, 0, 2); // outputs 'ab'
In this example, the substring begins at the very first position in the string, 0, and it’s 2 characters long. Thus, the substring returned is the first two characters of the string, ab.
The length parameter can also be negative. Like the start parameter, if the length parameter is negative, it means count from the end of the string. Thus, the ending position for the string will be length number of characters from the end of the string. The character at the ending position specified is included in the substring. Again, -1 is the first character when counting from the end of the string.
Here’s an example:
$str = 'abcdef';
echo substr($str, 0, -2); // outputs 'abcde'
Now, instead of the length of the string being 2, it’s however long it takes to get 2 from the end (position -2). The string starts at the beginning (0), so everything from the first character to the one before the last (-2) is returned as the substring.
CAUTION
Because string index positions can be confusing, it’s a good idea to check the result of substr calls with several different strings to make sure it is doing what you want it to do. If it’s not, you can adjust the parameters you’re passing to it without too much of a hassle; if you continue without testing, you may later find that you have a hard time even figuring out where the problem is.
You may find that sometimes you need the length of a string. This is helpful if you want to get the last character of a string or check to make sure a string isn’t too long to fit somewhere (such as a particularly limited place on a Web page or in a size-limited database field).
To find the length of a string, use the strlen function, which has the following syntax:
int strlen(string str)
To find the length of a string $str, then, you would use
<code>
echo strlen($str);
The complete number of characters (including whitespace characters such as spaces and \n) is returned. Here’s an example:
$str = 'This is a string.';
$str2 = "Newlines!nOnenTwo";
echo strlen($str) . ', ' . strlen($str2);
The output of this code would be 17, 17. Remember that even though the second string appears longer, the \n sequence inside of double quotes is interpreted as only one character. Thus, the two strings are of equal length.
Finding Substrings
If you already know where to find a substring within a string, things aren’t too difficult. However, it’s not always so easy; sometimes you only know where a substring is in relation to another string.
Let’s take a string representation of a number raised to a power as an example. To interpret such a string, you would have to break the string into two parts: the number and the power it’s supposed to be raised to.
Here’s an example string:
$numToPower = '20^2';
Keep in mind that this example should be allowed to change; although our example is 20^2, it could be 2^2, 3^5, or 5^10. Therefore, you have no idea where the caret is going to be and where either number will begin or end.
So, in order to extract the numbers as substrings, you first must determine the positions at which they start and end. You know that the first number will always start at 0, and you know that the last number will always go to the end. All you really have to figure out is where the first number ends and the second begins.
If you knew the position of the caret, you could determine the positions you needed: The first number would end 1 before the caret's position, and the second number would begin 1 after the caret's position. Now you need to find the caret's position.
To do this, you'll use the strpos function. The syntax for the strpos function is as follows:
int strpos(string str, string find [, int start])
str is the string to be searched, and find is the string to find. The optional start parameter is used to limit where strpos starts searching for find within str; for example, if you know there are three periods within a string, but want to find the second one, you can rule out the first one by specifying a start that is past it.
Here's how strpos is used to find the caret in the preceding expression:
$caretPos = strpos($numToPower, '^');
Supposing $numToPower was 20^2, $caretPos would now be 2 (the caret's index within the string). See Figure 5.2 for a visual depiction of how strpos() arrives at this value.
Figure 5.2. The index position of the caret is what's returned by strpos('20^2', '^').
Now, to get the two numbers, it's only necessary to use $caretPos with the preceding assertions describing where you will find the beginnings and ends of the numbers in relation to the caret.
There is one complication, however. The substr function doesn't take a start and an end position, but rather a start and a length. To overcome this, you have to calculate the length for the first number. (The second number's length can be unspecified because it will end at the end of the string.)
The caret position is 2; this tells us that there are 2 characters before the caret: those at positions 0 and 1. If the caret were at 3, there would be 3 characters before it (those at 0, 1, and 2). At 4, there would be 4, and so on. Therefore, you can simply use the caret's position as the length of the substring for the first number.
The last number starts at whatever position immediately follows the caret, $caretPos + 1.
The following code extracts the two numbers from the string $numToPower:
Performing Basic String Replacements
Another type of string manipulation is comparable to a word processor's Find and Replace utilty. A string replacement occurs when a particular string is replaced with another string within a larger string. This is commonly used for
Removing possible occurrences of obscene words from publicly submitted text
Changing plain-text characters into HTML characters (such as regular newlines into
tags)
Changing Windows return plus newline (\r\n) into Unix-formatted newlines (\n)
The function to perform simple string replacements with is str_replace. Here's the syntax:
string str_replace(string find, string replace, string str)
Where find is the string that should be found, replace is the string to replace all occurrences of find with, and str is the string to perform the replacements in.
NOTE
Notice that str_replace returns a string. The only way to get the result of the replacement is to store this return value (either to a new variable or even back to the original variable passed as str). The str_replace function does not modify str on its own.
The use of str_replace is pretty straightforward. Let's assume the string $text contains some text a user submitted that's going to be displayed on a Web site. If the user pressed Enter anytime he was typing the text, he would have inserted \n or \r\n into the text. However, these characters are ignored when a browser is interpreting HTML. (You can break a line wherever you want in HTML and the file will be processed exactly the same way.) To get these linebreaks to show up, you must replace the \n sequences with a
tag. Here's how this could be done:
$text = str_replace("\n", '
', $text);
CAUTION
The difference between double quotes and single quotes is extremely important in this example. The newline (\n) passed to str_replace must be the same as the one in $text. Therefore, you must be sure to enclose the newline in double quotes. As with all other strings, enclosing it in single quotes keeps PHP from interpreting it as a newline, but rather forces PHP to interpret it as a slash and an 'n'.
TIP
There is also a function that has been specifically created to handle this task called nl2br. For more information, check out the PHP manual, as specified in Appendix A, "Debugging and Error Handling."
NOTE
The str_replace function has one drawback: It's case-sensitive. If you want to find only the capitalized word Fred then this is fine; however, if you want to find Fred, fred, and FRED, you'll need to use the pregi_replace function, which is mentioned later in this chapter in "Replacements with Regular Expressions."
The str_replace function can also perform multiple replacements at the same time. Any of the three parameters may be specified as arrays. The first parameter may be an array of several different substrings to find within the string. Once found, the corresponding element of the array passed as the second parameter is used as the replacement string. If the second parameter was a single string, then that string will be used for all of the replacements. This can go on for however many strings are in the array passed as the third parameter, which may or may not be an array.
CAUTION
If the array for the second parameter has fewer elements than the one in the first param- eter, empty strings will be used as the replacement strings for the missing elements. If you're replacing multiple strings with multiple values, be sure you have a value for each string you're replacing or you'll end up simply removing the strings without replacing them with anything.
The following example demonstrates the replacement of the strings "dog", "cat", and "ferret" with the single word "animal":
Here's the output from this program:
My animal knows a mammal that knows the animal that stole my keys.
Now that you've replaced several words with one, try replacing them so each word is replaced with a different word:
And here's the output:
My wife knows a guy that knows the thief that stole my keys.