PHP Tutorials

String Functions

As the concatenation operator joins strings, the various string functions allow you to divide strings and manipulate what’s already in a string. This will allow you to

Separate a string of data into more workable pieces

Retrieve only a particular part of a string

Find the location of a substring you want to extract

Replace a substring with a different string

Extracting Substrings
Extracting substrings is simply a matter of knowing where within a string the information (another string) you want is located. Specifically, you have to know the index of the first character and the length of the string you want to extract.

For example, let’s assume you have a person’s Social Security number stored in a string and you want to use the last four digits of the number as the default PIN code.

Let’s assume your program (or a person) has already formatted the string such that it is simply a sequence of nine numbers, without hyphens, spaces, or other characters separating the numbers. Let’s say the Social Security number is 012-34-5678. The sequence is stored in a variable as follows:
$SSN = '012345678';
TIP

Notice that the string above must be within quotes or it will lose the intial zero. Although most numeric values may be easily converted back and forth between numbers and strings, this one would lose the zero as soon as it became a numeric type.

It is a good practice to enclose all numbers intended to be used as strings in quotes to denote them as strings and not numbers. Not doing so can not only yield strange results if the number begins with a zero, but it also makes your code somewhat obscure. Variables intended for use only as strings should be coded only as strings.

Now, you want to retrieve the last four characters (in this case, digits) of a nine-character string. To do this, use the substr (substring) function. The syntax for substr follows:

string substr(string str, int start [, int length])
INTERPRETING SYNTAX GUIDES
The monospaced text you see just before this block is called a syntax guide. It’s a brief way of showing how a function is intended to be used that tells two important things about the function: what value is returned and what parameters it takes.

The function’s return value is given before the function name. In this case, it’s the first occurrence of string on that line.

After the function name, the parameters are given in parentheses, similarly to actually calling the function. However, the parameter types are given in addition to the typical parameter itself. Also, the parameters given here are italicized because they are symbolic names for what should be passed as that parameter.

Syntax guides can also tell you which parameters are optional. Optional parameters are enclosed in brackets so you’re aware of which parameters are optional and which aren’t.

str is the string you want to extract a substring from, start is the index of the first character to be extracted, and the optional parameter length is the length of the substring you wish to extract. If you leave length out, the substring returned will go all the way to the end of the string.

So, to get the last four characters of the Social Security number, use
$SSN_lastFour = substr($SSN, 5, 4);
The 5 here means the substring you get will start at the index position 5, which is the fourth character from the end of the string. The last parameter, 4, tells substr() to give us four characters—in this case, the last four. Figure 5.1 illustrates the extraction of the last four digits from the rest of the string.

Figure 5.1. The substring here is the last four characters of the nine-character string, starting at the character index 5 and continuing to the end.

NOTE

When a substring is extracted from a string, it is not removed, but rather only retrieved. For example, in the demonstration involving a Social Security number, $SSN will still be a nine-character string, and it will still be the same as it was before. You are not changing the string in any way; instead, you’re merely “taking a look” at what’s inside the string.

The substr function is much more flexible than that, however. Let’s assume for a moment that you’re not sure if the Social Security number has its number groups separated by some character or not. Any of the following assignments could be true:



$SSN = '012345678';

or

$SSN = '012-34-5678';

or even

$SSN = '012.34.5678';

Independent of the rest of the string, if you know that the last four characters of the string are the last four digits of the number, you can retrieve the last four characters from the end.

Counting from the end is especially important in this case because you can’t be sure whether the string’s length will be 9 (just the nine digits) or 11 (the nine digits plus two separating characters). If you counted from the beginning of the string, you would then have the problem of figuring out what the starting position of the substring would be; it could be either 5 or 7. However, if you count from the end, the substring will always start 4 characters from the end.

To express this to substr, use a negative starting position. Doing so tells substr to count from the end of the string instead of from the beginning. However, unlike counting from the beginning of the string, when counting from the end, the first character is –1 (not 0 or -0).

The following statement retrieves the last four digits, regardless of the format of the string:

$SSN_lastFour = substr($SSN, -4);

Notice that the length parameter is omitted. Because you’re trying to retrieve everything up to the end of the string, it’s not necessary.

Now, try using the length parameter. The length parameter determines how long the substring returned will be. For example, if length is specified as 2, the substring returned will be two characters long. The following example demonstrates this principle.



$str = 'abcdef';
echo substr($str, 0, 2); // outputs 'ab'

In this example, the substring begins at the very first position in the string, 0, and it’s 2 characters long. Thus, the substring returned is the first two characters of the string, ab.

The length parameter can also be negative. Like the start parameter, if the length parameter is negative, it means count from the end of the string. Thus, the ending position for the string will be length number of characters from the end of the string. The character at the ending position specified is included in the substring. Again, -1 is the first character when counting from the end of the string.

Here’s an example:



$str = 'abcdef';
echo substr($str, 0, -2); // outputs 'abcde'

Now, instead of the length of the string being 2, it’s however long it takes to get 2 from the end (position -2). The string starts at the beginning (0), so everything from the first character to the one before the last (-2) is returned as the substring.

CAUTION

Because string index positions can be confusing, it’s a good idea to check the result of substr calls with several different strings to make sure it is doing what you want it to do. If it’s not, you can adjust the parameters you’re passing to it without too much of a hassle; if you continue without testing, you may later find that you have a hard time even figuring out where the problem is.

You may find that sometimes you need the length of a string. This is helpful if you want to get the last character of a string or check to make sure a string isn’t too long to fit somewhere (such as a particularly limited place on a Web page or in a size-limited database field).

To find the length of a string, use the strlen function, which has the following syntax:

int strlen(string str) To find the length of a string $str, then, you would use <code> echo strlen($str);

The complete number of characters (including whitespace characters such as spaces and \n) is returned. Here’s an example:



$str   =  'This is a string.';
$str2  =  "Newlines!nOnenTwo";
echo strlen($str) . ', ' . strlen($str2);

The output of this code would be 17, 17. Remember that even though the second string appears longer, the \n sequence inside of double quotes is interpreted as only one character. Thus, the two strings are of equal length.

Finding Substrings
If you already know where to find a substring within a string, things aren’t too difficult. However, it’s not always so easy; sometimes you only know where a substring is in relation to another string.

Let’s take a string representation of a number raised to a power as an example. To interpret such a string, you would have to break the string into two parts: the number and the power it’s supposed to be raised to.

Here’s an example string:

$numToPower = '20^2';

Keep in mind that this example should be allowed to change; although our example is 20^2, it could be 2^2, 3^5, or 5^10. Therefore, you have no idea where the caret is going to be and where either number will begin or end.

So, in order to extract the numbers as substrings, you first must determine the positions at which they start and end. You know that the first number will always start at 0, and you know that the last number will always go to the end. All you really have to figure out is where the first number ends and the second begins.

If you knew the position of the caret, you could determine the positions you needed: The first number would end 1 before the caret's position, and the second number would begin 1 after the caret's position. Now you need to find the caret's position.

To do this, you'll use the strpos function. The syntax for the strpos function is as follows:

int strpos(string str, string find [, int start])

str is the string to be searched, and find is the string to find. The optional start parameter is used to limit where strpos starts searching for find within str; for example, if you know there are three periods within a string, but want to find the second one, you can rule out the first one by specifying a start that is past it.

Here's how strpos is used to find the caret in the preceding expression:

$caretPos = strpos($numToPower, '^');
Supposing $numToPower was 20^2, $caretPos would now be 2 (the caret's index within the string). See Figure 5.2 for a visual depiction of how strpos() arrives at this value.

Figure 5.2. The index position of the caret is what's returned by strpos('20^2', '^').

Now, to get the two numbers, it's only necessary to use $caretPos with the preceding assertions describing where you will find the beginnings and ends of the numbers in relation to the caret.

There is one complication, however. The substr function doesn't take a start and an end position, but rather a start and a length. To overcome this, you have to calculate the length for the first number. (The second number's length can be unspecified because it will end at the end of the string.)

The caret position is 2; this tells us that there are 2 characters before the caret: those at positions 0 and 1. If the caret were at 3, there would be 3 characters before it (those at 0, 1, and 2). At 4, there would be 4, and so on. Therefore, you can simply use the caret's position as the length of the substring for the first number.

The last number starts at whatever position immediately follows the caret, $caretPos + 1.

The following code extracts the two numbers from the string $numToPower:

Performing Basic String Replacements
Another type of string manipulation is comparable to a word processor's Find and Replace utilty. A string replacement occurs when a particular string is replaced with another string within a larger string. This is commonly used for

Removing possible occurrences of obscene words from publicly submitted text

Changing plain-text characters into HTML characters (such as regular newlines into
tags)

Changing Windows return plus newline (\r\n) into Unix-formatted newlines (\n)

The function to perform simple string replacements with is str_replace. Here's the syntax:

string str_replace(string find, string replace, string str)

Where find is the string that should be found, replace is the string to replace all occurrences of find with, and str is the string to perform the replacements in.

NOTE

Notice that str_replace returns a string. The only way to get the result of the replacement is to store this return value (either to a new variable or even back to the original variable passed as str). The str_replace function does not modify str on its own.

The use of str_replace is pretty straightforward. Let's assume the string $text contains some text a user submitted that's going to be displayed on a Web site. If the user pressed Enter anytime he was typing the text, he would have inserted \n or \r\n into the text. However, these characters are ignored when a browser is interpreting HTML. (You can break a line wherever you want in HTML and the file will be processed exactly the same way.) To get these linebreaks to show up, you must replace the \n sequences with a
tag. Here's how this could be done:

$text = str_replace("\n", '
', $text);
CAUTION

The difference between double quotes and single quotes is extremely important in this example. The newline (\n) passed to str_replace must be the same as the one in $text. Therefore, you must be sure to enclose the newline in double quotes. As with all other strings, enclosing it in single quotes keeps PHP from interpreting it as a newline, but rather forces PHP to interpret it as a slash and an 'n'.

TIP

There is also a function that has been specifically created to handle this task called nl2br. For more information, check out the PHP manual, as specified in Appendix A, "Debugging and Error Handling."

NOTE

The str_replace function has one drawback: It's case-sensitive. If you want to find only the capitalized word Fred then this is fine; however, if you want to find Fred, fred, and FRED, you'll need to use the pregi_replace function, which is mentioned later in this chapter in "Replacements with Regular Expressions."

The str_replace function can also perform multiple replacements at the same time. Any of the three parameters may be specified as arrays. The first parameter may be an array of several different substrings to find within the string. Once found, the corresponding element of the array passed as the second parameter is used as the replacement string. If the second parameter was a single string, then that string will be used for all of the replacements. This can go on for however many strings are in the array passed as the third parameter, which may or may not be an array.

CAUTION

If the array for the second parameter has fewer elements than the one in the first param- eter, empty strings will be used as the replacement strings for the missing elements. If you're replacing multiple strings with multiple values, be sure you have a value for each string you're replacing or you'll end up simply removing the strings without replacing them with anything.

The following example demonstrates the replacement of the strings "dog", "cat", and "ferret" with the single word "animal":

Here's the output from this program:

My animal knows a mammal that knows the animal that stole my keys.

Now that you've replaced several words with one, try replacing them so each word is replaced with a different word:

And here's the output:

My wife knows a guy that knows the thief that stole my keys.

Filed under: Chapter 5 @ 6:59 pm

Pattern Matching with Regular Expressions

Although basic string replacements are very effective in some cases, they are simply useless in others. For example, if you know exactly what you want to replace, such as the word “dog”, str_replace is fine. However, sometimes you only know how the word will appear in a file; somehow, you have to “describe” what the word “looks like” so PHP can find it; regular expressions are a way to write such a description using regular characters along with wildcards—characters that stand for some unknown character or group of characters.

A good example of this is an HTML anchor () tag. If you have a whole HTML page stored in a variable and want to find all of the links on the page, the functions you’ve learned so far would require that you develop a pretty complex algorithm for extracting this information. However, regular expressions allow you to specify that you know the string is something like this:
<a xhref="SOME_STRING">SOME_OTHER_STRING</a>

By doing so, you’ve eliminated most of the problem immediately. In addition to being able to find substrings like this, you can do replacements with them, or return the values you find (such as the values where SOME_STRING and SOME_OTHER_STRING appear). In this case, you would be able to parse the URL and text from HTML code (which could have been submitted by a visitor or retrieved from another Web site). However, since you don’t know what the actual text is that you’re looking for, str_replace doesn’t help any.

Pattern matching was created to accomplish this task. Pattern matching is the process of comparing one string (the string in which substrings are to be found) to another string that contains wildcard characters (the “description” of what the substring should “look like”). Wildcard characters are characters that represent one character or a set of characters. An example of a wildcard character is the asterisk; it is used on both Windows and Unix-based systems to indicate “any character(s).”

For PHP, the wildcards are used in regular expressions, a standard for how wildcards and other characters (collectively known as patterns) are written.

NOTE

In this section, you are discussing only PHP’s support for PCRE (Perl-compatible regular expressions). If you have experience with other regular expressions, you may find some of this to be a little different.

All of this new terminology at once is probably a bit confusing. The following example demonstrates a short pattern and the text it matches:

Pattern: “hello”
Matches: “hello”

As you can see, the pattern only matches one string: itself. This is very much like the behavior of str_replace; the only occurrences found are those that are exactly like the one being searched for.

This example can be expanded a little bit to make it more useful. For example, if you wanted to find the word “hello” anywhere in a sentence, you could use a wildcard to specify that it’s okay for “hello” to be bordered by any number of any characters.

The following example uses a regular expression function, preg_match, which is discussed later in this chapter, to determine whether the word “Hello” appears somewhere within a string:



<?php
/* ch05ex04.php - demonstrates simple use of regular expressions */

$string1 = 'Hello, this is string one.';
$string2 = 'This is string two.';

echo "String1 is: $string1
";
if ( preg_match("/.*Hello.*/", $string1) )
{
echo "I found 'Hello' in this string.

";
}
else
{
echo "I didn't find 'Hello' in this string.

";
}

echo "String2 is: $string2
";
if ( preg_match("/.*Hello.*/", $string2) )
{
echo "I found 'Hello' in this string.

";
}
else
{
echo "I didn't find 'Hello' in this string.

";
}

?>

The output of this program is

String1 is: Hello, this is string one.
I found ‘Hello’ in this string.

String2 is: This is string two.
I didn’t find ‘Hello’ in this string.

Just as Windows and Unix-based systems use the asterisk to specify any character, regular expressions (sometimes referred to as regexps, which is pronounced “rej-exps”) use the period to indicate “any character.” This and other wildcards are known as qualifiers. Table 5.1 shows the qualifiers PHP recognizes in regular expressions:

Table 5.1. These Qualifiers Are Understood in PHP’s Regular Expressions Qualifier  Meaning
.  Any character
^  The beginning of the string
$  The end of the string
[]  Used to specify character classes

All other characters are also considered to be qualifiers, but these are the special ones.

For example, to specify that a string may contain the word “hello” followed by any three characters, I could use the expression “/Hello…/”. If I wanted to ensure that the string matched is the only text within the string we’re testing, I could specify that it border the beginning and end using the appropriate qualifiers; “/^Hello…$/” would do the trick.

The following program demonstrates using these two expressions:



<?php
/* ch05ex05.php – uses some more regular expressions */
$string1 = 'Hello---'; // This one matches both expressions
$string2 = 'Hi, Hello---'; // This one isn't at the beginning of the string
$string3 = 'Hello'; // This one doesn't have three characters after Hello

echo "String1 is: $string1
";
if ( preg_match("/Hello.../", $string1) )
{
echo "I found 'Hello...' in this string;
checking to see if this is all that's in the string... ";

if ( preg_match("/^Hello...$/", $string1) )
{
echo "it is.

";
}
else
{
echo "it isn't.

";
}
}
else
{
echo "I didn't find 'Hello...' in this string.

";
}

echo "String2 is: $string2
";
if ( preg_match("/Hello.../", $string2) )
{
echo "I found 'Hello...' in this string;
checking to see if this is all that's in the string... ";

if ( preg_match("/^Hello...$/", $string2) )
{
echo "it is.

";
}
else
{
echo "it isn't.

";
}
}
else
{
echo "I didn't find 'Hello...' in this string.

";
}


echo "String3 is: $string3
";
if ( preg_match("/Hello.../", $string3) )
{
echo "I found 'Hello...' in this string;
checking to see if this is all that's in the string... ";

if ( preg_match("/^Hello...$/", $string3) )
{
echo "it is.

";
}
else
{
echo "it isn't.

";
}
}
else
{
echo "I didn't find 'Hello...' in this string.

";
}

?>


The output of this program is

String1 is: Hello—
I found ‘Hello…’ in this string; checking to see if this is all that’s in the
string… it is.

String2 is: Hi, Hello—
I found ‘Hello…’ in this string; checking to see if this is all that’s in the
string… it isn’t.

String3 is: Hello
I didn’t find ‘Hello…’ in this string.

The last qualifier on the list is the set of square brackets. These are used to define character classes, or certain groups of characters from which any one character may be used. For example, if you wanted to allow only a vowel to be picked, you might use the character class [aeiou], as in “b[aeiou]t”, which would match “bat”, “bet”, “bit”, “bot”, and “but”. Notice that only one character is allowed from the set.

You can also define character ranges within a character class using the hyphen. To match any alphanumeric character, this character class could be used: [a-zA-Z0-9].

Unlike Windows and Unix, however, one dot only allows for one occurrence of a character. As you can see from the previous example, if you had an unknown or large number of wildcard characters to match, things could become quite confusing. Therefore, you have to specify how many of some thing you wish to allow. The following table shows you the modifiers used to specify how many occurrences should be matched (therefore, known as quantifiers)

Table 5.2. These Quantifiers Can Be Used to Specify How Many Occurrences of a Certain Character Are to Be Matched Quantifier  Meaning
*  Any number of occurrences (zero or more)
+  At least one occurrence (one or more)
?  May or may not occur (zero or one)
{x}  Exactly x number of occurrences
{x,y}  At least x but not more than y occurrences
{x,}  At least x occurrences

To use a quantifier, place it directly after a qualifier. The example above could be reexpressed as “hello.{3}”.

NOTE

If you want to use an actual period, question mark, or so forth, precede it with two backslashes (\\).

Just as you must escape quotes within a string, you must escape the special characters in regexps to get their literal meaning. This would normally be done with a single slash; however, because the regular expressions are being expressed in double-quoted strings, you have to make an exception. The slash that really escapes the special character must itself be escaped.

Before you move on, let’s spend a little bit of time practicing and getting used to regular expressions:

“hello.*” matches any string that begins with “hello”. It may include much more text or it may terminate right after the “o”. Examples include “hello, this is regexps 101″ and “hello”.

“.*hello.*” matches any string with the word “hello” in it. It could be the word “hello” alone or any combination of things, as long as “hello” appears somewhere within, such as “Why, hello John!” and “hello”.

“^hello$” matches a string containing only the word “hello”. If other characters are present, the match fails.

“[a-zA-Z0-9]+” matches any string containing alphanumeric characters only, such as “John Smith” and “Smith150″.

” matches an HTML anchor tag. The initial

Notice that the double quote must be escaped with a slash to keep from ending the double quoted string that contains the expression. The href value itself is matched by the .* combination, and the opening quote, if present, is closed, followed by any other attributes and finally the end of the tag. This expression will become useful in demonstrating functions later in this chapter. Make sure you understand what each part of it does and why each character appears where it appears.

This pattern is somewhat complex, so a more in-depth explanation of it is necessary. An anchor tag that it is designed to match might look like this:
<a xhref="http://www.quepublishing.com">

The ).

The asterisk is a particularly tricky quantifier; it is referred to as a greedy quantifier because it will match the biggest string it can. This can create problems. Consider the following example string:

This is a test.

Notice that the tag isn’t just a simple two-component tag; instead, it has a third component for class. The regular expression formulated in the preceding examples will match more of this string than you really intend for it to match. Not only will it match the tag, but it will also match the text This is a test. because at the end it is looking for the largest string of any characters before the last > character. That’s just about everything.

However, you can reverse the greediness of the expression by adding question marks after the asterisk quantifiers, like this:
<a.+href[ ]*=[ ]*['\"]?.*?['\"]?.*?>
Notice that the two .* sequences got the addition of a question mark; this will stop the asterisk from going for the biggest string it can find. Rather, it will go until it finds the string following it in the regular expression (>). Now, instead of going to the last tag, the expression will reach the closing angle bracket of the first tag and will stop evaluating that part of the expression. Thus, only the opening tag of the string is matched.

It’s also possible to let an expression match two (or more) completely different textual occurrences. In the next section of this chapter, for example, the goal is to match both the opening tag and the closing tag. To do this, the expression must be able to say “pick either one of these”. This is done by including an expression for both conditions in the expression and separating the two with a pipe (|). This is read in the expression as “or”; abc|def is the same as match abc or def in English.

Basic Pattern Matching
Now that you know the basics of pattern matching, here’s a chance to try them out. The first thing you should do is get acquainted with the preg_match function, which is the basic function for matching strings with regular expressions in PHP. It follows this syntax:
bool preg_match($expr, $str [, $result])

Where $expr is the regular expression, which must have a delimiter added to it. The easiest thing to do is add a forward slash to each end of the string, like this: “/hello/”. The slashes are a carryover from Perl that allows certain options to be added (but we won’t explore those). $str should be the string being compared to the expression, and $result, if specified, becomes an array holding the results of the match. This will be discussed in more detail soon.

For now, let’s stick with simply testing to see if an expression matches a string. At the beginning of the chapter, the idea of verifying that an email address looks valid was mentioned, so let’s use that example for now.

Before you look at any code, let’s decide what an email address should look like. The following example addresses are all valid email addresses you can use to follow along as the attributes of an email address are described:

example2001@example.com
example-email@example123.com
example.email@this-example.com
example_email@subdomain.example.com

First, you know an e-mail address has two basic parts of interest: that before the @ sign and that after it. (Of course, the @ sign itself must be present, too.) The part before the @ sign may consist of letters, numbers, periods, hyphens, and underscores. The part after the @ sign will be a domain (letters, numbers, hyphens, and periods) with any number of subdomains. For instance, a domain might be simply “example.com”, or it could be “mail.example.com”, or even “in.mail.example.com”.

Now let’s construct the expression you’ll use. The first part of the e-mail address can be expressed as this:

“[a-zA-Z0-9\.\-\_]+”

Notice that the slashes keep the special characters from meaning anything other than their literal form. Actually, it isn’t necessary to escape the period (because it is always taken literally within brackets) or the underscore (because it appears next to a bracket), but doing so can’t hurt anything.

The other part of the address is the domain. The expression for that could be

“([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+”

The first part of this expression accounts for the domain and possible subdomains, while the latter half accounts for the top-level domain (such as .com, .org, or .net).

Now let’s put this together to verify an e-mail address. To do this, you’ll add the beginning and ending qualifiers; if you don’t, strings such as “ex:ample@example.com” will match although it’s not a valid address because ample@example.com matches and you didn’t specify that nothing else could be present in the variable; adding the beginning and ending qualifiers will prevent this. You’ll also have to add the slashes for delimiters on either end of the string. Here’s the resulting code:


$email = 'example-email@example-domain.com';
$validateEmail = "/^([a-zA-Z0-9\.\-\_]+)\@({[a-zA-Z0-9\-]+\.}+[a-zA-Z0-9\-]+)$/";

echo (int) preg_match($validateEmail, $email); // echos 1 for match, 0 for no match

You could insert this code into any program where you wanted to check an e-mail address for typos and it would work with very little modification.

There’s also the optional result parameter. If supplied, this parameter becomes an array containing the values of what the regular expression matched. For example, the previous code would yield an array with element 0 being ‘example@example.com’, 1 being ‘example-email’, and 2 being ‘example-domain.com’.

There are rules that dictate which elements of the array contain which matched strings. The first element (0) is always the value of the whole string that was matched. The strings under that (1, 2, 3, and so on) are numbered as the left parenthesis is encountered from left to right. Figure 5.3 illustrates the sequencing of the elements of the array containing the expression’s matches:

Figure 5.3. The elements of the result array will contain the different parts of this regular expression’s match results.

Replacements with Regular Expressions
Just as you can check to see if a string matches a pattern, you can perform replacements when strings match particular patterns. Replacements can be the same for all matches of a certain pattern, or they can be based upon what is matched.

The function you’re going to use to perform these replacements is preg_replace. This function uses the following syntax:

string preg_replace(string pattern, string replacement, string str [, int limit])

Where pattern is the pattern to match, replacement is the string to replace the pattern with, str is the string to be replacing in, and the optional parameter limit is the number of times a replacement can be made.

TIP

preg_replace is case sensitive (which means Jim and jim aren’t considered the same). A case insensitive version, pregi_replace (the “i” stands for “insensitive”), takes the same parameters, but works in a case-insensitive fashion, so that Jim, jim, and JIM are all the same.

Let’s try a replacement in which all of the links ( tags) in a string are replaced by the text “[Link]”. This requires that you go back to the href pattern you created before. The following code contains that expression:

$match = "<a.+href[ ]*=[ ]*['\"]?.*['\"]?.*>";

Although this matches the tag when the tag is the only thing in a variable, in a longer variable, it’s too greedy. This expression would end up matching everything from . To stop this, turn off the greediness of the asterisk by following it with a question mark.

Another problem with the match string is that it only matches the opening tag and not the closing tag. We need to add a provision for it to match the closing tag, also. This is done with an “or” operator (|).

Here’s the code after those changes:
$match = "<a.+href[ ]*=[ ]*['\"]?.*?['\"]?.*?>|</a>";

From there, all we have to do is add the delimiter slashes and pass it to the function. In adding the delimiter slashes, you have to escape the forward slash in the closing link tag.

The following example completes the process:



<?php
/* ch05ex06.php – replaces all links in a page with [Link] */

$str = <<<END_OF_HTML
<a xhref="http://www.example.com">This</a> is a link.

If you want a <a xhref="www.example.com">link</a>, go here.
END_OF_HTML;

$match = "/<a.+href[ ]*=[ ]*['\"]?.*?['\"]?.*?>|<\/a>/i"; // case-insensitive

echo preg_replace($match, '[Link]', $str);

?>


The output for this segment is

[Link]This[Link] is a link.
If you want a [Link]link[Link], go here.

Some replacements with regular expressions are a little more complicated. For example, say you want to make all the e-mail addresses within a string clickable. To do this you need to find the e-mail addresses, then replace those with a string that includes the e-mail address you found both in the link and as the link text.

The first step to referencing text that was matched is to understand how parentheses influence the referencing of text. Every set of parentheses in a regular expression means that it is a segment of the expression that is to be referenced. If you don’t intend to reference the value of a matched expression, it’s generally a good idea not to enclose it in parentheses unless you have to.

Now you need to be able to use the value of a certain set of parentheses. Each value is a variable named after an integer in numeric sequence, starting at one. As a rule, the whole matched string is always $0. So the first set of parentheses encountered from the left would be $1, the second would be $2, and so on.

To match an e-mail address, use the expression




$matchEmail = '/[a-zA-Z0-9\.\-\_]+\@([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+/';

And to make it clickable, do a preg_replace like this:

$str = 'This is my email address: example@example.com. Try it!';
$matchEmail = '/[a-zA-Z0-9\.\-\_]+\@([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+/';


echo preg_replace($matchEmail, "<a xhref=\"mailto:$0\">$0</a>", $str);



The preg_replace goes through the string and finds anything that looks like an e-mail address (as we’ve specified in the regular expression) and replaces it with a link, using the value found with the regular expression both as the link value and the link text. Here’s the HTML output:



This is my email address:
<a xhref="mailto:example@example.com">example@example.com</a>. Try it!

This covers the basic idea behind doing string replacements with references. Using references, you’re able to manipulate text to a virtually unlimited extent.

Filed under: Chapter 5 @ 7:02 pm
« Previous Page

Powered by WordPress