In this section, you'll practice
creating regular expressions that achieve desired search results. The answers
are embedded in the source code, so you can VIEW the source code if you get stumped.
However, only look after you've tried yourself!
Create an HTML document containing the following code:
<script language="JavaScript">
string = "\"The quick brown fox jumps over the lazy dog,\" she said. That's that: 3 more to go!";
retv = string.match(/a/g);
document.write(string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
This code returns the following; note that it located all the letters "a" in the search string:
Now, you'll change the regular expression "/a/g" to achieve the following results:
Locate only the FIRST letter "a":
Answer:
Locate all letters "b":
Answer:
Locate all letter sequences that look like the letter "a" followed by any other character:
Answer:
Locate all letter sequences "az":
Answer:
Locate all letter sequences "ab": (Note: there aren't any!)
Answer:
Locate all letter sequences that look like the letters "at" followed by any other character:
Answer:
Locate all letter sequences that look like the letter "a" followed by any other character, followed by
the letter "y":
Answer:
Locate any of the lowercase vowels: a, e, i, o, u:
Answer:
Locate only the FIRST lowercase vowel that you come to:
Answer:
Locate all uppercase letters "T", and all lowercase letters "t":
Answer:
Locate two-character patterns that begin with either "T" or "t" and are followed by any other character:
Answer:
Locate three-character patterns that begin with either "T" or "t", are followed by any character,
and then followed by the letter "e":
Answer:
Locate three-character patterns: the first character can be any letter
between 'a' and 'g'; the second character can be anything; the third character is a letter
between 'h' and 'z':
Answer:
Locate three-character patterns: the first character can be any letter
between 'a' and 'g'; the second character can be any LETTER (uppercase or lowercase); the third character is a letter
between 'h' and 'z':
Answer:
Locate any three-character pattern that begins with ANY uppercase letter:
Answer:
Make an educated guess (a conjecture) about how you could locate a PERIOD ('.');
try it out!
Answer:
Locate some common punctuation symbols: the PERIOD (.), COMMA (,),
APOSTROPHE ('), DOUBLE-QUOTE ("), COLON (:), and EXCLAMATION POINT (!):
Answer:
Locate all numbers:
Answer:
Try to figure out how to locate spaces:
Answer:
Now, try to locate all three-letter WORDS. (Actually, only those with a space before and
a space after, for now; you can return them with the spaces before and after):
Answer:
Create an HTML document containing the following code:
<script language="JavaScript">
string = "\"The quick brown fox jumps over the lazy dog,\" she said. THAT'S THAT: 3 more to go!";
retv = string.match(/a/g);
document.write("string: ",string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
Locate the first letter that is NOT an 'a':
Hint: The negating character, the up-arrow, MUST be used inside a character class, even if there's
only a single letter.
Answer:
Locate ALL letters that are not 'a' (uppercase or lowercase):
Answer:
Locate ALL characters that are NOT letters of the alphabet (a-z or A-Z):
Answer:
Locate ALL characters that are NOT letters of the alphabet (uppercase or lowercase) or spaces:
(Make sure that you understand this answer!)
Answer:
Locate all patterns that begin with t (uppercase or lowercase), and have any number of alphabetic
letters following (a-z, A-Z):
Answer:
Try to locate all words (no punctuation) that begin with either 't', 'b' or 'd' (uppercase or lowercase):
Answer:
Locate all sentences (begin with uppercase letter, end with period, or exclamation point, or question
mark):
Answer:
A teacher claims that a student uses too many commas. Locate sentences without any commas! (Modify
your previous answer.)
Answer:
A teacher claims that a student uses too many colons. Locate sentences without any colons! (Modify
your previous answer.)
Answer:
A teacher claims that a student over-uses capitaliation for emphasis. Locate all CAPITALIZED words (that have
more than one letter):
Answer:
Locate all words that begin with 't' or 'T' and have MORE THAN TWO letters. (This will be easier to do
after tomorrow's lesson.)
Answer:
Locate all words that begin with 't' or 'T' and have MORE THAN THREE letters. (This will be easier to do
after tomorrow's lesson.)
Answer:
Locate all five-letter words (no punctuation). (This will be easier after tomorrow's class.)
Answer:
Note that the previous sentence had no words longer than five characters. Try locating all five-letter
words the same way you did in the previous question, and see what happens:
Answer:
Try to fix the problem in the previous example! Try forcing a space
before the word, and either a space or punctuation after. Why don't we get the word 'house' returned? (Note:
some of these problems will be resolved when we study "Anchoring Patterns".)
Answer:
Match an 'h' followed by an optional 'i':
Answer:
Match an 'h' followed by one or more 'i':
Answer:
Match an 'h' followed by zero or more 'i':
Answer:
Match words beginning with 'f', followed by a vowel that is NOT an 'a':
Answer:
Match words beginning with 'f', followed by any letter that is NOT an 'a':
Answer:
Create an HTML document containing the following code:
<script language="JavaScript">
string = "different regular expressions will go here";
retv = string.match(/a/g);
document.write("string: ",string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
You'll be changing both the string = and the retv = lines
for each of these exercises.
PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values.
string = "x1xx2xxx3xxxx4xxxxx5xxxxxx6xxxxxxx7xxxxxxxx8"
retv1 = /x/;
retv2 = /x/g;
retv3 = /x+/g;
retv4 = /x{2,4}/g;
retv5 = /x{2,4}[1-9]/g;
retv6 = /x{5,}/g;
retv7 = /x{5}/;
retv8 = /x{5}/g;
Locate all four-character patterns that begin with 't' and end with 't'. (Use a general
multipler for the middle part.)
Answer:
Locate all four-character patterns that begin with 't' and end with 't' or 'n'. (Use a general
multipler for the middle part.)
Answer:
Locate all four-to-six character patterns that begin with 't' and end with a space. (Note:
the space counts as one of the characters.)
Answer:
Locate all four-to-six character patterns that begin with 'th' and end with a space:
Answer:
The next exercise will give you practice with the "leftmost is greediest" idea:
PREDICT what will be returned by each of the following. Then, TEST your predictions!
string = "a xxx c xxxxxxxx c xxx d";
retv1 = string.match(/a.*c/);
retv2 = string.match(/a.*c.*d/);
USING VARIABLES IN REGULAR EXPRESSIONS:
Using variables in regular expressions is a bit tricky. Suppose you want to find all the letters "C"
in a string. Using variables, you might try something like this:
str = "Carol";
ltr = "C";
retv = str.match(/ltr/);
document.write(retv);
Try it! Here's the result you'll get:
The reason this is returning "null" is that it is looking for the literal letters
"ltr", NOT the stuff that is stored in memory location "ltr"! And, there are no letters "ltr"
in the string "Carol"! So, HOW can we force JavaScript to treat the contents
of the variable "ltr" as a regular expression? By CREATING an actual
REGULAR EXPRESSION from the desired string with the RegExp() operator, like this:
str = "Carol";
ltr = "C";
var ltrregexp = new RegExp(ltr);
retv = str.match(ltrregexp);
document.write(retv);
NOTE: the RegExp(string_to_be_made_into_regular_expression) operator automatically puts the slashes
around the input string for you!
When we try out this new code, we get what we want:
(10 pts) WRITE A FUNCTION that accepts four inputs: a string (str),
a beginning character (char1), an ending character (char2),
and a length (n). It should match the first pattern that has length n,
that begins with char1 and ends with char2.
For example, if the name of the function is matchthis, then the function call
matchthis("cat cut cot","c","t",3)
should return:
(5 pts) Study the class handout on RegExp.
Figure out how to revise the previous function so that it returns ALL matching patterns. For example,
the function call
Create an HTML document containing the following code:
<script language="JavaScript">
string = "different regular expressions will go here";
retv = string.match(/a/g);
document.write("string: ",string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
You'll be changing both the string = and the retv = lines
for each of these exercises.
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values.
string = "fredxbarneyx fredxbarneyy fredtbarneyt fredtbarneyx";
retv1 = string.match(/fred(.)barney\1/;
retv2 = string.match(/fred.barney./;
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values.
string = "axbycydx atbqcqdt axbycxdy";
retv1 = string.match(/a(.)b(.)c\2d\1/g);
retv2 = string.match(/a(.)b(.)c\1d\2/g);
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values. (Be careful
on the second one! Remember the "greedy" nature of things! It may not return what you first think...)
string = "aFREDbFREDc aCarolbCarolcdefg abc aXXbXXXc aFREDbCarolcCarol abJULIAcJULIAdefg";
retv1 = string.match(/a(.*)b\1c/g);
retv2 = string.match(/a.*b(.*)c\1/g);
So, now we've learned that parentheses have special meaning in the language of regular expressions; they
are used to "remember" stuff. So, if you really WANT to match parentheses, you must put a backslash in
front of them! PREDICT what you'll get for each of these!
string= "hat sat bat cat hut cut but sit hit";
retv1 = string.match(/hat|bat/);
retv2 = string.match(/hat|bat/g);
retv3 = string.match(/.at|.ut/);
retv4 = string.match(/.at|.ut/g);
Create an HTML document containing the following code:
<script language="JavaScript">
string = "different regular expressions will go here";
retv = string.match(/a/g);
document.write("string: ",string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
You'll be changing both the string = and the retv = lines
for each of these exercises.
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values.
string = "fredxbarneyx fredxbarneyy fredtbarneyt fredtbarneyx";
retv1 = string.match(/fred(.)barney\1/;
retv2 = string.match(/fred.barney./;
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values.
string = "axbycydx atbqcqdt axbycxdy";
retv1 = string.match(/a(.)b(.)c\2d\1/g);
retv2 = string.match(/a(.)b(.)c\1d\2/g);
Try to PREDICT what will be returned by each of the following regular expressions.
Then, test your predictions! Make sure that you UNDERSTAND all the returned values. (Be careful
on the second one! Remember the "greedy" nature of things! It may not return what you first think...)
string = "aFREDbFREDc aCarolbCarolcdefg abc aXXbXXXc aFREDbCarolcCarol abJULIAcJULIAdefg";
retv1 = string.match(/a(.*)b\1c/g);
retv2 = string.match(/a.*b(.*)c\1/g);
So, now we've learned that parentheses have special meaning in the language of regular expressions; they
are used to "remember" stuff. So, if you really WANT to match parentheses, you must put a backslash in
front of them! PREDICT what you'll get for each of these!
string= "hat sat bat cat hut cut but sit hit";
retv1 = string.match(/hat|bat/);
retv2 = string.match(/hat|bat/g);
retv3 = string.match(/.at|.ut/);
retv4 = string.match(/.at|.ut/g);
Create an HTML document containing the following code:
<script language="JavaScript">
string = "different regular expressions will go here";
retv = string.match(/a/g);
document.write("string: ",string,"<BR>");
document.write("returned values: ",retv,"<BR>");
</script>
You'll be changing both the string = and the retv = lines
for each of these exercises.
Be sure you can answer the questions:
"What do anchors allow you to ensure?"
"What are the four anchors? Briefly describe each."
Be sure you can answer the questions:
"What is a word boundary?"
"What characters match \w ?"
"What characters match \W ?"
We've studied two different uses for the "carat" ^ in regular expressions.
What are these two uses?
Predict what will be returned in each case below. In each case, be sure that
you can CIRCLE, in the string, exactly what part(s) are being returned. (Make sure
that you really understand retv7 and retv8! Here's the "greedy" stuff coming into play again!)
"Word boundary" is actually a tricky concept to conquer. Note that a
"word boundary" is NOT a character; it's a position BETWEEN two characters (a word
character, and a non-word character). For example, in the string "abc def", there
are TWO word boundaries (and a space) between the "abc" and the "def". (Hopefully,
this little discussion will help you with this exercise!) PREDICT what will be returned in each case below:
string = "abcdef abc def";
retv1 = string.match(/\b[a-zA-Z]+\b/g);
retv2 = string.match(/\b\w+\b/g);
retv3 = string.match(/abc\bdef/);
Write code that will return all words (alphabetic only): thus, should return "kite,hat,great".
Write code that will return all "words" (alphabetic, numbers, and underscore): thus, should return "kite, crate1, _crate2,hat,great".
Write code that will return all "words" (alphabetic, numbers, but NO leading underscore): thus,
should return "kite,crate1,hat,great".