2-Regex

pdf version

Exercises

Download the dictionary from benschmidt.org/words.txt and load it into a text editor.

Searches

  1. The word “picalilli” contains five consecutive “l” or “i” letters. What word contains 6 consecutive “i” or “ls”?

  2. What is the longest substring of your name for which a word contains all the matches? For example, my name is “Ben Schmidt” and I can match the first five letters wtih the capitalized letters in the word BirkENStoCk. What is the regex for it?

  3. What dictionary words contain the same letter, three times in a row?

  4. Besides the word found in question 1, are there any other words in the dictionary that contain two identical letters 6 times in a row? 7 times in a row?

Replacements

  1. Design a regex that replaces the text strings "NU" and "NEU" with the word “Northeastern.” For example, it would transform

    The NU huskies are competing in Thursday's game: email
    m.meehan@husky.neu.edu for more information.

    into

    The Northeastern huskies are competing in Thursday's game: email
    m.meehan@husky.Northeastern.edu for more information.
  2. Improve your regex so that it doesn’t replace strings that are part of longer words; for example, it should not replace “entrepreneur” with “entrepreNortheaternr”.

  3. Sometimes documents have excessive spaces in them. (For instance, if you copy and paste from the Internet). Write a regex that reduces any string of spaces down to just one. For example, this text:

    Good day, everyone.
    1     4      6
    
    3   4       10
    Good night, ladies

    Would be reduced to:

    Good day, everyone. 1 4 6 3 4 10 Good night, ladies 
  4. Write a regex that changes the spelling of all words in a document so that they conform to the rule “I before e, except after c.”

Concordances

An online version of the bible is at dighist15.benschmidt.org/bible. It allows you to filter and replace at once on the bible. This may take some time to run, so it initially will only show values for the book of Matthew.

  1. Create a regex that reduces the bible to a concordance for the word “love” that shows 3 words before and 3 words after.

  2. Edit that regex so that it includes the book/line/verse number as the beginning.