Changing Text with Sed

You have probably used grep to search for text in files and maybe complex regular expressions to find text that meets multiple requirements (e.g. “find words that start with capital letters at the beginning of a line”). There are many other tools that use these regular expressions; sed is one of them. While you can also use it to find text, much like grep, its main purpose is to modify text, and that’s what you will have a look at in this guide. If you need a refresher on regular expressions, the guide to filtering data using AWK has lots of examples and explanations.

Working with Complex Text

Changing small amounts of text is often so straightforward that it is easiest to quickly do it with an editor of your choice, but there are cases when this is not easily possible. For example, consider the ASCII art/Markdown table below:

| city           | state      | area code |
|----------------|------------|-----------|
| New york       | New york   | 718/212   |
| San francisco  | California | 415       |
| Chicago        | Illinois   | 773/312   |
| Denver         | Colorado   | 720       |
| Honolulu       | Hawaii     | 808       |

This table has a very specific structure that makes changing it difficult in a text editor, e.g. swapping two columns is a tedious task. This is true for many other text formats, e.g. CSV and TSV files. Ideally, a table like this would be automatically generated from a database and swapping columns a matter of swapping two values in the process that generates it, but sometimes all you have is the output.

First, let’s have a look at what this table renders as in Markdown:

city	state	area code
New york	New york	718/212
San francisco	California	415
Chicago	Illinois	773/312
Denver	Colorado	720
Honolulu	Hawaii	808

Replacing Words

As a warm-up, let’s replace “Hawaii” with “Hawai’i”, assuming that our Markdown table is in a file called “table.md”:

sed -e "s/Hawaii /Hawai'i/" table.md

You’re telling sed to evaluate the expression given after -e for the file table.md, where the expression instructs sed to search (s) for Hawaii (note the space at the end, which you want to delete in the replaced text to keep the formatting of the table) and replace it with Hawai'i. This will output the changed file. To change the file in place (without producing any output), you can give sed the -i flag.

So far so good, but this is very straightforward and can be done just as easily with a text editor. Let’s move on to something more complex and realistic.

Changing Text

You may have noticed that the capitalisation in our table is inconsistent and wrong in some cases, e.g. the “York” in “New York” should be capitalised. For this small example, it’s easily fixed with a series of replacement commands like the above, but if the file was much larger with many more values, you don’t want to keep iterating until you’ve fixed everything. Fortunately, you can use sed to capitalise each word, regardless of what it is:

sed -e "s/\b\([A-Za-z']\+\)/\u\1/g" table.md

Let’s take this apart. You are instructing sed to search for the regular expression \b\([A-Za-z']\+\) (the part after the s/ and before the second /). \b is the regular expression for word boundary, i.e. any character that is not part of a word. This is followed by \(, which opens a so-called capture group. This does not match anything on its own, but instructs the regular expression engine to keep track of anything matched inside it for later reference (remember that you want to capitalise each word, regardless of what it is). Inside this group, you match one or more (\+) characters from the set A-Za-z', so any upper- or lower-case letters, along with the apostrophe (to capture “Hawai’i”). Then you close the capture group with \). Note that the backslashes in front of the opening and closing parentheses and the + sign are only required here because they have special meaning in the shell, and it you don’t escape them with backslashes they will not be passed to sed.

Capture Groups

You have matched individual words, now you need to change them. This is what the part of the sed expression after the second forward slash and before /g; \u\1. This instructs sed to take what’s contained in the first capture group (\1, a backreference to the word you matched before) and change the first letter to upper-case (\u). That is, each word matched is replaced by a version of itself that has the first letter in upper-case. The g flag at the end of the sed expression instructs the regular expression engine to do this for each word on a line, not just the first one (g for “global”).

Capture groups are a very powerful concept that allows us to “mark” arbitrary parts of text matched by a regular expression for later use when assembling replacement text. A regular expression can have any number of matching groups (although in sed you can use only the first 9 in backreferences) and they can be nested arbitrarily.

Note that this requires the GNU version of sed to work; if you are on Mac OS X or FreeBSD, this will not work. The \u expression to upper-case letters is not supported in the non-GNU version of sed.

Our table now looks like this:

City	State	Area Code
New York	New York	718/212
San Francisco	California	415
Chicago	Illinois	773/312
Denver	Colorado	720
Honolulu	Hawai’i	808

Swapping Columns

Finally, let’s swap the first two columns of our table, which would be very tedious in a text editor. With sed and what you’ve learned so far, it’s straightforward:

sed -e "s/|\([^|]\+\)|\([^|]\+\)|/|\2|\1|/" table.md

You need two capture groups here, one for each column. You want to swap them, i.e. output the second capture group followed by the first. This is what our replacement expression (between the second and third forward slash) does – \2 followed by \1, with | characters outside and in between. The regular expression part has two identical capture groups, \([^|]\+\), also with | characters outside and in between. It matches a group of at least one (\+) character that is not a | character, as that would indicate the start of the next column ([^|], using the ^ at the start of a character class to instruct the regular expression engine to match everything except this character). Other columns can be matched and swapped similarly.

And here is the final table:

State	City	Area Code
New York	New York	718/212
California	San Francisco	415
Illinois	Chicago	773/312
Colorado	Denver	720
Hawai’i	Honolulu	808

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

Featured

Why Linode

Get to Know Us Better

Featured

Compute

Storage

Services

Networking

Developer Tools

Featured

Solutions

Featured

Pricing

Featured

Community

Engage With Us

Filters

Changing Text with Sed

Working with Complex Text

Replacing Words

Changing Text

Capture Groups

Swapping Columns

More Information

Your Feedback Is Important