Regular Expressions for Network Engineers
How many times were you working on a task which involved either updating all instances of a piece of configuration or creating a new configuration piece at multiple points on a network device? You have translated the requirements into functional syntax, a blueprint, for the specific hardware platform, now it’s time to implement it 10’s of times on the device. How do you implement it on the device?
For small and non-routine one-off tasks, the quickest way may be to jump on the device and repeat the manual labor N times at different places and with slight variations, where N is hopefully a relatively small number worth this manual approach. This may also be true for a junior network engineer who may not know other efficient methods of achieving it.
This is a type of automation as we aim to reduce if not eliminate manual processes that are very well defined and certainly repeatable. While automation can go a long way where we can have multiple devices or device groups, automated login to these, implementation of config and finally verification of status and rollback if needed, all being launched in order by a single orchestrating script – say an Ansible Playbook. Let’s keep that for some other day and talk about simple config generation on a single device that we can manually apply.
Ok, enough of the need for regular expressions (regex), let’s get started.
Regular Expressions
A regular expression defines a search pattern. This by itself is useful. However, combined with a “replace” pattern, it becomes even more useful. Let’s see some rules for the search pattern first.
METACHARACTER | DESCRIPTION | EXAMPLE |
---|---|---|
. | Match any single character | “a.c” matches aac, abc, acc, a1c, a2c and so on |
* | Match preceding character zero or more times | “ab*c” matches ac, abc, abbc, abbbc and so on |
+ | Match preceding character one or more times | “ab+c” matches abc, abbc, abbbc and so on |
[ ] |
Match any of the characters between the brackets Special Ranges: [0-9] matches digits 0 to 9. [A-Z] matches uppercase A to Z. [a-z] matches lowercase a to z. \d is the same as [0-9] \w is the same as [a-zA-Z0-9] |
“a[bcde]f” matches abf, acf, adf, aef only. “int gig[1-3]/0” matches “int gig1/0”, “int gig2/0”, “int gig3/0” “int [a-z]+1/0” matches “int eth1/0”, “int gig1/0”, “int f1/0” “int \w+/0” matches “int eth1/0”, “int gig1/0”, “int gig2/0” |
\ | Don’t interpret the following single character as a regex metacharacter |
“ab\*c” matches ab*c only. The “*” is not interpreted as a metacharacter. |
^ | Match the start of line |
“^ntp” matches “ntp server …”, “ntp peer …”, but not “!ntp …” |
$ | Match the end of line |
“eth3/10$” matches “int eth3/10”, “ip radius source-interface eth3/10”, but not “interface ethernet3/10” |
| | Logical OR two conditions |
“eth|gig” matches “int eth1/0”, “int gig1/0” but not “int f1/0” |
( ) | Define a subexpression. It can be recalled later using \1, \2 .. or $1, $2 .. – more on this later. |
“int (eth|gig|fe)1/0” matches “int eth1/0”, “int gig1/0”, “int fe1/0) “int (eth|gig)([1-2])+/0” matches “int eth1/0”, “int eth2/0”, “int gig1/0”, “int gig2/0” |
? | Match a previous metacharacter zero or one time. The point to remember is that “?” is greedy i.e. if it finds a match, it will always match, thus ignoring the “zero” match part. |
abc(de)?f matches abcdef but not “abcf” in the greedy mode if “abcdef” is the input string |
While there are other regex metacharacters, the above should be enough for a start.
Before moving on to use the regex we just learned in actual config generation, let’s talk briefly about a “replace” pattern. While fixed replace strings are something you would have used an astronomical number of times in typical Find/Replace operations in text editors. What if you need the replace pattern to contain a part of the matched search pattern? For that, you need to “reference” the earlier matched search pattern.
To recall a search pattern later in the replace pattern, we use the “( )” metacharacter in the search pattern as described earlier. The first “(” bracket in the search pattern is mapped to the replace pattern character \1, second “(” to \2, third “(” to \3 and so on up to \9 which matches the 9th occurrence of “(” in the search pattern. Some text editors take it as $1, $2 instead of \1, \2 i.e. Atom editor.
Let’s see some examples.
- Example Text: interface gig1
- Search Pattern: interface ([a-z]+)(\d+)
\1 in the replace pattern will match contents of ([a-z]+) = gig
\2 in the replace pattern will match contents of (\d+) = 1
- Example Text: interface gig1
- Search Pattern: interface ([a-z]+(\d+))
\1 in the replace pattern will match contents of ([a-z]+(\d+)) = gig1
\2 in the replace pattern will match contents of (\d+) = 1
Having seen some basic regex patterns, now it’s time to see how it could be useful in generating device configurations.
For the purpose of this post, I am using Notepad++ to generate the configuration. It is straight to the point, simple all in one text editor package. On Apple Mac, I am using Atom which is a modular and highly customizable editor. Which can be daunting at first and takes some time to setup, install the right packages and get used to. With respect to regex usage in Atom, use the “$1” convention instead of “\1” in the replace pattern and “\n” instead of “\r\n”.
Generate Configuration from Tabular Data
Suppose the configuration data exists in a tabular form in an Excel CSV file or in a design document containing network parameters and config snippets. The objective is to generate platform specific configuration from this file to build the physical device.
!!!! CSV File Containing VLAN SVI data vlan,name,desc,ip,mask,dhcp1,dhcp2 10,data,Data VLAN,10.10.10.0,255.255.255.0,10.1.1.1,10.1.1.2 20,voice,Voice VLAN,10.10.20.0,255.255.255.0,10.1.1.1,10.1.1.2
!!!! SVI Configuration Snippet vlan <VLAN> name <NAME> interface vlan<VLAN> description <vlan-desc> ip address <ip> <mask> ip helper-address <helper1> ip helper-address <helper2> no shutdown
Regex searches and replace patterns to generate target config from above are given below.
^(.+),(.+),(.+),(.+),(.+),(.+),(.+)$
vlan \1\r\n name \2\r\n\r\ninterface vlan\1\r\n description \3\r\n ip address \4 \5\r\n ip helper-address \6\r\n ip helper-address \7\r\n no shutdown\r\n
The resulting configuration looks like this. Consider these were 50 or 100 VLAN’s, how long would that have taken?
vlan 10 name data interface vlan10 description Data VLAN ip address 10.10.10.0 255.255.255.0 ip helper-address 10.1.1.1 ip helper-address 10.1.1.2 no shutdown vlan 20 name voice interface vlan20 description Voice VLAN ip address 10.10.20.0 255.255.255.0 ip helper-address 10.1.1.1 ip helper-address 10.1.1.2 no shutdown
Generate Tabular Data from Configuration
This can be a basic documentation method or a way to migrate from one hardware/vendor platform to another. Using the device config from the last section, regex search and replace patterns to generate CSV formatted tabular data are given below:
vlan (\d+)\r\n name (\w+)\r\n\r\ninterface vlan\d+\r\n description (.+)\r\n ip address ([0-9.]+) ([0-9.]+)\r\n ip helper-address ([0-9.]+)\r\n ip helper-address ([0-9.]+)\r\n no shutdown\r\n
\1,\2,\3,\4,\5,\6,\7
And the resulting output is the same as before:
10,data,Data VLAN,10.10.10.0,255.255.255.0,10.1.1.1,10.1.1.2 20,voice,Voice VLAN,10.10.20.0,255.255.255.0,10.1.1.1,10.1.1.2
Accounting for missing configuration
Sometimes translating configuration into tabular data may not readily work due to inconsistent configuration across different stanzas. As an example, suppose VLAN 20 is missing the “description” bit under the SVI config.
vlan 10 name data interface vlan10 description Data VLAN ip address 10.10.10.0 255.255.255.0 ip helper-address 10.1.1.1 ip helper-address 10.1.1.2 no shutdown vlan 20 name voice interface vlan20 ip address 10.10.20.0 255.255.255.0 ip helper-address 10.1.1.1 ip helper-address 10.1.1.2 no shutdown
Our search pattern needs to account for both the presence and absence of the description field. That is the task for our zero or one match presence checking greedy friend, the “?”. Here is how it will look:
vlan (\d+)\r\n name (\w+)\r\n\r\ninterface vlan\d+\r\n( description (.+)\r\n)? ip address ([0-9.]+) ([0-9.]+)\r\n ip helper-address ([0-9.]+)\r\n ip helper-address ([0-9.]+)\r\n no shutdown\r\n
\1,\2,\4,\5,\6,\7,\8
And it converts it to this as CSV:
10,data,Data VLAN,10.10.10.0,255.255.255.0,10.1.1.1,10.1.1.2 20,voice,,10.10.20.0,255.255.255.0,10.1.1.1,10.1.1.2
Notice the 3rd field, description, is blank in the second record. The replace pattern matched “\4” for this field, which matched on nothing, hence blank.
Conclusion
Regex could be super useful when you know how to use it and together with the ubiquitous Notpad++ editor on pretty much any IT desktop machine, it is a powerful data analysis combination for a quick fix. The usage remains almost the same outside the realms of text editors. Many Cisco IOS, IOS XE, NXOS and other vendor platforms support regex based show commands on the CLI – use regex search pattern. The caveat is, often with only basic regex support. If you used Linux “grep” or “egrep” command line tools, search patterns work there too.
If your config automation requires more then a few search/replace tasks or some non-trivial if-else checks and it is a frequent activity, scripting the logic in some programming language, Python or better still an automation framework like Ansible, may save you heaps of time in future. Even in such cases, on the fly Notepad++ and regex combination can act as a powerful proof of concept and verification tool.