This is an HP R episode 19086 entitled Introduction to Send, Part 2. It is posted by name Moris and in about 61 minutes long. The summary is some more about the GNU SET command. This episode of HP R is brought to you by an honest host.com. Get 15% discount on all shared hosting with the offer code HP R15, that's HP R15. Make a web hosting that's honest and fair at an honest host.com. Hello everyone, this is Dave Moris and I've got episode 2 of my series on the SET command. In the last episode we had a look at SET from a fairly simple level. The sort of thing that suddenly I learned when I first started using SET, found it a bit confusing to admit so I just kept to the simple stuff. We just looked at some of the command-lined options and we started looking at regular expressions. Well we're going to look at both of these subjects a bit more detail this time and I think I said last time but it's GNU SET that we're looking at here, probably did refer to it, I can't remember. There are quite a few extensions over the original version of SET which complies with the posic standards. These extensions provide fair number of extra features but, and here's the point of this, SET scripts that you write in this way are not necessarily portable so if you're moving to another unique system or BSD system or something like that, you might find that they don't work because of these extensions so just bear that in mind that's what you like to do. There's a couple of new data files I've got in this episode which I mentioned in there in the notes. We'll talk about them as we come to them. So looking at command-line options we looked at the minus E option to introduce expressions, minus F option for files and we'll look a few more today there's quite a number of them actually and I'm not planning to cover all of them in this series. I've referred to the GNU SET manual for the whole list if you ever need them. Let's start with minus N. There are two other alternatives for this. It's minus minus quiet or minus minus silent. Now as you probably gather from the last time, SET prints out the pattern space at the end of each cycle through the script. Remember we talked about this, a line is taken in by SET stored in a place called the pattern space. Then the script that you've defined is applied to that line and then it's printed. The minus N option and its variance disabled is automatic printing and said only produces it but when you tell it explicitly there's a flag to the S command which does this, the P flag and there's also a P command which we'll talk about next in the next episode. Next option is the minus i option also referred to as minus minus N-place. This can be followed by a suffix and what it does is to make a SET edit files in place. So what we've been doing so far is to give SET a file or feed it on standard in, feed some sort of text on standard in and then it works on it and puts it out on its standard out channel. But in no cases have we actually seen it writing stuff back to the file. You can't just redirect it back to the file because you can't do that sort of thing in Unix you end up effectively deleting the file if you try that. So the minus i thing is for editing the file in place. So if you provide a suffix and usual one is to put.sav or dot be a k or something after the minus i then the original file is renamed by adding that suffix to the end of it and the edited file the change file is given the original name. So when you go looking you see two files where there were one, one with the suffix on the end of it and that's the original copy. If you don't give a suffix at all then the original file is replaced by the edited file so you can't go back. Now by default said treats all input files on the command line as a single stream of data. When the minus i option is used the files are treated separately so you can add edit multiple files this way. There's also a minus s option which will come on to in a little while which also treats the files separately. There's a thing another thing that I didn't know about actually until recently if the suffix can change an asterisk symbol then this is actually replaced by the current file name. I've got an example later on in this episode example one which demonstrates how you can use that. So moving on to the next option minus minus follow minus sin links as why m l i n k s. This option is relevant to the minus i option and it's only relevant on systems that know anything about symbolic links so that's all the unique systems I think there are any that don't I don't know. Anyway if specified if the file being edited is a symbolic link the link will be followed in the actual file will be changed. If it's emitted which is the default behavior is not to follow the sim links the link will be broken and the actual file will not be changed. So if you ran said in with the minus i without follow sim links and it was the file you were trying to change was a sim link to the real file you would find that you suddenly had a file of the name relating to the the sim link in your directory and the sim link would have been gone and the file would then contain whatever the edited text was. It's an easy trap for all into I fell into it just today while putting together some examples for this series. So be aware of that one it's actually quite potentially problematic. I mean it doesn't cause any damage but it messes things up a bit. So I mentioned the minus s option that's also alias two hyphen hyphen separate. So this is the thing that controls with the said treats the input files on the command line as a single stream of data or whether it treats them as separate files. So you need to put a minus s in to get it to treat them as separate files. And the last one we look at today is minus r or it's its full form is minus minus rejects r e g e x p hyphen extended. By default said uses basic regular expressions but this is a gu new extension which allows extended regular expression which of the sort that are used by the e-grep command. So we'll be looking at this in a bit more detail today in this episode. Standard said uses backslashes to denote a number of the special characters in the regular expression so called metric characters but in extended mode these backslashes are not required but if you do this then the resulting regular expression is not portable. So what I want to do today is to talk about the s command some more. That's the substitute command that we looked at last time. So in order to look at this command in more detail we need to look further at regular expressions and you probably can be a fair bit more complex than what we looked at in the last episode. There's a bunch of new metric characters that we look at today and all of them start with the backslash. Now just as an aside regular expressions are used all over the place in unix in all sorts of tools and editors and that sort of thing. There's a variation between those that use metric characters with the backslash in front of them and those that don't. This can be confusing so it's a good idea to be aware of the difference between the different tools and their needs in terms of regular expressions. They tend to use similar metric characters but there's some variability in whether they need a backslash in front of them or not. So what I've done in the notes is I've made a little table of the characters we're going to talk about today and that's really for your reference and then I've followed that with a section which goes into more detail about each one so I'll not read out the table because I don't think that's going to be very helpful but it's there for your reference we'll dive straight in with the first of these metric characters and the first one is backslash plus. Now this is a modifier which means one or more of the proceeding. So what you do is you put it in you put it behind an expression or a character let's say and it means one or more of that character or expression. So last time we had expressions like a star bc meaning an a modified by a star meaning zero to infinity of these characters followed by a b and a c. If we change that to a backslash plus bc then we're matching the sequence a bc with this one a a a bc two a's a a bc and so on to as many a's as you wish. It does not match just bc because you must have at least one a in that example. Okay some of the examples towards the end of this episode use this in a bit more detail. Now this is a canoe said extension. The next one is backslash question mark. This is also similar to the the asterisk but it matches zero or one of the proceeding expressions. It's modifying the same sort of way. So if we were to use the expression s slash a backslash question mark bc slash d a of slash and that's a substitution expression which means substitute a bc by d f. But because the a is followed by a backslash question mark it can be emitted or there has to be just one. So it'll just match bc or a bc. And again this is a canoe said extension as well. Then we get into a collection of regular expression modifiers which have got braces curly brackets as I tend to call them. And the first one is a modifier which says a fixed number of the proceeding. So using backslash open curly bracket then a number then backslash closed curly bracket. We can specify a fixed number of the proceeding expression. So using the the well worn a bc example again. If we I won't read out the entire s substitute command but if the regular expression is a backslash open curly bracket 3 backslash closed curly bracket bc then what that means is it's to match an a which must occur three times. It's equivalent to typing a a a bc. But there's times when you might want to specify a number of that particular character and it's more convenient not to type it. In the example I've given here it's a bit of a fiddle to type it in in that way but I'm just making the point really. The next one is sort of upgrade from the previous one where in the curly brackets you have a lower and upper band. So between i and j of the proceeding is the way I've expressed it. So if we go to our example we've got a followed by backslash open curly bracket 1 comma 5 backslash closed curly bracket bc. What that saying is the a can occur between 1 and 5 times so that matches a bc a a bc etc etc. I have listed them all out in the the notes but I should or want me to read them all out. Anyway between 1 and 5 a is followed by bc. The third variant of this particular thing is from i all more of the proceeding as I've expressed it in the title. So this one consists of an open curly bracket with the backslash in front of it followed by a comma and then backslash closed curly bracket. So that means from that number to an infinite number of the proceeding character or expression. So my example of regular expression I've got a backslash open curly bracket 1 comma backslash closed curly bracket bc. So that that matches a bc a a bc and so on to there's no limit to the number of a characters and that's the same as a backslash plus the one we saw at the start of this list bc. So it's one from one to any number of a's but of course using this form the starting number can be something other than one can be greater than one. Now the next topic is not really a meta character but it's a way of it's a way of grouping the elements of a regular expression. So all the examples we've worked with so far they've tended to have all been referring their modifiers to a single character but we can group characters or indeed regular expressions into more complex expressions. The way we do that is to use backslash open parentheses and backslash closed parentheses to enclose them. So going to the tried and tested a bc thing then if the expression I'll give you the full expression this time is s slash backslash open parentheses a bc backslash closed parentheses asterisk df slash gha slash and what that substitution is doing is it's it's wanting to match the expression the the sequence of characters either df or abc df or abc abc df. So what it's what it's actually saying is the string abc can occur zero times one time two times et cetera with multiple instances of abc in front of the df. Now there's further level of magic if you like associated with this grouping. As you write a regular expression with such groups in it each group is numbered by said and it just simply counts the number of backslash open parentheses it occurrences and this allows the various sub expressions enclosed in this way to be referenced elsewhere in the expression and we'll be looking at that shortly. The next meta character is whatever referred to as alternatives it's possible to build a regular expression with alternative sub expressions so one or other of the one or another of these these sub expressions are going to be matched and you do that by using the characters backslash then the vertical bar. So say for example you want to write a regular expression to match either the string hello world or goodbye world and you want to find those without an exclamation mark at the end and then add one. So I've given a full command line sequence here to demonstrate it and I've got an echo with the string in double quotes hello space world in capitals leading capitals and then I pipe that with the pipes and we'll intersect and the said command is followed by minus e open quote s slash hello backslash vertical bar goodbye space world slash ampazand exclamation mark slash close quote. Now what that would that would seem to be a reasonable way of solving this problem but the answer you get back is hello exclamation mark space world. If you then fed the same said expression the string goodbye world which is my second example then you put the exclamation mark at the end after world so this might be unexpected if you if you were first time you tried working with this sort of stuff. What's happened is that said has just matched the hello in the first part of the regular expression so the replacement ampazand exclamation mark has just resulted in an exclamation mark being placed after this word. In the second case it's matched goodbye world and the exclamation marks been placed properly. So what we actually wanted to do was to match either hello or goodbye followed by the word world and that's done in my next example which is echoing hello world to a said command which contains the s command which sounds which is structured like this s slash backslash open parenthesis we're grouping here hello backslash vertical bar that's the alternative symbol then goodbye backslash close parenthesis so we've grouped the hello and the the backslash vertical bar with these these parenthesis then the close parenthesis is followed by space world then we have a slash an ampazand vertical an exclamation mark slash quote so that does put the exclamation mark at the end of the string after hello world and if you feed it the string goodbye world it works for that as well so we've constrained what the two alternatives to this alternative meta character is what the two alternatives are I should say by grouping them the number of alternatives can be more than two and I've done a further example it's uses matches fair well in as well as hello goodbye and I've done that with another vertical another x backslash vertical bar fair well as you can see in the example so this is a GNU extension this alternative business so next we'll look at the subject of greediness in the context of regular expressions so the way that said and other things use regular expressions do their matching can sometimes be a little bit unexpected and the the subject of so-called greediness is where more is matched there might be predicted I've quoted what it says in the GNU manual the quote is note that the regular expression matcher is greedy IE matches are attempted from left to right and if two or more matches are possible starting at the same character it selects the longest so say for example we're trying to process the example file for this episode which I've called said underscore demo two dot txt that's the full text of the about page from the hbr website and we're looking for a word starting with capital h at the start of a line you might think well the regular expression circumflex or up arrow as I tend to call it followed by a capital h followed by a dot followed by a backslash plus meaning a line starting with the capital h and followed by some number of one to many characters followed by a space that would do it now I've given an example what happens if you do this and I've made the matching string be enclosed by square brackets just so you can see where the where the matching began and ended and I've made it print out only the lines that match I'm used the minus an option in the command line options and I've used the p flag before I've actually talked about it but bear with me it's difficult to know what order to introduce these things in it's like this one wrong doesn't it anyway the command is this is something you could type up the command line said space minus n minus n e just to digress for a second when you have single character options to any unix Linux command you can concatenate them so minus n space minus e can be concatenated to minus n e you can't do that if you're using the full form the minus minus some text things you can do if a single character one anyway minus n e space open quote s slash second flex h dot back slash plus space slash so that's the regular expression then open square brackets and present close square brackets that means whatever you matched put it in square brackets slash that's the end of the replacement p the p on the n says the print it space said underscore demo two dot t x t so what you get back is I won't read out all three lines that you get back but the first line you get back is open square bracket hacker public radio brackets h p r is an internet radio show brackets podcast that space close bracket releases so what's happened is the regular expression matcher has matched everything from the leading h to the last space on the line it's gobbled up everything in that in the dot back slash plus the the matcher has said well that can match everything including spaces up to the last possible space on the online last space on the line so that's that's what is referred to as greediness I given an example of how you can limit this sort of behavior and the essence of it is that if you don't put dot back slash plus meaning any character one or more of them and instead you put in the regular expression open square bracket circumflex space close square bracket instead of the dot then what that means is a not space the square bracket is a set makes a set or a list and the using a circumflex as a first character means everything but the character in the list or characters in the list well that means is then I want to have one or more not spaces so that would match the word hacker on the first line all of which HACKER are not spaces but it won't match the space so what that does is it puts the square brackets in the example here it's similar example before it puts the square brackets in the the result around the word hacker and the space that follows it so that has constrained what the regular expression match it can do and it's curved it's greediness just as an aside other regular expression matches in other languages they're greediness can be be controlled more explicitly that's put it that way I won't go into how and and that's actually because that's really a massive digression maybe I should do this series on regular expressions at some point but we shall see so the other element of the S command is the replacement part and last time we saw the use of the ampersand which was a way of signifying the whole of the line that matched the regular expression part of the of the command and some the examples we've just seen used that we're going to look at a few more capabilities of the replacement part the first one is the back reference so we were looking at grouping elements of the regular expression a bit earlier on and what we can do I made reference the fact that each of the groups were numbered well we can refer to the groups with the sequence back slash followed by number the numbers between one and nine you can't have more groups than that which are which can be referenced but it's quite a useful feature so my first example shows a whole command line where the string hack a public radio is being echoed to a said command and the said command consists of said space minus e space quote s slash back slash open parenthesis dot back slash plus back slash open parenthesis so that so far means a whole bunch of any character one or more of any character grouped together in a group so we can refer back to it that's in followed by space in the regular expression then another one of the same of the grouped dot back slash plus and a space and then another one so there are three groups and think you probably tweak that this matches the three words in the string so the replacement part in this example consists of back slash three space back slash two space back slash one now back slash three refers to the third group which is radio back slash two refers to the second group which is public and back slash one refers to the first group which is hacker so what gets returned is radio public hacker one other aspect of the back references that they can be used inside the regular expression itself so my next example shows echoing the string in quote run space lowler space run never seen that film I'm really must get around to seeing it sometimes it's supposed to be very good and it's piped into a said command which consists of said space minus the space and then we've got the same sequence of groups that match a word so just to do one of them just remind you back slash open parenthesis dot back slash plus back slash closed parenthesis space so there's one of those followed by another one and then the third instead of having a third one we simply refer to back slash one so what we're saying is whatever matches the first word is to be used as the last one because we've got a phrase that consists of the same word in position one and three so if we then invert them or change their order the replacement is back slash two space back slash one space back slash one close slash close quote so we end up with the string lowler run run and you could have grouped the back slash one in the regular expression I show an example of how that's possible to do but it makes no sense since it achieves the same end result and it makes said work harder to achieve it so the other thing you can do in the replacement part of the S command is to manipulate the case of what you have selected through the regular expression this is a genus said extension and it allows you to change the case using the sequences back slash capital L or back slash lower case L back slash upper case U back slash lower case U and back slash capital E so the back slash capital L means turn the replacement to lower case until you find another one of these some case change sequences like back slash U or back slash E back slash E means stop stop changing case back slash lower case L means just turn the next character to lower case so back slash capital and lower case U have a similar effect they they've turned the replacement to upper case until it finds a point to stop or the next character and the back slash capital E as we've already seen is the stop mark to stop case conversion so what I've done here is to reiterate one of the examples we had before where we echo the string hack a public radio to said and we select out the three words but then in the replacement part I've put back slash upper case U back slash one space back slash capital L back slash one and then repeated the same sequence for back slash two and back slash three so the result of that is to change the word hacker to upper case and then to lower case then public the same and radio the same and my joke was this is from Ken's script for the community news where he has a tendency to go hacker hacker a public public radio radio so people joke but that's my trademark there is more than we can say about flags as well we saw the G flag in last episode which makes the the substitution keep repeating for each line so every match that it can possibly find in that line it will iterate over there's some more that you can use I've not covered them all here because some of them are quite obscure I reckon anyway I might squeeze them in later on to a later episode but really we don't want to this is meant to be an introduction to said I don't really want to go into every possible corner of it I'm not even sure I'm equipped to do that but and you probably have turned off long before that let's talk about one of these which is a number a number flag it's just a simple number and what it does it it just applies to that number match so my example is echoing the string eony comma mini comma minony all in lower case to said and the command is said space minus e space quote s slash n y they all got n y the ends of them those words slash back slash u uppercase u that is ampersand slash two so what that's saying is find an instance of n y lower case n y which is in each of the three words change it to um uppercase form and we're using the ampersand to to mean the the thing that was matched but at the end of the after the closing slash we put a number two so what that means is only do this for the second instance of n y so the result is eony comma minony where minony is m e e in lower case capital n capital y comma minony so that can be quite useful at times I've certainly used it myself in in medications in fact to be honest I don't need to discover it it existed when I started preparing this show but it's quite cool I think then the next flag is the p flag which I've already made reference to that is for making the substitute command the s command print the pattern space and it's normally used in conjunction with the minus n command line option which we've already seen my example is a said command which uses minus n space minus e I didn't join them together in this instance just to prove that either or possible and the substitute is in quote s slash hacker space slash hobby space slash p close quote and I'm applying this to the files said underscore demo two dot txt which is just a file of more text than demo one and what it does is it changes the two instances of of hacker followed by a space in this file it's part of hacker public radio to hobby public radio and it just prints the two lines which you can see so point of p then is that only when a substitution is made does anything get printed if you've got the n option I didn't I should have said in the notes but didn't that if you use this a p flag when you don't have a minus n option then it just repeats the line so as it it prints the line is printed by the auto print method which is how said normally works and then the p on the end causes it to be printed or over again I can't see many instances of where you'd want to do that but it's usually a mistake I think certainly is in my case the final flag is the i flag this is an extension a genucid extension and they cause the regular expression to be case in sensitive so I simply repeated the same command that we just had the example we just had except that in the regular expression I've used hacker in lower case replaced that by hobby in in mixed case and I put the flags i and p on the end of the s expression and what that does is it does exactly the same things the briefs one did except that it's now case in sensitive when it's looking for the word hacker so that demonstrates that particular point the upper case i and lower case i have no separate significance you can use either of them as the as the flag so at this point I wanted to talk about some of the further extensions that genucid offers in terms of what you can put into regular expressions and indeed in some cases you can put them in the replacement as well so it's got a way of referencing or producing as I've said in the notes some special characters and there's more than I'm talking about here there in a section called the same as the section in this these notes and I've put a pointer to to that section in the manual I'm not going to cover them all because I think they're probably too to obscure for most purposes but they're just just to refer to the fact that they do exist but anyway let me talk about two two of the special characters that you can use and these are backslash n and backslash n represents a new line you can use it in a regular expression and you can use it in the replacement part backslash t represents a tab so called horizontal tab there is a vertical tab but that is so obscure I don't think anybody uses that for its original purpose it was originally for line printers as I recall it made the printer skips several lines down the page wow that's going really really back long way so there's there are others there are hexadecimal sequences and so forth but if you need them go and look in the manual then there are escapes which match what the manual calls a character class they're only for using regular expressions but I thought I'd mention it because they're pretty useful for writing more general regular expressions so backslash lowercase w matches any word character the word character is any letter or digit or the underscore character so word in this context really means a sort of identifier as you'd have in a programming language you know where you might call your variable ABC underscore one or something it's not really about English words but still it's still pretty powerful backslash capital w has the opposite effect it matches any non word character so that would be anything which is not a letter or digit or the underscore character so that can be a useful short hand as well then we have a weird weird concept if you've never come across it before backslash b this match is a word boundary that is it matches the if the character to the left is a word character and the character the right is a non word character so it doesn't actually match a character it matches a sort of virtual position in the string and it it operates if this is confusing in the way it's it's written up I've just copied the the the word straight out of the manual here so if the character to the left is a word character and the character to the right is a non word character it matches this backslash b and it also says vice versa which means if the character to the right is a word character and the character to the left is a non word character also matches that basically it matches the beginning and the end of a word there are alternatives to this interestingly and I found that these are not that well documented and they are backslash less than and backslash greater than so that's a sort of a bracketing thing and they mean the same thing they mean the the word boundaries except that backslash less than is used for the left boundary and backslash greater than is used for the right boundary so if you want to denote a word then you can put those around it you're looking for an actual word I've got some examples bit later on that uses them the final one in this list is backslash capital b and that matches everywhere but on a word boundary that is it matches if the character to the left and the character to the right are either both word characters or both non word characters now I haven't really come up with the way of using this yet maybe that's a challenge for you if you get this for I'll I need to do some more investigation but I've not really found that to be amazing useful but I put it in just for completeness so the final bit of this episode which I fear has got rather long is a series of examples I've tried to put a moderate number of examples into the the node so you've got something to refer to it's one of the things that I find I learn better from than simply reading the the manual because otherwise all I'm doing is reading you the manual so I tried to do some put some effort into making some usable examples for you so example one is the demonstration of the minus i option one I've got is a series of bash commands which do various things which I will skim through fairly quickly but I'll try and explain for you so the first command in this group is a for loop which says for f in then curly bracket a capital a dot dot capital c close curly bracket if you remember if you've listened to my series on bash hints and tips then you will know that that is a way of making a loop where the loop variable goes from the first to the last in this this group so it causes f to be set to capital a capital b in capital c after that we have a semicolon space do space echo dollar random dollar random is a bash variable as a magic thing that whenever you use it whenever you expand it it returns around number and the result of this echo is pipe two dollar f so and then semicolon space done so what the loop is doing it is creating three files called ab and c and putting a random number in each then the next line is a said command where I have said space minus i is a low case i is you be aware open quote saved SAV ED underscore asterisk dot SAV close quote minus e space quote s slash four slash at slash g close quote curly bracket capital a dot dot capital c close curly bracket so what that is doing is it's telling said to operate on all three of the files and it is to edit each one to replace any instances of four the number four the digit four by an at sign just just for the point of just so you can see what happens is there's no other point to it really but the i i setting saved underscore asterisk that will be used to make the backups and the backups will be named saved underscore a dot SAV and saved underscore b etc dot save so the next command is cat space curly brackets a dot dot c and that then reports that there are three three files can you just list them all all out one at one up to the other and so you just see a list of three numbers and each number contains an at sign because some they've been they've each had fours in them and they got changed and then i also cat the files called saved underscore curly bracket capital a dot dot capital c close curly bracket dot SAV and those are the original files which have been saved by virtue of using the minus i with a with an extension after it and you can see there the same the original numbers with the the fours intact so as always fairly contrite but hopefully it gets crossed the message of what you can do with this minus i option example two now this is an instance of operating on the second example file but i've provided for this episode and it's called said underscore demo three dot txt and this contains some statistics that are pulled from the hpr site you can do it yourself if you want to it's got it's called stats dot php i think i've referenced it in the in the links at the bottom it contains various useful things like how long to the next three slots see how long the long the queue is and various other things so imagine we're trying to write a bashed grip to pass it and we actually interested in the number of days to the next free slot we want it in the what the answer to that invariable so the line in question in this file consists of the string days to next free slot call on and then the number and on the day that i'd sample it the number was eight probably is today actually because the queue's going down as it does in the file there are two lines beginning at the word days and so we have to make sure that we get the right one so my example shows variable dtnfs which is days to next free slot this is in my mind equals and then double quote remember that double quote in bash is the so called soft quote inside double quotes you can get command and a variable substitution to to go ahead the so called hard quotes which are single quotes don't allow this anyway within the double quotes we have dollar open bracket and then a said command close bracket double quotes and this is a command substitution so in these these parentheses are a said command and the said is the command is said space minus n e space and then we have a substitute the substitute is attempting to find the line that we're interested in days to next free slot and it's going to pull out the number at the end now i've gone a little bit can we call it overkill i've gone over the top with my matching mechanism here but it's really to demonstrate the sort of things that you can do in a regular expression so the regular expression consists of a circumflex meaning beginning of line then the words days space two then a list now the list is in square brackets and it consists of a circumflex colon close square bracket so what that means is i'm looking for a not colon any character that is not colon and that and the the list is followed by back slash plus so we're looking for one or more not colon and then we're going to follow that with the colon that's looking for a line beginning days to followed by some other stuff up to a colon we're using this list business to prevent any potential greediness in the the regular expression matcha the colon's and followed by novelist which consists of a back slash t in a space so we're saying here that we're looking for either a tab character remember with back slash t is one of the specials that you can use meaning a tab and or a space so we're looking for either a tab or a space there's a back slash plus that follows it so we're looking for one or more of these and that's because when you look at the line it looks like the colon's followed by a space but you don't always know it's quite hard to work out what a thing that looks like a space actually is sometimes it can be a tab character which is invisible of course so it's a good idea to do when you're matching spaces of this sort to put in this sort of thing so that you're covered for whatever it actually is it is really a tab that's been that's in the the file that you get back and after that we have a group back slash open parenthesis then in square brackets zero high for nine close square brackets back slash plus back slash closed parenthesis so that's a group which consists of the digits not to nine and the we expect one to many digits there will be no instances where there are no digits but we're expecting we could have you know double digits even three digits can you imagine that so that's the end of the regular expression so there's a slash that follows it and we simply replace that with back slash one which is a reference to that group which is the the number that we found close the replacement with the slash for that with a p so that particular said expression we'll look for that line pull out the number and return it and then the final bit in these parentheses that are the command substitution is the name of the file which is said underscore demo three dot txt so what that should do then is to run the said to pick out the particular number and stick it into the variable dt in fs and there's an echo which follows it and the echo returns it actually consists of the the word or the letters dtfs equals so that the result you get will show that that's the variable we're looking at so you use that to debug the bit of scripting you done so far and that's followed by dollar dtfs close quote so you should get back the string dtfs equals and then eight okay so like I say that's a fragment of what you might be putting together in a bash script and you might just type that on the command line to prove that what you're planning to do works you might put it in script itself to prove that that it does actually work before you you move on to the next bit one possibility is that for some reason maybe the file format has changed the said command doesn't match anything and dtfs contains nothing so that's something you should be considering when you're writing the bash script and you should check that and take appropriate action if you were doing that particular job example there is a case where we're using the backslash n escape we came across earlier and we're going to use it in the replacement part and here we're simply looking for the string hack a public radio we are going to we've put hack a bit radio as the literal text in the regular expression part of the the said yes command in the said call and we've made each one a group and the replacement part simply consists of backslash 1 backslash n backslash 2 backslash in backslash 3 backslash in and so the result of that will be that when the the said command runs and it finds hack a public radio it will write out the words hack a public radio each on a new line and we're running this against said underscore demo 2.txt in the example i pipe the result to a head command head minus 4 so we just get the first four lines otherwise you'd see the whole file there are ways you can get said to do this as well actually but we haven't got that far yet that will be next week next episode i should say not next week i'm not going to be that quick so that will do what what a said it will do and that will that will work fine backslash n is a nice way of representing a new line alternative ways are a real pain but i did think as i was doing this oh how would you write a bit of said to join all the lines of the of the input file together you would think that it might be possible to do said space minus e space open quote s slash backslash n slash slash close quote and run that again said underscore demo 2.txt you think that that said would simply strip off all the the line breaks the new lines at the ends of all of the lines and make them all that make it into one long line that doesn't happen and that's because said grabs one line at a time put it into the pattern space that we mentioned in the last episode and in doing so it removed the trailing new line then it applies the script that you've put together on your command on or whatever in a file or whatever and it having processed it it will print it out and it will add a trailing new line it won't print it out if you have a minus an option of course as we know but it will add the trailing new line back again so at the point of which the script runs against it there's new and new lines to strip there are ways in which you can concatenate all the lines of the file but we'll leave that to another episode where we have learned about more commands within said now we mentioned the minus r option or rejects extended is its full form that option we we mentioned earlier on and if we were to you run example three the one where we turn hack public radio into the each word on a separate line then you would type that that particular said command without we would use minus r said space minus r space minus e space open quote s slash then you would put open parenthesis hacker close parenthesis space etc etc in other words you don't need the back slashes in front of the the parenthesis you do need back slashes inside the replacement you need back slash one back slash n etc but you don't need them in the regular expression because we have because we have switched to extended regular expression mode which doesn't use the back slashes it's a useful feature certainly saves some typing and makes the regular expression a lot more readable but it's an extension and it's not portable so personally I don't use it because mostly I don't want to be in a situation where I'm faced with a said that doesn't have that capability and I forget how the hell to use it so that's just my thinking anyway example five then the last one we nearly there one of the things you're often called upon to do when you're processing text is to take in a string and remove leading and trailing spaces well I was thinking about this and I rather naively wrote a bit of said that didn't work so I thought I would share it with you the thing that didn't work was echo open double quotes and then a bunch of spaces hello world exclamation mark a bunch of spaces closed double quotes piped into said space minus the space open quote slash up I'm going to call it a barit but circumflex space star okay so that means any spaces zero or more spaces that the beginning of the line then we have a back slash vertical bar which is the alternative operator then after that we have space star dollar address whatever you want to call space address dollar so that means zero or more spaces at the end of the line we closed that regular expression with a slash then the replacement part is nothing and close quote so if you do that then what you then get returned is the string hello world with no spaces on the front of it but and you look at it and think oh that works that's great on to the next thing but it doesn't actually work if you do what I've shown in the second example which is to add another as expression inside the quotes to the said command where it replaces after this space trimming business it replaces the start of the line with a less than sign then another one that replaces the end of the line with a greater than sign so puts these these symbols in a sort of brackets around the string you will see that you get a less than sign hello world exclamation mark a bunch of spaces and a greater than sign so in other words it didn't remove the trailing spaces so what's happened here is that said has spotted leading spaces and has removed them but then it stopped well surprise surprise that's because you didn't tell it to keep going the answer is you simply add a G to the end of that first S command the G flag to tell it to keep keep going so the final example is the same as the second one but with a G in it and you will see that the result is hello world with no spaces and it's enclosed in these less than a greater than sign just to prove that there are no spaces there and that's just the sometimes you're experimenting with the regular expression with a bit of said and you're not quite sure what it's doing sometimes it's useful to do that type of thing just to prove to yourself that it works then you strip out that business of adding millimeters around the thing and say right and that's finished now I can move on to the next task so hopefully that's useful okay well that's it for this time sorry it got so long it's hard to know where to stop really next time we're going to be looking at more of the the said command set we've only looked at one command so far so hopefully we're not going to take quite so long but I know when I'm going to take as long to to cover the next batch but hopefully you'll be following along with me and you'll find it useful okay then bye you've been listening to Hecker Public Radio as Hecker Public Radio.org we are a community podcast network that releases shows every weekday Monday through Friday today's show like all our shows was contributed by a HBO artist near like yourself if you ever thought of recording a podcast and click on our contributing to find out how easy it really is. Hecker Public Radio was founded by the digital.com and the informomicon computer club and it's part of the binary revolution at bmf.com. If you have comments on today's show please email the host directly leave a comment on the website or record a follow up episode yourself on their otherwise status. Today's show is released on the creative comments attribution share a light to the other horizons.