Mass file conditional search and replace
|
Author | Content |
---|---|
techiem2 Apr 18, 2010 11:22 AM EDT |
So I'm playing with my new Entourage Edge (blog post coming sometime in the next few days probably), and I'm trying to get a good epub version of the Bible created, since the auto-generated on from Gutenberg has some issues.
I am working with an html file set and want to get headings in for each chapter start so I get a good contents table when Calibre generates the epub. Each verse starts with <p>chapter:verse such as <p>001:001 I would like to process each file to add a header line before the first verse of each chapter (a single file has all chapters of the book) Such as: <h1> Chapter 001 <h1> <p>001:001 The filenames are all booknumbernumber.htm Such as book01.htm, book33.htm, etc. I'm assuming I need some sort of awk/sed/etc. magic, but I don't have a clue what to do. :) Any help would be appreciated. :) Thanks as always! Mark II p.s. Is there a code type tag for the forum to make entering code segments easier instead of having to use the html tags to get things to print instead of being parsed as tags? :) |
gus3 Apr 18, 2010 2:59 PM EDT |
Just as a point of curiosity, is there a reason you're not using something from the Sword Project, like Gnome-sword or JSword? |
techiem2 Apr 18, 2010 5:50 PM EDT |
1. There isn't an Android Sword client (though supposedly someone is working on one)
2. A normal app doesn't use the fancy e-ink display that I can read on and use for adding notes, etc. |
krisum Apr 19, 2010 3:33 AM EDT |
edit: removed the code segment till I figure out how to get it to display properly I gave up trying to get it to display correctly here. See: http://pastebin.com/4hsu7qtP edit: updated link to latest code that corrects missing file name to awk |
Sander_Marechal Apr 19, 2010 3:38 AM EDT |
@krisum: You'll want to escape your HTML tags. Replace < with &tl; and > with >. Also note that you can use pre blocks here on LXer. |
krisum Apr 19, 2010 3:39 AM EDT |
Sander, yes I have been trying to use those. It shows perfectly fine in preview but messes up in final output everytime. > Also note that you can use blocks here on LXer. How? |
krisum Apr 19, 2010 3:58 AM EDT |
@Sander Its twice now with same results. I have been trying to use < (for <) and > (for >) to get the code to display properly but to no avail. Looks fine with simple usage like here but messes up in code. Is there some way to display code with tags in a pre-formatted way? edit: Tried your pre suggestion but that is eating the tags; put the code in pastebin instead. Also preview removes the < tags so has to be typed again. |
techiem2 Apr 19, 2010 1:08 PM EDT |
Ok, I gave that a try and it's close, but it's matching every instance, not just the first match for a particular number (which I assume is probably the trickier part to get working), so I got the chapter tag before each verse. Thanks, almost there! I'm trying to understand exactly what the command string is doing, so I can get a better understanding of how to use awk for these weird tasks. :) |
TxtEdMacs Apr 19, 2010 1:27 PM EDT |
krisum, Go back to your text and I think you will find the codes gone after the preview. So you have to reinsert and then just submit hoping you don't miss any. YBT P.S. Of course this one is with serious tags. Just in case you found the tone ambiguous. |
techiem2 Apr 19, 2010 4:39 PM EDT |
Yeah, I noticed that too. You have to type the codes in then submit, or else preview turns them into the actual tags and breaks your post. |
Sander_Marechal Apr 19, 2010 5:08 PM EDT |
Yes, that's a known forum bug. |
krisum Apr 19, 2010 11:26 PM EDT |
> Go back to your text and I think you will find the codes gone after the preview. So you have to reinsert and then just submit hoping you don't miss any. I know and have tried many times now with the codes inserted properly after preview and also without preview. As I mentioned in the last post, it seems to work fine in the simple case (like in my last post) but not in the code. You can just try typing the code, changing to add the tags and then submit -- somehow it does not get the codes right, at least for me. |
krisum Apr 19, 2010 11:36 PM EDT |
@techiem2 > Ok, I gave that a try and it's close, but it's matching every instance, not just the first match for a particular number (which I assume is probably the trickier part to get working), so I got the chapter tag before each verse. Sorry, missed that only the first instance should be matched. It is simple to fix; try the new one here: http://pastebin.com/86MwgS03 > I'm trying to understand exactly what the command string is doing Short explanation: 1. if ($0 ~ /<p>[ ]*[0-9]+:[0-9]+/): $0 is the whole line and "~" is match operator. The right one is the regular expression pattern which says the "p" tag followed possibly by some spaces, followed by 1 or more digits then colon and then again one or more digits. 2. match($0, /[0-9]+/): This will find the first match for one or more digits in the line that matched above which is the chapter number. The RSTART will store the start of match and RLENGTH the length of it so we get the chapter number using the substring function (substr($0, RSTART, RLENGTH)) Edit: I don't know what is going on with the tags above. Cannot get even the tags in the above piece of code to display -- <p> gets parsed as "p" tag instead of displaying as < p >. It looks like extra spaces have to be added as here to get it to display properly. |
gus3 Apr 20, 2010 1:01 AM EDT |
For funky tagging in comments, this is what I do: 1. Compose the comment, with HTML entities as needed, in a separate editor. 2. Copy/paste into the comment field, and click Preview. 3. If the preview shows something not right (content or rendering), repeat as necessary from beginning. 4. Once the preview shows what you want, do one more copy/paste, then click Send. And having to type &lt; to get < (and &amp;lt to get &lt;, and on and on) is the LXer equivalent of the Leaning Toothpick Syndrome. |
Sander_Marechal Apr 20, 2010 3:07 AM EDT |
If it were me I'd rip out the entire posts parser and replace it with Markdown. Works like a charm! |
TxtEdMacs Apr 20, 2010 7:59 AM EDT |
krisum, Sorry it used to work for me, however, I have not posted code recently. In addition, I may have been confusing my experiences when posting articles on my site. I wonder if prefacing with an escape symbol might help, e.g. \
|
techiem2 Apr 20, 2010 12:54 PM EDT |
Ok, after mostly understanding the code, and re-evaluating the posts and results, I realized that the first one was almost perfect, it just needed a slight tweak. I suspect my not-the-clearest instructions are to blame. :P I realized that the match needed to simply match all occurrences of verse 1: nnn:001 So I tweaked the first script match for that: http://pastebin.com/iaH27yJu Now it works great! Thanks for all the help! |
gus3 Apr 20, 2010 5:11 PM EDT |
Uh, yeah. Sure! Glad we could help.... |
You cannot post until you login.