Mass file conditional search and replace

Forum: LinuxTotal Replies: 17
Author Content
techiem2

Apr 18, 2010
11:22 AM EDT
So I'm playing with my new Entourage Edge (blog post coming sometime in the next few days probably), and I'm trying to get a good epub version of the Bible created, since the auto-generated on from Gutenberg has some issues. I am working with an html file set and want to get headings in for each chapter start so I get a good contents table when Calibre generates the epub.

Each verse starts with

<p>chapter:verse

such as <p>001:001

I would like to process each file to add a header line before the first verse of each chapter (a single file has all chapters of the book)

Such as:

<h1> Chapter 001 <h1> <p>001:001

The filenames are all booknumbernumber.htm

Such as book01.htm, book33.htm, etc.

I'm assuming I need some sort of awk/sed/etc. magic, but I don't have a clue what to do. :)

Any help would be appreciated. :)

Thanks as always!

Mark II

p.s. Is there a code type tag for the forum to make entering code segments easier instead of having to use the html tags to get things to print instead of being parsed as tags? :)
gus3

Apr 18, 2010
2:59 PM EDT
Just as a point of curiosity, is there a reason you're not using something from the Sword Project, like Gnome-sword or JSword?
techiem2

Apr 18, 2010
5:50 PM EDT
1. There isn't an Android Sword client (though supposedly someone is working on one) 2. A normal app doesn't use the fancy e-ink display that I can read on and use for adding notes, etc.

krisum

Apr 19, 2010
3:33 AM EDT
edit: removed the code segment till I figure out how to get it to display properly

I gave up trying to get it to display correctly here. See: http://pastebin.com/4hsu7qtP

edit: updated link to latest code that corrects missing file name to awk
Sander_Marechal

Apr 19, 2010
3:38 AM EDT
@krisum: You'll want to escape your HTML tags. Replace < with &tl; and > with &gt;. Also note that you can use pre blocks here on LXer.
krisum

Apr 19, 2010
3:39 AM EDT
Sander, yes I have been trying to use those. It shows perfectly fine in preview but messes up in final output everytime.

> Also note that you can use blocks here on LXer. How?
krisum

Apr 19, 2010
3:58 AM EDT
@Sander

Its twice now with same results. I have been trying to use &lt; (for <) and &gt; (for >) to get the code to display properly but to no avail. Looks fine with simple usage like here but messes up in code. Is there some way to display code with tags in a pre-formatted way?

edit: Tried your pre suggestion but that is eating the tags; put the code in pastebin instead. Also preview removes the &lt; tags so has to be typed again.
techiem2

Apr 19, 2010
1:08 PM EDT
Ok, I gave that a try and it's close, but it's matching every instance, not just the first match for a particular number (which I assume is probably the trickier part to get working), so I got the chapter tag before each verse.

Thanks, almost there! I'm trying to understand exactly what the command string is doing, so I can get a better understanding of how to use awk for these weird tasks. :)

TxtEdMacs

Apr 19, 2010
1:27 PM EDT
krisum,

Go back to your text and I think you will find the codes gone after the preview. So you have to reinsert and then just submit hoping you don't miss any.

YBT

P.S. Of course this one is with serious tags. Just in case you found the tone ambiguous.
techiem2

Apr 19, 2010
4:39 PM EDT
Yeah, I noticed that too. You have to type the codes in then submit, or else preview turns them into the actual tags and breaks your post.
Sander_Marechal

Apr 19, 2010
5:08 PM EDT
Yes, that's a known forum bug.
krisum

Apr 19, 2010
11:26 PM EDT
> Go back to your text and I think you will find the codes gone after the preview. So you have to reinsert and then just submit hoping you don't miss any.

I know and have tried many times now with the codes inserted properly after preview and also without preview. As I mentioned in the last post, it seems to work fine in the simple case (like in my last post) but not in the code. You can just try typing the code, changing to add the tags and then submit -- somehow it does not get the codes right, at least for me.
krisum

Apr 19, 2010
11:36 PM EDT
@techiem2

> Ok, I gave that a try and it's close, but it's matching every instance, not just the first match for a particular number (which I assume is probably the trickier part to get working), so I got the chapter tag before each verse.

Sorry, missed that only the first instance should be matched. It is simple to fix; try the new one here: http://pastebin.com/86MwgS03

> I'm trying to understand exactly what the command string is doing

Short explanation: 1. if ($0 ~ /&lt;p&gt;[ ]*[0-9]+:[0-9]+/): $0 is the whole line and "~" is match operator. The right one is the regular expression pattern which says the "p" tag followed possibly by some spaces, followed by 1 or more digits then colon and then again one or more digits. 2. match($0, /[0-9]+/): This will find the first match for one or more digits in the line that matched above which is the chapter number. The RSTART will store the start of match and RLENGTH the length of it so we get the chapter number using the substring function (substr($0, RSTART, RLENGTH))

Edit: I don't know what is going on with the tags above. Cannot get even the tags in the above piece of code to display -- &lt;p&gt; gets parsed as "p" tag instead of displaying as < p >. It looks like extra spaces have to be added as here to get it to display properly.
gus3

Apr 20, 2010
1:01 AM EDT
For funky tagging in comments, this is what I do:

1. Compose the comment, with HTML entities as needed, in a separate editor. 2. Copy/paste into the comment field, and click Preview. 3. If the preview shows something not right (content or rendering), repeat as necessary from beginning. 4. Once the preview shows what you want, do one more copy/paste, then click Send.

And having to type &amp;lt; to get &lt; (and &amp;amp;lt to get &amp;lt;, and on and on) is the LXer equivalent of the Leaning Toothpick Syndrome.
Sander_Marechal

Apr 20, 2010
3:07 AM EDT
If it were me I'd rip out the entire posts parser and replace it with Markdown. Works like a charm!
TxtEdMacs

Apr 20, 2010
7:59 AM EDT
krisum,

Sorry it used to work for me, however, I have not posted code recently. In addition, I may have been confusing my experiences when posting articles on my site.

I wonder if prefacing with an escape symbol might help, e.g.

       \

Works on preview (actually using double reverse slashes) to show properly. Will test the submit now.

YBT (but serious here, this time ... )

Edit, just one slash survived - so you are correct the submit differs from the preview mode.
techiem2

Apr 20, 2010
12:54 PM EDT
Ok, after mostly understanding the code, and re-evaluating the posts and results, I realized that the first one was almost perfect, it just needed a slight tweak.

I suspect my not-the-clearest instructions are to blame. :P

I realized that the match needed to simply match all occurrences of verse 1:

nnn:001

So I tweaked the first script match for that:

http://pastebin.com/iaH27yJu

Now it works great!

Thanks for all the help!
gus3

Apr 20, 2010
5:11 PM EDT
Uh, yeah. Sure! Glad we could help....

You cannot post until you login.