Parsing filenames with perl

Forum: LinuxTotal Replies: 8
Author Content
techiem2

Jan 18, 2008
12:23 PM EDT
I'm working on designing our digital sign system. I think I have the basics of how I'm gonna do it worked out (yes, I'll write it all up once I get it done). Basically, we'll have a directory that the person running things will put files (video/images) in. The system will process them into the proper video size and such. Then the person will edit the playlist and tell mplayer to reload it (probably going to use gmplayer in slave mode with a fifo file and a web interface for this).

So right now I'm trying to get my processing script setup.

Right now I want to set it up to just process a single file given via command line so it's easy to test and all. Maybe later I'll do full directory at once.

First I'm trying to get images processed to videos (I have the stuff for doing the conversions working). The idea will be (I think, suggestions welcome) to have them edit the image filenames to "+20+blah blah blah.jpg". This SHOULD be fairly easy to get them to do....

So basically, I'm trying to figure out how to have my script: 1. See if it's an image file to do image processing (so check extension against list) - if not it should go on to the video processing section 2. See if it has +somenumber+ at the beginning and put the number into a variable, the rest of the filename (including extension) into a variable, and the rest of the filename without extension into another variable (so 3 variables if it has a number) 3. If it has the number, process one way with the variable, if no number, process a different way (basically, the number is the duration of the video to create from the image, so if no duration is specified, it will use a default value we decide on)

I think most of this should be fairly straightforward once I have the parsing working. I just can't figure out how to do the parsing in perl.

Any examples/good sites/etc someone could give me to help me get the parsing working?

Thanks guys/gals/ais/other! :)

Mark II

*EDIT* Actually, it would make more sense to put duration into a variable, extension into a variable (since we already checked this), and rest of filename without extension into a variable. So for +20+Picture-of-Bob.jpg We would have $extension = jpg $duration = 20 $filename = Picture-of-Bob

This would make it easier to define output filenames and such cleanly.
tuxtom

Jan 18, 2008
12:50 PM EDT
You are referring to Perl Regular Expressions my friend, one of the most powerful tools in existence.

This should get you started:

http://perldoc.perl.org/perlre.html

http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
Sander_Marechal

Jan 18, 2008
1:42 PM EDT
Quoting:You are referring to Perl Regular Expressions my friend, one of the most powerful tools in existence.


A yes. The programming tool that looks and actls like line noise :-) Regular Expressions are very hard to understand, make it easy to shoot yourself in the foot but are oh so useful.

I don't know how PCRE's (Perl Compatible Regular Expressions) are used in Perl, but in PHP it would look like this for the format "20-filename.ext":

preg_match('/^(d+)-(.*).([a-z]{1,4})$/', $filename, $match);

Would result in $match being an array with four elements:

$match[0] = "20-some-filename.whatever.jpg"; $match[1] = "20"; $match[2] = "some-filename.whatever"; $match[3] = "jpg";

Or for the format "+20+filename.ext:

preg_match('/^+(d+)+(.*).([a-z]{1,4})$/', $filename, $match);

Question: Would it not make much more sense if you make the person or application write a simple configuration file instead of relying on filename conventions? Filename conventions always get broken in my experience. As a bonus it would mean that you don't have to save the same file multiple times under different names if you want the same bit included in multiple videos. You could do that using symlinks as well, but then you'd have to teach people about symlinks :-)

Example (XML):

<?xml version="1.0"?> <video>     <name>My awesome sign video</name>     <author>Me!</author>     <resolution>         <width>400</width>         <height>300</height>     </resolution>     <scene>         <source>/home/sander/someimage.jpg</source>         <duration>10</duration>     </scene>     <scene>         <source>/mnt/fileserver/shared/videos/stock-video.mpg</video>         <overlay>/home/clients/foobar/logo.gif</overlay>     </scene>     <scene>         <source>/home/sander/another-image.png</source>         <duration>20</duration>     </scene> </video>

Or, if you hate XML:

name = My Awesome Video author = Me! resolution = 400x300

[scene 1] source = /home/sander/someimage.jpg duration = 10

[scene 2] source = /mnt/fileserver/shared/videos/stock-video.mpg overlay = /home/clients/foobar/logo.gif

[scene 3] source = /home/sander/another-image.png duration = 20

Oh, and do yourself a favour, don't rely on the file extension to determine the filetype. Linux has very good mimetype support and virtually all programming languages have functions to determine the mimetype of a file. In this regard, Linux is far superior to Windows which is hopelessly lost without file extensions.

Sander_Marechal

Jan 18, 2008
2:19 PM EDT
Oops. LXer ate my backslashes. Here are the regexes again:

preg_match('/^(\d+)-(.*)\.([a-z]{1,4})$/', $filename, $match); preg_match('/^\+(d+)\+(.*)\.([a-z]{1,4})$/', $filename, $match);
techiem2

Jan 18, 2008
2:52 PM EDT
hmm. Lots of stuff to look over and analyze...

I'm not making a single video out of the stuff, I'm making each image into it's own video then having them put the items in a playlist in the order they want them in, probably via an easy to use web interface. Keep in mind that content will be added and the playlist edited by an overworked office person most likely. The playlist will probably change somewhat daily.

And yeah...looks like I probably need to get working on my regex...

The mimetypes idea could be good too..just check if it's an image or video instead of doing extension checking and having to make sure all the proper extensions they end up using are in the list...
Sander_Marechal

Jan 18, 2008
3:16 PM EDT
Quoting:Keep in mind that content will be added and the playlist edited by an overworked office person most likely.


In that case, would a web interface not be easier? A bit like Youtube. Build a simple form that allows you to upload a file and set some basic settings (like duration). Process the image or video on the server and add it to the library of videos that the office drone can queue.

For extra efficiency you could generate an md5 hash of the video or image that was uploaded, "salt" it with the settings (like duration) and store that in a database together with the path to the processed video in the library. That way, when someone uploads an image or video that was already processed, you can skip processing and refer the user directly to the video that was processed earlier.
techiem2

Jan 18, 2008
3:59 PM EDT
hmm that's an idea too!

I got pcre matching file extensions and checking for the duration modifier at the beginning of the file. Nothing real fancy, but it works. :)

(heh...we have to escape our escapes so lxer forum won't eat them...)

if (($infile =~ m/(\.jpg|\.png)$/i) and ($infile =~ m/^(\+\d+\+)/)) { print "Infile is: " . $infile . " and has a duration specified\n"; }

if (($infile =~ m/(\.jpg|\.png)$/i) and !($infile =~ m/^(\+\d+\+)/)) { print "Infile is: " . $infile . " and has no duration specified\n"; }
techiem2

Jan 22, 2008
1:01 PM EDT
Well, the script is looking good to me. I have it processing stuff how I want anyway. I have my current project status for the sign system as well as the current version of the script up on my wiki. I'm trying to keep the page up to date with my progress.

http://www.techiem2.net/cgi-bin/twiki/bin/view/Main/HowToLin...
Sander_Marechal

Jan 22, 2008
1:09 PM EDT
It looks quite interesting. Keep up the good work!

You cannot post until you login.