Include and do Regex-replacements

Description

This is a hack upon the Include macro. Features:

  • regular expression search
  • regular expression replace
  • Can include URL's containing logged in users name
  • Can choose not to include for anonymous users

Regular expressions are very usable when you whish to extract data from and/or change data inside an included page.
Username feature is very usefull for personalized content.

Note: You dont HAVE to use the regular expressions fields. You can still use the other functions as is.

Arguments:

  • arg 1 : url - http://page.com or wiki:WikiPage. Local files cannot be included1. Also accepts wiki:MyPage
  • arg 2 : format. Either "wiki" or "raw", where "wiki" does WikiFormatting
  • arg 3+: "Control arguments". See Control Arguments below.
  • arg last:The regular expressions MUST be the last argument

Only arg 1 & 2 are required.

Regular expressions are defined as such:
Search:
'expression'
(Remember to define at least one group by using paranthesis)

Replacement:
'<expression>'/'<replacement>'
You can use several expressions:
'<expression>'/'<replacement>','<expression>'/'<replacement>' And search and capture combined
,'<expression>'/'<replacement>'

You can also use capture groups. In python these are defined as \1,\2 ect. (where in eg. perl its $1,$2)

The result/output of each expression is passed on to the next expression, from left to right.

Control Arguments

Control Arguments are defined AFTER the second argument (raw,wiki) and BEFORE the regular expressions. Multiple arguments are seperated by comma.
Example:

RegexInclude(http://google.com,wiki,no_anon,use_vars=lower,'Expression'/'Replacement')
use_vars
If defined, replace $USER with username in URL. (NOT in the included text!).
use_vars accepts a sub-argument, used change the case of the username. This is one of:
  • upper
  • lower
  • ucfirst
example: use_vars=upper

(note: ucfirst is "uppercase first letter only")

no_anon
Do not include this page for anonymous users.
NOTE: this is NOT a security feature, as anyone can still read your source code.
Good feature to not include unneseseary information for anon users.
match_seperator
Char or string that seperate returned matches from a search expression.
Use like this:
  • match_seperator=Seperate This
  • match_seperator="Seperate This"
Note: To use comma as seperator, it must be quoted. (match_seperator=",")
no_dotall
Disable the use of DOTALL option for regular expression.
pr. default DOTALL is enabled and this make the dot (".") also include linebreaks


  1. 1. Well, they can if the file resides in /tmp/trac_include/ and no slashes are used in filename.

Bugs/Feature Requests

Existing bugs and feature requests for RegexIncludeMacro are here.

If you have any issues, create a new ticket.

Feel free to use the comment field in the bottom

Download

Download the zipped source from here.

Source

You can check out RegexIncludeMacro from here using Subversion, or browse the source with Trac.

Examples

A simple example. Replace all occurences of the word "windows" with the word "Linux"

[[RegexInclude(http://yourpage.tld,raw,'[Ww]indows'/'Linux')]]


The next example shows how to use the Control Argument use_vars to create a personalized start page on the wiki:

[[RegexInclude(wiki:StartPage_$USER,wiki,use_vars=lower)]]

In my case, it would include a page from the wiki called "StartPage?_dfaerch". (and always lowercasing the username). And for anonymous users, it would include "StartPage?_anonymous", unless "no_anon" is defined as well.


This example shows the 2 graphs from kernel.org's frontpage. It uses regular-expression search to capture the image-urls. Also, it uses match_seperator to seperate the to found image-urls by [[BR]] to put them on seperate lines.

[[RegexInclude(http://kernel.org,wiki,match_seperator=" [[BR]]",'(http://www\d+.kernel.org/bw-zeus\d+\.png)')]]


The last example includes http://www.kernel.org/kdist/finger_banner and wiki-formats the wanted data, and removes the unwanted. Perticulary, i want to only keep the major version information, not patches nor snapshots.

Note how i use 2 regular expression replaces, seperated by ','. The first expression formats the lines i want to keep, the second deletes the rest.

[[RegexInclude(http://www.kernel.org/kdist/finger_banner,wiki,'The latest ([\w.]+) version [^:]+:\s*([a-z0-9.-]+)'/' '''\1''' :: \n  ''\2'' ','\nThe[ a-zA-Z0-9.:-]+'/'\n')]]

The result is:

stable
2.6.18

2.4
2.4.33.3
2.2
2.2.26

Known limitations

  • expressions or replacement-strings cannot contain ',' (that is: tick-comma-tick) since that combination is used for splitting the expressions:
  • expressions or replacement-strings cannot contain '/' (that is: tick-slash-tick) since that combination is used for splitting the expression from replacement-string.

I hope these two combinations are so unlikely, that the limitation wont cause any trouble.

  • url's cannot contain comma. If you need comma, you can URLEncode it. (as such: /file?arg=1,2 becomes /file?arg=1%2c2)
  • match_seperator must be in quotes (") if you want to use commas.

Recent Changes

[1357] by dfaerch on 10/09/06 11:53:17

RegexIncludeMacro:

Added regex match functionallity + match_seperator

[1342] by dfaerch on 10/05/06 15:38:53

RegexIncludeMacro:

added Control Arguments & wiki-page inclusion

[1338] by dfaerch on 10/04/06 16:16:19

RegexIncludeMacro:

initial release

[1337] by dfaerch on 10/04/06 16:06:55

New hack RegexIncludeMacro, created by dfaerch

Author/Contributors

Author: dfaerch

Comments & Feedback

Any feedback is appreciated.

Error: Failed to load processor AddComment
No macro or processor named 'AddComment' found