Browsed by
Tag: regex

MS Word Regex Pattern

MS Word Regex Pattern

A wee pattern to strip out MS word nonsense from pasted in text.

(<[\/]?span[a-zA-Z0-9\',\.\#\:\;\%\{\s="\-\(\)]*>| |data-leveltext="[]"[\s]?|data-font="[a-zA-Z]*"[\s]?|data-listid="[0-9]*"[\s]?|data-list-defn-props="{"[\s]?|data-aria-posinset="[0-9]"[\s]?|data-aria-level="[0-9]"[\s]?|class="[A-Za-z]*"[\s]?|style="[a-z0-9\-\:\;\.\%,\s]*"[\s]?|class="[A-Za-z0-9\s]*"[\s]?|·)

View example

// Anti MS Word Pattern
$antiword = '/(<[\/]?span[a-zA-Z0-9\',\.\#\:\;\%\{\s="\-\(\)]*>| |data-leveltext="[]"[\s]?|data-font="[a-zA-Z]*"[\s]?|data-listid="[0-9]*"[\s]?|data-list-defn-props="{"[\s]?|data-aria-posinset="[0-9]"[\s]?|data-aria-level="[0-9]"[\s]?|class="[A-Za-z]*"[\s]?|style="[a-z0-9\-\:\;\s]*"[\s]?|class="[A-Za-z0-9\s]*"[\s]?|·)/i';

$text = preg_replace($antiword, "", $text);
' Anti MS Word
DIM pattern AS string = "(<[\/]?span[a-zA-Z0-9\',\:\;\%\{\s=""\-\(\)]*>| |data-leveltext=""[]""[\s]?|data-font=""[a-zA-Z]*""[\s]?|data-listid=""[0-9]*""[\s]?|data-list-defn-props=""{""[\s]?|data-aria-posinset=""[0-9]""[\s]?|data-aria-level=""[0-9]""[\s]?|class=""[A-Za-z]*""[\s]?|style=""[a-z0-9\-\:\;\s]*""[\s]?|class=""[A-Za-z0-9\s]*""[\s]?|·)"
Dim regex AS regex = new Regex(pattern)

text = Regex.Replace(text, pattern, "")
Search Engine Friendly URLs

Search Engine Friendly URLs

I’ve been working on setting up some search engine friendly urls on a PHP website.

Rather than have urls that look like www.domain.co.uk/index.php?id=23 I wanted to change them to look like www.domain.co.uk/slugname like WordPress does.

To do this I used .htaccess

RewriteEngine On
RewriteRule ^\/?services\/? index.php?cat=2 [NC]
RewriteRule ^\/?departments\/? index.php?cat=3 [NC]
RewriteRule ^\/?resources\/? index.php?cat=4 [NC]
RewriteRule ^\/?calendar\/? index.php?cat=5 [NC]
RewriteRule ^\/?college\/([a-z0-9\-\_]+)\/?$ index.php?cat=1&id=$1 [NC]
RewriteRule ^\/?college\/? index.php?cat=1 [NC]

Ref : https://httpd.apache.org/docs/current/mod/mod_rewrite.html

RewriteEngine On

Enabled the Apache Mod_Rewrite
This lets me rewrite urls.

Category Rewrite Rule

RewriteRule ^\/?services\/? index.php?cat=2 [NC]

This line is a rewrite rule
( https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule )

After the rule is defined the next part looks for a matching pattern in the current url using regex.
(Ref : https://www.rexegg.com/regex-quickstart.html )

^ – Start of the string matches the following…
\ – escapes reserved characters
/? – may or may not have a / at the start
services – followed by the string ‘services’
\ – escaped reserved characters
/? – may or may not end with a trailing /

Then the actual path is specified (this is the path that will actually load)

[NC] indicates that the rewrite rule is case insensitive (therefore it will match regardless of characters being uppercase or lowercase)

Category Sub Items Rewrite Rule

RewriteRule ^\/?college\/([a-z0-9\-\_]+)\/?$ index.php?cat=1&id=$1 [NC]

I added a rewrite rule to handle sub items. This one takes an id from the pattern and adds it to the actual path. so if you typed in www.domain.com/college/item it would load www.domain.com/index.php?cat=1&id=item .

^\/?college\/([a-z0-9\-\_]+)\/?$

^ – match the start of the string
/? – may or may not start with a /
college/ – followed by the string college/
([a-z0-9\-\_]+) – followed by a group of characters in the set [ ] ( a-z 0-9 – or _ ) case doesn’t matter in this example because we are using NC at the end.
\ – escape string (escapes the slash)
/? – may or may not end with a /
$ – matches the end of the string specified in the brackets ( )

index.php?cat=1&id=$1

The matched string ( (…)$ ) is then inserted into the actual path using $1 – if you matched more than one param from the first string your would number them $1, $2, $3 e.t.c.

Continued….

This is a fairly basic example, it works for what I need but I may take it further in the future…

Useful Links