Browsed by
Tag: ASP

MS Word Regex Pattern

MS Word Regex Pattern

A wee pattern to strip out MS word nonsense from pasted in text.

(<[\/]?span[a-zA-Z0-9\',\.\#\:\;\%\{\s="\-\(\)]*>| |data-leveltext="[]"[\s]?|data-font="[a-zA-Z]*"[\s]?|data-listid="[0-9]*"[\s]?|data-list-defn-props="{"[\s]?|data-aria-posinset="[0-9]"[\s]?|data-aria-level="[0-9]"[\s]?|class="[A-Za-z]*"[\s]?|style="[a-z0-9\-\:\;\.\%,\s]*"[\s]?|class="[A-Za-z0-9\s]*"[\s]?|·)

View example

// Anti MS Word Pattern
$antiword = '/(<[\/]?span[a-zA-Z0-9\',\.\#\:\;\%\{\s="\-\(\)]*>| |data-leveltext="[]"[\s]?|data-font="[a-zA-Z]*"[\s]?|data-listid="[0-9]*"[\s]?|data-list-defn-props="{"[\s]?|data-aria-posinset="[0-9]"[\s]?|data-aria-level="[0-9]"[\s]?|class="[A-Za-z]*"[\s]?|style="[a-z0-9\-\:\;\s]*"[\s]?|class="[A-Za-z0-9\s]*"[\s]?|·)/i';

$text = preg_replace($antiword, "", $text);
' Anti MS Word
DIM pattern AS string = "(<[\/]?span[a-zA-Z0-9\',\:\;\%\{\s=""\-\(\)]*>| |data-leveltext=""[]""[\s]?|data-font=""[a-zA-Z]*""[\s]?|data-listid=""[0-9]*""[\s]?|data-list-defn-props=""{""[\s]?|data-aria-posinset=""[0-9]""[\s]?|data-aria-level=""[0-9]""[\s]?|class=""[A-Za-z]*""[\s]?|style=""[a-z0-9\-\:\;\s]*""[\s]?|class=""[A-Za-z0-9\s]*""[\s]?|·)"
Dim regex AS regex = new Regex(pattern)

text = Regex.Replace(text, pattern, "")