Page no: P35n
Table of contents
This algorithm can be built with a finite state machine. Here more on PHP and state machines.
Steps 1a-1f: Transform plural words into singular (Pro Version and economic blogs)
Apply plural rules first: Rules 1a to 1e
We want to eliminate plural words and maintain only singular. But there are exceptions. These are here.
1a: Transform “ies” plural into “y” singular
Old State: “Before plural”
Condition: if word is in the “ies”- list.
Rule: replace “ies” with “y”
New State: “Before stop words”
Example: abilities
Changed into: ability
1b: Transform “xes” plural into “x” singular
Old State: “Before plural”
Condition: if word is in this “xes”- list.
Rule: remove “es” at the end
New State: “Before stop words”
Example: sexes
Changed into: sex
1c: Transform “ches” plural into “ch” singular
Condition: if word is in the “ches”- list.
Rule: remove “es” at the end
Examples: sexes, taxes
Is changed into: sex, tax–
1d Transform “ches” plural into “ch” singular
Condition: if word is in the “shes”- list.
Rule: remove “es” at the end
Example: brushes
Changed into: brush
1e Transform “sses” plural into “ss” singular
Condition: if word is in the “sses” list
Rule: remove “es” at the end
Example: fitnesses
Changed into: fitness
Now the important step:
Step 2: Remove “s”-Forms: plural, 3rd person and genitive (Pro Version and economic blogs)
This does three things in one!
Condition: Words ends with “s”
Exception: Do nothing if it is ending in “ss”.
Rule: remove “s” at the end
a) This concerns verbs: 3rd person
Example: blows
Changed into: blow
b) This concerns nouns: we cut plural forms to singular
Example: the word “nouns”
Is changed into: “noun”
c) This concerns genitive:
Example: URL Obamas-heatth-care
Changed into: Obama-health-care
Running Example
Old URL: https://austrian.economicblogs.org/max-keiser/2015/keiser-sinking-ship-survey-16th-economic-freedom-index/
New URL: https://austrian.economicblogs.org/max-keiser/2015/keiser-sinking-ship-survey-16th-economic-freedom-index/
Verbs ending in “s” are removed
will remove: shows, falls, index
and transfer “ships” into “ship”
Currently Swiss national bank’s gives URL Swiss-banks
RequirementRemove ‘s before applying stop word algo Title: The Fed’s Waterloo should give It is ready on European. I tested it. It works fine. Must be added on all blogs. Must run on
|
|
BugDoes not work on European or is not installed on FeedWP (see screenshot from European) Solution on BugNow it is installed on Syndication and works perfect. See the problem screens. |
|
Nki:
Removing ‘s can do that only
— new posts
— when I restore the URL based on title
George:
— restore title inside Stop Words
–> But this might be complicated
We are restoring URL based on title anyway