Page no: P35j
Table of contents
Common and Not Common Words
Common words: The 3500 basic words make up most parts of the language, about 1200 are verbs. 1300 are nouns, 1000 are others like adjectives, adverbs or prepositions.
Not common words are 150’000 words.
People do Google Search mostly with nouns and with not common words.
We have two alternatives.
We opt for alternative1: We remove the Stop Words. For a discussion of the second alternative: “Only keep nouns” see further below.
Usual stop Words: 857 words
Regular verbs: 636 words
Past Form of Regular Verbs: 636 wordsIrregular verbs: 280 words
3rd person of irregular verbs: 91 words
Prepositions: 70 words
Adjectives: 141 words
Adverbs: 496 words
Insignificant nouns: 857 nouns
Adjective and noun: 298 words
Common words except nouns: 1900 words
Total: 6262 words
List of Stop Words (Light Version, Alternative 1)
The following are the lists of stop words. The idea of alternative 2 is to delete all stop words.
As result only nouns and not common words are maintained.
List 1) usual stop words (incl. numbers)
Examples: indeed, a, the, them, there, very
Remark: The list above also contains many prepositions and numbers.
List 2) 600 most common regular verbs
Examples: accept, behave
we will need to remove verbs that are also significant nouns like battle, bubble, cross license, train, transport, shelter, water, watch, waste, perform, plan
List 3) Past forms of regular verbs
These are regular verbs with the suffix “ed”. This is regular verbs plus “ed” or when it ends in “e” then only “d”. With the past form there is no need to remove nouns.
Examples: accepted, behaved
Must be created manually out of List 2)
List 4) Third person of irregular verbs
List 5) most common irregular verbs
with their presence and past forms inside this list
Examples: awake, awoke, awoken, is, be, will
Remark: This list might be a bit short.
List 6) Prepositions
Examples: Aboard, about, above
List 7a) 500 most common Adjectives
Examples: different, used, important
Many adjectives are also nouns, some also adverbs.
This gives the
List 7b) Adjective&Noun list
Again some nouns are removed.
List 7) 500 most common Adverbs
Examples: abruptly, decently, hopefully
List 8) Numbers from 0 to 12
Examples: one, two, three, 0,1,2,3,4,5,6,7,8,9,10,11,12
Remark: We do not delete numbers > 12
We decide not to use this list because it can easily lead to duplicate posts.
Example: Episode1, Episode2
is only pro version.
controversy, conversation,conversion, coordinator,cooperation, cop,contrast,correspondent,correlation,corridor,
counselor, counsel,counterpart,country,county,courage,course,coverage,cream, creature,creation, credibility,credit,crew,crime,criteria,criticism,cross,crowd,cruise,culture,cure,curiosity,curve
List 10: 4000 common words (non-nouns)
List 11: Manual inclusion
Comparison of alternatives
Alternative 1: Keep all nouns. Define stop words and delete them. Stop Words will be all common words (except nouns).
Alternative 2: Delete all words except nouns
Alternative 2 : Delete all words in slut except nouns (not used)
In this alternative we only keep nouns and delete all other words.The test would be a positive-test. Hence no stop words, but only “included words”, which are nouns. The difference to alternative 2 would be that we delete non-common words.
sometimes with “direction-indicating” verbs or adjectives. An example for a direction indicating verb is “remove”:
Example: Search “Remove Stop Words Plugin”.
See more for