Spaces:
Sleeping
Sleeping
wzkariampuzha
commited on
Commit
•
2c279a2
1
Parent(s):
4700524
Upload usr/local/lib/nltk_data/stopwords/README
Browse files
usr/local/lib/nltk_data/stopwords/README
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Stopwords Corpus
|
2 |
+
|
3 |
+
This corpus contains lists of stop words for several languages. These
|
4 |
+
are high-frequency grammatical words which are usually ignored in text
|
5 |
+
retrieval applications.
|
6 |
+
|
7 |
+
They were obtained from:
|
8 |
+
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/
|
9 |
+
|
10 |
+
The stop words for the Romanian language were obtained from:
|
11 |
+
http://arlc.ro/resources/
|
12 |
+
|
13 |
+
The English list has been augmented
|
14 |
+
https://github.com/nltk/nltk_data/issues/22
|
15 |
+
|
16 |
+
The German list has been corrected
|
17 |
+
https://github.com/nltk/nltk_data/pull/49
|
18 |
+
|
19 |
+
A Kazakh list has been added
|
20 |
+
https://github.com/nltk/nltk_data/pull/52
|
21 |
+
|
22 |
+
A Nepali list has been added
|
23 |
+
https://github.com/nltk/nltk_data/pull/83
|
24 |
+
|
25 |
+
An Azerbaijani list has been added
|
26 |
+
https://github.com/nltk/nltk_data/pull/100
|
27 |
+
|
28 |
+
A Greek list has been added
|
29 |
+
https://github.com/nltk/nltk_data/pull/103
|
30 |
+
|
31 |
+
An Indonesian list has been added
|
32 |
+
https://github.com/nltk/nltk_data/pull/112
|