Spaces:

sunnychenxiwang
/

EasyDetect

Sleeping

EasyDetect / pipeline /nltk /test /paice.doctest

update nltk

d916065 over 1 year ago

1.27 kB


	=====================================================
	PAICE's evaluation statistics for stemming algorithms
	=====================================================

	Given a list of words with their real lemmas and stems according to stemming algorithm under evaluation,
	counts Understemming Index (UI), Overstemming Index (OI), Stemming Weight (SW) and Error-rate relative to truncation (ERRT).

	>>> from nltk.metrics import Paice


	-------------------------------------
	Understemming and Overstemming values
	-------------------------------------

	>>> lemmas = {'kneel': ['kneel', 'knelt'],
	... 'range': ['range', 'ranged'],
	... 'ring': ['ring', 'rang', 'rung']}
	>>> stems = {'kneel': ['kneel'],
	... 'knelt': ['knelt'],
	... 'rang': ['rang', 'range', 'ranged'],
	... 'ring': ['ring'],
	... 'rung': ['rung']}
	>>> p = Paice(lemmas, stems)
	>>> p.gumt, p.gdmt, p.gwmt, p.gdnt
	(4.0, 5.0, 2.0, 16.0)

	>>> p.ui, p.oi, p.sw
	(0.8..., 0.125..., 0.15625...)

	>>> p.errt
	1.0

	>>> [('{0:.3f}'.format(a), '{0:.3f}'.format(b)) for a, b in p.coords]
	[('0.000', '1.000'), ('0.000', '0.375'), ('0.600', '0.125'), ('0.800', '0.125')]