Purifyr
From semanticweb.org
| Purifyr | |
|---|---|
| Purifyr homepage | |
| Status: | stable |
| Last release: | 1.2.114 (2009/10/17) |
| License: | BSD |
| Language: | C, Python |
| Author(s): | Alex Dong, Zephyr Fang |
| Affiliation: | 2Zelex Software |
| Web resources | |
Purifyr is an "web content extraction" engine for semantic web and NLP, natural language processing, written in C and Python. It'll automatically detect semantic noises introduced by not only advertisements, but also navigational information, related articles or tag clouds. The latest version claims to be able to process 25 web pages per second with cleaning ratio up to 95%. Purifyr is trained for mainstream media web contents for the top 10 most popular languages.
Semantic parser will see its performance and recall accuracy increase about 35% to 120% due to smaller and cleaner contents. |