Purifyr

From semanticweb.org
Jump to: navigation, search
Purifyr
Purifyr homepage
Status: stable
Last release: 1.2.114 (2009/10/17)
License: BSD
Language: C, Python
Author(s): Alex Dong, Zephyr Fang
Affiliation: 2Zelex Software
Web resources

Purifyr is an "web content extraction" engine for semantic web and NLP, natural language processing, written in C and Python. It'll automatically detect semantic noises introduced by not only advertisements, but also navigational information, related articles or tag clouds. The latest version claims to be able to process 25 web pages per second with cleaning ratio up to 95%. Purifyr is trained for mainstream media web contents for the top 10 most popular languages.

Semantic parser will see its performance and recall accuracy increase about 35% to 120% due to smaller and cleaner contents. |

Personal tools
Namespaces

Variants
Actions
Navigation
services
Toolbox