Domain categorization is mostly used in parental control and data loss prevention (DLP) applications, so you need to know if a domain as a whole is suitable for the audience in question.
Let’s take the parental control example. Any site theme that is inappropriate, like adult content, can be blocked outright. However, the problem starts when you are dealing with sites like: Google, YouTube, CNN or Wikipedia. Each site has its own categorization challenges, and then you need an engine that can go deeper than just domain level categorization:
- Google (or any search site) – serves dynamic content, and it’s true that most search engines have a safe search option. The safe search is mostly geared toward adult content, but there are other types of content you might want to block like: violence, profanity and online movies, which can’t be done with regular domain categorization. You need to be able to do search engine result classification. Furthermore, if you try to classify Google search from a server, it will block the queries after a few requests because it’s considered “bot” traffic. Our solution is to classify the keyword of the search itself, regardless of the results the search engine returns.
- CNN (or any site with multiple themes) – is a news site, but it has sections that discuss sexuality or celebs, so it can be blocked as a whole. Then there are reference resource sites like Wikipedia, with mixed content and it might be problematic to block the entire site in school settings. That’s why you need an engine that is able to classify a single page inside the domain and allow it to filter the right pages, so if you browse Wikipedia and go to a page about sports, the classification reported would be: ‘reference, sports.’ If you go to a site about sexuality, the classification would be: ‘reference, adult or sexual education.’
Our URL Classification service supports: Per domain, Per page, Per keyword and Per search classification.