General URL Classification questions
What is URL Classification?
URL Categorization or URL Classification is the process of providing the category a Domain or a URL belongs too, for example, for the domain cnn.com the domain classification is: “news”.
Who needs URL Classification?
URL Classification is used by vendors that provide software such as: Parental control, DLP (Data Leakage Protection), gateway protections, also it can be used to improve advertising performance for: RTB (Real Time Bidding) advertisement, brand safety and segmentation of users web surf habits for more accurate targeting of advertisement.
Do you classify websites per URL?
Regarding per URL classification, there are two ways of classifying the URL:
- Provide the classification of the domain, regardless of the page’s content, for example every page in cnn.com will be classified as “news”
- Actually classifying the page and providing the classification specific for that page, regardless of the domain’s main category, for example an article in cnn about finance will be classified as: “news,finance”
What kind of per URL classification do you provide?
We are able to provide both options. The option to classify the content of the page costs more as we need to request the URL in real time, and analyze it. We must request every page so we can analyze each of them separately.
Do you classify websites manually?
We classify the top sites manually, but our proprietary algorithms classify 99% of our database. This allows us to add new categories very fast, and let the algorithm update our database.
Can you provide me the raw data?
For an additional charge we can provide the raw data with or without updates (You can look at our: Offline Category Database).
Do you have free URL blacklist?
No, we only provide paid solutions, you can read more about it here.
Can I request a customized scan?
Can you add a custom category?
It’s possible, and we need to discuss it first to see what is required.
Can you provide a custom feature?
It’s possible, and we need to discuss it first to see what is required.
Which languages do you support?
We fully support the English version, and have partial support for: German, French, Spanish, Italian, Dutch, Japanese, Chinese, Arabic, Croatian, Czech, Finnish, Greek, Hebrew, Norwegian, Polish, Romanian, Portuguese, Russian, Swedish, Turkish, Yiddish, Bulgarian, Estonian, Filipino, Icelandic, Indonesian, Latvian, Lithuanian, Slovenian, Luxembourgish, Ukrainian, Amharic, Nepali, Korean, Vietnamese, Thai, Hindi, Tamil, Armenian, Bengali.
Partial support means we support the major categories needed for filtering of unsafe content.
Do you have real time virus detection?
We don’t provide a security feed.
Allowed data usage
- The data we provide must be bundled within a product/service.
- The can be used commercially.
- The data can’t compete with us and/or provide URL Classification service as a standalone service.
URL Classification API/Server questions
Do you classify web searches?
Yes, the server detects when a URL is for a search engine and part of a search query, then it will extract the keyword and classify the keyword within the database of over 20 million keywords and phrases in 20 languages.
Do you classify keywords or phrases?
Yes, we classify both. The server can accept a single keyword or a phrase for classification.
How do you handle sub domain for sites like wordpress.com?
We have a list of sites which provides custom sub domain and if that sub domain is new, the server will classify it in real-time.
How do you handle a new website that is not in your classification database?
Our servers classify new sites in real-time, it takes up to 30 seconds for slow sites to get a response, most time it will take 10 seconds. (so you don’t have to wait for someone to review it) and adds it to our sites database
We also automatically classify sites that we already know that are: Link farm, Empty, Under construction. In case they changed since the last crawl.
How do you handle Captcha protected site?
We have a category just for those sites, which means our crawler was blocked and the classification is unknown. We review the list and manually classify some of the sites which are popular.
Do you classify images?
No we don’t. We classify the whole website based on the text content of the site.
It’s important to understand that images are coming in context (HTML, search phrase) and the context is what determines the category.
What is your service coverage?
Our servers are located strategically around the globe to give good coverage for end users from different geographies. In cases where we have a client that has most of its users from one country, we may add a server specifically for that country.
The current server locations are:
- USA – Oregon
- USA – Ohio
- Australia – Brisbane
How often do you update your data?
We are updating our existing results every 90 days. New sites are checked in realtime.
Who needs to manage the servers?
We manage our servers, which are accessed by all our clients. You can request a dedicated server just for your clients, we can manage it for you, or you can manage it yourself.
Can I manage my own server?
For an additional charge we can provide the server software and data.
Can you set your servers to be a subdomain in my domain?
It’s possible, you’ll need to coordinate it with us.
What if we need to change a site classification?
In case there’s a mis-categorization, the client (not the end user) can email us or use the API to request a change. The change will be live within two business days. If it’s an emergency, and within business hours, it may be possible to do it within an hour, if it’s an isolated incident.
I want to know more about your API
You can read about the general description of the API: Website category API
You can read about the technical details of the API: Website category API reference
What’s the API latency?
If we have the classification of the site, the latency is the time it takes to establish the connection receive a reply. If we don’t have the classification of the site, then the latency is the connection and receive time, and an extra 1-30 seconds that the server classifies the site.
Which load can your servers handle?
We have two APIs which are provided based on the planned usage and technical needs:
- Per query API – 50 requests/second
- Per device API – 20,000 requests/second
Per user license
Do I need to provision a token for my end users?
No, we provide one token, so there’s no provisioning, we rely on the customer report on the number of end users served.
How do you count a user?
It depends on the business model:
- Users using the service for free – The number of unique users accessed the service during a month.
- Paid subscription – A paying user is considered one user.
- Paid family subscription – Half the number of users paid for, so a family account of 5 users is considered 2.5 users.
Custom URL Classification data
We can offer custom data on top of our existing solution such as: custom scans, custom coverage and more, contact us for more details.
Offline domain database
How many domains are in your database?
We have two databases, one with 67 million unique domains, and another with 180 million unique domains. Our databases grows with approximately one million domains every 3 months.
You can see the statistics and categories breakdown here: Offline URL Database changelog.
How many URLs are in your offline database?
We have an offline URL database, it contains 10 Billion URLs.
What is the format of your database?
- The offline domain database format is a text file inside a directory. Each directory represents a category. Each text file contains a list of the domains in cleartext.
- The offline URL database format can be delivered via: proprietary solution, MySQL, text files.
Are the files in the offline domain database encrypted or hashed?
No, the files are in cleartext format.
Can you convert your offline database to a custom format?
How offline is your offline database?
The database is 100% offline, we provide the raw data, there’s no SDK that communicates with our servers, and there’s no component that calls home in any way.
How do you update the database?
We provide a downloadable link every X months as agreed upon in the commercial terms.
Can I purchase only number of categories?
What if I need real time updates for new sites?
You can augment the offline database with our API, or if you have strict privacy requirements, you can deploy one of our servers in your datacenter.
What if I need updates for new sites, every X days?
Same answer as before.
What if I want updates for your database with my sites?
You can send us your list and we will add it to our crawl process.
Business related questions
Where are you located?
We are located in Israel.
Can I use your category list to show my clients, or put on my site?
Do you have any export restrictions
You can’t provide our services in the following countries: Lebanon, Syria, Iran, North Korea.
How can I pay you?
You can pay using: Paypal, direct Wire transfer.
Clients from Israel needs to add 17% VAT.
Do you own the intellectual property?
The solution was 100% developed by us, which means we can license the: data, server, database in various ways.
Can I buy the source code?
Can I license the software’s source code and be 100% independent of you?
Can I pay for a one time deal?
Yes, it’s possible to purchase a number of query credits without a subscription.
I need more information from you, do I need to sign an NDA?
No, as you can see we are very transparent. The final contract contains a mutual confidentiality clause.
Do we need to sign a contract?
It depends on the deal, we can either sign a contract that we provide or you can accept our online terms.
We are a big enterprise and we have our own contract, is it OK?
Most likely yes, but it’s on a per contract basis.