Offline Category Database

Offline Category Database

What is included with our Offline Category Database?

One of the options for licensing our product is through the locally hosted domain categorization database. It has our entire domain classification data stored in text files.

There are three offline database versions:

  • Active database: Which takes about 2.5 Gigabytes, and contains over 67 million domains.
  • Full database: Which takes about 14 Gigabytes, and contains 270 million domains.
  • Deep scan URL database: A database with classification of the actual URLs inside a site (for example a URL inside Wikipedia) which takes about 4TB and contains over 30 billion URLs.

A true Offline Category Database

Some solutions provide a local API that communicates with a cloud server to download dynamic data or updates, this can be the right solution for specific use cases, but this is not a true offline database.

Use cases of the Offline Category Database

Companies usually license it when they have privacy or SLA requirements that aren’t supported with the regular classification API.

Example of use cases:

Offline domain database for Parental Control URL Filtering

A Parental control company that needs to support Content Categorizations and their endpoints but have strict SLA and privacy requirements and can’t connect to any 3rd party.

The solution would be to deploy our Offline Category Server on the client’s data center premise, which means that the client controls the SLA and privacy.

Offline domain database for Endpoint URL Filtering

A company that manufactures endpoint devices (firewall, smart router) with zero latency and strict privacy can embed the offline domain database at their device.

Every agreed-upon interval, they can download an update from our servers and deploy it to their servers. Which means they have 100% control over SLA and privacy.

Offline URL database for Brand Safety

Using the offline database (Domain or URL) for brand safety or RTB is not a typical scenario, but certain privacy regulation or client’s SLA may mandate the database’s use.

Advantages of using the Offline Category Database

  • No data is transmitted to our servers, not even for updates, 100% offline.
  • Data is provided in clear-text format for easy and fast integration.
  • By the terms of the SLA, the licensee is in full control.
  • Supports high volume workloads, is not limited by bandwidth or our servers’ speed.
  • Can be loaded to a database of your choice.
  • You receive updates every three months.

Disadvantages of using the Offline Category Database

  • Can’t categorize new sites (it’s possible to combine this with a locally hosted server, or use our servers to defer to the regular API for new sites if privacy compliance allows).
  • Suitable only for medium to large businesses because of the URL filter database price and IT knowledge required to work with it.
  • No support for keyword classification (can be deferred to a server as well)
  • It takes size, only the Offline Domain Database can be used on endpoints. The Offline URL Database can be placed on servers only.

Offline Domain Database structure

The offline domain’s database structure is 170 directories (one for each classification). Each directory contains a text file with the domains that are part of that classification. This structure allows for easy importing to any database like MySQL, PostgreSQL and more.

For example, under the directory ‘News’ there will be a file called “domains”, and it will contain:

cnn.com
foxnews.com
reuters.com

Offline Deep scan URL Database structure

The offline Deep scan URL database is quite large, over 4TB of data, we can provide it in various formats:

  • Raw data.
  • Inside of a rational database like: MySQL, Redshift, PostgreSQL.
  • In our own custom storage software.

Offline URL Database changelog

You can view the domain count and other changes in the Offline URL Database changelog page.