Offline Category Database

Offline Category Database

What is included with our Offline Category Database?

One of the options for licensing our product is through the locally hosted domain categorization database. It has our entire domain classification data stored in text files.

There are two versions:

  • Active database: Which takes about 2 Gigabytes, and contains 65 million domains.
  • Full database: Which takes about 18 Gigabytes, and contains 600 million domains.

We also have a separate database that contains 10 billion classified URLs (for example a URL inside Wikipedia), the database size if 4TB and we use a custom database for fast data retrieval.

A true Offline Category Database

Some solutions provide a local API that communicates with a cloud server to download dynamic data or updates, this can be the right solution for specific use cases, but this is not a true offline database.

Use cases of the Offline Category Database

Companies usually license it when they have privacy or SLA requirements that aren’t supported with the regular classification API.

Example of use cases:

Offline domain database for Parental Control URL Filtering

A Parental control company that needs to support Content Categorizations and their endpoints but have strict SLA and privacy requirements and can’t connect to any 3rd party.

The solution would be to deploy our Offline Category Server on the client’s data centre premise, which means that the client controls the SLA and privacy.

Offline domain database for Endpoint URL Filtering

A company that manufactures endpoint devices (firewall, smart router) with zero latency and strict privacy can embed the offline domain database at their device.

Every agreed-upon interval, they can download an update from our servers and deploy it to their servers. Which means they have 100% control over SLA and privacy.

Offline URL database for Brand Safety

Using the offline database (Domain or URL) for brand safety or RTB is not a typical scenario, but certain privacy regulation or client’s SLA may mandate the database’s use.

Advantages of using the Offline Category Database

  • No data is transmitted to our servers, not even for updates, 100% offline.
  • Data is provided in clear-text format for easy and fast integration.
  • By the terms of the SLA, the licensee is in full control.
  • Supports high volume workloads, is not limited by bandwidth or our servers’ speed.
  • Can be loaded to a database of your choice.
  • You receive updates every three months.

Disadvantages of using the Offline Category Database

  • Can’t categorize new sites (it’s possible to combine this with a locally hosted server, or use our servers to defer to the regular API for new sites if privacy compliance allows).
  • Suitable only for medium to large businesses because of database price and IT knowledge required to work with it.
  • No support for keyword classification (can be deferred to a server as well)
  • It takes size, only the Offline Domain Database can be used on endpoints. The Offline URL Database can be placed on servers only.

Offline Domain Database structure

The offline domain’s database structure is 170 directories (one for each classification). Each directory contains a text file with the domains that are part of that classification. This structure allows for easy importing to any database like MySQL, PostgreSQL and more.

For example, under the directory ‘News’ there will be a file called “domains”, and it will contain:

Offline URL Database structure

The offline URL database is quite large, over 4TB of data, we can provide it in various formats:

  • Raw data.
  • Inside of a rational database like: MySQL, Redshift, PostgreSQL.
  • In our own custom storage software.