Offline Category Database

Offline Category Database

One of the options for licensing our product is through the locally hosted domain categorization database. It has our entire domain classification data stored in text files, which takes about 1.5 Gigabytes, and contains 60 million domains.

We also have a separate database that contains 5 billion classified URLs (for example a URL inside Wikipedia), the database size if 4TB and we use a custom database for fast data retrieval.
A vital factor to check before purchasing an offline database is – is it truly an offline database.

Some solutions provide a local API that communicates with a server to download dynamic data, this can be the right solution for specific use cases, but this is not an offline database.

Companies usually license it when they have privacy or SLA requirements that aren’t supported with the regular classification API.

Advantages of using the offline database are:

  •  No data is transmitted to our servers
  •  By the terms of the SLA, the licensee is in full control
  •  Supports high volume workloads, is not limited by bandwidth or our servers’ speed
  •  Can be loaded to a database of your choice
  •  You receive updates every three months

Disadvantages:

  •  Can’t categorize new sites (it’s possible to combine this with a locally hosted server, or use our servers to defer to the regular API for new sites if privacy compliance allows)
  •  Suitable only for medium to large businesses because of database price and IT knowledge required to work with it
  •  No support for keyword classification (can be deferred to a server as well)
  •  It takes size, so can’t be used on endpoints and routers

The offline domain’s database structure is 160 directories (one for each classification). Each directory contains a text file with the domains that are part of that classification. This structure allows for easy importing to any database like MySQL, PostgreSQL and more.

For example, under the directory ‘News’ there will be a file called “domains”, and it will contain:

cnn.com
foxnews.com
reuters.com

The offline URL’s database is contained in custom software to allow for fast data retrieval.