A tool for data scraping and analysis of security certificates from Common Criteria and FIPS 140-2/3 frameworks.

Website Website PyPI DockerHub PyPI - Python Version Tests Codecov

Paper pre-prints#

Two papers related to this tool are accepted for publication. See the arXiv pre-prints below.



Use Docker with docker pull seccerts/sec-certs or just pip install -U sec-certs && python -m spacy download en_core_web_sm. For more elaborate description, see docs.


There are two main steps in exploring the world of security certificates:

  1. Data scraping and data processing all the certificates

  2. Exploring and analysing the processed data

For the first step, we currently provide CLI. For the second step, we provide simple API that can be used directly inside our Jupyter notebook or locally, together with a fully processed datasets that can be downloaded.

More elaborate usage is described in docs/quickstart. Also, see example notebooks either at GitHub or at docs. From docs, you can also run our notebooks in Binder.

Data scraping#

Run sec-certs cc all for Common Criteria processing, sec-certs fips all for FIPS 140 processing.

Data analysis#

Most probably, you don’t want to fully process the certification artifacts by yourself. Instead, you can use our results and explore them as a data structure. An example snippet follows. For more, see example notebooks. Tip: these can be run with Binder from our docs.

from sec_certs.dataset import CCDataset

dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
df = dset.to_pandas()  # Or you can transform the object into Pandas dataframe
    './latest_cc_snapshot.json')  # You may want to store the snapshot as json, so that you don't have to download it again
dset = CCDataset.from_json('./latest_cc_snapshot.json')  # you can now load your stored dataset again

# Get certificates with some CVE
vulnerable_certs = [x for x in dset if x.heuristics.related_cves]
df_vulnerable = df.loc[~df.related_cves.isna()]

# Show CVE ids of some vulnerable certificate

# Get certificates from 2015 and newer
df_2015_and_newer = df.loc[df.year_from > 2014]

# Plot distribution of years of certification