Common Criteria example#
This notebook illustrates basic functionality with the CCDataset
class that holds Common Criteria dataset and of its sample CCCertificate
.
Note that there exists a front end to this functionality at seccerts.org/cc. Before reinventing the wheel, it’s good idea to check our web. Maybe you don’t even need to run the code, but just use our web instead.
from sec_certs.dataset import CCDataset
from sec_certs.sample import CCCertificate
import pandas as pd
Get fresh dataset snapshot from mirror#
There’s no need to do full processing of the dataset by yourself, unless you modified sec-certs
code. You can simply fetch the processed version from the web.
Note, however, that you won’t be able to access the pdf
and txt
files of the certificates. You can only get the data that we extracted from it.
Running the whole pipeline can get you the pdf
and txt
data. You can see how to do that in the last cell of this notebook.
dset = CCDataset.from_web_latest()
print(len(dset)) # Print number of certificates in the dataset
Do some basic dataset serialization#
The dataset can be saved/loaded into/from json
. Also, the dataset can be converted into a pandas DataFrame.
# Dump dataset into json and load it back
dset.to_json("./cc_dset.json")
new_dset: CCDataset = CCDataset.from_json("./cc_dset.json")
assert dset == new_dset
# Turn dataset into Pandas DataFrame
df = dset.to_pandas()
Simple dataset manipulation#
The certificates of the dataset are stored in a dictionary that maps certificate’s primary key (we call it dgst
) to the CCCertificate
object. The primary key of the certificate is simply a hash of the attributes that make the certificate unique.
You can iterate over the dataset which is handy when selecting some subset of certificates.
# Iterate over certificates in dataset
for cert in dset:
pass
# Get certificates produced by Infineon manufacturer
infineon_certs = [x for x in dset if "Infineon" in x.manufacturer]
df_infineon = df.loc[df.manufacturer.str.contains("Infineon", case=False)]
# Get certificates with some CVE
vulnerable_certs = [x for x in dset if x.heuristics.related_cves]
df_vulnerable = df.loc[~df.related_cves.isna()]
# Show CVE ids of some vulnerable certificate
print(f"{vulnerable_certs[0].heuristics.related_cves=}")
# Get certificates from 2015 and newer
df_2015_and_newer = df.loc[df.year_from > 2014]
# Plot distribution of years of certification
df.year_from.value_counts().sort_index().plot.line()
Dissect single certificate#
The CCCertificate
is basically a data structure that holds all the data we keep about a certificate. Other classes (CCDataset
or model
package members) are used to transform and process the certificates. You can see all its attributes at API docs.
# Select a certificate and print some attributes
cert: CCCertificate = dset["bad93fb821395db2"]
print(f"{cert.name=}")
print(f"{cert.heuristics.cpe_matches=}")
print(f"{cert.heuristics.report_references.directly_referencing=}")
# Select all certificates from a dataset for which we detect at least one vulnerability.
vulnerable_certs = [x for x in dset if x.heuristics.related_cves]
Serialize single certificate#
Again, a certificate can be (de)serialized into/from json. It’s also possible to construct pandas Series
from a certificate as shown below
cert.to_json("./cert.json")
new_cert = cert.from_json("./cert.json")
assert cert == new_cert
# Serialize as Pandas series
ser = pd.Series(cert.pandas_tuple, index=cert.pandas_columns)
Assign dataset with CPE records and compute vulnerabilities#
Note: The data is already computed on dataset obtained with from_web_latest()
, this is just for illustration.
Note: This may likely not run in Binder, as the corresponding CVEDataset
and CPEDataset
instances take a lot of memory.
# Automatically match CPEs and CVEs
dset.compute_cpe_heuristics()
dset.compute_related_cves()
Create new dataset and fully process it#
The following piece of code roughly corresponds to $ cc-certs all
CLI command – it fully processes the CC pipeline. This will create a folder in current working directory where the outputs will be stored.
Warning: It’s not good idea to run this from notebook. It may take several hours to finnish. We recommend using from_web_latest()
or turning this into a Python script.
dset = CCDataset()
dset.get_certs_from_web()
dset.process_auxillary_datasets()
dset.download_all_artifacts()
dset.convert_all_pdfs()
dset.analyze_certificates()