Model package#

This package exposes model (mostly transformers and classifiers) that apply complex transformations. These are to be leveraged by members of Dataset package and are directly applied on members of Sample class (or on built-in objects).


The examples related to this package can be found at model notebook.


class sec_certs.model.CPEClassifier(match_threshold=80, n_max_matches=10, spacy_model_to_use='en_core_web_sm')#

Class that can predict CPE matches for certificate instances. Adheres to sklearn sklearn.base.BaseEstimator interface. Fit method is called on list of CPEs and build two look-up dictionaries, see description of attributes.

fit(X, y=None)#

Just creates look-up structures from provided list of CPEs

  • X (List[CPE]) – List of CPEs that can be matched with predict()

  • y (Optional[List[str]]) – will be ignored, specified to adhere to sklearn BaseEstimator interface, defaults to None

Return CPEClassifier:

return self to allow method chaining


Will predict CPE uris for List of Tuples (vendor, product name, identified versions in product name)


X (List[Tuple[str, str, str]]) – tuples (vendor, product name, identified versions in product name)

Return List[Optional[Set[str]]]:

List of CPE uris that correspond to given input, None if nothing was found.

predict_single_cert(vendor, product_name, versions, relax_version=False, relax_title=False)#

Predict List of CPE uris for triplet (vendor, product_name, list_of_versions). The prediction is made as follows: 1. Sanitize vendor name, lemmatize product name. 2. Find vendors in CPE dataset that are related to the certificate 3. Based on (vendors, versions) find all CPE items that are considered as candidates for match 4. Compute string similarity of the candidate CPE matches and certificate name 5. Evaluate best string similarity, if above threshold, declare it a match. 6. If no CPE item is matched, try again but relax version and check CPEs that don’t have their version specified. 7. (Also, search for 100% CPE matches on item name instead of title.)

  • vendor (Optional[str]) – manufacturer of the certificate

  • product_name (str) – name of the certificate

  • versions (Set[str]) – List of versions that appear in the certificate name

  • relax_version (bool) – See step 6 above., defaults to False

  • relax_title (bool) – See step 7 above, defaults to False

Return Optional[Set[str]]:

Set of matching CPE uris, None if no matches found


class sec_certs.model.SARTransformer#

Class for transforming SARs defined in st_keywords and report_keywords dictionaries into SAR objects. This class implements sklearn.base.Transformer interface, so fit_transform() can be called on it.


Just returns self, no fitting needed


certificates (Iterable[CCCertificate]) – Unused parameter

Return SARTransformer:

return self


Just a wrapper around transform_single_cert() called on an iterable of CCCertificate.


certificates (Iterable[CCCertificate]) – Iterable of CCCertificate objects to perform the extraction on.

Return List[Optional[Set[SAR]]]:

Returns List of results from transform_single_cert().


Given CCCertificate, will transform SAR keywords extracted from txt files into a set of SAR objects. Also handles extractin of correct SAR levels, duplicities and filtering. Uses three sources: CSV scan, security target, and certification report. The caller should assure that the certificates have the keywords extracted.


cert (CCCertificate) – Certificate to extract SARs from

Return Optional[Set[SAR]]:

Set of SARs, None if none were identified.


class sec_certs.model.ReferenceFinder#

The class assigns references of other certificate instances for each instance. Adheres to sklearn BaseEstimator interface. The fit is called on a dictionary of certificates, builds a hashmap of references, and assigns references for each certificate in the dictionary.

property duplicates#

Get the duplicates in the fitted dataset.

Return IDMapping:

Mapping of certificate ID to digests that share it.

fit(certificates, id_func, ref_lookup_func)#

Builds a list of references and assigns references for each certificate instance.

  • certificates (Certificates) – dictionary of certificates with hashes as key

  • id_func (IDLookupFunc) – lookup function for cert id

  • ref_lookup_func (ReferenceLookupFunc) – lookup for references

predict(dgst_list, keep_unknowns=True)#

Get the references for a list of certificate digests.

  • dgst_list – List of certificate digests.

  • keep_unknowns – Whether to keep references to and from unknown certificate IDs

Return Dict[str, References]:

Dict with certificate hash and References object.

predict_single_cert(dgst, keep_unknowns=True)#

Get the references object for specified certificate digest.

  • dgst – certificate digest

  • keep_unknowns – Whether to keep references to unknown certificate IDs

Return References:

References object

property unknown_references#

Get the unknown references in the fitted dataset (to unknown certificate IDs, not in the dataset during fit).


class sec_certs.model.TransitiveVulnerabilityFinder(id_func)#

The class assigns vulnerabilities to each certificate instance caused by references among certificate instances. Adheres to sklearn BaseEstimator interface.

fit(certificates, ref_func)#

Method assigns each certificate vulnerabilities caused by references among certificates


certificates (Certificates) – Dictionary of certificates with digests

Return Vulnerabilities:

Dictionary of vulnerabilities of certificate instances


Method returns vulnerabilities for a list of certificate digests


dgst_list (List[str]) – list of certificate digests

Return Dict[str, TransitiveCVE]:

Dictionary of TransitiveCVE objects for specified certificate digests


Method returns vulnerabilities for certificate digest


dgst (str) – Digest of certificate

Return TransitiveCVE:

TransitiveCVE object of certificate