Sample package#
This package holds mostly data objects of primary interest (Common Criteria, FIPS), or assisting objects like CPE, CVE, etc. The objects mostly hold data and allow for serialization, but can also perform some basic transformations.
The examples related to this package can be found in the common criteria notebook and the fips notebook.
- class sec_certs.sample.CCCertificate(status, category, name, manufacturer, scheme, security_level, not_valid_before, not_valid_after, report_link, st_link, cert_link, manufacturer_web, protection_profile_links, maintenance_updates, state, pdf_data, heuristics)#
Data structure for common criteria certificate. Contains several inner classes that layer the data logic. Can be serialized into/from json (ComplexSerializableType) or pandas (PandasSerializableType). Is basic element of CCDataset. The functionality is mostly related to holding data and transformations that the certificate can handle itself. CCDataset class then instrument this functionality.
- class Heuristics(extracted_versions=None, cpe_matches=None, verified_cpe_matches=None, related_cves=None, cert_lab=None, cert_id=None, prev_certificates=None, next_certificates=None, st_references=<factory>, report_references=<factory>, annotated_references=None, extracted_sars=None, direct_transitive_cves=None, indirect_transitive_cves=None, scheme_data=None, protection_profiles=None, eal=None)#
Class for various heuristics related to CCCertificate
- class InternalState(report=<factory>, st=<factory>, cert=<factory>)#
Holds internal state of the certificate, whether downloads and converts of individual components succeeded. Also holds information about errors and paths to the files.
- class MaintenanceReport(maintenance_date, maintenance_title, maintenance_report_link, maintenance_st_link)#
Object for holding maintenance reports.
- class PdfData(report_metadata=None, st_metadata=None, cert_metadata=None, report_frontpage=None, st_frontpage=None, cert_frontpage=None, report_keywords=None, st_keywords=None, cert_keywords=None, report_filename=None, st_filename=None, cert_filename=None)#
Class that holds data extracted from pdf files.
- property cert_lab#
Returns labs for which certificate data was parsed.
- filename_cert_id(scheme)#
Get cert_id candidates from the matches in the report filename and cert filename.
- frontpage_cert_id(scheme)#
Get cert_id candidate from the frontpage of the report.
- keywords_cert_id(scheme)#
Get cert_id candidates from the keywords matches in the report and cert.
- metadata_cert_id(scheme)#
Get cert_id candidates from the report metadata.
- property actual_sars#
Computes actual SARs. First, SARs implied by EAL are computed. Then, these are augmented with heuristically extracted SARs.
- Return Optional[Set[SAR]]:
Set of actual SARs of a certificate, None if empty
- compute_heuristics_cert_id()#
Compute the heuristics cert_id of this cert, using several methods.
The candidate cert_ids are extracted from the frontpage, PDF metadata, filename, and keywords matches.
Finally, the cert_id is canonicalized.
- compute_heuristics_cert_lab()#
Fills in the heuristically obtained evaluation laboratory into attribute in heuristics class.
- compute_heuristics_cert_versions(cert_ids)#
Fills in the previous and next certificate versions based on the cert ID.
- compute_heuristics_version()#
Fills in the heuristically obtained version of certified product into attribute in heuristics class.
- static convert_cert_pdf(cert)#
Converts the pdf certificate to txt, given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to convert the certificate for
- Return CCCertificate:
the modified certificate with updated state
- static convert_report_pdf(cert)#
Converts the pdf certification report to txt, given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to convert the pdf report for
- Return CCCertificate:
the modified certificate with updated state
- static convert_st_pdf(cert)#
Converts the pdf security target to txt, given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to convert the pdf security target for
- Return CCCertificate:
the modified certificate with updated state
- property dgst#
Computes the primary key of the sample using first 16 bytes of SHA-256 digest
- static download_pdf_cert(cert)#
Downloads pdf of the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to download the pdf of
- Return CCCertificate:
returns the modified certificate with updated state
- static download_pdf_report(cert)#
Downloads pdf of certification report given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to download the pdf report for
- Return CCCertificate:
returns the modified certificate with updated state
- static download_pdf_st(cert)#
Downloads pdf of security target given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to download the pdf security target for
- Return CCCertificate:
returns the modified certificate with updated state
- static extract_cert_pdf_keywords(cert)#
Matches regular expressions in txt obtained from the certificate and extracts the matches into attribute. Static method to allow for parallelization
- Parameters:
cert (CCCertificate) – certificate to extract the keywords for.
- Return CCCertificate:
the modified certificate with extracted keywords.
- static extract_cert_pdf_metadata(cert)#
Extracts metadata from certificate pdf given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to extract the metadata for.
- Return CCCertificate:
the modified certificate with updated state
- static extract_report_pdf_frontpage(cert)#
Extracts data from certification report pdf frontpage given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to extract the frontpage data for.
- Return CCCertificate:
the modified certificate with updated state
- static extract_report_pdf_keywords(cert)#
Matches regular expressions in txt obtained from certification report and extracts the matches into attribute. Static method to allow for parallelization
- Parameters:
cert (CCCertificate) – certificate to extract the keywords for.
- Return CCCertificate:
the modified certificate with extracted keywords.
- static extract_report_pdf_metadata(cert)#
Extracts metadata from certification report pdf given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to extract the metadata for.
- Return CCCertificate:
the modified certificate with updated state
- static extract_st_pdf_keywords(cert)#
Matches regular expressions in txt obtained from security target and extracts the matches into attribute. Static method to allow for parallelization
- Parameters:
cert (CCCertificate) – certificate to extract the keywords for.
- Return CCCertificate:
the modified certificate with extracted keywords.
- static extract_st_pdf_metadata(cert)#
Extracts metadata from security target pdf given the certificate. Staticmethod to allow for parallelization.
- Parameters:
cert (CCCertificate) – cert to extract the metadata for.
- Return CCCertificate:
the modified certificate with updated state
- classmethod from_dict(dct)#
Deserializes dictionary into CCCertificate
- classmethod from_html_row(row, status, category)#
Creates a CC sample from html row of webpage.
- merge(other, other_source=None)#
Merges with other CC sample. Assuming they come from different sources, e.g., csv and html. Assuming that html source has better protection profiles, they overwrite CSV info. On other values the sanity checks are made.
- property pandas_tuple#
Returns tuple of attributes meant for pandas serialization
- set_local_paths(report_pdf_dir, st_pdf_dir, cert_pdf_dir, report_txt_dir, st_txt_dir, cert_txt_dir)#
Sets paths to files given the requested directories
- Parameters:
report_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf reports shall be stored
st_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf security targets shall be stored
cert_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf certificates shall be stored
report_txt_dir (Optional[Union[str, Path]]) – Directory where txt reports shall be stored
st_txt_dir (Optional[Union[str, Path]]) – Directory where txt security targets shall be stored
cert_txt_dir (Optional[Union[str, Path]]) – Directory where txt certificates shall be stored
- class sec_certs.sample.ProtectionProfile(web_data, pdf_data=None, heuristics=None, state=None)#
- class Heuristics#
- class InternalState(pp=<factory>, report=<factory>)#
Class to hold internal state for each of the documents.
- class PdfData(report_metadata=None, pp_metadata=None, report_keywords=None, pp_keywords=None, report_filename=None, pp_filename=None)#
Class to hold data related to PDF and txt files related to protection profiles.
- class WebData(category, status, is_collaborative, name, version, security_level, not_valid_before, not_valid_after, report_link, pp_link, scheme, maintenances)#
Class to hold metadata about protection profiles found on
- classmethod from_html_row(row, status, category, is_collaborative)#
Given bs4 tag of html row (fetched from cc portal), will build the object.
- static convert_pp_pdf(cert)#
Converts the actual protection profile from pdf to txt.
- static convert_report_pdf(cert)#
Converts certification reports from pdf to txt.
- property dgst#
digest of thwe protection profile, formed as first 16 bytes of category|name|version fields from WebData object.
- static download_pdf_pp(cert)#
Downloads actual pdf of the given protection profile.
- static download_pdf_report(cert)#
Downloads pdf of certification report for the given protection profile.
- static extract_pp_pdf_keywords(cert)#
Extracts keywords using regexes from the actual protection profile.
- static extract_pp_pdf_metadata(cert)#
Extracts various pdf metadata from the actual protection profile.
- static extract_report_pdf_keywords(cert)#
Extracts keywords using regexes from the certification report.
- static extract_report_pdf_metadata(cert)#
Extracts various pdf metadata from the certification report.
- classmethod from_html_row(row, status, category, is_collaborative)#
Builds a ProtectionProfile object from html row obtained from cc portal html source.
- set_local_paths(report_pdf_dir, pp_pdf_dir, report_txt_dir, pp_txt_dir)#
Adjusts local paths for various files.
- class sec_certs.sample.FIPSCertificate(cert_id, web_data=None, pdf_data=None, heuristics=None, state=None)#
Data structure for common FIPS 140 certificate. Contains several inner classes that layer the data logic. Can be serialized into/from json (ComplexSerializableType). Is basic element of FIPSDataset. The functionality is mostly related to holding data and transformations that the certificate can handle itself. FIPSDataset class then instrument this functionality.
- class Heuristics(algorithms=<factory>, extracted_versions=<factory>, cpe_matches=None, verified_cpe_matches=None, related_cves=None, policy_prunned_references=<factory>, module_prunned_references=<factory>, policy_processed_references=<factory>, module_processed_references=<factory>, direct_transitive_cves=None, indirect_transitive_cves=None)#
Data structure that holds data obtained by processing the certificate and applying various heuristics.
- property algorithm_numbers#
Returns numbers of algorithms
- class InternalState(module_download_ok=False, policy_download_ok=False, policy_convert_garbage=False, policy_convert_ok=False, module_extract_ok=False, policy_extract_ok=False, policy_pdf_hash=None, policy_txt_hash=None)#
Holds state of the FIPSCertificate
- class PdfData(keywords=<factory>, policy_metadata=<factory>)#
Data structure that holds data obtained from scanning pdf files (or their converted txt documents).
- property certlike_algorithm_numbers#
Returns numbers of certificates from keywords[“fips_certlike”][“Certlike”]
- class ValidationHistoryEntry(date: 'date', validation_type: "Literal['initial', 'update']", lab: 'str')#
- class WebData(module_name=None, validation_history=None, vendor_url=None, vendor=None, certificate_pdf_url=None, module_type=None, standard=None, status=None, level=None, caveat=None, exceptions=None, embodiment=None, description=None, tested_conf=None, hw_versions=None, fw_versions=None, sw_versions=None, mentioned_certs=None, historical_reason=None, date_sunset=None, revoked_reason=None, revoked_link=None)#
Data structure for data obtained from scanning certificate webpage at
- compute_heuristics_version()#
Heuristically computes the version of the product.
- static convert_policy_pdf(cert)#
Converts policy pdf -> txt
- property dgst#
Returns primary key of the certificate, its id.
- static extract_policy_pdf_keywords(cert)#
Extract keywords from policy document
- static extract_policy_pdf_metadata(cert)#
Extract the PDF metadata from the security policy.
- static get_algorithms_from_policy_tables(cert)#
Retrieves IDs of algorithms from tables inside security policy pdfs. External library is used to handle this.
- prune_referenced_cert_ids()#
This method goes through all IDs (numbers) that correspond to FIPS Certificates and are stored in pdf_data.keywords or web_data.mentioned_certs. It performs prunning of these attributes and fills attributes heuristics.prunned_module_references and heuristics.prunned_policy_references. These variables are further processed and Reference objects are created from them.