# MAGIC Correlations and Categories¶

The Cythereal MAGIC Report provides correlations between malware and categorization of the type of malware. The correlations are based on both similarity between binaries and the unpacked versions of binaries. These relationships are described in MAGIC Correlations. The categorizations are described in MAGIC Categories.

## API Endpoints¶

Full reference documentation: https://api.magic.cythereal.com/docs

GET /magic/{api_key}/{file_hash}

Retrieve the MAGIC correlations for a binary.

Parameters: Query Parameters: api_key (string) – API Key for accessing the service file_hash (string) – A cryptographic hash of a file. Supported hashes are SHA1. details (boolean) – If false, omits the match details. Defaults to true. threshold (number) – Only similarity matches at value equal or above threshold will be considered. MUST be in range [0, 1]. Defaults to 0.7. 200 OK – Request received.
GET /categories/{api_key}/{file_hash}

Retrieve the MAGIC categories for a binary. **Alpha Level Feature**

Parameters: api_key (string) – API Key for accessing the service file_hash (string) – A cryptographic hash of a file. Supported hashes are SHA1. 200 OK – Request received.
PUT /categories/{api_key}/{file_hash}

Save category for a binary. **Alpha Level Feature**

Parameters: api_key (string) – API Key for accessing the service file_hash (string) – A cryptographic hash of a file. Supported hashes are SHA1. 200 OK – Request received.

## CLI Commands¶

Retrieving a MAGIC report is not currently supported by the MAGIC client. Instead, normal utilities such as wget or curl can be used to access the API Endpoints.

Example:

# Retrieve the correlations report
curl 'https://api.magic.cythereal.com/magic/(api_key)/(sha1)' > magic_correlation.json

# Retrieve the correlations report with additional details
curl 'https://api.magic.cythereal.com/magic/(api_key)/(sha1)?details=yes' > magic_correlation_with_details.json

# Retrieve the categories report
curl 'https://api.magic.cythereal.com/categories/(api_key)/(sha1)' > magic_categories.json

curl -X PUT 'https://api.magic.cythereal.com/categories/(api_key)/(sha1)' -d 'category=(category)'


## MAGIC Correlations¶

An example MAGIC Correlations Report is given below. The fields are summarized in Fields Summary with pointers to detailed descriptions given as needed. The MAGIC correlations are in the answer field. The remaining top-level fields are the fields from the API Response Protocol.

{
"status": "success",
"statuscode": 0,
"vb_version": "VB-0.4",
"message": "MAGIC report found",
"hash":  "2d9035b27ce5c1cce2fb8432c76e315a1a2a8de0",
"version": "1.0.0",
"query": {
"sha1": "2d9035b27ce5c1cce2fb8432c76e315a1a2a8de0"
},
"summary": {
"01ab20c6bb66f5e393f53dd7f54bd2d829745a6e"
],
"008019b0ebf88cbcc4feab0347e3e823fdb1a372",
"fcf5100bc222fc21b7326bb9a99d1aeac83a0f3e"
],
"67120136e48d1307470e22c3a26d3a94bb843d36"
],
"weak_similar": [
]
},
"details": [
{
"sha1": "01ab20c6bb66f5e393f53dd7f54bd2d829745a6e",
"match_subtypes": [
{
"similarity": 1
},
{
"match_subtype": "matches_original",
"similarity": 1
}
],
"max_similarity": 1
},
{
"sha1": "008019b0ebf88cbcc4feab0347e3e823fdb1a372",
"match_subtypes": [
{
"match_subtype": "matches_original",
"similarity": 0.8868
}
],
"max_similarity": 0.8868
},
{
"sha1": "fcf5100bc222fc21b7326bb9a99d1aeac83a0f3e",
"match_subtypes": [
{
"match_subtype": "matches_original",
"similarity": 0.8868
}
],
"max_similarity": 0.8868
},
{
"sha1": "67120136e48d1307470e22c3a26d3a94bb843d36",
"match_subtypes": [
{
"similarity": 1
}
],
"max_similarity": 1
},
{
"match_type": "weak_similar",
"match_subtypes": [
{
"match_subtype": "matches_weak_down",
"similarity": 0.8877
},
{
"match_subtype": "matches_original",
"similarity": 0.8877
}
],
"max_similarity": 0.8877
}
]
}
}


### Fields Summary¶

The query field records information about the binary this report is for. Currently, it only contains the query.sha1 field, which records the SHA1 of the binary the magic report is for.

The version field records the MAGIC report version. This is distinct from the API version.

The details field records details about the binaries matched with the query binary. This field is further described in Match Details.

The summary field contains the SHA1 of binaries matched with the query binary, i.e., the SHA1 values from details, separated by their MAGIC Match Classification. This can be similar_packer_similar_payload, similar_packer_different_payload, different_packer_different_payload, or weak_similar. These classifications are described in Match Classifications.

### Match Classifications¶

There are four types of matches between binaries that Cythereal MAGIC reports. To understand the difference between the four similarities, it would be helpful to know how Cythereal MAGIC compares malware binaries.

Cythereal MAGIC compares malware binaries along two dimensions: a) comparing the original (i.e., packed) malware binary and b) comparing the payload resulting from unpacking the malware.

These two dimensions of similarity give four possibilities:

• Similar Packer Similar Payload: Two malware binaries have “strong” similarity if their original (i.e., packed) binaries are similar AND their payloads (i.e., unpacked) binaries are also similar. This is the strongest form of similarity.
• Similar Packer Different Payload: When the original/packed binaries are similar, but the payloads are not.
• Different Packer Similar Payload: When the original/unpacked binaries are not similar, but the payloads are similar.
• Weak Similar: When a packed form of a malware is similar to the payload of another malware.

The Weak Similar relation is named as such because we do not expect similarities between packed and unpacked binaries. Packing a binary generally results in a packed binary that shares little resemblance with the original, unpacked binary. Since we expect malware to be packed and that the unpacked binary will be dissimilar from the packed version, we separate out cases in which this expectation is violated. This can happen for a number of reasons such as a malware not being packed (thus resulting in an unpacked version identical to the packed version), malware defeating the unpackers (again resulting in identical packed and unpacked versions), or an unpacked file being misidentified as packed.

### Match Details¶

The details of each match reported in the summary field of the MAGIC correlation reports are present in the details field. This field is included by default, but can be removed by setting the details parameter to no for the magic correlation report API call.

The details field from the example in MAGIC Correlations is given below.

{
"details": [
{
"sha1": "01ab20c6bb66f5e393f53dd7f54bd2d829745a6e",
"match_subtypes": [
{
"similarity": 1
},
{
"match_subtype": "matches_original",
"similarity": 1
}
],
"max_similarity": 1
},
{
"sha1": "008019b0ebf88cbcc4feab0347e3e823fdb1a372",
"match_subtypes": [
{
"match_subtype": "matches_original",
"similarity": 0.8868
}
],
"max_similarity": 0.8868
},
{
"sha1": "fcf5100bc222fc21b7326bb9a99d1aeac83a0f3e",
"match_subtypes": [
{
"match_subtype": "matches_original",
"similarity": 0.8868
}
],
"max_similarity": 0.8868
},
{
"sha1": "67120136e48d1307470e22c3a26d3a94bb843d36",
"match_subtypes": [
{
"similarity": 1
}
],
"max_similarity": 1
},
{
"match_type": "weak_similar",
"match_subtypes": [
{
"match_subtype": "matches_weak_down",
"similarity": 0.8877
},
{
"match_subtype": "matches_original",
"similarity": 0.8877
}
],
"max_similarity": 0.8877
}
]
}


The details field contains list with a Match Details object for every binary that matched with the query binary. This details object contains four fields: sha1, match_type, match_subtypes, and max_similarity. The sha1 field contains the SHA1 of the matched binary and the match_type field contains the MAGIC Match Classification for the match. The match_subtypes and max_similarity fields are described below.

The match_subtypes field contains a list of objects with the fields match_subtype and similarity. As previously mentioned, MAGIC uses similarities between both original (packed) binaries and the unpacked version of these binaries (the payload). The match_subtypes list will contain an entry for all similarity relationships between these original and unpacked binaries with the match_subtype field indicating the type of relationship (see below for the various types) and the similarity field giving the actual simliarity score. This will be a value between 0 and 1, with larger values indicating greater similarity.

The max_similarity field lists the maximum similarity value from the list in match_subtypes.

The figure below illustrates the possible match subtypes that can be present in match_subtypes. In this figure A and B are malware and a and b are their respective unpacked versions. The malware A is the query binary.

The following match subtypes are possible. Note that all reported relationships are reported between A and B. So a similar to b results in relationship A matches_payload B

A similar_to B => matches_original
A similar_to b => matches_weak_down
a similar_to B => matches_weak_up


In addition, if an original binary has been unpacked to two similar but distinct unpacked versions, it can result in a matches_self class.

Similarity between two binaries, both packed and unpacked, is determined by the percent of their MAGIC Genome that is shared. The MAGIC Genome is described in MAGIC Genomic Features.

## MAGIC Categories¶

Warning

MAGIC Report Categories is currently an alpha level feature and under active development. It may change without notice in backwards incompatible ways. It is not recommended for production environments.

The MAGIC Categories report assigns to the query binary a number of categories and associated category scores. A category describes the type of malware the binary is, such as ransomware, rogue antivirus, etc. The category score for a given category provides a “confidence level” for that category. The higher the category score is, the higher confidence that the malware belongs to that category.

An example MAGIC Categories report is given below:

{
"ground_truth": {
"total_categories": 3,
"categories": [
{"name": "cryptolocker", "score": 15},
{"name": "ransomeware", "score": 13} ,
]
},
"categorization_result": {
"total_categories": 5,
"categories": [
{"name": "cryptolocker", "score": 24},
{"name": "ransomeware", "score": 11},
{"name": "zeus", "score": 15},
]
}
}


The report consist of a single dictionary with two keys: ground_truth and categorization_result. The ground_truth contains categories directly assigned to the file while categorization_result contains categories assigned to the file on the basis of its MAGIC Correlations. If a binary is not assigned to any categories, this dictionary will be empty. For each element in categories key, the associate value is a dictionary containing the keys name and score. The name key provides the category name. The score key provides the category score for the associated category. The higher the category score is, the higher confidence that the malware belongs to that category. So in the example, the score for the ransomware category in categorization_result is 11.