*Result*: MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection

Title:
MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection
Source:
Journal of Cybersecurity and Privacy ; Volume 5 ; Issue 4 ; Pages: 109
Publisher Information:
Multidisciplinary Digital Publishing Institute
Publication Year:
2025
Collection:
MDPI Open Access Publishing
Document Type:
*Academic Journal* text
File Description:
application/pdf
Language:
English
Relation:
Security Engineering & Applications; https://dx.doi.org/10.3390/jcp5040109
DOI:
10.3390/jcp5040109
Accession Number:
edsbas.BA0B8038
Database:
BASE

*Further Information*

*As technology advances, developers continually create innovative solutions to enhance smartphone security. However, the rapid spread of Android malware poses significant threats to devices and sensitive data. The Android Operating System (OS)’s open-source nature and Software Development Kit (SDK) availability mainly contribute to this alarming growth. Conventional malware detection methods, such as signature-based, static, and dynamic analysis, face challenges in detecting obfuscated techniques, including encryption, packing, and compression, in malware. Although developers have created several visualization techniques for malware detection using deep learning (DL), they often fail to accurately identify the critical malicious features of malware. This research introduces MalVis, a unified visualization framework that integrates entropy and N-gram analysis to emphasize meaningful structural and anomalous operational patterns within the malware bytecode. By addressing significant limitations of existing visualization methods, such as insufficient feature representation, limited interpretability, small dataset sizes, and restricted data access, MalVis delivers enhanced detection capabilities, particularly for obfuscated and previously unseen (zero-day) malware. The framework leverages the MalVis dataset introduced in this work, a publicly available large-scale dataset comprising more than 1.3 million visual representations in nine malware classes and one benign class. A comprehensive comparative evaluation was performed against existing state-of-the-art visualization techniques using leading convolutional neural network (CNN) architectures, MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3. To further boost classification performance and mitigate overfitting, the outputs of these models were combined using eight distinct ensemble strategies. To address the issue of imbalanced class distribution in the multiclass dataset, we employed an undersampling technique to ensure balanced learning across all ...*