This project implements a Network Intrusion Detection System using unsupervised machine learning techniques like Isolation Forests and Autoencoders to detect anomalous patterns in network traffic. These anomalies could signal security breaches, malware, or system malfunctions.
To identify potential threats or unusual behavior in network traffic without requiring labeled data, leveraging the KDD Cup 1999 dataset for training and evaluation.
- Isolation Forest: Tree-based algorithm that isolates anomalies rather than profiling normal instances.
- Autoencoder (Keras): Deep neural network trained to reconstruct input; anomalies are identified based on high reconstruction error.
- KDD Cup 1999 Dataset (10% subset)
- Contains simulated network connections labeled as either normal or one of several attack types.
- Preprocessing of categorical and numerical network features
- Feature normalization using
StandardScaler - Dual anomaly detection pipelines:
IsolationForestwith contamination tuning- Deep
Autoencoderusing reconstruction error
- Export of labeled anomalies to CSV for analysis
anomaly_results_kdd.csvcontains:anomaly_isolation_forest:-1= anomaly,1= normalanomaly_autoencoder:1= anomaly,0= normal
- Real-time network traffic analysis
- Anomaly visualization using PCA/t-SNE
- Performance comparison with supervised classifiers (e.g., SVM, Random Forest)
- Containerization (Docker) for deployment
pip install pandas scikit-learn tensorflow keras