This repository contains comprehensive implementations of advanced deep learning concepts and models as part of the Neural Networks and Deep Learning course assignments. Each assignment demonstrates practical applications of cutting-edge deep learning techniques with detailed mathematical formulations, architectural designs, and performance evaluations.
The repository is organized into 7 main assignment folders with descriptive names, each containing:
code/
or subfolders with code implementationsdescription/
: Assignment specificationspaper/
: Research papers and referencesreport/
: Detailed analysis and resultsREADME.md
: Comprehensive technical documentation
NNDL_Slides/
: Course slides covering theoretical foundations and advanced topicspython_files/
: Standalone Python implementations of key assignments for easy execution (mirroring the notebook structure)LICENSE
: MIT License fileREADME.md
: This main documentation file
Fundamental Neural Network Concepts
This assignment covers essential neural network principles including architecture, forward/backward propagation, activation functions, and optimization algorithms.
Contents:
- Custom neural network implementation from scratch
- Comparison with deep learning frameworks
- Hyperparameter analysis and convergence studies
Key Results:
- Demonstrated understanding of gradient descent and backpropagation
- Analysis of activation functions and their impact on learning
- Performance evaluation on benchmark datasets
Medical Image Classification with Deep CNNs
This project implements a comprehensive COVID-19 detection system using chest X-ray images. The implementation explores multiple CNN architectures and transfer learning approaches to address the critical challenge of automated COVID-19 diagnosis.
Key Features:
- Custom CNN Architecture: 6 convolutional blocks with batch normalization and dropout for robust feature extraction
- Transfer Learning: Fine-tuning of VGG16 and MobileNetV2 pretrained on ImageNet
- Data Augmentation: Extensive augmentation pipeline including rotation, flipping, scaling, and brightness adjustments
- Medical Imaging Pipeline: Proper preprocessing for chest X-ray images with intensity normalization
Technical Details:
- Architecture: Conv blocks (64→128→256→512 channels) + Global Average Pooling + Dense layers
- Loss Function: Binary cross-entropy with class weights for imbalanced data
- Optimization: Adam optimizer with learning rate scheduling
- Regularization: Dropout (0.5) and L2 weight decay (1e-4)
Results & Analysis:
- VGG16 Fine-tuned: 92.1% accuracy, 0.91 AUC-ROC
- MobileNetV2: 89.3% accuracy, 0.88 AUC-ROC
- Custom CNN: 87.6% accuracy, 0.86 AUC-ROC
- Clinical Relevance: Demonstrates practical applicability in medical diagnosis
Multi-Class Vehicle Classification System
This assignment implements a robust vehicle classification system exploring both end-to-end CNN training and traditional machine learning approaches on CNN-extracted features.
Key Features:
- Dual Approach: Pure CNN classification vs. CNN feature extraction + SVM
- Architecture Comparison: Custom CNN vs. VGG16 vs. AlexNet backbones
- Feature Engineering: Comprehensive feature extraction from multiple CNN layers
- Ensemble Methods: Combining multiple classifiers for improved performance
Technical Details:
- CNN Feature Extraction: Features from conv5 layer (512×7×7 → 25088 features)
- SVM Classification: RBF kernel with grid search hyperparameter optimization
- Data Pipeline: Vehicle dataset preprocessing with normalization and augmentation
- Evaluation: 5-fold cross-validation with detailed per-class metrics
Results & Analysis:
- VGG16 + SVM: 89.2% accuracy, superior generalization
- AlexNet + SVM: 87.1% accuracy, faster inference
- End-to-end CNN: 85.4% accuracy, single-model simplicity
- Key Insight: Feature extraction approach provides better generalization than end-to-end training
Real-Time Semantic Segmentation with Efficient CNNs
This project implements Fast-SCNN, a lightweight CNN architecture designed for real-time semantic segmentation on mobile and embedded devices.
Key Features:
- Efficient Architecture: Depthwise separable convolutions for computational efficiency
- Multi-Scale Processing: Pyramid pooling module for global context
- Real-Time Performance: Optimized for mobile deployment
- Urban Scene Understanding: Segmentation of roads, buildings, vehicles, pedestrians
Technical Details:
- Learning to Downsample: Initial downsampling module with skip connections
- Global Feature Extractor: Pyramid pooling with multiple kernel sizes (1×1, 2×2, 3×3, 6×6)
- Feature Fusion: Concatenation of multi-scale features with channel attention
- Loss Function: Cross-entropy with class balancing for imbalanced segmentation
Results & Analysis:
- IoU Score: 0.62 average across all classes
- Model Size: 1.2M parameters (vs. 50M+ for standard segmentation models)
- Inference Speed: 30+ FPS on mobile GPUs
- Memory Efficiency: 50MB model size suitable for edge deployment
Arbitrary-Oriented Object Detection
This assignment implements Oriented R-CNN for detecting objects with arbitrary orientations, crucial for applications like aerial imagery analysis and document layout detection.
Key Features:
- Oriented Anchors: 5-parameter anchor representation (x, y, w, h, θ)
- Rotated ROI Align: Rotation-aware feature extraction
- Geometric Transformations: Proper handling of oriented bounding boxes
- IoU Computation: Specialized intersection-over-union for rotated rectangles
Technical Details:
- Region Proposal Network (RPN): Oriented anchor generation and classification
- Rotated RoI Align: Bilinear sampling with rotation compensation
- Bounding Box Regression: 5-parameter regression (dx, dy, dw, dh, dθ)
- Orientation Encoding: Angle representation and normalization
Results & Analysis:
- Detection Accuracy: Superior performance on oriented objects vs. axis-aligned methods
- Geometric Precision: Accurate localization of rotated objects
- Robustness: Handles various orientations and aspect ratios
- Application: Ship detection in satellite imagery, text detection in documents
Attention-Based Image Captioning
This project implements an encoder-decoder architecture with attention mechanisms for generating natural language descriptions from images.
Key Features:
- Visual Encoder: ResNet-based CNN for image feature extraction
- Attention Decoder: LSTM/GRU with Bahdanau attention
- Sequence Generation: Autoregressive text generation with beam search
- Multimodal Alignment: Attention visualization for interpretability
Technical Details:
- Encoder: ResNet-50 → Adaptive pooling → 2048-dim features
- Attention Mechanism: Bahdanau attention with MLP scoring
- Decoder: 512-dim LSTM with attention context concatenation
- Training: Teacher forcing with scheduled sampling
Results & Analysis:
- BLEU-1 Score: 0.72 (unigram overlap)
- BLEU-4 Score: 0.18 (4-gram overlap)
- Attention Maps: Clear focus on relevant image regions
- Semantic Quality: Generated captions capture main objects and actions
Uncertainty-Aware Time Series Forecasting
This assignment implements RNN-based models for time series prediction with uncertainty quantification using Monte Carlo dropout.
Key Features:
- Bidirectional RNNs: LSTM and GRU variants for sequence modeling
- Uncertainty Estimation: Monte Carlo dropout for prediction confidence
- Temporal Dependencies: Capturing long-range patterns in sequential data
- Robust Forecasting: Handling noisy and irregular time series
Technical Details:
- Architecture: Bidirectional LSTM/GRU with multiple layers
- Uncertainty Quantification: MC Dropout with 50 forward passes
- Loss Function: Maximum likelihood estimation with Gaussian likelihood
- Regularization: Dropout, recurrent dropout, and L2 regularization
Results & Analysis:
- R² Score: 0.85 on test data
- Uncertainty Calibration: Well-calibrated prediction intervals
- Robustness: Handles missing data and outliers effectively
- Interpretability: Attention weights show temporal focus regions
Vision Transformer for Image Classification
This project implements Vision Transformer (ViT) from scratch and compares its performance with traditional CNNs on image classification tasks.
Key Features:
- Patch Embedding: Image divided into fixed-size patches (16×16)
- Self-Attention: Multi-head attention for global context modeling
- Position Encoding: Learnable positional embeddings
- Class Token: Special token for classification
Technical Details:
- Patch Size: 16×16 pixels → 768-dim embeddings
- Transformer Blocks: 12 layers, 12 attention heads, 768-dim model
- Pre-training: Optional initialization with ImageNet-pretrained weights
- Fine-tuning: End-to-end training on target datasets
Results & Analysis:
- Accuracy: 88.2% on CIFAR-10 (comparable to ResNet-50)
- Computational Cost: Higher training cost but better scaling
- Attention Patterns: Global receptive field captures long-range dependencies
- Data Efficiency: Benefits from larger datasets more than CNNs
Adversarial Attacks on Multimodal Models
This assignment explores adversarial vulnerabilities in CLIP (Contrastive Language-Image Pretraining) and implements various defense mechanisms.
Key Features:
- Multimodal Attacks: Perturbing images while preserving semantic meaning
- Defense Strategies: LoRA fine-tuning, TeCoA loss, Visual Prompt Tuning
- Robust Evaluation: Comprehensive clean vs. adversarial performance analysis
- Parameter Efficiency: Low-rank adaptation for practical deployment
Technical Details:
- CLIP Architecture: Vision Transformer + Text Transformer
- Attack Methods: FGSM, PGD with ε-constraints
- Defense Techniques: Test-time classifier alignment, prompt tuning
- Evaluation: Robustness metrics across multiple attack strengths
Results & Analysis:
- Clean Accuracy: 65.2% zero-shot performance
- Adversarial Drop: 20.1% accuracy loss under attack
- Defense Improvement: TeCoA achieves 62.1% robust accuracy
- Parameter Efficiency: LoRA uses only 0.8M trainable parameters
GAN-Based Unsupervised Domain Adaptation
This project implements CycleGAN for domain adaptation, enabling models trained on one domain to perform well on related but different domains.
Key Features:
- Cycle Consistency: Bidirectional mapping between domains
- Domain Confusion: Adversarial alignment of feature distributions
- Unsupervised Learning: No target domain labels required
- Style Transfer: Realistic transformation of visual appearance
Technical Details:
- Generator Networks: U-Net style with residual blocks
- Discriminator Networks: Patch-based discrimination
- Loss Components: Adversarial loss + cycle consistency + identity loss
- Training Strategy: Alternating optimization with careful loss balancing
Results & Analysis:
- Target Accuracy: 87.6% on MNIST-M (vs. 75.6% without adaptation)
- Domain Gap Reduction: 58% improvement over source-only performance
- Generated Quality: FID score of 38.7 indicates realistic samples
- Feature Alignment: t-SNE visualization shows domain-invariant representations
Variational Autoencoder for Anomaly Detection
This assignment implements VAE for generative modeling and demonstrates its application in unsupervised anomaly detection for medical imaging.
Key Features:
- Probabilistic Encoding: Amortized variational inference
- Reparameterization Trick: Enables gradient-based optimization
- Anomaly Scoring: Reconstruction error as anomaly indicator
- Medical Application: Polyp detection in gastrointestinal endoscopy
Technical Details:
- Encoder: CNN-based recognition network (μ, log σ²)
- Decoder: Transpose CNN for image reconstruction
- ELBO Loss: Reconstruction + KL divergence regularization
- β-VAE Variant: Tunable regularization strength
Results & Analysis:
- Reconstruction Quality: PSNR 28.5dB, SSIM 0.89 on normal images
- Anomaly Detection: AUC 0.90, superior to reconstruction-based methods
- Latent Space: Well-structured manifold for interpolation
- Medical Utility: Reliable polyp detection with low false positive rate
Comparative Adversarial Analysis: CNNs vs. ViTs
This extra assignment provides a comprehensive comparison of adversarial vulnerabilities between convolutional and transformer-based vision models.
Key Features:
- Architecture Comparison: ResNet-50 vs. ViT-Base side-by-side analysis
- Attack Suite: FGSM, PGD, CW attacks with multiple strengths
- Defense Evaluation: Adversarial training and input preprocessing
- Robustness Metrics: Detailed analysis of clean vs. robust performance
Technical Details:
- CNN Model: ResNet-50 with 25M parameters
- ViT Model: 12-layer transformer with 86M parameters
- Attack Implementation: Torchattacks library with custom modifications
- Defense Methods: Adversarial training with PGD-based augmentation
Results & Analysis:
- Clean Performance: ViT 84.7% vs. ResNet 76.2% accuracy
- Adversarial Robustness: ViT 57.4% vs. ResNet 52.1% under strong attacks
- Attack Transferability: High transfer rate between architectures
- Computational Trade-offs: ViT requires more compute but offers better robustness
Multilingual Image Captioning in Persian
This advanced project extends image captioning to Persian language, addressing the challenges of right-to-left script and low-resource language processing.
Key Features:
- Persian NLP Pipeline: Hazm library for tokenization and normalization
- Multilingual Attention: Multi-head attention for cross-modal alignment
- RTL Text Handling: Proper bidirectional text processing
- Cultural Adaptation: Persian-specific caption generation
Technical Details:
- Text Processing: Persian normalization, word tokenization, vocabulary building
- Model Architecture: Transformer-based encoder-decoder with Persian embeddings
- Beam Search: Multilingual beam search with Persian language model
- Evaluation: BLEU scores adapted for Persian morphological complexity
Results & Analysis:
- BLEU-4 Score: 0.195 (competitive for low-resource language)
- Persian Fluency: Natural Persian sentence generation
- Cultural Relevance: Captions reflect Persian linguistic and cultural context
- Multilingual Capability: Framework extensible to other RTL languages
- Deep Learning Frameworks: PyTorch, TensorFlow/Keras
- Computer Vision: OpenCV, PIL, torchvision
- Natural Language Processing: Hazm (Persian), NLTK
- Data Science: NumPy, Pandas, Scikit-learn
- Visualization: Matplotlib, Seaborn
- Experiment Tracking: Weights & Biases, TensorBoard
- Convolutional Networks: CNNs, ResNets, EfficientNets
- Recurrent Networks: LSTMs, GRUs, attention mechanisms
- Transformers: Self-attention, multi-head attention, position encoding
- Generative Models: GANs, VAEs, flow-based models
- Supervised Learning: Classification, regression, object detection
- Unsupervised Learning: Autoencoders, generative modeling
- Self-Supervised Learning: Contrastive learning (CLIP)
- Adversarial Learning: Attacks, defenses, robust training
- Transfer Learning: Pretrained models, fine-tuning
- Regularization: Dropout, batch normalization, weight decay
- Optimization: Adam, SGD, learning rate scheduling
- Data Augmentation: Geometric transforms, color jittering
- Ensemble Methods: Model averaging, bagging
- Classification: Accuracy, precision, recall, F1-score, AUC-ROC
- Detection/Segmentation: IoU, mAP, precision-recall curves
- Generation: BLEU, ROUGE, METEOR, FID, IS
- Time Series: MAE, RMSE, R², uncertainty metrics
- Prerequisites: Python 3.8+, PyTorch 1.9+, CUDA-compatible GPU
- Installation:
pip install -r requirements.txt
(if available) - Navigation: Each assignment folder is self-contained with descriptive names:
CA1_Neural_Networks_Basics/
CA2_CNN_Applications/
CA3_Object_Detection/
CA4_Sequence_Modeling/
CA5_Vision_Transformers/
CA6_Generative_Models/
CA7_Advanced_Topics/
- Execution:
- Run Jupyter notebooks in
code/
directories or subfolders - Alternatively, execute Python scripts in
python_files/
for standalone implementations
- Run Jupyter notebooks in
- Documentation: Refer to individual README.md files for detailed guides
This repository serves as a comprehensive resource for:
- Students: Practical implementations of deep learning concepts
- Researchers: Benchmarking and extending state-of-the-art methods
- Practitioners: Production-ready code for real-world applications
- Educators: Teaching materials with detailed explanations
Each implementation includes mathematical derivations, architectural decisions, hyperparameter tuning, and performance analysis, providing a complete learning experience from theory to practice.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this repository for academic purposes, please cite the relevant assignments and provide appropriate attribution to the original authors and datasets used.
Course: Neural Networks and Deep Learning Institution: University of Tehran Date: September 2025