Skip to content

This repository contains comprehensive implementations of advanced deep learning concepts and models as part of the Neural Networks and Deep Learning course assignments. Each assignment demonstrates practical applications of cutting-edge deep learning techniques with detailed mathematical formulations, architectural designs

License

Notifications You must be signed in to change notification settings

tahamajs/DeepLearning_Asssisments

Repository files navigation

Deep Learning Assignments Repository

This repository contains comprehensive implementations of advanced deep learning concepts and models as part of the Neural Networks and Deep Learning course assignments. Each assignment demonstrates practical applications of cutting-edge deep learning techniques with detailed mathematical formulations, architectural designs, and performance evaluations.

Repository Structure

The repository is organized into 7 main assignment folders with descriptive names, each containing:

  • code/ or subfolders with code implementations
  • description/: Assignment specifications
  • paper/: Research papers and references
  • report/: Detailed analysis and results
  • README.md: Comprehensive technical documentation

Additional Resources

  • NNDL_Slides/: Course slides covering theoretical foundations and advanced topics
  • python_files/: Standalone Python implementations of key assignments for easy execution (mirroring the notebook structure)
  • LICENSE: MIT License file
  • README.md: This main documentation file

Assignments Overview

CA1: Neural Networks Basics

Fundamental Neural Network Concepts

This assignment covers essential neural network principles including architecture, forward/backward propagation, activation functions, and optimization algorithms.

Contents:

  • Custom neural network implementation from scratch
  • Comparison with deep learning frameworks
  • Hyperparameter analysis and convergence studies

Key Results:

  • Demonstrated understanding of gradient descent and backpropagation
  • Analysis of activation functions and their impact on learning
  • Performance evaluation on benchmark datasets

CA2: CNN Applications

Covid_Detection

Medical Image Classification with Deep CNNs

This project implements a comprehensive COVID-19 detection system using chest X-ray images. The implementation explores multiple CNN architectures and transfer learning approaches to address the critical challenge of automated COVID-19 diagnosis.

Key Features:

  • Custom CNN Architecture: 6 convolutional blocks with batch normalization and dropout for robust feature extraction
  • Transfer Learning: Fine-tuning of VGG16 and MobileNetV2 pretrained on ImageNet
  • Data Augmentation: Extensive augmentation pipeline including rotation, flipping, scaling, and brightness adjustments
  • Medical Imaging Pipeline: Proper preprocessing for chest X-ray images with intensity normalization

Technical Details:

  • Architecture: Conv blocks (64→128→256→512 channels) + Global Average Pooling + Dense layers
  • Loss Function: Binary cross-entropy with class weights for imbalanced data
  • Optimization: Adam optimizer with learning rate scheduling
  • Regularization: Dropout (0.5) and L2 weight decay (1e-4)

Results & Analysis:

  • VGG16 Fine-tuned: 92.1% accuracy, 0.91 AUC-ROC
  • MobileNetV2: 89.3% accuracy, 0.88 AUC-ROC
  • Custom CNN: 87.6% accuracy, 0.86 AUC-ROC
  • Clinical Relevance: Demonstrates practical applicability in medical diagnosis

Vehicle_Classification

Multi-Class Vehicle Classification System

This assignment implements a robust vehicle classification system exploring both end-to-end CNN training and traditional machine learning approaches on CNN-extracted features.

Key Features:

  • Dual Approach: Pure CNN classification vs. CNN feature extraction + SVM
  • Architecture Comparison: Custom CNN vs. VGG16 vs. AlexNet backbones
  • Feature Engineering: Comprehensive feature extraction from multiple CNN layers
  • Ensemble Methods: Combining multiple classifiers for improved performance

Technical Details:

  • CNN Feature Extraction: Features from conv5 layer (512×7×7 → 25088 features)
  • SVM Classification: RBF kernel with grid search hyperparameter optimization
  • Data Pipeline: Vehicle dataset preprocessing with normalization and augmentation
  • Evaluation: 5-fold cross-validation with detailed per-class metrics

Results & Analysis:

  • VGG16 + SVM: 89.2% accuracy, superior generalization
  • AlexNet + SVM: 87.1% accuracy, faster inference
  • End-to-end CNN: 85.4% accuracy, single-model simplicity
  • Key Insight: Feature extraction approach provides better generalization than end-to-end training

CA3: Object Detection

Fast_SCNN

Real-Time Semantic Segmentation with Efficient CNNs

This project implements Fast-SCNN, a lightweight CNN architecture designed for real-time semantic segmentation on mobile and embedded devices.

Key Features:

  • Efficient Architecture: Depthwise separable convolutions for computational efficiency
  • Multi-Scale Processing: Pyramid pooling module for global context
  • Real-Time Performance: Optimized for mobile deployment
  • Urban Scene Understanding: Segmentation of roads, buildings, vehicles, pedestrians

Technical Details:

  • Learning to Downsample: Initial downsampling module with skip connections
  • Global Feature Extractor: Pyramid pooling with multiple kernel sizes (1×1, 2×2, 3×3, 6×6)
  • Feature Fusion: Concatenation of multi-scale features with channel attention
  • Loss Function: Cross-entropy with class balancing for imbalanced segmentation

Results & Analysis:

  • IoU Score: 0.62 average across all classes
  • Model Size: 1.2M parameters (vs. 50M+ for standard segmentation models)
  • Inference Speed: 30+ FPS on mobile GPUs
  • Memory Efficiency: 50MB model size suitable for edge deployment

Oriented_RCNN

Arbitrary-Oriented Object Detection

This assignment implements Oriented R-CNN for detecting objects with arbitrary orientations, crucial for applications like aerial imagery analysis and document layout detection.

Key Features:

  • Oriented Anchors: 5-parameter anchor representation (x, y, w, h, θ)
  • Rotated ROI Align: Rotation-aware feature extraction
  • Geometric Transformations: Proper handling of oriented bounding boxes
  • IoU Computation: Specialized intersection-over-union for rotated rectangles

Technical Details:

  • Region Proposal Network (RPN): Oriented anchor generation and classification
  • Rotated RoI Align: Bilinear sampling with rotation compensation
  • Bounding Box Regression: 5-parameter regression (dx, dy, dw, dh, dθ)
  • Orientation Encoding: Angle representation and normalization

Results & Analysis:

  • Detection Accuracy: Superior performance on oriented objects vs. axis-aligned methods
  • Geometric Precision: Accurate localization of rotated objects
  • Robustness: Handles various orientations and aspect ratios
  • Application: Ship detection in satellite imagery, text detection in documents

CA4: Sequence Modeling

Image_Captioning

Attention-Based Image Captioning

This project implements an encoder-decoder architecture with attention mechanisms for generating natural language descriptions from images.

Key Features:

  • Visual Encoder: ResNet-based CNN for image feature extraction
  • Attention Decoder: LSTM/GRU with Bahdanau attention
  • Sequence Generation: Autoregressive text generation with beam search
  • Multimodal Alignment: Attention visualization for interpretability

Technical Details:

  • Encoder: ResNet-50 → Adaptive pooling → 2048-dim features
  • Attention Mechanism: Bahdanau attention with MLP scoring
  • Decoder: 512-dim LSTM with attention context concatenation
  • Training: Teacher forcing with scheduled sampling

Results & Analysis:

  • BLEU-1 Score: 0.72 (unigram overlap)
  • BLEU-4 Score: 0.18 (4-gram overlap)
  • Attention Maps: Clear focus on relevant image regions
  • Semantic Quality: Generated captions capture main objects and actions

Time_Series_Prediction

Uncertainty-Aware Time Series Forecasting

This assignment implements RNN-based models for time series prediction with uncertainty quantification using Monte Carlo dropout.

Key Features:

  • Bidirectional RNNs: LSTM and GRU variants for sequence modeling
  • Uncertainty Estimation: Monte Carlo dropout for prediction confidence
  • Temporal Dependencies: Capturing long-range patterns in sequential data
  • Robust Forecasting: Handling noisy and irregular time series

Technical Details:

  • Architecture: Bidirectional LSTM/GRU with multiple layers
  • Uncertainty Quantification: MC Dropout with 50 forward passes
  • Loss Function: Maximum likelihood estimation with Gaussian likelihood
  • Regularization: Dropout, recurrent dropout, and L2 regularization

Results & Analysis:

  • R² Score: 0.85 on test data
  • Uncertainty Calibration: Well-calibrated prediction intervals
  • Robustness: Handles missing data and outliers effectively
  • Interpretability: Attention weights show temporal focus regions

CA5: Vision Transformers

VIT_Classification

Vision Transformer for Image Classification

This project implements Vision Transformer (ViT) from scratch and compares its performance with traditional CNNs on image classification tasks.

Key Features:

  • Patch Embedding: Image divided into fixed-size patches (16×16)
  • Self-Attention: Multi-head attention for global context modeling
  • Position Encoding: Learnable positional embeddings
  • Class Token: Special token for classification

Technical Details:

  • Patch Size: 16×16 pixels → 768-dim embeddings
  • Transformer Blocks: 12 layers, 12 attention heads, 768-dim model
  • Pre-training: Optional initialization with ImageNet-pretrained weights
  • Fine-tuning: End-to-end training on target datasets

Results & Analysis:

  • Accuracy: 88.2% on CIFAR-10 (comparable to ResNet-50)
  • Computational Cost: Higher training cost but better scaling
  • Attention Patterns: Global receptive field captures long-range dependencies
  • Data Efficiency: Benefits from larger datasets more than CNNs

CLIP_Adversarial_Attack

Adversarial Attacks on Multimodal Models

This assignment explores adversarial vulnerabilities in CLIP (Contrastive Language-Image Pretraining) and implements various defense mechanisms.

Key Features:

  • Multimodal Attacks: Perturbing images while preserving semantic meaning
  • Defense Strategies: LoRA fine-tuning, TeCoA loss, Visual Prompt Tuning
  • Robust Evaluation: Comprehensive clean vs. adversarial performance analysis
  • Parameter Efficiency: Low-rank adaptation for practical deployment

Technical Details:

  • CLIP Architecture: Vision Transformer + Text Transformer
  • Attack Methods: FGSM, PGD with ε-constraints
  • Defense Techniques: Test-time classifier alignment, prompt tuning
  • Evaluation: Robustness metrics across multiple attack strengths

Results & Analysis:

  • Clean Accuracy: 65.2% zero-shot performance
  • Adversarial Drop: 20.1% accuracy loss under attack
  • Defense Improvement: TeCoA achieves 62.1% robust accuracy
  • Parameter Efficiency: LoRA uses only 0.8M trainable parameters

CA6: Generative Models

Unsupervised_Domain_Adaptation_GAN

GAN-Based Unsupervised Domain Adaptation

This project implements CycleGAN for domain adaptation, enabling models trained on one domain to perform well on related but different domains.

Key Features:

  • Cycle Consistency: Bidirectional mapping between domains
  • Domain Confusion: Adversarial alignment of feature distributions
  • Unsupervised Learning: No target domain labels required
  • Style Transfer: Realistic transformation of visual appearance

Technical Details:

  • Generator Networks: U-Net style with residual blocks
  • Discriminator Networks: Patch-based discrimination
  • Loss Components: Adversarial loss + cycle consistency + identity loss
  • Training Strategy: Alternating optimization with careful loss balancing

Results & Analysis:

  • Target Accuracy: 87.6% on MNIST-M (vs. 75.6% without adaptation)
  • Domain Gap Reduction: 58% improvement over source-only performance
  • Generated Quality: FID score of 38.7 indicates realistic samples
  • Feature Alignment: t-SNE visualization shows domain-invariant representations

VAE

Variational Autoencoder for Anomaly Detection

This assignment implements VAE for generative modeling and demonstrates its application in unsupervised anomaly detection for medical imaging.

Key Features:

  • Probabilistic Encoding: Amortized variational inference
  • Reparameterization Trick: Enables gradient-based optimization
  • Anomaly Scoring: Reconstruction error as anomaly indicator
  • Medical Application: Polyp detection in gastrointestinal endoscopy

Technical Details:

  • Encoder: CNN-based recognition network (μ, log σ²)
  • Decoder: Transpose CNN for image reconstruction
  • ELBO Loss: Reconstruction + KL divergence regularization
  • β-VAE Variant: Tunable regularization strength

Results & Analysis:

  • Reconstruction Quality: PSNR 28.5dB, SSIM 0.89 on normal images
  • Anomaly Detection: AUC 0.90, superior to reconstruction-based methods
  • Latent Space: Well-structured manifold for interpolation
  • Medical Utility: Reliable polyp detection with low false positive rate

CA7: Advanced Topics

CNN_VIT_Adversarial_Attack

Comparative Adversarial Analysis: CNNs vs. ViTs

This extra assignment provides a comprehensive comparison of adversarial vulnerabilities between convolutional and transformer-based vision models.

Key Features:

  • Architecture Comparison: ResNet-50 vs. ViT-Base side-by-side analysis
  • Attack Suite: FGSM, PGD, CW attacks with multiple strengths
  • Defense Evaluation: Adversarial training and input preprocessing
  • Robustness Metrics: Detailed analysis of clean vs. robust performance

Technical Details:

  • CNN Model: ResNet-50 with 25M parameters
  • ViT Model: 12-layer transformer with 86M parameters
  • Attack Implementation: Torchattacks library with custom modifications
  • Defense Methods: Adversarial training with PGD-based augmentation

Results & Analysis:

  • Clean Performance: ViT 84.7% vs. ResNet 76.2% accuracy
  • Adversarial Robustness: ViT 57.4% vs. ResNet 52.1% under strong attacks
  • Attack Transferability: High transfer rate between architectures
  • Computational Trade-offs: ViT requires more compute but offers better robustness

Image_Captioning

Multilingual Image Captioning in Persian

This advanced project extends image captioning to Persian language, addressing the challenges of right-to-left script and low-resource language processing.

Key Features:

  • Persian NLP Pipeline: Hazm library for tokenization and normalization
  • Multilingual Attention: Multi-head attention for cross-modal alignment
  • RTL Text Handling: Proper bidirectional text processing
  • Cultural Adaptation: Persian-specific caption generation

Technical Details:

  • Text Processing: Persian normalization, word tokenization, vocabulary building
  • Model Architecture: Transformer-based encoder-decoder with Persian embeddings
  • Beam Search: Multilingual beam search with Persian language model
  • Evaluation: BLEU scores adapted for Persian morphological complexity

Results & Analysis:

  • BLEU-4 Score: 0.195 (competitive for low-resource language)
  • Persian Fluency: Natural Persian sentence generation
  • Cultural Relevance: Captions reflect Persian linguistic and cultural context
  • Multilingual Capability: Framework extensible to other RTL languages

Key Technologies and Frameworks

  • Deep Learning Frameworks: PyTorch, TensorFlow/Keras
  • Computer Vision: OpenCV, PIL, torchvision
  • Natural Language Processing: Hazm (Persian), NLTK
  • Data Science: NumPy, Pandas, Scikit-learn
  • Visualization: Matplotlib, Seaborn
  • Experiment Tracking: Weights & Biases, TensorBoard

Core Concepts Demonstrated

Neural Network Architectures

  • Convolutional Networks: CNNs, ResNets, EfficientNets
  • Recurrent Networks: LSTMs, GRUs, attention mechanisms
  • Transformers: Self-attention, multi-head attention, position encoding
  • Generative Models: GANs, VAEs, flow-based models

Learning Paradigms

  • Supervised Learning: Classification, regression, object detection
  • Unsupervised Learning: Autoencoders, generative modeling
  • Self-Supervised Learning: Contrastive learning (CLIP)
  • Adversarial Learning: Attacks, defenses, robust training

Advanced Techniques

  • Transfer Learning: Pretrained models, fine-tuning
  • Regularization: Dropout, batch normalization, weight decay
  • Optimization: Adam, SGD, learning rate scheduling
  • Data Augmentation: Geometric transforms, color jittering
  • Ensemble Methods: Model averaging, bagging

Evaluation Metrics

  • Classification: Accuracy, precision, recall, F1-score, AUC-ROC
  • Detection/Segmentation: IoU, mAP, precision-recall curves
  • Generation: BLEU, ROUGE, METEOR, FID, IS
  • Time Series: MAE, RMSE, R², uncertainty metrics

Getting Started

  1. Prerequisites: Python 3.8+, PyTorch 1.9+, CUDA-compatible GPU
  2. Installation: pip install -r requirements.txt (if available)
  3. Navigation: Each assignment folder is self-contained with descriptive names:
    • CA1_Neural_Networks_Basics/
    • CA2_CNN_Applications/
    • CA3_Object_Detection/
    • CA4_Sequence_Modeling/
    • CA5_Vision_Transformers/
    • CA6_Generative_Models/
    • CA7_Advanced_Topics/
  4. Execution:
    • Run Jupyter notebooks in code/ directories or subfolders
    • Alternatively, execute Python scripts in python_files/ for standalone implementations
  5. Documentation: Refer to individual README.md files for detailed guides

Educational Value

This repository serves as a comprehensive resource for:

  • Students: Practical implementations of deep learning concepts
  • Researchers: Benchmarking and extending state-of-the-art methods
  • Practitioners: Production-ready code for real-world applications
  • Educators: Teaching materials with detailed explanations

Each implementation includes mathematical derivations, architectural decisions, hyperparameter tuning, and performance analysis, providing a complete learning experience from theory to practice.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this repository for academic purposes, please cite the relevant assignments and provide appropriate attribution to the original authors and datasets used.


Course: Neural Networks and Deep Learning Institution: University of Tehran Date: September 2025

About

This repository contains comprehensive implementations of advanced deep learning concepts and models as part of the Neural Networks and Deep Learning course assignments. Each assignment demonstrates practical applications of cutting-edge deep learning techniques with detailed mathematical formulations, architectural designs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published