• HOME
  • Projects
    • Python
    • Data Analysis
    • Data Engineering
    • Data Science
  • Certifications
  • tools
  • Education
  • Achievements
  • More
    • HOME
    • Projects
      • Python
      • Data Analysis
      • Data Engineering
      • Data Science
    • Certifications
    • tools
    • Education
    • Achievements
  • HOME
  • Projects
    • Python
    • Data Analysis
    • Data Engineering
    • Data Science
  • Certifications
  • tools
  • Education
  • Achievements

Glacier Data ETL Pipeline Using Databricks

  • Extracted glacier data from the datahub.io website and stored it in DBFS.
  • Applied transformation operations to split the entire dataset based on their respective years.
  • Created two separate files to represent the transformed data and loaded them into DBFS.
  • Implemented a data pipeline to efficiently split the data by the century and saved them with file names indicating the beginning and end date of each century.
  • This pipeline contributes to a significant reduction in analysis time, as the dataset is organized and stored based on their corresponding years.

Visit project

Copyright © 2024 manpreetsinghsandhu.com - All Rights Reserved.

Powered by GoDaddy

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept