As more and more data is getting stored and evolved over time, the problem is generally how do you use that data and how to churn meaningful data out. Azure Databricks is a fast, easy and collaborative Apache Spark-based big data analytics service designed for data science and data engineering.
In this course you will learn about how to use the unstructured data and put it to a meaningful sight, we would kick start Azure Databricks and start by understanding how Azure Databricks was evolved why there was a need to get another big data solution. Further in this course, we would play around with smaller sets of datasets, we would mount the Azure blob storage to store and keep the processed data. You will also understand filestore, wherein we would be processing CSV data stored on a production scale. And finally, we would touch base on the security and how to enforce cluster policy for different teams, we would be creating cluster policies using the JSON template and apply them to the existing and new cluster. We would also integrate a GIT repo with Databricks to follow continuous integration and delivery and then eventually set up CICD of our spark application using Azure DevOps.
By the end of this course, you’ll master the deployment of Azure Databricks, you’ll be well equipped with the knowledge about how to read and transform data and also how to load the transformed data into a sink.