3/18/2021 0 Comments Airflow Context Variables
But we have yet to test it on a Linux nor Windows machine, since all of us here use MacOS.Six months ago (when the latest Airflow was 1.10.4), I was tasked to set up a local Airflow development environment.I did a bit of research and was not quite satisfied with the solutions out there.
Some were running Airflow on their local machine in a manner similar to the instructions in Airflows Quick Start. But even then, I could not find a setup that sufficiently meets the criteria below. We wanted a local development environment that is: Lightweight Closely mimics the production environment We run Airflow in our Kubernetes cluster, like the rest of our microservices Easy and fast to start up Easy to work with Promotes collaboration Eventually, we built our own complete setup with the IDE interacting seamlessly with our Docker environment, as well as some basic DAG validation tests. In this article, I am going to share a setup that has worked pretty well for my team here in Ninja Van. I am going to lay out the thoughts and processes that went into setting up the local development environment, too. I highly recommend you to look at those tools for better Developer Experience. Pre-requisites I shall assume that: You have a basic working knowledge of docker docker-compose. To install them, visit: - - You have a basic working knowledge of Airflow and its components: Airflow DB, Scheduler, Webserver, Connections, Variables, DAGs, etc. Now, let us get started First of, let us see how Airflow out-of-the-box setup fares against the criteria mentioned in the Introduction. Airflow out-of-the-box setup: good for playing around Based on the Quick Start guide, here is what we need to do to get started. A default airflow.cfg is generated and placed at AIRFLOWHOME A sqlite database airflow.db is initialised and placed at AIRFLOWHOME. Note that using the default sqlite database narrows down the choice of Airflow executor to only SequentialExecutor. The implication is that there will not be any concurrent task execution. This is different from our actual production environment where tasks are executed concurrently. Thus, if we stick to SequentialExecutor, we might run into problems that only emerge in the production environment. The Airflow CLI doesnt depend on the Webserver or Scheduler, but it depends on the DB backend. The out-of-the-box setup is good enough if you just want to have a taste of Airflow. But, we think it is not suitable as a development environment because: It does not closely resemble our production environment, given the SequentialExecutor constraint It is not easy or fast to set up with the multiple commands needed to run the whole suite of Airflow components. That would incur an additional step to start up a local RDBMS. It is because of those drawbacks that we decided to build our own development environment.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |