![]() For example, if this setting is set to 32, and there are two schedulers, then no more than 64 tasks can be in a running or queued state at once across all DAGs. Parallelism: The maximum number of tasks that can run concurrently on each scheduler within a single Airflow environment. The associated environment variables for all parameters in this section are formatted as AIRFLOW_CORE_PARAMETER_NAME. Core Settings Ĭore settings control the number of processes running concurrently and how long processes run across an entire Airflow environment. ![]() This is particularly relevant if you want your DAGs to run well on your support infrastructure. You should modify environment-level settings if you want to tune performance across all of the DAGs in your Airflow environment. For more information, see Environment Variables on Astronomer. If you're running Airflow on Astronomer, you should modify these parameters with Astronomer environment variables. For more information, see Setting Configuration Options in the Apache Airflow documentation. To check current values for an existing Airflow environment, go to Admin > Configurations in the Airflow UI. Generally, all default values can be found in the Airflow Configuration Reference. They all have default values that can be overridden by setting the appropriate environment variable or modifying your airflow.cfg file. Environment-level settings Įnvironment-level settings are those that impact your entire Airflow environment (all DAGs). Knowing the requirements of your use case before scaling Airflow will help you choose which parameters to modify. Airflow admins or DevOps engineers might tune scaling parameters at the environment level to ensure that their supporting infrastructure isn't overstressed, while DAG authors might tune scaling parameters at the DAG or task level to ensure that their pipelines don't overwhelm external systems. The reason Airflow allows so many adjustments is that, as an agnostic orchestrator, Airflow is used for a wide variety of use cases. Tuning these settings can impact DAG parsing and task scheduling performance, parallelism in your Airflow environment, and more. See Airflow executors explained.Īirflow has many parameters that impact its performance. To get the most out of this guide, you should have an understanding of: If you're using an earlier version of Airflow, some of the parameter names might be different. This guide references the parameters available in Airflow version 2.0 and later. you'll also learn how your choice of executor can impact scaling and how best to respond to common scaling issues. ![]() In this guide, you'll learn about the key parameters that you can use to modify Airflow performance. To make the most of Airflow, there are a few key settings that you should consider modifying as you scale up your data pipelines.Īirflow exposes a number of parameters that are closely related to DAG and task-level performance. One of the biggest strengths of Apache Airflow is its ability to scale to meet the changing demands of your organization.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |