Introduction This document outlines a disaggregation methodology for converting regional commodity flows into counts of trucks traveling between arbitrary points. The results of this methodology may be used in a simulation-based transport model or aggregated for use in a more traditional travel demand model.
Purpose and Need Freight flows are an important consideration in statewide and large urban travel demand models, though data to create them can be difficult to obtain. The primary dataset for freight flows in the United States is the Freight Analysis Framework (FAF) developed by Oak Ridge National Laboratory on behalf of the Federal Highway Administration. The FAF is derived from the Commodity Flow Survey and contains modeled freight flows in dollars and tons for 43 commodities between approximately 130 regions. The principal challenge in using the FAF for transportation planning at any meaningful scale is that these regions are too large to adequately assign flows to highway links and thereby study policy or infrastructure alternatives. Practitioners and academics have developed many different methods for disaggregating these region-to-region flows into county-to-county or region-to-region flows. All methods have strengths and weaknesses; this new method is designed with a few specific goals in mind: • Simulation-ready: Outputs of the framework are discrete, individual plans that can be simulated with a transport engine such as MATSim. • Uniform: A single methodology that can be applied with minimal modification to study freight flows in any state or multi-state region. • Simplicity: Design the framework around a few basic steps. If a recommended improvement to the framework adds substantial complexity or requires substantial effort but produces only marginal benefit, it should be avoided. • Transparency: The disaggregation code is written clearly in free software (R and Python), and is available in a public GitHub repository. Datasets that the simulation uses are clearly referenced or linked to, and are free to the general public. Constructive comment and contributions are sought and welcome from all.1 The design paradigm for this new disaggregation methodology involves first discretizing the regional dollar and ton commodity flows into an integer number of trucks carrying the goods, and then selecting an origin and destination for them to travel between. The remainder of this document discusses these steps in turn. 1 Ideally,
with a pull request on GitHub.
Limitations Adopting a simulation framework provides many advantages in terms of writing clear and intuitive computer code, incorporating activity-based decision structures, and preparing outputs for advanced assignment methodologies. Nevertheless, This path required some design decisions that are based in judgement but not on data, and may need to be revisited in the future. Flows less than a quarter of a truck load are discarded, and all other flows are rounded up to a whole number. It would be equally justified to round all flows down to a whole number, or consolidate multiple small flows into a single truck configuration. We expect the consequences of this assumption are minimal, though we have not performed exhaustive sensitivity tests. Empty trucks travel between FAF regions based on the FAF traffic analysis methodology; more advanced methodologies such has matrix balancing are not used in an effort to be simple and transparent, though they may be more theoretically justified. Most travel demand models rely on the concept of an “average day”, where infrequent origin-destination flows may be represented by decimal values less than one. For example, imagine a zone pair with 100 trucks traveling between them each year. In an aggregate travel demand model, this would be represented as 100/365.25 = 0.2737851 trucks per day. In the simulation by contrast, this is represented as a random probability where each of the 100 trucks would have a 1/365.25 = 0.0027379 probability of running on a given day; in some simulated days none of these trucks would be modeled, but in some days one or more trucks could be. Figure ?? shows the frequency of different truck values at these probabilities in 10,000 repeated simulations. A consequence of this is the possibility of outlying simulations, and the need to test outputs for sensitivity to random variation. In principal major highway link volumes should be robust to such variation, but smaller projects could be highly sensitive.
Discretization The documentation for FAF version 3 included a methodology to convert freight flows to a number of trucks.2 There are three types of factors included with this methodology: 1. Allocation factors that determine the truck configuration (single-unit, combination-semi, etc.) based on the distance traveled (Table 3.3). 2 FAF3 Freight Traffic Analysis, http://faf.ornl.gov/fafweb/Data/Freight_Traffic_Analysis/ faf_fta.pdf
Figure 1: 2. Empty truck factors that determine the percent of empty trucks traveling between two FAF regions based on the configuration and the whether the shipment is domestic or a land-border shipment (Table 3.5).3 3. Payload equivalency factors that determine how much of each commodity a truck of a given configuration and vehicle type can carry (Appendix A). This method is based on 2007 commodity flow survey data processed by Oak Ridge National Laboratory; when an updated methodology based on the 2012 commodity flow survey becomes available the framework will be updated. Applying these factors to the FAF data results in a decimal number of trucks of each configuration – empty and full – traveling between FAF regions in a calendar year. The simulation framework used in the selection step requires a discrete number of trucks; as a consequence, we filter out flows less than one quarter of a truck load and round up to the nearest whole number. 3 NAFTA allows trucks to carry goods across borders, but not necessarily return full; hence, international shipments have a greater percentage of empty trucks.
Selection The heart of the disaggregation framework is a method by which trucks “select” their origins or destinations based on a size term. The (python) code for a truck to select its origin or destination county is: def pick_county(dict_table, sctg, zone): """ :param sctg: the commodity code for the truck's cargo :param dict_table: the appropriate lookup table :return: the O or D county FIPS code """ try: a = dict_table[zone][sctg] except KeyError: print "Key not found for zone ", zone, " and commodity ", sctg sys.exit() else: county = np.random.choice( dict_table[zone][sctg].keys(), p=dict_table[zone][sctg].values() ) return county The dict_table is a dictionary object that returns the selection probability of each county in a zone based on the commodity the truck is carrying. The random.choice function from the numpy package selects a random county based on the probabilities in the table. The rest of this section provides the provenance of all the size terms used by the framework.
Production-side The production-side size term is intended to represent the probability of a truck carrying a commodity selecting an origin county from within the given origin FAF region based on the workers in each county producing that commodity. This probability a function of two ratios: 1. the production of a commodity made by each industry to the total production of the in all industries (this is known as the make coefficient), and 2. the proportion of the producing industries’ employment located in a county relative to all employees of the relevant industries in the given FAF region. The primary assumption used in this calculation is that employees in an industry with multiple products are equally efficient at producing a dollar unit of each of 4
product. This assumption is directly related to an efficient markets hypothesis,4 but will not likely hold in all real-world situations. The commodity flow survey Public Use Micro data Sample (PUMS) file contains cleaned and anonymized responses to the 2012 CFS. The data include the industry5 of the firm making the shipment, the commodity shipped, and the shipment’s monetary value. From this, an analyst can calculate the inferred probability of workers in a particular industry shipping a dollar’s value of a commodity, or the first ratio in the size term. The values of these make coefficients by industry are represented in the figure below. The second ratio in the size term comes from the 2012 county business patterns file. This data contains the count of employees in each industry in each county. These calculations are straightforward, with the understanding that missing values are given the median of the stated range. For instance, if the CBP data is missing a count in a county for an industry and the missing code indicates between zero and 19 employees exist, the program treats this as if there were ten employees.
Figure 2: 4 If employees were more efficient at producing one good, the firm would maximize profits by producing only that good, or the price of the other goods would rise to match labor costs with demand. 5 Given in either 2- or 4- digit NAICS codes depending on the industry.
Attraction-side The CFS is a survey of shippers, and contains no data on the shipments received. Calculating the size term on the attraction side is therefore slightly more difficult. But as before, the size term is a function of two ratios: 1. The proportion of a commodity used by a specific industry relative to all industries using that commodity, and 2. The proportion of industry employment in a specific county relative to the total industry employment in the FAF region. The bureau of economic analysis produces input-output tables for industry-toindustry exchanges at a national level. The program reads one of these files, “Use of Commodities by Industries, Before Redefinitions (Producers’ Prices)”.6 This table contains the total dollar value of all goods and services that industry B buys from industry A. We assume that industry B buys commodities from industry A in proportion to industry A’s production of each commodity, a ratio we can calculate from the make coefficients above.7 The resulting ratios are known as use coefficients, and are shown graphically in the figure below.
Figure 3: 6 The 7 This
code reads the relevant table for 2012 directly from the BEA API. assumption has less theoretical support than the efficient production hypothesis
The second part of the size term is the proportion of use industry employment in each county of a FAF region, calculated as in the production size terms. The one alteration required is the addition of consumers to the model. Goods shipped directly to consumers use the ratio of all employment in all industries in a county to total FAF region employment. Employment is likely a better proxy for total purchasing power than raw population, and is available directly in the CBP dataset, reducing the need for an additional data set.
Disaggregation to TAZ For implementations where the code disaggregates to a set of traffic analysis zones instead of counties, the truck first picks a county, and then selects among the zones with centroids in that county. If a zone covers more than one county — as in external zones in a statewide model — then trucks bound for those counties simply select the same zone. For implementations where the county-to-county flows are further disaggregated to zone-to-zone flows, there are two additional sets of coefficients that relate zone-level employment by category (industrial, retail, etc.) to the likelihood that a truck carrying a good will be produced in or attracted to a particular zone. These coefficients are shown in Figure ??; the size term for a zone a in county A is the proportional regression quantity βXa b∈A βXb
where β are the coefficients on the make or use end of the trip and Xa are the zone’s socioeconomic variables. The coefficients are taken from the North Carolina statewide model. ## Joining by: c("Industry", "commodity") The regression coefficients are reasonable, with most trucks originating in zones with industrial employment and destined for zones with retail or industrial employment.
Imports and Exports The framework is only constructed to handle freight truck flows; intermodal, rail, sea, and air shipments are all filtered out. However imports that arrive in the United States by any mode may be shipped to their final destination by truck; the inverse pattern is allowed for exports. For this reason, the framework includes size tables for air, water, and highway imports and exports. The location of airports, seaports, and highway border crossings is available in the National Transport Atlas Database assembled by the Bureau of Transportation 7
Figure 4: Statistics. This database is compiled from information provided to BTS by the Federal Aviation Administration, the U.S. Army Corps of Engineers, and the Customs and Border Patrol, respectively. The airports database has the location of every airport in United States and its territories. Not all of these airports handle freight, or indeed are equipped to do so. The size term for airports is based on the 2006-2010 International Air Passenger and Freight Statistics Report published by the U.S. Department of Transportation.8 The seaports database includes the total volume (weight) of imports and exports handled in 2012. Because these numbers are comparable for the vast majority of seaports, we do not create separate tables for imports and exports. The size term is the sum of import and export volume handled by a seaport divided among all the ports in a FAF region. The highway border crossings database includes the number of trucks using each crossing in 2012. The data does not discriminate between inbound and outbound trucks, so we do not create separate import and export tables. The size term is the number of trucks using a border crossing divided among all the crossings in a FAF region. 8 https://www.transportation.gov/sites/dot.dev/files/docs/2006-2010%20Departures.txt
Trucks to and from Alaska The Alaskan highway network is not included in most disaggregation models. Trucks traveling from Alaska to the lower 48 states by highway through Canada enter via one of two crossings: trucks bound for Arizona, California, Idaho, Oregon, Nevada, Utah, and Washington enter via I-5 in Washington; trucks bound for the rest of the country enter via I-15 in Montana. The inverse corresponding relationship holds for trucks traveling from the lower 48 to Alaska.