Exploring Mexican Firm-Level Data Leonardo Iacovone∗ April 12, 2008 Abstract This paper discusses the firm-level data collected by INEGI and their content. It also documents how the different surveys can be merged and their main limitations.1 Keywords: Mexico, Plant-level data, INEGI, microdata

1

Introduction

The past ten years have witnessed an explosion in the number of studies using micro-level data. This has been made possible by an unprecedented increase in the availability of firm and household level longitudinal dataset. Mexico has a long tradition of collecting firm-level data and this paper describes some of the principal firm-level surveys produced by INEGI.2 For each survey we will discuss their content, sampling methodology, and the principal problems encountered when linking University of Sussex and World Bank. Arts Building E, Falmer, Brighton BN1 9SN, United Kingdom. Email: [email protected] and [email protected]. 1 The authors are grateful to Gerardo Leyva and Abigail Dur´an for granting access to INEGI data at the offices of INEGI in Aguascalientes under the commitment of complying with the confidentiality requirements set by the laws of Mexico. We would like to thank all the INEGI’s employees who helped during the work at Aguascalientes, and express our special gratitude to Alejandro Cano and Gabriel Romero whose patience and camaraderie helped and supported us during this work. We also thank Araceli Martinez, Armando Arellanes, Ramon Sanchez, Otoniel Soto, Candido Aguilar, Adriana Ramirez for their valuable comments and help. Leonardo Iacovone gratefully acknowledges the ESRC and LENTISCO financial support, and Alan Winters, Gustavo Crespi and Sherman Robinson for their guidance and support. 2 Mexican authorities are committed to develop a sound and complete statistical information that is useful for analysing social and economic dynamics, as well as evaluate the impact of specific policies. INEGI, Instituto Nacional de Estadistica Geografia y Informatica, is the institution responsible to fulfil this task. INEGI’s work provide an impressive amount of high quality data. ∗

1

them. We will also carefully explain the procedures used for cleaning and deflating the data. The main objective of this paper is to serve as a background to the rest of the thesis, as the data described here are used in all the subsequent empirical analysis. Furthermore, we also hope this paper will serve to enhance knowledge about the Mexican sources of information available for further research and raise awareness about the remarkable work done by INEGI.

During various months spent at INEGI, I had the chance to work directly with various INEGI analysts, interview a number of INEGI’s experts, consult various internal methodologies for collecting, processing and revising the data, as well as analyse the data directly. Some of the material revised is published by INEGI and available online.3 However, a large part of it is not processed for public use and only accessible at INEGI premises in Aguascalientes (INEGI 2002a, INEGI 2002b). INEGI does a remarkable amount of work with a large number of analysts involved in monitoring the quality of the information and processing it. I spent several months at INEGI, in Aguascalientes, and worked directly with the data, supported by various INEGI analysts, to ensure that I could adequately use this information and understand the limitations of the data. This document is a distilled result of this work and of the the work, on which we strongly relied, of INEGI’s analysts.

The paper is divided into two sections, each one covering a different survey.

2

EIA

The Annual Industrial Survey (EIA4 henceforth) is the main, and oldest, survey covering the manufacturing sector5 and was originally started in 1963, when it only included 622 plants spread over 29 classes of activity.6 The original coverage was increased in 1976, 1987 and 1994 when the classes of activities covered were respectively expanded to 57, 129 and 205, and the number of plants surveyed enlarged to 1338, 3218 and 6867 respectively (see Table 1). Normally the expansion of the coverage coincided with a new industrial census, at which point an updated and more 3

See INEGI website (http://www.inegi.gob.mx) Encuesta Industrial Anual (or EIA). 5 It is important to notice that the “maquiladoras” are excluded from the EIA and their information is collected by a specific, and separate, survey (i.e. Encuesta de la Industria Maquiladora de Exportaci´on). 6 A class of activity is the most disaggregated level of industrial classification and is defined at 6 digits. 4

2

complete “picture” of the manufacturing sector was developed, which allowed “new” activities and firms to emerge and be included, and which also allowed those that were not no longer important to be excluded. As the diversification and development of Mexican economy has naturally led to a diversification of the “manufacturing activities” in Mexico, the coverage of EIA has also expanded over time. For this reason, after the latest industrial census carried out in 2003, the latest EIA saw a further expansion of the classes of activity covered to 231.7 It is important to explain this point a little further. There are two distinct reasons that drive the expansion in the number of “clases” covered. The first driver is linked to the updating of the industrial classification system with new class of activities introduced and previously existing groups split into more detailed classes of activities. The second driver, already mentioned above, is due to the diversification of the economy and the emergence of new manufacturing activities previously not important enough to justify their inclusion in the sampling. Table 1: EIA’s historical evolution (Source: INEGI)

1962 1975 1986 1993

N. of manufacturing activities covered 29 57 129 205

N. of firms covered 622 1,338 3,218 6,867

Based on Industrial Census 1961 1976 1986 1993

The unit of observation is the plant described as “the manufacturing establishment where the production takes place” and each plant is classified in its respective class of activity based on its principal product (at 6 digits level based on the CMAP8 ).

2.1

The “old EIA”

The old EIA covers the period 1984-1994. The number of classes of activity9 encompassed is 129 and the number of plants is 3,300. This is a balanced panel, exiting firms having been excluded from the sample, and the questionnaire applied has been maintained constant during the entire period. 7

When my “fieldwork” was carried out and I visited INEGI in Aguascalientes during the second semester of 2005 the latest available year for which the EIA information had been processed was 2002. The “new new EIA” spanning over 231 classes started in 2003 and is therefore not analysed here. 8 Mexican System of Classification for Productive Activities. 9 The system of classification is the CMAE75 (Clasificaci´on Mexicana de Actividades Economicas).

3

The variables captured by the old EIA, similar to many other industrial surveys, include the various inputs used by manufacturing plants (labour split into white and blue collars, raw materials, intermediate inputs, energy consumption, industrial services and maquila services, non-industrial services like technology transfers) and the principal output indicators (value of production, value of sales, inventory, revenues derived from industrial services like maquila and non-industrial services like technology transfers). The old EIA also captures variables related to existing initial capital stock at book value (divided machinery, building, land, and transport equipment), depreciation of capital stock during the year (also divided by different type of fixed assets), investments in new and used assets. The principal difference between the “old” and “new” EIA is the absence of “trade related” variables. In particular, in the “old EIA” there is no information on the quantity of imported intermediate inputs or fixed assets. However, for certain years, the World Bank financed a special effort to collect exports information for the firms covered by the EIA. For this reason, we have information on the value of exports for the period 1986-1990.

The sampling method is “deterministic” and aims at capturing the “most representative” classes of activities and the larger establishments, while the 1986 Industrial Census was used as sampling frame.10

When we tried to link the “old EIA” with the “new EIA” only a sub-set of the plants appear to be linkable. Out of the 3,300 firms present in the “old EIA” and about 6,800 plants from the “new EIA”, only 2,300 can be followed and linked across the two surveys.11 Because in our analysis we will mostly focus on trade related variables and also because, if we were to use the linked dataset, we would be starting our sample with one third of the plants captured by the “new EIA” we will perform our analysis using only the “new EIA”. A further reason that explains our choice to use only the “new EIA” is due to the availability of tariff data. While for the period 1993-2002 there is availability of disaggregated, i.e. at six digits, tariff data, for the previous period the availability of tariff information is reduced and we could only obtain tariffs at 2 or 3 digits of disaggregation. Finally, in the case of the “old EIA” there is not availability of detailed product-level information, which is crucial in our analysis. 10

See section 2.2.1 for more details on the sampling methods. The reasons of this mismatch could be various: (a) differences in the coverage of the two surveys, (b) problems with the use of the plant identifier. 11

4

2.2

The “new EIA”

The “new EIA” started in 1993 (in correspondence with a new economic census) and implied a very important improvement over the “old EIA” both in quantitative and qualitative terms. The system of classification used for the “new EIA” is the CMAP9412 (Clasificaci´on M´exicana de Actividades y Productos) . The first digit indicates the sector (the EIA only includes firms which fall in the sector “3” corresponding to the manufacturing sector). The second digit indicates the “divisi´on” (sub-sector or division), e.g. “31” indicates “food products, beverages and tobacco”. The first four digits identify the “rama” (sub-division branch) of activity. Finally, CMAP at six digit indicates the “clase” of activity (e.g. “311203” indicates “preparation of condensed milk, evaporated milk and in powder”). The classification level that is used for the EIA is the CMAP at 6 digits, which allows us to identify the respective activities at a high degree of disaggregation.

2.2.1

Coverage and sampling structure

As previously mentioned, the new EIA spans over the period 1993-2002 and is very similar to the old EIA in terms of the sampling methodology. Here we describe it in detail. First INEGI selected the manufacturing activities13 (clases) to be included in the following way: 1. Based on the industrial census of 1993, the various classes of activity are ranked in decreasing order based on their total value of production measured at the “factory gate” price. 2. The most important activities, which jointly represent 85 percent of the total manufacturing output, are selected. 3. Finally, some other classes of special interest for defining the national accounts are added, even if their contribution in terms of industrial output does not justify their inclusion. 12

This system of classification can be harmonised with international systems as the SITC, regional systems as SCIAN, and other Mexican systems adopted in the past as the SCNM. For this purpose INEGI has developed appropriate tables of concordance 13 Each manufacturing activity is captured by a specific six digits code.

5

Second, INEGI proceeds to the selection of the plants within each one of the already chosen activity classes: 1. Plants are ranked in decreasing order based on their total production value, measured at the “factory gate” price. 2. Plants are added to the sample until the set of the selected plants covers approximately 85 percent of the respective class’s output value. 3. All plants with 100 or more employees are included automatically, regardless of the 85 percent threshold having already been reached.14 4. For the highly disaggregated classes15 , whenever the normal sampling procedure implies that more than 120 plants need to be surveyed to reach the 85 percent threshold, the number of plants surveyed is kept to a maximum of 120. In fact, for “highly disaggregated” sectors the actual coverage is at about 60 percent of the total manufacturing output of the respective class. 5. For the highly concentrated classes,16 where the 85 percent threshold is reached by covering less than 15 plants, then all the plants are included. As already mentioned, the new EIA covers 205 of the 309 6-digits classes of the CMAP-1994. The “ramas” covered are all 50 included in the CMAP 1994, and the number of divisions covered are all the 9 subsectors included in the CMAP 1994. The number of firms covered in 1993 is 6,861 and it decreases over time because of attrition. Furthermore, it is important to note that entering firms are captured in a non systematic way (See section 2.4.3 for more details on entry).

As a consequence of this sampling method, the EIA is clearly skewed towards larger firms. In fact, while the 1993 census covered 106,748 plants, the number of plants covered by the EIA is equal to 6.5 percent of the total number of plants covered in the Census. Nevertheless, this represents about 85 percent of the total Mexican industrial output. As reported by Table 2, the average plant in the EIA has 188 employees, and about a quarter of the plants surveyed are large and have in average 423 employees.

Once a firm is included in the sample, then it is classified on the basis of its princi14

This means that the EIA is in reality a census for plants with more than 100 employees. These are classes of activities characterised by plants with small size and a high number of manufacturing establishments (e.g. textile, footwear, etc.) 16 These are classes of activities characterised by a reduced number of large plants or in other words sectors where the industrial concentration is higher (e.g. chemical, machineries, etc. 15

6

Table 2: Average Size and Stratification Stratum

Mean No. Employees

Median No. Employees

No. Plants

Small 50 48 Medium 160 152 Large 423 541 All 188 101 Notes - Based on EIA 1993 - Based on INEGI stratification small plants have less than 100 and more than 15 employees, medium plants have more than 100 and less than 250 employees, and large plants have more than 250 employees. - We are excluding “extreme observations” and missing values.

3354 1908 1463 6725

pal product. Each establishment then fills in one questionnaire. However, another possibility is that, in a few cases, the owner of a plant is also the owner of other plants producing the same product. In this case, he can request to aggregate the information of the different plants into one questionnaire only “as if” it was one unique large plant.17 Since 1997, INEGI has started to identify those plants that concentrate information of multiple establishments. There is a possibility that there are “concentradoras” in the years pre-1997, which remain in the sample, although we are unable to identify them. The same applies also to “concentradas” plants: pre-1997, there could also be some of these firms. However, these firms are in principle easier to identify because they would appear with all zero values but still be maintained in the sample. We identify both the “concentradoras” and “concentradas” plants and are able to exclude them in our robustness checks.

2.3

Content

The EIA, similarly to other industrial surveys, contains the following variables related to labour force, inputs and costs, investment, output and revenues. In detail: • Labour force related variables 17

These plants are known as “concentradoras” while the sub-units that are aggregated to this plant are known as “concentradas”.

7

– Total number of workers – Total wages and, separately, total social contributions paid – Total hours worked • Costs related variables – Costs of intermediate goods and materials split between domestic and imported ones – Costs of packaging – Costs of fuel and lubricants – Costs for industrial services including maintenance and reparation, as well as “maquila services”18 – Costs for non-industrial services as commissions paid to retailers and merchants, transport and distribution costs – Expenditures for technology transfers – Marketing and advertisement costs – R&D expenditures – Energy expenditures and quantity of energy consumed • Revenues related variables – Domestic and export sales – Total production value evaluated at average gate prices – Revenues for “maquila services” – Revenues for services of maintenance and reparation – Revenues for technology transfers • Inventories of intermediate goods, raw materials, finished and semi-finished products • Book value of fixed assets, and investments split in different categories – Machineries acquired domestically and imported, split between new and second-hand 18

Maquila services are a special type of sub-contracting services when the sub-contracted firm receive all the inputs and materials to be processed

8

– Buildings – Transport equipments – Others (i.e. office equipment) – Equipment for reducing and controlling pollution – Land

2.4 2.4.1

Additional Information Ownership

It is important to note that while all the variables are obtained annually, the information on foreign ownership was collected only in 1994,19 with the Industrial Census, and subsequentially dropped from the questionnaire. For 1993, we then have the ownership share and the nationality of the plants’ owners. Table 3: Firms Ownerhip Ownership

Number

Percentage

Domestic Foreign Participated Foreign Owned

5,979 317 553

87.3 4.6 8.1

Notes: Foreign participation is defined for plants with foreign share smaller or equal than 50 percent. Foreign ownership is defined for plants with foreign share larger than 50 percent. Unfortunately, because of its questionnaire design the EIA does not allow us to identify plants that are part of a multi-plant complex, because there is no question concerning ownership.20 19

The information is collected in 1994 but refers to 1993. However a “special project” carried out in 2005 by the department administering the Monthly Industrial Survey (EIM henceforth) identified those plants that were part of multi-plant firms (see section 3.3.1 for more details). 20

9

2.4.2

Capital stock

Obtaining a correct measure of the capital stock is especially important, both when we focus our attention on labour productivity and need to control for the capital intensity of a plant, or when estimating total factor productivity. In order to obtain a correct measure of the capital stock we proceeded in the following way.21 First, from the 199422 Industrial Census we obtained the value of the capital stock at its replacement value.23 However, the matching between the EIA and the Census is imperfect and we were unable to match about 14 percent of the plants in the EIA. For these plants we used the book value of their capital stock.

An alternative method explored to evaluate the initial capital stock is described in appendix A. When comparing the capital stock estimated using this method with the book value capital stock, we observe a correlation equal to .92 and the distribution of the two capital stock is presented in Figure 1, where K2 is the capital stock estimated while K3 is the capital stock at book value. Because of the similarity between the two series of capital stock and because previous papers opted for using the capital stock calculated at book values we used the book value capital stock (Verhoogen 2008, Lopez-Cordova 2003).

A question worth asking is if the subset of plants for which the initial capital stock at its replacement value is missing is a random subset. We can try to answer this question by comparing the capital stock, evaluated at its book value, for the two groups of plants: the group for which we do not have the replacement value (dashed line) and the group for which we do have its replacement value (continuous line) as in Figure 2. We observe that it seems that the plants for which the capital stock at its replacement value is missing appear to have a smaller stock of initial capital.24 21

In my work I have tried to do better than previous work using the Mexican data that either only used the plants for which the replacement value of initial capital stock is available or simply used for all plants its initial capital stock calculated at its book value. A second improvement with respect to previous studies is due to the use of specific capital assets deflators. 22 Also in this case the information is collected in 1994 but refers to 1993 23 While in the EIA plants are asked to indicate the historical value, or book value, of their capital stock, in the census the question explicitly asks for market replacement value of the capital stock. 24 Unfortunately based on discussions and interviews with INEGI’s expert we could not find a reason for this as INEGI officers insisted that the missing links between 1993 industrial census and EIA can be considered random.

10

0

.05

.1

.15

.2

Figure 1: Distribution of estimated capital stock and book value capital stock

0

5

10

15

x kdensity lnK2

kdensity lnK3

0

.05

.1

.15

.2

.25

Figure 2: Distribution of capital stock at its book value

0

5 10 Log of Capital Stock in 1993 Book Value for Plants with Missing Replacement Value Book Value for Plants with Existing Replacement Value

11

15

Once we have obtained the initial value of capital stock for all plants, we can calculate the value of the capital stock in the following years using the perpetuary inventory method formula kt+1 = kt (1 − δ) + It

t ∈ [1993, 2001]

(1)

The deprecitation rates, δ, chosen are equal to the mean depreciation rate offered by fiscal authorities (see Table 4).

Table 4: Depreciation Rates Type of Fixed Assets

Fiscal Depreciation Band

Applied Depreciation Rate

Machineries and equipment Buildings Transport equipment Office equipment and others Anti-Contamination equipment

5-15% 3-8% 15-25% 7-35% 5-50%

10% 5.5% 20% 21% 27.5%

Finally, we need to deflate this nominal values and transform them into constant 1994 peso prices. With this purpose, we applied assets specific deflators obtained from Banco de Mexico for each one of the five different types of fixed assets: machineries, buildings, office equipment, transportation equipment, land.

2.4.3

Entry and Exit

INEGI tries to refresh the EIA sample by including new firms that are created. However, the identification of these firms, new entries into the EIA panel, is not done in a systematic way. There is one specific department within INEGI that is in charge of “updating” the sample and this has traditionally been done by relying mostly on local and national media. Also, whenever the number of firms included in a class of activities shrunk below 8 then, in order to satisfy the “confidentiality” requirements of the survey, this department actively looked for new firms established. Unfortunately, there is no formal agreement between INEGI and other Mexican institutions maintaining an updated administrative register of existing firms that would allow a continuous refreshing of the sample and also give a picture of new “entries”.25 25

Since 2005, INEGI has started to make use of administrative registries of Ministry of Finance to refresh the sampling framework and try to better capture entry. However, this new method is

12

A very unique and interesting feature of the new EIA is the way it captures exits. The plants are not eliminated from the sample at the precise moment when they become inactive, but are kept as “suspended” for 2 years in a type of “stand-by mode”.26 After two years of “suspension”, the plant exits the sample and the causes of this exit are recorded in detail.27 This information on exit is kept separately and we merged this with the main panel.

2.5 2.5.1

Data Management Linking multiple waves and building a panel

Each plant surveyed by the EIA was assigned in 1993 an identifier composed of its 6-digit class of activity and an additional 4-digit code (“folio”). Jointly, these two codes allow us to uniquely identify each plant and follow it over time. We build the panel using this 10-digit unique plant identifier.

Whenever a plant closes down its identifier disappears and it is not used again. Analogously, whenever a new plant is included in the sample, it is assigned the corresponding 6-digit class code, based on what it produces, and also assigned a new 4-digit “folio”.

2.5.2

Deflating Variables

All variables reported in the EIA are in current nominal values so it was necessary to transform them into constant real values. In order to do this appropriately we used different deflators and transformed all nominal values into constant 1994 peso.28 The domestic sales were deflated using the price-producer index at 6 digits proapplied only to the “new new EIA” survey starting in 2003. 26 The rationale is that the suspension could just be temporary. 27 The causes behind an exit can be: merger, switching of class of activity, change of activity, change of trade name, disappeared, information reported by another plant, duplicated, administrative merger, strike, liquidation, export maquila, domestic maquila, bankruptcy, unwilling to provide information, accident, suspension of operations. 28 Most price deflators, except the ones for fixed assets, can be directly downloaded from Banco de Mexico (www.banxico.org.mx.

13

vided by Banco de Mexico.29 Similarly, net inventories and maquila revenues were deflated using the same price-producer index at 6 digit. The export sales were deflated using the export-producer index at 2 digit provided by Banco de Mexico. The labour costs were deflated using the consumer-price index provided by Banco de Mexico with base year 1994.30 The domestic intermediate inputs were deflated using the 4 digit intermediate inputs price index published by Banco de Mexico. To deflate the imported intermediate inputs we used the US intermediate inputs price deflator for exported non-agricultural supplies and materials (excluding fuels and building materials) adjusted for the exchange rate fluctuations.31

2.5.3

Data cleaning

We already mentioned that the EIA does not include the “export maquiladoras”.32 For this reason, we want to tackle the possibility that “by mistake” some maquiladoras have been included in the sample. To address this potential error we defined as maquiladora any plant that exports all its production and imports all its inputs. We identify and exclude from the panel all the firms that appear as “potential maquiladoras”, even just for one year. There are eventually only 15 plants identified as “potential maquiladoras”.

As previously explained, some firms are included in the sample as “entry”. However, this is not done in a systematic way. For this reason we identify these plants in order to be able to exclude them when evaluating the robustness of our results.

Another issue to be resolved is the presence of extreme values. To resolve this 29

Banco de Mexico classifies the economic activities using the CMAE, Mexican classification of economic activities. Therefore it was necessary to first match the 6-digit CMAE clases with the respective CMAP clases, this was possible because INEGI has developed an appropriate conversion table 30 Ideally we would have preferred to use wholesale prices because this would avoid to incorporate into the deflators issues related to imperfect competition and market power of the retail sector but we were unable to obtain this price index. 31 This can be downloaded from Bureau of Labor Statistics. 32 These are firms that benefit of a special system of tax exemptions because import most of intermediate inputs and export most of their output.

14

problem the common solution is some type of trimming. The two most common options in the literature discussed by Angrist and Krueger (1999) are winsorizing and truncating. Winsorizing consists of setting the observations in the top and bottom deciles, for instance 5th and 95th decile, precisely equal to the value of the observation at the 5th and 95th decile. Truncating consists in eliminating altogether the observations in the extreme deciles. As a general rule of thumb, Angrist and Krueger (1999) suggest that winsorizing should be preferred when the extreme values are exaggerated versions of the true values, but the true values still lie in the tails, whilst truncating should be used when extreme values are pure mistakes that do not bear any resemblance to the true values. During the period spent at INEGI, I had the chance to discuss these issues with those responsible for the survey, as well as with the analysts in charge of ensuring the reliability of the data. Based on these discussions, the evaluation of the internal data revision process, and the direct analysis of the data we came to the conclusion that the quality of information collected by INEGI could be considered reliable. However, because we were concerned with the possibility of extreme values being due to mistakes in collecting or inputting the data, we identified these “potential extreme values” by truncating the top and bottom 1 percent of the observations. These observations were then flagged and excluded during the robustness checks.

Finally, in order to confirm that there were no mistakes in the data we ran a set of identity checks to confirm that: • the value of total sales is equal to the sum of domestic plus export sales; • the value of total intermediates is equal to the sum of domestic plus imported intermediates; and • the value of total costs is equal to the sum each one of the individual costs; Whenever any of these identity checks failed we discussed the problem with INEGI’s analysts and, when they could not provide a solution or an explanation, we flagged the observation in order to exclude it during our robustness checks.33 33

A remarkable characteristic of the way the EIA is administered is that each analyst is allocated a certain number of plants (on average 150) to follow up and every year he is in charge of analysing the responses of the same plants. Whenever the responses appear out of line with the previous years or what would be reasonable the analyst calls the plant and confirm the results. Indeed, in one case, when running some checks on the data, we found that the employment of a plant had dropped from one year to another of more than 50 percent and thought this was a mistake. However, when we were able to meet the responsible analyst he explained me that this was not a mistake but it was due to a strike that had paralysed the plant for more than six months.

15

3

EIM

The EIM34 is a monthly survey that is collected by INEGI to monitor short-term trends. Traditionally, the survey has been run in parallel with the EIA and covers the same plants. The principal differences with EIA are its periodicity and the variables collected. Also, within INEGI two different departments are responsible for collecting, processing and analysing the two surveys.

3.1

Historical Evolution and Coverage

Similar to the EIA, the Mexican authorities started collecting monthly industrial data in 1964. However, the number of manufacturing activities covered was initially limited. This was the case until 1987 when the EIM was expanded to cover 129 “clases” and an initial sample of 3,218 firms. This was expanded even further in 1994 with an initial sample35 of 6,884 firms covering 205 ”clases”. The number of firms decreases over time because of attrition. For details see Table 5 Table 5: Number of firms in the EIM 1994-2002 Year

1994

1995

1996

1997

1998

1999

2000

2001

2002

Number of firms 6,711 6,683 6,608 6,350 6,008 5,753 5,551 5,378 5,173 There are some cases of mis-matching between the EIA and EIM regarding the number of plants covered because of the timing of these two surveys (see section 3.4.1 for more details).

Similarly, as it occurs for the EIA, in the case of the EIM INEGI runs a number of filters to check the data obtained from the respondents. Each analyst is responsible for analysing every month the same plants (about 150 plants) and, whenever the responses fall outside the “expected ranges”, they call the respondent and doublecheck the information. In certain cases, when errors and inconsistencies are discovered after some delay, the information provided in the previous months is revised and updated. 34

Encuesta Industrial Mensual As for the EIA the sample base is provided by the 1993 industrial census and the sample covers about 85 percent of the included industrial activities. For more detail about sampling process see the section 2.2.1 as the sample structure of these two surveys is identical 35

16

3.2

Content

The EIM contains fundamentally two group of variables: labour-force related and output related variables. In detail: • Labour-force related variables – Total number of workers broken down into blue collars (“obreros”) and white collars ((“empleados”) – Total wages, net of the social contributions, broken down as the labour force – Social contributions paid to workers broken down as the labour force – Total hours worked broken down as the labour force • Output related variables – Revenues from maquila services – Revenues from services of ”maintenance and assistance” – Total production – Net sales – Export sales • Installed capacity usage It is important to make two remarks with respect to the variables capturing production, sales and exports. First, the plants are asked to report both values and quantities, therefore an implicit average unit price can be calculated. Second, for these variables the plant is requested to distinguish each one of its products, so these variables are reported product by product. In 1993, INEGI defined a list of products for each 6-digit class36 from which the plant can choose. However, if the product is not in the list then it is recorded as “other non-generic products” or “residues and sub-products”. However, the weight of these two residual categories is negligible for most of firms (i.e. less than 2 percent in average). In table 6 we show in the first two columns the average and the median weights of these residuals for sold products, exported products and produced products. Also, for those plants having a relatively high share of “residual products”, we can see this is not a major problem. For plants in the 90th and 95th percentile the residuals are never above 8 percent of the total output. Only for plants above the 99th percentile does this 36

This list was developed based on the census and previous surveys.

17

appear to be a serious issue, because the weight of their residual products is equal to about one third of their output. These plants are identified and excluded in our robustness checks. Table 6: Weight of residuals products

Sold Products Exported Products Produced Products

Mean 1% 1% 1%

Median 0 0 0

90th Pctile 95th Pctile 99th Pctile 2% 8% 27% 0 6% 33% 2% 8% 27%

Source: EIM, INEGI

3.3 3.3.1

Additional Information Ownership

Normally, the EIM questionnaire does not contain any information regarding ownership. However, during 2005-2006 a special module was run and plants were asked if they were part of a multi-plant complex or single-plant. This information can be linked to the main EIM variables using the plant identifiers. We do so for 2003 and the number of multi-plant firms by sector is detailed in Table 7: there are 458 multiplant firms with an average number of three plants per firm, and 3,791 single-plant firms. Table 7: Multiple- and single-plant firms No. of Firms

No. of Plants

Average No. of Plants

1245 3791

3 1

Multiplant Firms 458 Single-Plant Firms 3791 Notes - Based on 2003 data - Based on trimmed data

18

3.4 3.4.1

Data Management Panel Creation

In the EIM, as in the yearly industrial survey (EIA), a firm can be tracked over time using a unique plant identifier. This is built in the same way as for the EIA, and actually coincides with it (see section 2.5.1 for details). Based on these identifiers a panel using the individual monthly EIM can be built.

Having built the panel, we dropped the observations relative to the “residual categories” (see previous sub-section). The result of this is a panel with 187,533 observations where we have multiple observations per plant given plants normally produce multiple products. The number of plant and products present in the panel is reported in the table.

Having built the panel, we annualised the information provided by the EIM in order to link this panel with the EIA panel. We are able to link these two surveys using the same plant identifier. However, the matching between the two surveys is imperfect because of their timing. The information for the EIA is collected in the following year during the period between April and July, while the information for the EIM is collected during the following month.37 The resulting panel that merges the information from the EIA and EIM will be our main dataset and we report in table 8 the number of plants and products present in this panel.

3.4.2

Data Cleaning

Also as described for the EIA, we apply an analogous trimming to the main variables of interest obtained from the EIM: • Domestic unit values equal to the ratio of domestic sales revenues and domestic quantities sold • Export unit values equal to the ratio of export sales revenues and export quantities sold 37

For example, one plant can be operating until March 2000 and then closed, in which case it would be captured by the EIM during all 1999 and the first two months of 2000 but would not appear in the data of both the EIA-1999 and EIA-2000 as the information of the latter two is captured respectively between April and July of 2000 and 2001.

19

Table 8: Number of plants and products - merged EIA and EIM panel Year 1994 1995 1996 1997 1998 1999 2000 2001 2002

No. of Plants All Exporting 6,299 1,586 6,070 1,880 5,786 2,061 5,572 2,161 5,400 2,106 5,255 1,967 5,118 1,914 4,952 1,780 4,782 1,696

No. of Products Sold Exported 19,314 2,857 19,284 3,526 18,229 3,989 17,325 4,186 16,761 4,269 16,226 3,962 15,522 3,796 14,924 3,555 14,404 3,357

• Domestic sales, export sales, total sales per product • Number of employees and wages paid to blue collars • Number of employees and wages paid to white collars We flag these observations and exclude them in our robustness checks. In the case of product-level unit values, because we are particularly concerned with noise and errors at such a level of disaggregation, we also flag those cases where their yearly increase is larger than 300 percent or their yearly decrease larger than 65 percent (basically a boom or drop larger then one third the unit value).

4

Conclusion

In this paper I have described the source of information that will be used in the subsequent empirical analysis. As it emerged from this description, the Mexican plant-level data provide an extremely rich dataset with detailed information not only at plant but also product level.

20

References Angrist, J. D., and A. B. Krueger (1999): “Empirical Strategies in Labor Economics,” in Handbook of labor economics, vol. 3A (1999), pp. 1277–1366. Elsevier Science, North-Holland, New York and Oxford. INEGI (2002a): S´ıntesis Metodol´ogica de la Encuesta Industrial Anual. (2002b): S´ıntesis Metodol´ogica de la Encuesta Industrial Mensual. Lopez-Cordova, E. (2003): “NAFTA and Manufacturing Productivity in Mexico,” Economia: Journal of the Latin American and Caribbean Economic Association, 4(1), 55–88. Verhoogen, E. A. (2008): “Trade, Quality Upgrading and Wage Inequality in the Mexican Manufacturing Sector,” Quarterly Journal of Economics, 123(2).

21

A

Methodology to calculate the initial capital stock using perpetuary inventory method

An alternative method used to calculate the capital stock, in the absence of the initial capital stock, exploits the perpetuary inventory method (henceforth PIM).

Based on the PIM the capital stock at time t is equal to

Kt = It + (1 − δ)Kt−1

(2)

Consequently, in order to calculate the capital stock at time ”t” we need three variables: Kt−1 , δ (depreciation rate), and It . Normally It is reported in the survey. δ is a parameter and it is given exogenously.

Complications arise in order to obtain Kt−1 . In the case of EIA, we acquire this variable from the Industrial Census for most of the plants.

When there is a large number of firms for which the capital stock is never reported this must be input in some manner. One possible metholodogy is the following: 1. Calculate investment at sectoral level 38 using the aggregate investment series for all the available years (in our case for the period 1993-2002) 2. On the basis of this investment series we can get the initial sectoral capital stock as

K0j =

I0j δ + gj

(3)

where j is the 4-digit industry, 0 is the initial year for which we have the investment, and g growth rate of the capital stock over the entire period 19932002

gj = 38

1 Itj − I0j ∗ t I0j

At 4 digits

22

(4)

3. We obtain gj from the regression

lnIijt = α + gj ∗ t

(5)

where Iijt is the log of the investment of plant i belonging to sector j. 4. Once we have estimated the initial capital stock for each sector we need to assign the appropriate capital stock to each individual plant i in sector j. We do so by applying an appropriate weight that in most of the case is either the electricity or the intermediates inputs consumed. Because in the EIA we observe more missing values for the electricity variable, we opted for the value of intermediate inputs consumed (m0i )

K0ij = Kj ∗ w0i

Where w0i =

23

P m0i p∈j m0p

(6)

Exploring Mexican Firm-Level Data

Apr 12, 2008 - micro-level data. This has been made possible by an unprecedented increase in the availability of firm and household level longitudinal dataset. Mexico has a long ... about the Mexican sources of information available for further research and raise awareness about ..... 2.5 Data Management. 2.5.1 Linking ...

226KB Sizes 0 Downloads 223 Views

Recommend Documents

humanities data in r exploring networks geospatial data ...
Before using this unit, we are encourages you to read this user guide in order for this unit ... The problem is that once you have gotten your nifty new product, the ...

Mexican theme recipes.pdf
non-stick cooking spray. 1 (8 oz) whipped topping. toffee bits ... Mexican theme recipes.pdf. Mexican theme recipes.pdf. Open. Extract. Open with. Sign In. Details.

mexican hot chocolate.pdf
There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit t

11-3_The Mexican-American War.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. 11-3_The Mexican-American War.pdf. 11-3_The Mexican-American War.pdf.

The Mexican Redrump, Brachypelma vagans
assorted common names (Central American, Guatemalan, Honduran, or Mexican black velvet tarantulas). The official common name is Mexican redrump ...

Mexican Star quilt pattern.pdf
Page 1 of 1. Mexican Star quilt pattern.pdf. Mexican Star quilt pattern.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Mexican Star quilt pattern.pdf ...

Mexican Redrump Tarantula, Brachypelma vagans ... - CiteSeerX
deformed male in the laboratory, and it was suspected that the deformity was caused by inbreeding. Since then, many more perfectly normal specimens have been captured, and it is now thought that the specimen was deformed because it was injured during

Mexican Redrump Tarantula, Brachypelma vagans ... - CiteSeerX
This document is EENY-287, originally published as DPI Entomology Circular 394, one of a series of Featured Creatures from the Entomology and. Nematology Department, Florida Cooperative Extension Service, Institute of Food and Agricultural Sciences,