# More about econometric modeling and some practical first steps

In 1932, Alfred Cowels – a businessman and economist – founded the Cowels Commission for Research in Economics in Colorado Springs. The Commission was dedicated to linking economic theory to mathematics and statistics. Today, the Cowels Commission still operates from Yale University (Fascinating history there –http://cowles.econ.yale.edu/about/index.htm). Over the years, research members of the Commission contributed to economics in many areas, most notably in the creation and consolidation of two fields: general equilibrium and econometrics, and several members have won Nobel prizes for the work done while at the Cowels

Commission.

The Cowles Commission deﬁned econometrics as: “a branch of economics in which economic theory and statistical methods are fused in the analysis of numerical and institutional data”.

In econometric models, economic theory is used to develop mathematical links between a set of endogenous variables and a set of explanatory variables, seeking to understand the relationships between these variables. Economic systems are typically complex and while some of the explanatory variables are observable, others are not. Economic theory is usually not enough to characterize the relationships between all these variables and the econometrician has to add statistical assumptions about the unobservable variables and their relationship with the observables.

A key issue, and typically the first step in any analysis, is making sure we understand the dependent variable(s) well. This is where institutional data is essential. In many cases, a combination of formal and informal institutional information shapes the entire analysis.

For this reason, we normally spend a substantial amount of time with our clients before we begin any modelling. We take the time to understand how data is recorded during the regular course of business, and to consider what happens under various scenarios. We pay particular attention to irregular data points, special time periods, and important events. We make sure we understand conventions and decisions that affect the circumstances under which the data is observable or not and whether it takes a certain value or another.

Often, we hear concerns about the non-normal distribution of the dependent variable in an econometric/regression model. It may come as a surprise to some but, there are no assumptions in a regression model about the distribution of the dependent variable.

The distributional assumptions (normality and homogeneity of variance) in linear regression model are about the distribution of the dependent variable given the independent variables. What this means is that you have to take out the effect of the independent variables before you examine the distribution of the independent variable.

Now, by definition, the distribution of the dependent variable given the independent variables is the distribution of the residuals. In practice, there is no need to be concerned about the distribution of the dependent variable – estimate your model, save the residuals, and then check their distribution.

That said, it is still important to examine the distribution of the dependent variable. While you cannot always tell from the distribution of the dependent variable what is the distribution of the residuals, you can tell what the distribution of the residuals is not.

The importance of understanding everything about the dependent variable cannot be emphasized enough. It is a crucial step that affects both the choice of modeling technique and the interpretation of results. Some of the issues that need to be sorted out relate to the exact definition of the dependent variable: What exactly does it measure? What are the relevant units of measurement? Are there any limitations on the range of the dependent variable? Answering such questions will take you a long way down the modeling road.