Department of Statistics The Wharton School University of Pennsylvania
Credit Risk Project, Installment 4 REQUIREMENTS For the fourth and final installment of the project, your learning team will build a multiple regression model using a more extensive data table provided by the lender. This data table has both more rows (more cases) and more columns (variables). The data used in the first three installment is a subset of this larger table (the first 628 rows and initial columns). Your data includes cases that are missing the value of the variable amount repaid at six months. Part of this installment requires that you predict the PRSM for these cases. You will find it useful in your analysis to refer back to the information provided in the Credit Risk Project Introduction. After completing your technical analysis, you are to submit a report presenting your findings. Your report must consist of two parts. As in the third installment, submit your answers in a single PDF file at http://apps.wharton.upenn.edu/prism/. You must also submit predictions for the incomplete cases (as done in Installment 3). Part 1 of your report should be a one-page executive summary of your results written in non-technical language. Your summary should present your conclusions and be free of jargon, figures, graphs and charts. Describe your results, not the process that you engaged in to obtain them. (Do not exceed one page. You can use a font as small as 12 point and use single spacing if you choose.) Part 2 of your report should be a technical summary written for a statistical expert. The technical summary must not exceed 5 pages, including any pertinent JMP output. JMP output must be important to the narrative of the report and be clearly labeled and explained. Narrative portions should be typed using 12-point font and single spacing. This description should contain sufficient detail that justifies the structure of your regression model. The presentation should be able to convince the expert that you have conducted a competent analysis of the data. Part 3 of your analysis is to predict the status of a new, independent collection of loans. Your report is to be submitted as a single PDF document. The PDF file must have a separate cover page that identifies the team (e.g., A-1) and lists the member of the team who are participating in the project. Number the subsequent pages and format them to have 1-inch margins all around.
MODELING PRSM The objective for this installment is to build a multiple regression model that the lender can use to identify merchants that are most likely to be “on track” six months into the loan. A borrower is on track if the amount repaid at six months is close to half of the total amount to be repaid, and hence PRSM should be close to 1 for these. As you have seen in your analysis in Installment 3, the lender currently has many borrowers whose payments lag behind the target at six months. For these merchants, PRSM < 1. The lender would like to do a better job in identifying these problematic loans sooner. Doing better might mean avoiding making loans to some merchants or changing the terms of the loan to include a higher interest payment that would offset the delayed payments. Discussions with the lender have suggested several subtle issues that should be addressed when modeling these data. These comments from the lender may suggest variables that will be useful to you in modeling the performance of loans. (a) Merchants that seek loans from your client often operate in distressed neighborhoods. The lender believes that improvements to the economic condition of the neighborhood (income, jobs, housing, etc) provide an environment in which the merchant will more easily be able to keep up with the target payment stream. The lender suspects that the effects of improvements in the local economic situation are most pronounced in very distressed areas. (b) There is a sense that merchants who are able to pay their employees well could be more stable and a better credit risk. (c) The lender feels that some independent service organizations (ISOs) provide much better (or much worse) customers than others. The lender would very much like some evidence to either support or contradict this suspicion regarding differences among ISOs. Are there particularly good or bad ISOs among those represented in your data? (d) There is a strong belief that overly aggressive commissions could be a red flag because the independent sales reps may have a propensity to push such loans, an example of the principal-agent problem. (e) Location, location, location may be true about loans as well as real estate. It is possible that some areas experienced the same lax underwriting standards for these loans that were common place in the housing market and it is even possible that some ISO’s may be fraudulent at a local level. (f) As the merchant self-reports their credit card cash flow there is the potential for dishonesty. Fortunately, a validated monthly credit card cash flow is also available for comparison. The lender has heard anecdotally that such dishonesty may be associated with bad loan performance but there is considerable internal debate as to whether over or understating credit card receipts is more indicative of weak loan performance. (g) Lenders commonly make use of information from a credit bureau, such as the FICO score, as a means to judge the ability of a borrower to repay a loan. This
information is believed to be particularly useful when dealing with new loans but may not be so useful for repeat loans. (h) It is far from certain but there are rumors that certain types of businesses as captured by their SIC code may already be suffering from huge garnishes from their credit card receipts. Consequently the receipts reported for these businesses may be a major overstatement of the merchant’s ability to pay back a loan. (i) The ability of a merchant to have previously obtained credit and paid it off may be insightful as a predictor of performance, as could the keeping of current accounts in a satisfied state. (j) Past performance is often a good indicator of future success (or failure). It is hard to believe that delinquencies and legal proceedings against the merchant would not in some way be associated with loan performance. When you discuss the estimated coefficients in your regression, you need to describe their meaning, their uncertainty (in the technical summary), and relate them to the range of values of the associated terms. For example, if local employment rates affect PRSM, it would be necessary to know not only the effect of local employment, but also the range of employment rates seen in these data. You must report only one regression model and use it throughout your report.
JMP ANALYSIS In your work in JMP, here are a few recommendations that will produce results that resemble those that appear in the course notes and in class.
Where necessary, construct variables for your model as additional columns using JMP’s formula editor. In the “Fit Model” window, disable “Center Polynomials” by clicking the top left red button of the; the check mark next to “Center Polynomials” should disappear. Removing the check will simplify the appearance of the resulting equation. In the report window use indicator parameterization estimates (red button > Estimates > Indicator Parameterization Estimates) to mimic the effects produced in class. If you would like to look at basic statistics of a variable broken down by, for example, ISO, use: Distribution > Y: variable, By: ISO. If you would like to color a plot by, for example, ISO, use: Rows > Color or Mark by Column…
EXECUTIVE SUMMARY Write your executive summary in a language that conveys to the Chief Risk Officer of the lender what she needs to know in concise language. Provide pertinent advice. When writing the executive summary, follow these guidelines:
a) List all factors that substantially affect the performance measure PRSM. Omit minor factors or mention them only briefly. (That is, do not belabor factors that are statistically significant in your regression but contribute only very little to the estimation of PRSM.) b) Introduce a baseline scenario such as a loan of $20,000 at a 12.5% repayment percentage with a $3,000 payment to the ISO to a merchant that operates a restaurant in a location with population 30,000 in a zip code on the East Coast. The business generates about $15,000 in monthly transactions, has FICO score equal to 600, and has been in business for 15 years. It is quite possible that in your model not all these baseline variables are significant. If so, create a relevant baseline for your chosen model. c) Describe all major risk factors that indicate greater or less risk with regard to this baseline (no confidence intervals). d) If it helps, pick levels for a variable, such as a 500 or 600 FICO score, to exemplify the discount or premium induced by a change in the explanatory variable. e) Use conveniently rounded numbers. In writing your executive summary, please be aware that the Chief Risk Officer intends to share it with two new members of the Board of Directors. These new board members come from commercial banking which has a tradition of assetbased lending rather than lending based on attributes like the past borrowing characteristics of the customer. They are not familiar with regression analysis and are not aware that such models can be used to model and predict risk. Thus it will be most valuable if you frame your findings in a way that these readers will be able to appreciate.
TECHNICAL SUMMARY In a technical summary, you write in a manner intended to speak to your peers. You show them that you performed a thorough analysis and that you interpreted the results competently. This should not be a step-by-step chronology of what you did in JMP, but a summary of the most important steps in logical order. Even in a technical summary it is not of interest to hear, for example, how you used JMP’s formula editor to implement necessary transformations of the data; it is simply assumed that you know how to carry out the necessary work with available software. The narrative in this portion should explain what values were used in the executive summary and how they were rounded. In addition, it should explain what contributions to the model were neglected because their effect on PRSM is too small. JMP output must be important to the narrative of the technical report and be clearly labeled and explained. Do not include graphs unless mentioned in the narrative and their relevance explained.
The technical summary should show and explain the fitted model, term-by-term and estimate-by-estimate. It should mention model diagnostics that were performed and their outcomes, possibly accompanied by plots. Report any loans that you may have excluded and indicate why you did so.
PREDICTIONS This task is similar to that in Question 4c of Installment 3, only for this installment, you have the ability to use a more predictive multiple regression model. As in the previous installment, the data set you received for the fourth installment includes cases that are missing the value of the amount repaid at six months. You are to predict the value of PRSM for these cases. You should create these predictions in JMP and then save the predicted values and identifying case numbers to an Excel file. When you create the out-of-sample predictions for the PRSM scores, be aware that the process that was responsible for the zero PRSM scores seen in-sample, is expected to continue during the out-of-sample period. The format for submitting these predictions is precisely that used in Installment #3. You will submit your predictions by uploading a Microsoft Excel file to the PRiSM web site located at http://apps.wharton.upenn.edu/prism/. The Excel file must have two columns. Columns are named in the first row. In the first row of column A, enter the label “CaseNumber”; Label the second column “Prediction” (without the quotation marks). You can download a sample prediction file from the PRiSM website and cut and paste your own answers into the Excel file. If you are unable to create a file in the required format, see a TA before attempting to submit your predictions.