Statistical Analysis for Dataset

Statistical Analysis for Dataset

Closed - This job posting has been filled and work has been completed.

Job Description

I am looking for an experienced statistician to help analyze a loan
portfolio. The dataset has close to 75k in loan records with 42
attributes per loan. My interest is in a subset of the full dataset:
the number of loans that are 31 days past due or in default and were
originated since 2011 (5K records). In order to properly analyze this subset I
will provide the full datasheet. In essence, I am trying to
understand the characteristics of the loan that are currently
problematic in order to avoid similar loans in the future.

In order to accomplish this goal I am trying to identify the
combination of variables (loan attributes) that are statistically
valid in identifying future loan defaults. Ideally, I also would
like to understand the weight of such factors in determining the
defaults.

A couple of comments:
-I am not interested at this point at entertaining analysis regarding
profitability or return; therefore, staying away from risk – reward
analytics is preferable;
-I am particularly interested in identifying loans that default within
the first 6 months, 12 months and 18 months.
-Finally, some loan attributes are discrete (state of residency, loan
purpose), while others are represented by continuum variables
(debt-to-income ratios, revolving line utilizations). If it
simplifies the analysis, I am fine with creating ranges for the
continuum variables.

I welcome your inputs in terms of the outcome of your analysis, but at
a minimum, here is what I want to receive:
-a descriptive analysis of the loans that i) were issued starting in
2011 and ii) are currently in a past due status by 31 days or more, or
in default
-a list of the top 5 to 10 variables that are more relevant in
determining weather a loan will default (for example, State, % Credit
Line utilization, # of inquires, FICO)
-a combination of factors that, after conducting a hypothesis testing,
could be use to infer default with a high degree of probability. I
suspect there would be various combinations (ie: Texas resident with
FICO of less than 700, or % Credit utilization xx with loan values
over $20K).

I would be happy to discuss in more details.
Please include an estimate of the your price for this project.

---
Skills: analysis