Predicting Tax Lien Volume, Part 1

We finally had some time to start on our project of predicting next month’s tax lien volume.  Predicting the next value in a time series is often difficult, but we figured we would give it a shot and see what happened.

The data for this analysis was extracted from our internal database of all federal tax liens filed over the past 20 years.  We cleaned up the data, ran some SQL queries, and generated our data set of total federal liens filed monthly between January 1990 and May 2014. This gave us 293 months of data to feed into our models.  We also added some attributes we thought might be predictive of lien volume such as working days in a month, unemployment rate, and 3 moving averages. For example, we had to use last month’s unemployment rate to predict this month’s lien volume.  Why?  Because we don’t know unemployment rate until it is released by the government a few days into the following month.  Below is the meta data for the table used in our analysis.  TotalLiens is a ‘label’ meaning that it is the value we are trying to predict.



Using RapidMiner, Excel, and a little R, we whipped up two models.  We won’t bore you with all the details, but we split the data into training and testing sets. Then trained our models with a cross validation, reapplied our models to the training set, and finally applied the models to the testing set to see how they performed out of the sample.  This is not perfect data mining protocol, but it gives a decent sense of how we can use different types of models on this data.  Here is our process in RapidMiner to train and test the two models simultaneously.


The 2 models tested were a classic Linear Regression and a Neural Net. Below are the out of sample results for the two approaches compared to actual federal lien volume.

For both models, the out of sample prediction was off by about 1,600 liens or 14% per month.  That’s not terrible; however, these results are nothing to write home about.  The models may improve significantly if we retrain them every month, to predict the following month.  That would be interesting to test.

Next time we will explore this problem in a slightly different way.  While time series prediction can be difficult, we can always reframe this as a classification problem.  In other words, instead of predicting the number of liens next month, maybe we should try to predict whether the number of liens will go UP or DOWN.  That might prove more useful.  If we know there will be more liens filed next month and liens are correlated to revenue then we know revenue will increase.  In the real world it’s more complex, but this gives us a sense of what is possible with predictive analytics.

What’s New at Extrakt?

ExtraktExtraktData has spent the past several months listening to our customers and based on your feedback we’ve made some changes.  Due to popular request, we now offer the option to purchase our tax lien data online and also sell aged leads.

Customers can choose from four different subscription plans for our fresh tax lien data.  We sell federal and state tax liens filed against businesses only.  As always, phone numbers are included.  Let us know if you use the data for mailers and we can include a specially formatted mailer-friendly data file.  All of our data contracts are month to month so you are not locked into a long term commitment and unused credits do roll over from month to month. 

We also now offer aged federal tax liens.  Aged data includes addresses and totals, but no phone numbers.  You can buy them quarterly or as a bulk download.  We have federal tax lien data going back to 1990.  Such large quantities of data are not only good for sales leads, but are great for running your own analytics.

We hope that you find these changes helpful.  You can download sample data from our Tax Liens page or visit our Pricing page to place an order. Contact Us if you have any questions or suggestions.  We love feedback!

Adventures in Predictive Analytics

In my last blog, Eliminating the Guessing Game of Yesterday, I shared my recent obsession for business intelligence and predictive analytics. ExtraktData’s technologist and I have been experimenting with our data a lot since then.  Our main goal is to develop a model that can predict the number of federal tax lien filings for the coming month.  That means in any given month we can make an educated guess on how many liens the IRS will file the following month.  For us and many of our clients, sales revenue is highly correlated to lien volume.  Predicting lien volume helps predict sales revenue and that’s quite valuable to our planning process. The prediction models are still in the works so I do not have any results to report, but rather some insights into the building process.

The Data Set

The data set for these models consists of federal tax liens filed against businesses between the years of 1990 to 2012.  Federal holidays and weekends were removed to give an accurate count of how many working days each month the IRS employees had to file the liens. We purposefully held back 2013 data, so we could test what the model predicted versus what actually happened in 2013.  As we develop our prediction model we will be testing numerous hypotheses.


  1. Each holiday has a greater impact than just the loss of a workday because people often take extra time off during those time periods.
  2. December and the summer months are generally vacation months.  Therefore, there may be seasonality based on vacation time.
  3. A relationship may exist between the date a business has to file their annual or quarterly taxes and the volume of liens filed.
  4. Number of workdays in a month effects lien volume. Everything at the IRS isn’t automated (the IRS is not staffed by robots) so fewer workdays equals less liens filed and vice versa.
  5. Because of the IRS Fresh Start initiative (implemented at the beginning of 2012), looking only at historical data for liens above $10K is more predictive than looking at all the data.
  6. The above hypotheses combined with other factors (discussed in previous blogs), such as economic cycles and unemployment rate, will allow us to predict future lien volume.

Our experimentations are definitely a process, but one we are very much enjoying.  What’s the point in having all this data if you don’t explore the possibilities?  So keep following our geeky adventures and let’s see what we find.  We’re not promising miracles, but we will share all the same.

Eliminating the Guessing Game of Yesterday

Companies are collecting leagues of data every day.  Typically, that data just sits in a warehouse, with the enormous task of analyzing it and implementing the findings.

A recent business intelligence (BI) class has retooled the way I view data.  Until I took the class, I was only concerned with collecting data and shaping it into a nice, neat, little package to sell to customers.  I am just now diving into the world of BI and predictive analytics, and I’m not looking back.  I had heard the unavoidable buzzwords for years, but never paid attention until now.

BI is shifting.  It’s moved from being something that was nice for companies to have to a necessity for competing in the marketplace.  Businesses can make smart decisions based on data, eliminating the guessing game of yesterday.

With the constant influx of data firms collect daily, using BI to process that data can increase efficiency and effectiveness and better help them understand customers’ wants and needs.  Companies use analytics to save money on marketing by targeting customers and potential customers with advertisements that lead to revenue.  The advertisements they send are “smart”, therefore have higher success rates than advertisements not based on analytics.  For example, Target uses predictive analytics to figure out when women are pregnant and estimate their due date to within a small range. Is this creepy or amazing?  My inner data geek seems to win out on this debate.

After sharing my excitement and passion (my boss calls it an obsession) over the electrifying world of BI with my company’s founder and technologist, we decided to take a closer look at our own data.  We have data going back over 23 years and collect more every day.  That data has to be useful, right?

Over the next few weeks, we’re going to start experimenting.  Maybe we can predict next month’s volume of tax lien filings?  I’m not looking for miracles, but I will be blogging about the results here.  If you like data, keep reading to follow our geeky adventures – the good, the bad, and the ugly!

Connecting to your Sales Leads

Accurate phone numbers can be hard to come by.  Phone number databases sold by the big boys are expensive and outdated.  And searching the web yourself can be time consuming and tedious.  Append rates and accuracy vary widely.  Is it worth all the effort? From my experience, taking more time, finding more hits, and suffering a few more wrong numbers can lead to gold.  If it’s difficult to find a number then that means your competitors are having the same problem. Going that extra mile to append could be your key to success.

So where are your best bets to find business numbers?  From an append rate standpoint, company websites provide the highest percentage of found numbers followed by search engines and business phone sites.  The lowest append rate usually comes from phone number databases at about 30%.  In general, I believe a hybrid approach using multiple sources and looser (but reasonable) matching criteria are vital.  A few percentage point increase in append rate could make a huge difference in your bottom line.  Again the more difficult to connect with a sales lead, the higher probability of a sale.

But let me guess, you send mailers instead of calling leads. Well, the same problem exists here.  Addresses provided with sales leads can be notoriously inaccurate, from misspellings to simply being the wrong address.  Companies often put the address of their accountant, lawyer, or some other person on their formation documents and vital records. The address on the company’s website or one of the business sites is more likely to get your mailer where it needs to go.  Again a hybrid approach rules the day, and sending a couple of extra mailers to multiple addresses can really be worth it.

Ironically, all of this gets easier when you make a lot of calls or a send a boatload of mailers.  Statistics are your friend and simply tracking phone disconnects and returned mailers can quickly help you refine the accuracy of your system.  Our mantra is append more, analyze, and refine. At the end of the day, more connections means more sales.  It’s a competitive industry and the strategies that focus on the hard to find phone numbers and addresses often give companies the edge they need to win more clients.

WHAT YOU NEED TO KNOW: States with the Highest Total ($) Owed

One thing I have noticed is that lien buyers are concerned with total ($) owed on each lien.  That is to say our customers only want leads with totals above a specific amount, usually $10,000.  One might assume that states such as California and New York would have the highest totals.  Debtors in these bigger, wealthier states would, of course, owe more money on average, but surprisingly that is not the case.  In fact, Louisiana and Arkansas have the highest median lien totals.  And Delaware, due to the propensity for larger companies to incorporate there, as well as Nevada, perhaps due to the abundance of casinos, have the highest average lien totals.  Below are 3 graphs depicting the median and average amounts on Federal liens filed between 1990 and 2013.

Blog 5 pic 1Liens filed in Louisiana and Nevada are nearly tied for having the highest median total, both just over $15.5K per lien.










Blog 5 pic 2Since a lot of larger companies are incorporated in Delaware, it is not shocking that Delaware towers above the rest with an average lien amount of just under $400K. Average lien totals are somewhat skewed since a handful of very high total liens can inflate the average. However, I think our hypotheses regarding Delaware and Nevada are interesting enough to share.









Blog 5 pic 3Massachusetts liens have the lowest total out of any state, followed closely by Rhode Island.

The state of Florida is the second largest in terms of both population and number of tax liens filed; however, it is in the bottom three for lien amount.








The volume of tax liens filed is correlated to a state’s population, but a state’s population is clearly not correlated with lien totals. So if lien total is important to you, now you know which states to ask for from your vendor.  However, due to the IRS’s Fresh Start program (see this blog post) and other factors, we believe prioritizing high total liens is much less important than other approaches.  Fresh leads extracted directly from the source and predictive analytics to identify which liens have the highest probability to close are much more important.  With appropriate data and analytics, companies are adapting and finding valuable insights in their data which is translating into real revenue.  Work smarter, not harder!

WHAT YOU NEED TO KNOW: And The State With the Mostess Is…


top statesCalifornia!  14% of all tax liens are filed in California, followed by Florida, and then Texas.  Are you surprised? I mean California does have the highest population of any state, a whopping 37,253,956 people.  California, Florida, Texas, New York, and Illinois are the top 5 states by population.  There is a reason why all of your tax lien leads seem to come from the same select few states.  The pie chart on the left represents the top 9 states, which make up more than half of all tax liens filed in the United States.


Even more interesting, is the fact that while these states may produce the greatest volume of tax liens, not a single one of them is found in the top 5 for highest median or average lien total.   Tax lien filings are correlated to a state’s population, with some notable exceptions… More on this next week.

WHAT YOU NEED TO KNOW: Seasonal Fluctuations in Lien Volume

It never fails, every September I get a call from a client saying “Hey, why is federal lien volume so low?” then every May I get a call “Hey we are swamped, why is federal lien volume so high?”  And every year I explain this is partially due to seasonal fluctuations in lien volume. Most consumers of federal tax lien data are aware of major lien disruptions such as the October 2013 federal shutdown or public holidays but they are less familiar with normal seasonal variations in lien volume.  Below is a chart aggregating the monthly variation of lien volume over the past 4 years.  Note that October is skewed significantly downward due to the 60% drop in lien volume during the October 2013 government shutdown.

avg volume chart

Between 2010 and 2013, the IRS has filed an average of 16,600 federal tax liens per month.  On average they filed 15% more in May (18,900) and 18% fewer in September (13,600).  This swing in lien filings can have significant business implications when you rely on federal tax lien filings for sales leads.  Fewer leads typically means less revenue so predicting lien volume is quite helpful in estimating cash flow. So in the end, my time spent with clients discussing seasonality is well spent.  Developing models to predict lien volume turns out to be a great planning tool.  Let’s just hope there are no more government shutdowns on the horizon.

WHAT YOU NEED TO KNOW: Tax Liens are Counter Cyclical to the Economy

Over the last few months I performed an in-depth analysis of IRS data on federal tax lien filings from 1990 through 2013.  This data mining exercise yielded some very interesting results and revealed some of the drivers behind the volume of federal tax lien filings.  The analysis was so helpful to my company that I decided to share it and so part 2 of my blog series “What You Need to Know” continues.


TimeSeriesThe volume of federal tax lien filings is generally counter cyclical to the economy.  When the economy is doing well, lien filing volumes decrease.  The time series shows us that the number of liens filed declined significantly from 1990-2000 and this trend started to repeat itself again with a steady decrease beginning in 2010.  Both of these time periods fall during economic growth and bull markets.  However, bull market alone wasn’t a strong enough of a correlation since it did not apply in the mid 2000’s. I searched on for other relationships.









CorrelationIt turns out that unemployment rate is one factor associated with economic growth that is also highly correlated with the number of tax liens filed.  On the right is a chart depicting the relationship between federal tax liens and unemployment rate.  The lower the unemployment rate the fewer tax liens filed and the higher the unemployment rate the more tax liens filed.









Bar GraphLooking at the data another way, the chart to the left shows average unemployment rates from 2010-2013 and the total federal liens filed from 2010-2013 by month.

Notice the decrease in both? Changes in unemployment rate tend to be correlated more strongly than market trends, so watch unemployment as a tool to help you anticipate fluctuations in lien volume.  There were over 100,000 fewer tax liens filed in 2013 than in 2010.  This current growth cycle is forecasted by some to last another 3 to 5 years.  In this environment, lien volumes will probably continue to decrease.  Fresh leads and good analytics are vital if your company depends on tax lien filings for success.

WHAT YOU NEED TO KNOW: IRS Fresh Start Initiative

Over the last couple of years, the pace of federal tax lien filings have decreased.  There are many reasons for this, but one in particular to be aware of is the Internal Revenue Service’s Fresh Start initiative.

Fresh Start is intended to make it easier for individuals and businesses who are struggling to pay their taxes.  It significantly increases the federal tax lien filing threshold from $5,000 to $10,000. This is one of several new changes the IRS made to their lien process that went into effect in February of 2011.

The increase of the threshold and the healthy state of our economy means fewer potential customers.  In 2011 the total number of federal business tax liens filed was about 256,000 down to 171,000 in 2012 and down again in 2013 to 150,000.

For those of you who rely heavily on tax liens for sales leads, there are a couple of considerations:

  1. The number of federal tax liens filed is declining which means you need to make the most out of the leads you have.  Finding a vendor with fresh leads is vital.  Also, using data analytics to predict the leads most likely to close is key to helping focus your sales efforts.
  2. You can now assume that most federal tax liens filed are for amounts $10,000 or greater.  So don’t ignore federal tax liens without totals from your vendor, they may be a good source of fresh leads.


The IRS lien process has changed.  Fewer liens are filed but the average totals are greater.  There are many ways to thrive in this environment with fresh leads and good analytics.  And keep in mind that tax liens are counter cyclical to the economy so when the market falls, lien filing volumes will rise again!