Predicting discrete events == expenses

This question is 50% not about Julia.

When I try to predict how much money I will spend in the future, I sum expenses per month for multiple months and base my stats on that set of monthly expenses. But, it feels like I’m losing data by summing per month. Surly there is a better way to model these discrete events of expense (amount of money and perhaps category or even name) in time (randomly or periodicity relative to the month, to the year, other)?

I’m sure what I’m asking here is pretty basic for some of you, but I have no idea what to use or how to use it… What tools do you use for similar stuff?



Not an expert about this…But I would probably use something like Google Sheets or Excel for this. Then break expenses up by:

  • Recurring fixed price (cable bill, rent/mortgage, etc.)
  • Recurring variable priced required (phone bill, electric bill, etc.), probably broken into two categories, fixed monthly cost, and usage cost.
  • Required by somewhat discretionary (food expenses, fuel expenses) (They are required but you could reduce them by using less or buying cheaper.)
  • Emergency/Unexpected (heater breaks, car breaks, something breaks)
  • Fully discretionary (Entertainment)

Summing your expenses like that over a few months would give you and idea about how much each area varies, then possibly look at what you can do to reduce that category.

Not sure where taxes should sit, maybe Emergency/Unexpected because they happen once a year and usually you are budgeting monthly…although if you usually get a refund, then I guess it’s not an expense…

1 Like

I’ve had this question on my back-burner, too. My biggest issue with established budgeting tools I’ve used is that many rely on binning by month, but that sucks for all the reasons you list and more. I’d love to have a tool that semi-automatically amortizes recurring or otherwise known transactions.

I have a crazy google sheet that takes in a table of raw auto-categorized transactions (from Mint, which I want to drop, but still find it useful for this export even though its auto-categorizations suck) and uses that to re-construct a running history of expenses and net worth. I subset by category to see trends and outliers… it’s rough but I appreciate having the historical view, and that helps me identify categories that are ripe for optimization. I’ve also found it essential to do some sort of smoothing in order to actually see reasonable things behind the high-frequency transactional data — and kinda-sorta do a bit of this amortization — but it’s rough and can obscure things. It’d take some doing to detangle my PII from the sheet to share, but I probably could do it if anyone is super curious.

I’ve wanted to move this into a Julia Dashboards.jl project as a way to experiment with the package and make it easier to iterate on. I’ve also done no attempts at projection within Google Sheets — nor would I want to — but it’s something that’d be much more interesting to tackle within Julia itself.


Just thinking out loud here… You don’t have to lose all the information by binning over months, you could form all possible month-long intervals and learn a model that does not predict fixed monthly sums, but rather month-long interval sums. It could take as inputs the last couple of such intervals, plus perhaps a number of week-long intervals plus the same from one year back. It would be like an hierarchical AR model but with the learned coefficients individually convolved with \dfrac{1}{n}\sum_{i=0:n-1} z^{-i} where n is the length of the interval. This polynomial could potentially be extended and taperd towards the edges to account for weird stuff like different months having different lengths etc.

A robust strategy is probably something that exploits the extreme structure in spending habits and identifies and accounts for obviously recurring entries, like rent and salary etc.

1 Like

You could consider a data generating process which is a sum of

  1. periodic expenses x_i by category i, with x_i having some distribution that you can estimate using multilevel methods (across categories). Accounting for seasonal variation with dummies will probably take you a long way.
  2. non-regular events in categories j occurring at rate \lambda_j (Poisson process), each requiring an expense y_j from some distribution (again, multilevel methods will help you estimate this).

This could be a fun exercise in Bayesian estimation. Start simple and expand. Do predictive posterior checks (but I assume you would need years of data for this to be meaningful).

That said, strictly speaking not all “expenses” are exogenous (you make consumption decisions), and durable good investments can also be shifted in time endogenously, so they are notoriously fickle in the data and hard to model.

1 Like

@pixel27, what you’re describing is what I’ve been doing or should aspire to. But that’s still clumping discrete events into months.

@mbauman, I loved Mint when I lived in the US, but here in Sweden there’s no such service. Good categorization can be a great tool, but really hard.

@baggepinnen and @Tamas_Papp, yea, figures you’d come with something waaaaaaaaaaaaaay above my head. But cool! I’m kind of shocked this is not a well established field with finished methodologies and tools. I was imagining something more akin an FFT: just as we can describe an image with the most dominant frequencies, we should be able to describe expenditures…? Expenses that are random and low can be ignored or be approximated by a constant. Expenses that are regular or high need to be modeled.

This is a surprisingly messy problem. The noise driving the system has difficult distributions (it’s not really a friendly gaussian), months have different lengths causing the phase to jump a bit every now and then. FFT for spectral estimation in the standard form (periodigram etc.) makes too many assumptions that are not valid in this context.

1 Like

Just to add my experience from manufacturing: In production you have messy events like “a tool breaks”, a defective item from casting, material changes, a machine breaks down … which all go into your output. You cannot predict or model it.

But you can try to differentiate “special causes” like the above from “common causes” like normal fluctuations in your parameters. You try to get things “under control” by doing statistics and then fitting a gaussian to your “normal” (common cause) variation and setting your control limits e.g. to μ ± 3σ. Thus you get a signal for everything outside the control limits and you treat it as a “special cause” event with appropriate actions like mitigating, sorting out …

This time proven approach was introduced long time ago by Shewhart and then developed and spread by Deming. It is based on Percy Bridgman’s operationalism and C.I.Lewis’ conceptual pragmatism.

Thus you better not try to model everything – otherwise you get lost – and better try to get an operational handle on it.


I am not surprised about this. For your personal finance planning, the benefit (whatever it is) is probably not worth the effort. And most people have no systematic records (though statistics can be extracted from bank statements).

For businesses with a lot of small cash flow items, the central limit theorem probably kicks in (at least as a heuristic), so using some rules of thumb is usually sufficient.

I would treat this problem as a fun opportunity for learning some intermediate and advanced statistics, not as something with a useful outcome. After a lot of modelling, you would learn that stochastic processes are (drumroll) stochastic and the variation is large. Planning is probably more useful for personal finance than modelling.


I completely agree with you (and implement that). The funny thing is that while I was always interested in prediction for my own finances (but again mostly use planning), now that I’m doing my relative’s finances I feel like I need to reach out for something more robust than monthly binning.

Once again, I must point that this forum is insane (thanks to people like you people responding here). I feel like I can come ask what-ever-question and I’ll get a truly high quality response, regardless of how much it has to do with Julia (note my disclaimer at the beginning).

Thanks again!!!


I do that with ledger already. There is feature called effictive dates and the docs have this example:

2008/10/16 * (2090) Bountiful Blessings Farm
    Expenses:Food:Groceries                  $ 37.50  ; [=2008/10/01]
    Expenses:Food:Groceries                  $ 37.50  ; [=2008/11/01]
    Expenses:Food:Groceries                  $ 37.50  ; [=2008/12/01]
    Expenses:Food:Groceries                  $ 37.50  ; [=2009/01/01]
    Expenses:Food:Groceries                  $ 37.50  ; [=2009/02/01]
    Expenses:Food:Groceries                  $ 37.50  ; [=2009/03/01]

So this means that on the date above (2008/10/16) a payment was made to a food co-op, but this payment is for a service that covers several months. If you look at the monthly expenses for these months, it will only show the partial amount.

1 Like

I’ve thought a lot about this as well. I think the approach you take depends on what your goals are. For example, categorization (gas, grocery, etc) is not high on my priority list. I would care more about forecasting cash flows. Therefore, I would tend to think more of the characteristics of the transaction in fixed recurring, variable recurring, and everything else.

My best guess is that for most people, recurring inflows/outflows probably constitute the bulk of things anyway. So there it is probably less about modeling (perhaps could get fancy with things like seasonal energy usage?) and more just about proper record keeping.

1 Like

I used ledger a lot. It was awesome. My main conclusions were:
You either have a live feedback from what you spend to how much you have left from what you’ve budgeted for each micro category (like standing there contemplating if you should buy this cake or not, checking how much money you have left for cakes this month, and making a decision). Something that could be simply be approximated with a wallet with cash separated into pockets, one for each category.
Or you simply put away a bulk of money you aim to save at the beginning of each month, and if you end you month on a plus you save more the next. You keep increasing the amount of money you save each month until you end a month on a negative. Then you know you’re good. This can vary of course. I forgot to mention the buffer, you have a buffer with a bunch of money that you use if your checking account goes empty. Next month fill it up again.

The second approach is super simple and achieves maximal saving with least work. You just need to be sensible and not waste too much. Seems to have worked for me.

1 Like