python – Derek Harlan

How the Tool Works

The logic is simple:

Load the GL file

Identify accrual‑like entries (based on keywords)

Extract a date from the description

Normalize it into a real period end date

Compare it to the current period

Flag anything older than 60–90 days

This is the kind of problem where regex shines — descriptions are messy, but they follow patterns.

Running the Tool

From the parent directory of the project:

python -m StaleAccrualChecker.main gl.csv --current 2025-12-31

This prints a table of stale accruals, including:

property number

description

amount

parsed period

age in days

status

Example:

property_number	description	amount	parsed_period	age_days	status
1001	04/2025 Window Cleaning Accrual	4500	2025‑04‑30	210	stale

Closing

If you’ve ever stared at a GL description trying to guess how old an accrual really is… this tool is for you. It’s small, practical, and easy to extend — and it’s the perfect starting point for building a real accounting automation toolkit in Python.

Do you want to learn how to automate your financial workpapers? Do you want to build MicroApps like this one to bloster your own automation toolkit? Do you want to get started with VBA?

With all the push for AI in the workplace I was curious if I could build a small program using machine learning to perform some useful function for accountants and financial reviewers – as account miscoding came to mind I wrote a program that would predict the account coding of a payable, this could aid a reviewer in determining if the months payable are coded correctly. I made a mock general ledger in Excel and trained a small machine learning model to predict which account each transaction should belong to. If the model’s prediction doesn’t match the coded account, that’s a potential miscoding.

A screenshot from Excel showing a mock general ledger for training data

This model was built in Python 3.9 using Pandas, scikit learn and a couple other modules to support transforming the Excel file into a model the program can work with.

What is Machine Learning

Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed, by identifying patterns in data through algorithms trained on examples like labeled images or past transactions.

It works by feeding data into models, allowing predictions on new data. You see this everyday in recommendations for movies on streaming services, filtering spam emails, or detecting fraud in banking.

Machine learning is basically pattern recognition.

If your GL says:

“Office Depot – pens” → Office Supplies
“Printer ink” → Office Supplies
“AWS cloud hosting” → IT Expense
“Uber to client site” → Travel
“Lunch with client” → Meals

…then the model learns those patterns.

Give it a new transaction like:

“AWS EC2 monthly bill – Amazon Web Services – $260”

and it will say:

“This looks like IT Expense.”

If the GL has it coded as something else, that’s a red flag.

Here’s the Python code:

#! python3
#GL Classifer Bot

import pandas as pd
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

df = pd.read_excel(r"C:\GL_Training_Data.xlsx")

# Load a lightweight embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Combine text fields into one string
df["text"] = df["Description"].astype(str) + " " + df["Vendor"].astype(str)

# Convert text to embeddings
embeddings = model.encode(df["text"].tolist())
X = np.hstack([embeddings, df[["Amount"]].values])
y = df["Account"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

clf = RandomForestClassifier()
clf.fit(X_train, y_train)

def predict_account(description, vendor, amount):
    text = description + " " + vendor
    emb = model.encode([text])
    X_new = np.hstack([emb, [[amount]]])
    return clf.predict(X_new)[0]

print(predict_account("AWS EC2 monthly bill", "Amazon Web Services", 260.00))

the last line feeds in a transaction and the output is it’s expected coding.

A simple walkthrough

Here’s the workflow of how the application works:

1. I loaded my GL from Excel.

Just a simple table with:

Date
Description
Vendor
Amount
Account

2. I converted the text into “meaning vectors.”

This sounds fancy, but it’s basically a tool that turns text like:

“Uber to client site”

into a list of numbers that represent its meaning.

3. I trained a small model to learn the patterns.

It looks at:

the meaning of the description
the vendor
the amount
the account it was coded to

…and learns how they relate.

4. I asked it to predict the account for new transactions.

If the prediction doesn’t match the coded account, I flag it.

That’s it. No deep learning, no giant datasets, no complicated math.

The output with a box drawn around the predicted account

What surprised me

When I only had 8 rows of data, the model predicted everything as “Office Supplies.”
Once I added a few more rows — it suddenly started predicting correctly.

That’s the magic of small ML models: they don’t need much to start learning.

Why this matters for accountants

This tiny example shows how machine learning can help with:

miscoded expense detection
reclass suggestions
anomaly spotting
cleaning up messy GLs
speeding up month‑end review

And you can build the whole thing in under an hour.

The takeaway

Machine learning doesn’t have to be intimidating.
You can start with:

a tiny GL
a few lines of Python
and a simple idea:
“Does this transaction look like the account it was coded to?”

That’s enough to demonstrate the concept — and enough to spark ideas for real‑world automation. This could easily be expanded to use a more complicated general ledger and to automatically search a general ledger under review for correct coding.

Do you want to learn how to automate your financial workpapers? Do you want to get started with VBA?

My book, Beginning Microsoft Excel VBA Programming for Accountants has many examples like this to teach you to use Excel to maximize your productivity! It’s available on Amazon, Apple iBooks and other eBook retailers!

parser.py	Extracts dates from messy GL descriptions using regex patterns. This is where the “intelligence” of the tool lives.
detector.py	Applies the stale‑accrual logic. It takes the parsed dates, compares them to the current period, and assigns a status like ok, warning, or stale.
utils.py	Handles loading the GL file into a DataFrame. This will become reusable across future tools.
main.py	The command‑line interface. This is the file you run to execute the tool.
__init__.py	An empty file that tells Python, “This folder is a package.” Without it, the imports between these files wouldn’t work. This structure might look like overkill for a tiny script, but it sets me up to reuse pieces of this project in future micro‑apps — and it teaches beginners how real Python projects are organized.

Tag: python

Stale Accrual Detector – CRE MicroApps vol. 1

How the Tool Works

The Parser (`parser.py`)

The Detector (`detector.py`)

Running the Tool

Closing

A Simple Python Machine Learning Script to help Accountants with Account Miscoding

A simple walkthrough

1. I loaded my GL from Excel.

2. I converted the text into “meaning vectors.”

3. I trained a small model to learn the patterns.

4. I asked it to predict the account for new transactions.

What surprised me

Why this matters for accountants

The takeaway

How the Tool Works

The Parser (parser.py)

The Detector (detector.py)

Running the Tool

Closing

A simple walkthrough

1. I loaded my GL from Excel.

2. I converted the text into “meaning vectors.”

3. I trained a small model to learn the patterns.

4. I asked it to predict the account for new transactions.

What surprised me

Why this matters for accountants

The takeaway

The Parser (`parser.py`)

The Detector (`detector.py`)