A Simple Python Machine Learning Script to help Accountants with Account Miscoding

With all the push for AI in the workplace I was curious if I could build a small program using machine learning to perform some useful function for accountants and financial reviewers – as account miscoding came to mind I wrote a program that would predict the account coding of a payable, this could aid a reviewer in determining if the months payable are coded correctly. I made a mock general ledger in Excel and trained a small machine learning model to predict which account each transaction should belong to. If the model’s prediction doesn’t match the coded account, that’s a potential miscoding.

A screenshot from Excel showing a mock general ledger for training data

This model was built in Python 3.9 using Pandas, scikit learn and a couple other modules to support transforming the Excel file into a model the program can work with. 

What is Machine Learning

Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed, by identifying patterns in data through algorithms trained on examples like labeled images or past transactions. 

It works by feeding data into models, allowing predictions on new data. You see this everyday in recommendations for movies on streaming services, filtering spam emails, or detecting fraud in banking. 

Machine learning is basically pattern recognition.

If your GL says:

  • “Office Depot – pens” → Office Supplies
  • “Printer ink” → Office Supplies
  • “AWS cloud hosting” → IT Expense
  • “Uber to client site” → Travel
  • “Lunch with client” → Meals

…then the model learns those patterns.

Give it a new transaction like:

“AWS EC2 monthly bill – Amazon Web Services – $260”

and it will say:

“This looks like IT Expense.”

If the GL has it coded as something else, that’s a red flag.

Here’s the Python code:

#! python3
#GL Classifer Bot

import pandas as pd
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

df = pd.read_excel(r"C:\GL_Training_Data.xlsx")

# Load a lightweight embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Combine text fields into one string
df["text"] = df["Description"].astype(str) + " " + df["Vendor"].astype(str)

# Convert text to embeddings
embeddings = model.encode(df["text"].tolist())
X = np.hstack([embeddings, df[["Amount"]].values])
y = df["Account"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

clf = RandomForestClassifier()
clf.fit(X_train, y_train)

def predict_account(description, vendor, amount):
    text = description + " " + vendor
    emb = model.encode([text])
    X_new = np.hstack([emb, [[amount]]])
    return clf.predict(X_new)[0]

print(predict_account("AWS EC2 monthly bill", "Amazon Web Services", 260.00))

the last line feeds in a transaction and the output is it’s expected coding.

A simple walkthrough

Here’s the workflow of how the application works:

1. I loaded my GL from Excel.

Just a simple table with:

  • Date
  • Description
  • Vendor
  • Amount
  • Account

2. I converted the text into “meaning vectors.”

This sounds fancy, but it’s basically a tool that turns text like:

“Uber to client site”

into a list of numbers that represent its meaning.

3. I trained a small model to learn the patterns.

It looks at:

  • the meaning of the description
  • the vendor
  • the amount
  • the account it was coded to

…and learns how they relate.

4. I asked it to predict the account for new transactions.

If the prediction doesn’t match the coded account, I flag it.

That’s it. No deep learning, no giant datasets, no complicated math.

The output with a box drawn around the predicted account

What surprised me

When I only had 8 rows of data, the model predicted everything as “Office Supplies.”
Once I added a few more rows — it suddenly started predicting correctly.

That’s the magic of small ML models:  they don’t need much to start learning.

Why this matters for accountants

This tiny example shows how machine learning can help with:

  • miscoded expense detection
  • reclass suggestions
  • anomaly spotting
  • cleaning up messy GLs
  • speeding up month‑end review

And you can build the whole thing in under an hour.

The takeaway

Machine learning doesn’t have to be intimidating.
You can start with:

  • a tiny GL
  • a few lines of Python
  • and a simple idea:
    “Does this transaction look like the account it was coded to?”

That’s enough to demonstrate the concept — and enough to spark ideas for real‑world automation. This could easily be expanded to use a more complicated general ledger and to automatically search a general ledger under review for correct coding.

Do you want to learn how to automate your financial workpapers? Do you want to get started with VBA?

My book, Beginning Microsoft Excel VBA Programming for Accountants has many examples like this to teach you to use Excel to maximize your productivity! It’s available on Amazon, Apple iBooks and other eBook retailers!