How to troubleshoot any Artificial Intelligence or Machine Learning system ever. No exceptions, throw out the black box. Stop sounding confused and scaring people.

This is not a tutorial it is commentary and maybe a little informative....So… In summary.

  • Statistical models are used to generate an analysis and decision making framework when we don’t have complete understanding of how a thing works

  • Artificial Intelligence is just an automated version of statistical model building

  • Tune and test your models and get a confidence level

  • If it was perfect data and a perfect model, you wouldn’t need statistics or AI...GET OVER IT OR QUIT SCIENCING

  • Don’t leave your models in training mode when you are using them for important decision making or classification tasks…

    • Doing this is like cooking but not measuring anything or tracking what you threw in the pot.

    • Use a mature model so that you don’t have to log a copy of the model after every iteration/evaluation. Save that for while you are doing development and building the model.

  • Add logging to your system so you can troubleshoot

    • Inputs

    • Outputs

    • The model has all the in-between variables and their respective weights

    • If you truly need an adaptive model, log a copy of the model. Go with god on this one though and if you think the data logging is bad, try logging every micro change in weight to a Neural net as it tunes…. But let me clear something up. YOU DON’T NEED AN ADAPTIVE MODEL. Just delete all of your training copies and settle on the best one based off of your solid understanding of statistical probability and risk associated/accepted with that model.

    • Or don’t and just call it a black box, destroy your own credibility, and scare the shit out of flat Earthers.

How to troubleshoot any Artificial Intelligence or Machine Learning system ever. No exceptions, throw out the black box. Stop sounding confused and scaring people.

First, you need a fundamental understanding of model building in general in order to trouble shoot an AI system. This stuff isn’t easy and this article isn’t comprehensive, but this will give you a high level how to.

1. What is an statistical model? (If you don’t care or feel this isn’t important, you can skip strait to an article on a different website because I don’t have time for you)

A model is just a function for taking data/inputs and outputting a useful bit of information. A math equation. (Monday temperature * .2) + (Tuesday temperature *.8) = Wednesday temperature. Don’t use that model it’s total crap.

Sometimes you multiply the variables by each other, sometimes you pull in a constant like pi, or the heat loss ratio of electricity going over a wire per mile, etc. but at the end of the day it’s just data coming in, going through a calculation, and then producing an output.

Some models are good and perfect predictors. i.e. (Math test score 1 + Math test score 2 + Math test score 3)/ 3 = my final math grade score. If I have all three of those variables I guarantee you I know my final math grade. Some are not quite so clear or debatable, but they work = Fuzzy tail + long ears + buck teeth = lama (obviously)

Others are really shitty.. like my weather model above, or most of the ones coming out of political analytics systems. Truman won for the reacord... and if you are going to print a bunch of newspapers, you should really be more confident in your model...

For perfect models, standard computer programming already has that solved. Just write a function with a loop and some conditional statements.

For imperfect models you have to lean on statistics and probablistic tools. There are literally hundreds of tools in an statisticians toolbox for manipulating, transforming, and tuning data system into a usable form so that you can then generate a mathematical process to make a decision based on that data. They follow the scientific method of rigor, get tested, cross tested, back tested, front tested, etc. until a level of confidence is reached and then the model is utilized for decision making. THIS IS CALLED THE SCIENTIFIC METHOD AND YOU TRUST EXPERTS THAT BASE THEIR KNOWLEDGE ON IT ALL THE TIME. It’s how we got people on the moon, it’s how all our drugs get past the FDA, it’s how we test scientific hypothesis… it’s all statistics and probability of model completeness.

And it’s how we build AI systems..

What are the steps to generating an AI system?

I’m sure people will rip me and this process apart but at least I’m putting it out there you armchare sitting Monday morning quarterbacks.

  1. Identify what you are trying to accomplish

    1. I want to predict

    2. I want to identify

    3. I want to coordinate

    4. others I'm sure.

  2. Get relevant data - You rarely have all of the data to make a complete equation… thats why we use statistics and sometimes it won’t work because you don’t have good data or you can’t get it into a format you need. In order to apply the methods.

    1. What sensory data to I have available for my model to use?

      1. time series data

        1. Useful for everything

      2. static records

        1. Only useful for classification tasks (imo)

  3. Define your inputs –

    1. Age, weight, gender, blood type, intelligence measure, race, religion, type of baby lotion preferred

    2. price of tea in china in year 1, 200, 2010, 2011, 2012

    3. Close of business day customer counts

    4. Price of the S&P 500

    5. Consumer confidence rating of baby oil brands

  4. Define your outputs – If you don’t have a goal, what are you doing here man?

    1. Age of Keith Aumiller on the day after tomorrow.

    2. Price of Apple stock in two weeks

    3. Turn left or right at the stop sign

    4. Say the word “Tubular” or “Awesome”

  5. Define your methodology

    1. Logistical regression is perfect for this.. (Then dont’ use Machine learning)

    2. Neural nets seem cool. (Yea, but do you really want to spend 15 hours training a neural net to predict my age the day after tomorrow by using the consumer confidence of baby oil?)

    3. Decision tree. (is this really an AI thing? I mean isn’t this just the 20 questions game I played as a kid)

    4. SVM (Just draw a line through the scatterplot and call it good)

    5. whatever.

  6. Train your model

    1. Once you have done all your pre-work you have a structured, relevant data set, and you pick the appropriate tool for the job, you kick of the training algorithm where the real work gets done and you get a trained model as an output.

  7. Test your model

    1. Use statistical model testing processes on your Machine learning AI system to see if it works and does what you think it does.

    2. It’s the scientific process. Cross Validate your model! Does it do what you want it to do? Is it a good indicator of what you want it to be? Did you really just prove that your data is crap and isn’t useful for doing the job you are trying to do? That is common. Don’t force a model out of crap data.

    3. YOU NOW HAVE A BLACK BOX. It is your responsibility as a scientist, and expert in machine learning to turn it into an understood model via reverse engineering the weights on all of the variables from output back to input with all intervening variable that is utilized as a holding place for a calculation….

    4. Or realize it doesn’t really matter and that if it works with 99.999X% probability, then why worry about it. Most experts can’t explain why they come to a conclusion, why should your models? The Drug companies don’t need to explain the 17k people that die from pharmaceutical opioid Analgesics every year, so why are you freaking out that I can’t tell you what layer 7 node 8 of my 30 layer Neural net represents “in the real world”.

  8. Use your model

    1. This is an important differentiation. THE MODEL IS NOT CHANGING AT THIS POINT…. don’t let a model that is self tuning run in a system unless you want your teenage twitter bot to turn into an evil nazi bitch very quickly….

    2. Use a fixed model that isn’t continuing to change while in production. If you find a problem with the current one, continue to tune it and train it and then push it to general use only after it reaches an acceptable level

Where is the black box and how do I get rid of it? Mostly in your own head. Just because you don’t understand something doesn’t mean it isn’t understandable. If you take the time and work through the model you can find out how much weight each of the variables has and how they interact to come up with any possible output. If you have a fixed model (the weights aren’t still being tuned in training mode) and you know the inputs, and you know what the output was….. DO THE MATH, trace through the model and see where it failed/how it came to that solution. If it was wrong it is most likely because your you have just hit a scenario that your model hasn’t experienced before or it wasn’t penalized appropriately for.

But Keith, if you only run a finalized model then if it makes mistakes it’ll make the same mistake every time when in that scenario….. yea it will. But you know what it won’t do? Turn into a crazy nazi sex bitch when I intended it to be a nice teenage girl bot.

If you leave an AI system in training mode, it will eventually turn into ‘not the model’. If you don’t have good constraints around it you are setting yourself up for failure. And if you have good constraints around it you know how it works. IT WORKS WITHIN THE PARAMETERS YOU SET SILLY!

Don’t generate unconstrained models you goof ball, or if you do, use them as exploratory tools and then switch them out of training mode when you are using them for actual production systems and they are solving for whatever problem you are trying to solve X probabilistic time you use them because again, they aren’t perfect. But neither are human experts.

The good news is that if an AI system makes a mistake, we can issue an immediate death sentence via the delete key and just generate a new one...or put it back into training mode and tune it some more.

5. Thanks Keith I now realize that mostly my problem with AI is that I am scared of new things and buzz words that make things sound mysterious. I’ll trust the scientific method and statistical analysis from now on.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.