The post Is It Possible to Predict The Lottery? appeared first on Code Highlights.

]]>After learning about Symbolic Regression with Genetic Programming, one of the first things I thought about applying it towards was the lottery. I thought, if all the algorithm needed was some data points in a time series, that would fit the lottery problem perfectly. It turns out, the whole random thing put a bit of a kink in my plan.

Obviously, if the lottery could be predicted accurately and consistently, it wouldn’t be around for long as the states would lose money. The national lotteries (in the United States) use a mechanical system, which makes if fairly random, and they use a format that makes the odds extremely hard to win. (As of this writing, the odds of winning the Powerball jackpot are 1 in 292,201,338).

So then, why should I bother trying to predict it? Well, I’m not going after Powerball or Mega Millions. The are really random and the odds are too great to eek out any kind of meaningful advantage (more on that later). Instead, I focus on the smaller state lotteries that are computer generated. Yes, if you didn’t know, many small lotteries don’t use a mechanical setup that spins the balls around. They use a computer program to randomly pick the numbers. The problem with this is, if you know about software and computers, that you can’t achieve true randomness with a computer. The best you can get is pseudo-randomness.

Now, many will argue that the pseudo-randomness is “good enough”, and they might be right, but when the odds are so much better than the Powerball, that, I believe, might be enough to create an advantage to find a pattern or strategy to predict a drawing.

Before I get into my initial research and findings, let’s go over some “lotto picking theory”.

People actually spend a lot of time trying to find patterns in the lottery, or anything really that would allow them to predict a number or two. What has emerged is some really interesting theory on how to analysis lottery numbers.

I live in Washington state, so let’s use the Washington Lotto, which uses a Pick 6-49 format (pick 6 numbers from 1-49).

The drawing from Monday, November 14, 2016 was: **09-19-31-32-43-46**

If I asked you, what kind of properties could you extract from this set of numbers, what would you come up with? Here’s a few:

- Sum of numbers
- Width of numbers
- Odds / Evens
- Has consecutive numbers
- Area of Convex Hull created if numbers were put into a 2D coordinates (for 49 numbers, a 7×7 2D grid works well).

Using just these five, we get the following:

[table id=2 /]If you applied this to all drawings in the past three months, you’d get a time series of data points, which is the input model for symbolic regression.

Note that the Area of Convex Hull is an idea I came up during my testing and was not able to find anyone, anywhere doing this, so I presume it to be a novel approach.

So, I ran my symbolic regression tool on a bunch of time series data using properties derived from my draw analysis. I used more properties than I showed above, but you get the idea. I tweaked my tool to add a prediction tester, or what is more commonly known as out of sample testing. Along side the symbolic regression outputs, I used a quick random pick generator as a benchmark.

So, did symbolic regression produce functions that predicted the next data point? In short, the results did not perform much better than the random pick generator (only ~1% better).

Determined to find an answer, I looked at the lotto draw history. I wrote out the numbers by hand. I started drawing circles and lines, and shapes everywhere, trying to extrapolate some kind of pattern. It always looked like I was close, that there was something, but nothing definitive. I have definitely veered off from letting Genetic Programming find the solution, to me taking over and doing it myself.

I had eventually moved down to the Match 4 lotto. Match 4 is pick 4-24 (pick 4 numbers from 1-25), with a jackpot odds of 1 in 10,600. This is even better than the Washington Lotto, though the jackpot prize is $10,000 instead of $1,000,000 (base). But, you have the chance to win Match 4 everyday, as opposed to only 3 times a week for the Lotto.

Here I thought with Match 4’s much better odds, maybe the Genetic Programming algorithm had a better chance to find the function with my limited hardware (I’m running an old Athlon X6 1055T, 8.0 GB RAM). The results were the same; symbolic regression predicted no better than random guessing.

So again, I played with the numbers by hand, but then I looked at a different draw property, something I hadn’t focused on before. It’s called Hot Numbers.

The critics out there are now really shaking their heads and throwing up their hands, thinking I’m a lost cause at this point, but bear with me.

When I say hot numbers, in general, it’s the set of numbers that have be drawn in the past *n* drawings, where *n* can be anywhere from 2 to 10. The numbers that are left are considered Cold Numbers.

For Match 4, I experimented with *n* to see what was most optimal for my new strategy. Ideally, the set of hot numbers is much smaller than the set of cold numbers, and for computer generated lotteries, they tend to draw hot numbers more often than cold numbers (e.g. Powerball and Mega Millions). This appeared to be the slight advantage I was looking for so using just the Hot Number property on a drawing, I analyzed over a year of drawings on how many drawings had 0,1,2,3, or 4 hot numbers, and so on. This was the result:

Unique Numbers: Start Date: 11/21/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 6, Window Count: 70 Avg: 16.3428571428571, Max: 20, Min: 12, All: 24, {UQ/ALL}: .681 Start Date: 11/21/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 5, Window Count: 71 Avg: 14.7887323943662, Max: 18, Min: 12, All: 20, {UQ/ALL}: .7395 Start Date: 11/21/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 4, Window Count: 72 Avg: 12.8055555555556, Max: 16, Min: 10, All: 16, {UQ/ALL}: .8000 Start Date: 11/21/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 3, Window Count: 73 Avg: 10.4246575342466, Max: 12, Min: 7, All: 12, {UQ/ALL}: .868 Start Date: 11/21/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 2, Window Count: 74 Avg: 7.51351351351351, Max: 8, Min: 5, All: 8, {UQ/ALL}: .938

The data above is the result of creating a sliding window “Window Size” of size *n* and then determining Max, Min, and Average of Hot numbers. At a Window Size = 6, we see that all 24 numbers of the Match 4 set have shown up as a hot number at least once during this time frame, and the average size of the hot number set was ~16. Meaning, if I were to buy tickets for each combination of hot numbers, nCr => C(n,r) => C(16,4) = 1820. At $2 per ticket, that’s a bit too much for to spend on a daily basis. What about *n* = 2? Even with an average of 8 hot numbers, C(8,4) = 70, so 70 * $2 = $140 a day, which is still pricey.

The next step is to find out the optimal value for *n* with respect to the average set size for Hot Number.

Hot Numbers: Start Date: 1/1/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 5, Window Count: 395 Avg: 2.39240506329114, Max: 4, Min: 0 | (Cnt:Tot:%) => (0:7:0.01772152), (1:61:0.1544304), (2:140:0.3544304), (3:144:0.364557), (4:43:0.1088608) Start Date: 1/1/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 4, Window Count: 396 Avg: 2.02272727272727, Max: 4, Min: 0 | (Cnt:Tot:%) => (0:22:0.05555556), (1:90:0.2272727), (2:166:0.4191919), (3:93:0.2348485), (4:25:0.06313131) Start Date: 1/1/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 3, Window Count: 397 Avg: 1.63476070528967, Max: 4, Min: 0 | (Cnt:Tot:%) => (0:46:0.115869), (1:126:0.3173803), (2:160:0.4030227), (3:57:0.1435768), (4:8:0.02015113) Start Date: 1/1/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 2, Window Count: 398 Avg: 1.14321608040201, Max: 4, Min: 0 | (Cnt:Tot:%) => (0:98:0.2462312), (1:175:0.4396985), (2:96:0.241206), (3:28:0.07035176), (4:1:0.002512563) Start Date: 1/1/2015 12:00:00 AM, End Date: 2/4/2016 12:00:00 AM, Window Size: 1, Window Count: 399 Avg: 0.606516290726817, Max: 3, Min: 0 | (Cnt:Tot:%) => (0:195:0.4887218), (1:168:0.4210526), (2:34:0.08521304), (3:2:0.005012531), (4:0:0)

A description of the report above:

- Window Size: This is
*n.* - Window Count: The number of drawings that occurred from Start Date to End Date.
- Avg: The average number of Hot Numbers for each drawing.
- Max: The maximum number of Hot Numbers to show up for any drawing.
- Min: The minimum number of Hot Numbers to show up for any drawing.
- (Cnt:Tot:%): This is a stat of each type of drawing based on how many hot numbers it had, so for a Window Size of 2, and for drawings with 1 Hot number, there were 175 total drawings which represented 43.96% of all the drawings in the time frame: (Cnt:Tot:%) => (1:175:0.43969…)

Based on this data, I zeroed in on the Window Size (*n*) of 4 because there were 25 occurrences of all 4 numbers being hot numbers. And the average set size for *n* = 4 was about 13, that’s 715 tickets * $2/ticket = $1,430, but if I could predict when the 25 occurrences of 4 Hot Number drawings, then it wouldn’t matter ($10,000 > $1,430).

So I charted the 4 Hot Number occurrences to try and find a pattern. There seemed to be pair pattern where when one 4 Hot Number draw occurred, a 2nd one occurred in the next few days. This wasn’t consistent enough to be a predictor though. Then there was something odd about the Sum of Line property in comparison to the previous draw. Many of the times that a 4 Hot Number draw occurred, it’s Sum of Line was greater than the one before it. This wasn’t 100% of the time, but it happened often enough that I thought I could test a strategy around taking the 4 Hot Number draws and filtering them down based on their Sum of Line to reduce the daily costs.

At this point, my goal changed from trying to predict a jackpot, to going after profit. Could I come up with a strategy for daily play that minimized my costs, but came away with enough jackpots to turn a profit over time?

To cut this short and skip more details, I did come up with a strategy that simulated a profit from 2013 to 2016. Prior to 2013, there was a distinct change in my simulation results that always produced a loss. My only conclusion was that the Lottery’s number generator changed sometime in 2013 to make this strategy start working from 2013 onward.

The strategy went like this:

- Get the set of hot numbers with
*n*= 4. - Generate all ticket combinations with the set of hot numbers derived from (1).
- Find the Sum of Line value for each ticket.
- Find the Area of the Convex Hull for each ticket.
- Filter all tickets from (2) such that: Sum Of Line (Min, Max) => (32, Prior Sum), and Area (Min, Max) => (2, 3.5)

Applying this strategy for Match 4 on time frame 1/15/2015 – 2/5/2016 go the following results:

Total Winnings: $68,326 Total Costs: $51,478 Profit: $16,858

I tweaked the parameters and got even greater profit, but with more costs. The best being $53,128 in profit but the cost of $94,286. I also ran this on different time frames such as all of 2015, 2014, 2013, and 2012.

Since the strategy didn’t work prior to 2013, I decided if it changed before, it could change again, so I didn’t bother trying it. Also, having to spend $50,000 – $100,000 a year to make a profit seemed a bit crazy to me and I wasn’t going to test it with real money to find out if it would work.

Now, if I could figure this out for the lottery, predicting the stock market should be a lot easier, right?

The post Is It Possible to Predict The Lottery? appeared first on Code Highlights.

]]>The post Symbolic Regression With Genetic Programming appeared first on Code Highlights.

]]>Enter Genetic Programming.

With Genetic Programming, we can use Symbolic regression, which is a regression without a model. It will discover the model from the data during the analysis. Compared to symbolic regression, the other classic regression techniques are more like optimizations since they are just trying to find the best coefficients to the variables that are predetermined.

Using Genetic Programming, we first determine what mathematical functions we want to use, (e.g. +, -, *, /, sin(), cos(), ln(), etc.). Then, we determine the terminals, which are our variables and constants, (e.g. x, e, PI, {1..5}). These serve as our building blocks for constructing the GP parse trees. The Genetic Programming algorithm will discover the function that best fits the data using an error function such as root mean square as part of the fitness function.

When I was first teaching myself Genetic Programming, the Symbolic regression problem is what really attracted me the most and got me to understand how GP worked. Koza’s book, “Genetic Programming: On The Programming of Computers by Means of Natural Selection” was a great first book and is widely referenced on this subject.

Let’s look at an example of symbolic regression. I’m going to use my Genetic Programming library and I already have a symbolic regression problem app I created on top of it.

The function we’ll try to solve for will be the one from Koza’s first book on GP:

x^4 + x^3 + x^2 + x

For the parameters I’ll use the following:

- Function Set: +,-,*,/,sin,cos,ln
- Terminal Set: X
- Generations: 150
- Population Size: 500
- Crossover rate: 90%
- Mutation rate: 10%
- Reproduction rate: 10%
- Fitness cases: 20 data points from the target function, evenly sampled from the interval [-1,1].

Here’s a log output of the first generation:

New Best Individual. Gen (0) : add (add (mul (X, mul (X, X)), X), div (Sin (sub (X, X)), div (Ln (X), div (X, X) ))) Fitness:(Hits:Trials) : (4:20), Adjusted Fitness: 0.663302986469959, Standard Fitness: 0.50760666

This actually simplifies to: (X^3)+X

At generation 39, we get our best of run individual, hitting all 20 fitness cases with a very small overall error.

Best Individual. Gen (39) : add (add (add (mul (Sin (Sin (Ln (X))), mul (X, X)), X), div (Sin (sub (X, X)), div (X, sub (mul (X, mul (Sin (X), mul (X, X))), div (div (add (Sin (X), mul (X, Exp (X))), div (Cos (X), Exp (X))), Sin (Cos (Exp (X)))))))), mul (X, mul (add (mul (Sin (Ln (X)), mul (X, X)), X), X))) Fitness:(Hits:Trials) : (20:20), Adjusted Fitness: 0.999934421216811, Standard Fitness: 6.55830840476038E-05

Is this a perfect solution? No, but it’s pretty close, and we didn’t tell the algorithm what the function looked liked beforehand. We fed it only some sample data of the function output, and it determined on its own a function that matches that data, approximately.

If we ran the algorithm a few times, it is possible for it to find the 100% correct solution, but since the algorithm uses randomness, the ideal solution is never guaranteed (and that’s OK!).

What if we had data where there was no source function? Having an approximate solution at 99%+ correctness sounds great.

The post Symbolic Regression With Genetic Programming appeared first on Code Highlights.

]]>The post Will Robots Take Over My Job? appeared first on Code Highlights.

]]>Today, we have digital assistants (Siri, Cortana, Google Now, Amazon Echo), driver assisted systems (Adapted Cruise Control, Blind Spot Monitors), search engines (Google, Bing), video game opponents, among others.

There are even less noticeable uses of machine learning that people benefit from everyday. When you shop online at big retailers such as Amazon and Wal-Mart, they have recommendations of products for you to buy. This is done through machine learning.

Spam filters in your email inbox are driven by machine learning. In the early days of email, it was much easier to tell spam and phishing emails, but as spam filters progressed, so did the sophistication of malicious emails, and the volume sent. Automation was the only practical way to handle the problem. Now, top free email providers like Google and Microsoft are trying to improve the email experience by determining which email is important to you and filing away the “noise” in another folder.

I’ve shown a steady progress of Machine Learning and A.I over time. In the present (2016), we’ve reached a point where A.I can start taking over some of the more menial or repetitive tasks in people’s jobs. Let’s look at some examples:

- Legal Research – Lawyers often need to do research on the law to help them with their work, such as finding a legal precedent that can be used in trial. To a computer, the law is just words and combined with A.I, it can be interpreted for meaning and comparison. IBM’s Watson, just this year, offered legal research as a service.
- Taxi Driver / Chauffeur – The biggest news in A.I in the last couple years is self-driving cars. Google made headlines with it’s self-driving cars and now many other companies are joining the race to commercialize it. There are self-driving cars on the road right now, but they still have operators behind the wheel just in case.
- Secretaries / Personal Assistants – The Digital Personal Assistants are the first step in automated the tasks that personal assistants do.
- Factory Workers – This has been happening for decades now. If you look back at the factories for the first cars, 100’s people were standing shoulder to shoulder, each person handling one small, specific task. Over time, robots have replaced them with people overseeing the process. There are still many tasks that don’t have a robotic solution to them, yet, so people are still needed, but it’s only a matter of time until a robot is made.

In the short term, we’re going to see a continuation of specialized A.I systems making their way into the workplace in the form of specialized software run on computer, like what we see today. Think of doctors consulting their computers to get a 2nd opinion on a diagnosis for one of their patients with it’s own reasoning. The doctor could compare the intelligent system’s diagnosis with his or her own before giving a final report to the patient.

A good reason for why A.I. will still remain as a companion instead of replacing the person is because of hardware. You can’t ask the computer to walk down the hallway, and grab a cup of coffee for you. Only recently have we seen robotic servers in select restaurants in Asia, and they are basically line following robots with a camera and microphone. We are a long ways from likes of Data from Star Trek : TNG.

Still, some jobs can be replaced in the short term, such as taxi drivers, because the physical challenge is not a fundamental leap: we already have cars, we just need to retrofit them with more advanced sensors and computers with smart A.I. We don’t need a robot to sit in the driver seat.

If your job does not require being a people person or building anything in the physical realm (e.g. construction worker), that is also a prime target for replacement as the other major hurdle for A.I is social interaction. Yes, there are chat bots today, but I’m talking about face-to-face contact, such as what therapist and doctors give. This is not easily replaceable and even if the technology existed to have a fully human-like robot, if the patient knew, they might still feel uncomfortable trusting them. There is something to be said about the human touch.

Eventually, in the distant future, we will have very human like machines (think AMC’s Humans, as opposed to the Terminator), and the ramifications of a generally intelligent machine will be a fundamental change in our culture and economy. Would you trust a machine to be your police or politician even if they could it and do it without bias or error?

How would people make money to live if all jobs could be done by a machine? This is worthy of another post but the concept of universal basic income would have be part of the solution.

The post Will Robots Take Over My Job? appeared first on Code Highlights.

]]>The post Genetic Programming: The Automatic Invention Machine appeared first on Code Highlights.

]]>Genetic programming isn’t talked much as other artificial intelligence techniques for a few reasons, and I’ll give my thoughts on that later, but it is very useful at tackling search and optimization problems from wide angles. The algorithm will come up with solutions that you may never have thought of before, and that’s what’s most exciting about it.

Before I give an example of what Genetic Programming can do, let’s first cover some basics on how it works.

- Create an initial population of individuals.
- Evaluate the fitness of each individual with a numerical value (whether higher is better or worse depends on the problem being solved).
- “Evolve” the next generation of individuals by selecting individuals of the current population by fitness (there are many types of of fitness based selection methods), and apply the genetic operations on them.
- The resulting individual(s) are then put into the next generation.
- When the next generation reaches the desired population size, it replaces the current population.
- Repeat 2. to 5. until an individual with the desired fitness value is found, or some other exiting criteria (max generations, elapsed time, fitness convergence in the population, etc.)

As you can see with step 6. this will run in a loop, searching for the best solution. You could parallelize this algorithm very easily either by splitting up the fitness evaluations or splitting up the population into demes (islands).

Some definitions from the steps above:

*Population*: The set of individuals the algorithm processes.

*Individual*: A single solution to the problem, and is typically in the form of a tree data structure.

*Generation*: A single iteration of the algorithm. This involves: {Fitness evaluation -> Selection -> Breeding}.

*Fitness*: Like in nature, individuals are more or less “fit” based on their environment. In Genetic Programming, the problem and it’s fitness function determine how fit an individual is which is usually some quantitative value used for comparision purposes.

In basic Genetic Programming, individuals are expressed as Parse Trees. This allows the individual to really evolve each generation without being constrained by a fixed shape or size, like an Array a la Genetic Algorithms. The individual’s tree will change to meet the fitness pressure of the algorithm.

More advance techniques will add more functionality to the tree, such as ADFs (Automatically Defined Functions), or with architecture altering routines, the idea of numerical resulting branches will be added. More trees and data structures will be added to the core tree, (also known as the result producing branch).

There are many types of problems that Genetic Programming can be applied to, but it really shines with search and optimization problems. Problems that are not well understood or no good solution is known to exist are good examples of where GP has been applied and succeeded. Because GP based on randomness, the algorithm will find solutions that a human may never have thought of.

Here are some examples of problems that Genetic Programming can be used for:

- Image processing
- Financial modeling
- Time series and regression analysis
- Industrial Process Control
- Computational Chemistry and drug modeling

John Koza listed in his books on Genetic Programming several examples of human competitive results produced by Genetic Programming. I’ll reframe from listing them all but a few examples are:

- Creation of a sorting network for seven items using only 16 steps.
- Creation of a competitive soccer-playing program for the RoboCup 1997 competition.
- An evolved antenna that was deployed on NASA’s Space Technology 5 mission.

I have written my own Genetic Programming library and applied it to a few problems such as: n-bit multiplexer, n-bit even/odd parity, linear regression, stock trading strategies. I will likely post future articles that go deeper on some of these problems, but for now, I’ll show the n-bit parity problem.

A Boolean parity function is a function that returns 1 if the input vector has an odd number of 1’s. This makes it more specifically an “odd” parity function. If it returned a 1 for an even number 1’s in the input vector, it would be an “even” parity function. See Wikipedia for more details: https://en.wikipedia.org/wiki/Parity_function

I chose 6 bits for the input vector, and “odd” as the type of parity, so I’ll evolve a 6-bit odd parity function. The type of logic gates I’ll use are: AND, OR, NOT, NAND, XOR. I threw in NAND and XOR because they’re not really needed when you have the first three, but to show that Genetic Programming can evolve a solution with the mix, I’ve added them anyways.

Here’s a screen clipping of my console output.

In the output above, you’ll see two different solutions shown. The top-shown solution appears after the text “New Best Individual. Gen(35)”. The program found a new best solution based on fitness at then current generation 35. The D0..D5 are the 6 input bits.

After the individual’s expression tree is printed, it shows “Fitness:(Hits:Trials) : (63:64), Adjusted Fitness: 0.5, Standard Fitness: 1”. What this means is, there are 64 total fitness cases for the 6-bit parity problem and this individual passed 63 of them. Standard fitness is often the error, so 64-63 = 1. Adjusted Fitness is a simple formula based on standard fitness:

where f(x) is the Adjusted Fitness, and g(x) is the Standardized Fitness, so 1 / (1 + 1) = 0.5.

The output goes on to say “Ideal individual found”. This is what I like to see: a 100% fit solution. The solution passed all 64 fitness cases, and based on the expression tree, it used all five of the logic gates I allowed it to use.

I recall from my college days that you could build any Boolean logic circuit out of only NAND gates. Let’s see if GP can find a solution:

I let the program run to 500 generations, but it could only get an approximate solution (61 out of 64 test cases). I’m sure if I let it run longer it would have found a 100% correct solution, but as you can see from the clip above, using only NAND gates makes the solution much larger and more difficult to find.

Let me know what types of problems you would like to see tackled with Genetic Programming in the comments below.

http://www.genetic-programming.com/GPEM2010article.pdf

http://dces.essex.ac.uk/staff/rpoli/gp-field-guide/A_Field_Guide_to_Genetic_Programming.pdf

The post Genetic Programming: The Automatic Invention Machine appeared first on Code Highlights.

]]>The post Welcome To Code Jostle! appeared first on Code Highlights.

]]>- A.I and more specifically machine learning.
- I am very interested in machine learning. It’s a broad field, but I would focus more on Evolutionary Programming (Genetic Programming & Genetic Algorithms), Neural Networks, and statistics as it applies to machine learning.

- .Net and Cloud Development. I do this for a living as a Microsoft Software Engineer, and I love what I do. This is also still broad, but I’ll be thinking about how I can focus it more.

It’s likely I may mingle the two as utilizing the cloud to run your services is the trend in the industry. I’d love to share my experiences as a developer on the inside, helping to make the cloud better, and one on the outside, using it for my own projects.

The post Welcome To Code Jostle! appeared first on Code Highlights.

]]>