Posts

  • Playing a toy poker game with Reinforcement Learning

    Reinforcement learning (RL) has had some high-profile successes lately, e.g. AlphaGo, but the basic ideas are fairly straightforward. Let’s try RL on our favorite toy problem: the heads-up no limit shove/fold game. This is a pedagogical post rather than a research write-up, so we’ll develop all of the ideas (and code!) more or less from scratch. Follow along in a Python3 Jupyter notebook!

    (more)
  • Games, Strategies, and GTO Strategies

    This is Part 1 of 6 of an adaptation of my chapter “Game Theory Optimal Strategies: What Are They Good For?” from Excelling at No-Limit Hold’em edited by Jonathan Little.

    Much of the reason I wrote Expert Heads Up NLHE was to explain the ideas of game theory, poorly understood in the community at the time, to the average poker player. Heads up no limit (HUNL) is my game of choice personally, so it made sense to use it as the primary example. However, HUNL is something of a simple case, and there’s a bit more to be said about how game theory applies to other games. In this chapter, I’ll give a quick introduction to game theory as it applies to a variety of common poker formats. We’ll see when it’s useful, and more importantly, when it’s not – when it’s appropriate to use game theory-inspired strategies, and when it just can’t really guide our play. I promise to cover a practical skill or two as well.

    (more)
  • Solving the Shove/fold Game with TensorFlow

    Google recently open-sourced TensorFlow (website, whitepaper), a software package primarily meant for training neural networks. However, neural nets come in all shapes and sizes, so TF is fairly general. Essentially, you can write down some expression in terms of vectors, matrices, and other tensors, and then tell TF to minimize it.

    I ran through a couple of their very well written tutorials and then decided to try it out on one of my standard toy problems: the HUNL shove/fold game.

    (more)
  • Running it up, Part 3

    We doubled up twice – time for round 3!

    (more)
  • Running it up, Part 2

    We doubled the roll once, can we do it again?

    (more)
  • Running it up, Part 1

    Tonight, Carbon was down, but Black Chip Poker gave me a few dollars to play with, so I’m going to try to run it up.

    (more)
  • Value categories in C++11

    One of the most important additions to C++ in the C++11 standard was the introduction of movable types. This feature has consequences for many common programming tasks such as assigning variables and passing arguments to or returning objects from a function. Move semantics are a bit subtle, and when reading documentation, it helps to understand some vocabulary: value categories.

    (more)
  • EDVis v1.1

    Changes from v1.0 to v1.1: - Control fractions of individual hand combos - View and set fractions of hands of a particular suit - Account for card removal effects when drawing the distributions

    (more)
  • Debugging

    Pretty much any nontrivial piece of software will have bugs during development. Fixing bugs is thus an unavoidable part of programming, and it’s important that all programmers have some skill at the task. I recently made a video series about developing some poker-related software. The focus was on the problem domain, but much of the audience was new to programming, and I didn’t talk too much about what to do when things don’t go perfectly, i.e. when there are bugs. So, this post is a quick intro to debugging methodology in general, but I have my poker audience in mind.

    (more)
  • Arbitrarily-deep nested loops

    I finished a first pass at my lattice regression library over the weekend. The idea with that is pretty straightforward. Essentially, there’s some function we want to model, and it’s unknown, but we have a bunch of observations of inputs and corresponding outputs. So, we throw down a lattice (i.e. a regularly-spaced grid) of points over the space of inputs, and we use the data to “learn” some values of the function at the lattice points. Then, we discard the training data but can predict new values of the function by interpolating between the values at the lattice points. For more details, see, e.g. this paper.

    Code-wise, one challenge of the project was in representing and dealing with the lattice. For example, suppose the function we want to model has 4 inputs. Then, our learned values on a grid over the space of inputs might naturally be stored in something like a 4-D array,

    (more)
  • Eclipse: Computing Git status for repository LatticeRegression

    I’ve gotten a lot of value out of Eclipse CDT over the years, but I wish it was less buggy. And the UI could be better. Anyway, today I open my laptop (on an airplane), start a new C++ project, and soon notice (thanks to a battery indicator reading under 2 hours time left) that Eclipse is using 350% of my CPU. I check the Progress tab and see that Eclipse is “Computing Git status for repository LatticeRegression”.

    (more)
  • Job fairs and SWE interviews

    I have a different perspective on the job search thing now that I’ve successfully done it once and seen things from the other side. I manned a booth at my alma mater’s job fair recently and didn’t think most students asked the right questions. Ideally, almost all of a job fair conversation should consist of the student telling me what he’s good at and passionate about in as straightforward a way as possible (there’s no need to be modest or subtle). Reading resumes is mind-numbing work, if I have to use a lot of imagination to see you as a successful candidate, you’re likely to be disappointed. If you do ask questions, and I do the talking, you might as well take the opportunity to try and get as valuable of information as possible.

    (more)

subscribe via RSS