Tuesday, December 17, 2013

Nent

Nent (Source)

Nent is a asteroids-esque game, with upgrades and meta-ball fluid effects. It is also my submission to Ludum Dare 28, a 48hr game jam competition for which the theme was 'You only get one'. This is my first time participating in #LD48, and it has been quite the experience. To get a sense of how it felt, here is my GitHub punch-card showing my commits over the 48 hour period:
The most difficult part of creating the game was not coding, but actually coming up with a decent idea in the first place. The theme ('You only get one') is pretty vague and open to interpretation, and there were many ways to go about it (which makes coming up with an idea that much harder). In the end, after thinking about it for over 2 hours, I finally decided to just make something that looked decent and worry about mechanics and theme later.

Before I get into some of the interesting technical bits, I want to go over the tools that I used for creating this game. Instead of my normal web IDE, I opted to use LightTable which greatly increased coding productivity through its JavaScript injection magic. For image editing (the power-ups) I used GIMP which is my main tool for any sort of image manipulation/creation. And lastly I used LMMS (Linux MultiMedia Studio), a free music composition tool, to create the music for the game. This was my first real attempt at creating digital music and I found it both enlightening and horribly difficult. For anyone looking to try out LMMS I recommend this video tutorial to get started.

Now I'm going to go ahead and dive right into the most interesting aspect of the game, the liquid effects between object collision. This is done using the same technique as for metaballs. Metaballs (demo) work by combining gradients and then sampling the result. For example, here is what the game looks like before processing:
Ignoring the background color for a second, the gradients are created by changing the alpha values (opacity) of the color as they spread out. This is done using HTML5 canvas gradients:
var grad = ctx.createRadialGradient(x, y, 1, x, y, size)
grad.addColorStop(0, color.alpha(0).rgbaString())
grad.addColorStop(.4, color.alpha(1).rgbaString())
grad.addColorStop(1, color.alpha(0).rgbaString())
canvasContext.fillStyle = grad

Now, we iterate over every pixel and determine if the alpha value for that pixel is above a 'threshold'. Remember that when we have overlapping gradients, their alpha values will sum. This gives the following effect:
However, what we soon notice is that the CPU is getting pegged at 100% (1 core). This is because as our canvas gets larger, our iteration is taking exponentially longer. Here is the original code (used in the demo):
var imageData = ctx.getImageData(0,0,width,height),
        pix = imageData.data;
    
for (var i = 0, n = pix.length; i <n; i += 4) {
  if(pix[i+3]<threshold){
    pix[i+3]/=6;
    if(pix[i+3]>threshold/4){
      pix[i+3]=0;
    }
  }
}
ctx.putImageData(imageData, 0, 0);

As pix.length increases, it takes much longer to go through the loop. This eventually reaches the point where we will not get this computation in under 16ms (required to get 60FPS). Luckily, I came up with a solution. If you remember my slideshow project, where I animated 1.75 million particles in real-time, I was able to leverage the GPU shaders to greatly improve rendering performance. I am going to do the same here, using a library called Glsl.js (https://github.com/gre/glsl.js). This library greatly simplifies the process of using GLSL (opengl shading language) shaders, and applying them to the canvas that I am already using (no need to re-write code in WebGL).

GAME.glsl = Glsl({
  canvas: GAME.outCanv,
  fragment: $('#fragment').text(),
  variables: {
    canv: GAME.canv
  },
  update: function(time, delta) {
    animate(time)
    this.sync('canv')
  }
})

And now the shader code, which replaces the 'for' loop over the pixels:
<script id="fragment" type="x-shader/x-fragment">
    precision mediump float;
    uniform vec2 resolution;
    uniform sampler2D canv;

    void main (void) {
      vec2 p = gl_FragCoord.xy / resolution.xy;
      vec4 col = texture2D(canv, p);
      if(col.a < 0.85) {
          col.a /= 4.0;
          if(col.a > threshold/4.0) {
            col.a = 0.0;
          }
      }
      gl_FragColor = col;

    }
</script>

Now let's see what it looks like:
Oh no, that doesn't look right. Something odd is going on here. I'll go ahead and skip my adventure into blender land, and get right into the solution. The canvas, by default, uses composite blending. This means that it will absorb any colors underneath the canvas as part of the final canvas colors. Our web page has a background, with alpha 1.0, which causes every pixel to register in our metaball filter. To avoid this, we must modify the Glsl.js library, and add the following piece of code to change the default blending behavior (line 602, in the load() function):
gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);
gl.enable(gl.BLEND);

And that's it! Now our game will render properly.
In addition to utilizing shaders, this game also uses fixed-interval time-step physics, which you can read more about in my post on The Pond.

Looking to optimize the game more (for mobile devices for example), there is still a pretty large bottleneck regarding the canvas-to-webgl texture conversion each frame.
This could be solved by moving the entire game drawing process to the GPU shaders, and forgoing the canvas all together. However the task is non-trivial, and I did not have enough time during the 48hrs to be able to attempt it.

Participating in Ludum Dare was awesome, and I look forward to doing it again soon.

Tuesday, October 15, 2013

The Pond

Update:
Read my guest post on Mozilla Hacks

Check out The Pond (Source - GPL) on:

          Google Play             Chrome Web Store          Amazon App Store         Firefox Marketplace

also on clay.io, FB (http), Pokki

Tools used: LightTable IDE, CocoonJS

The Pond - because quality trumps quantity.

Before I go into some of the highlights of The Pond in code, I would like to give credit to both LightTable and CocoonJS for making the experience so enjoyable and awesome.

LightTable is a javascript IDE which started out as a concept video, and turned into a successful Kickstarter.  The most amazing feature of LightTable is it's code injection - as you edit javascript, you can update the javascript interpreted by V8 in the browser in real-time (video), which is an extremely powerful tool when developing a game where dealing with full page reloads can be cumbersome.

CocoonJS is a mobile platform for deploying HTML5 canvas games to mobile. It has implemented the javascript canvas in OpenGL for faster performance on mobile, and it also makes deploying the application to many platforms extremely simple (just tick a few check-boxes and you get a compiled version of the app for 4 different platforms).

One of the most important features of The Pond is that it's completely fluid and dynamic. No matter what screen size, be it a full screen desktop or low resolution mobile, The Pond dynamically adjusts itself flawlessly. Additionally, on mobile it will automatically reduce render quality if it detects a lag in frame rate. It is able to accomplish these things in part because it uses a proper game loop (from this amazing article on game loops):

var MS_PER_UPDATE = 18
var lag = 0.0
var previousTime = 0.0

// main game loop
function draw(time) {
  requestAnimFrame(draw)
  lag += time - previousTime
  previousTime = time

  var MAX_CYCLES = 18
  while(lag >= MS_PER_UPDATE && MAX_CYCLES) {
   
    // user input, movement, and animation calculations
    physics()
    lag -= MS_PER_UPDATE
    MAX_CYCLES--
  }

  // if we exhausted our cycles, the client must be lagging
  if(MAX_CYCLES === 0) {

    // adaptive quality
    lowerQuality()
  }
  
  // if 5 frames behind after update, jump
  if(lag/MS_PER_UPDATE > 75) {
    lag = 0.0
  }

  // draw to canvas
  paint()
}

The key to this game loop is that it uses a fixed interval physics time-step, and renders to the screen whenever possible. This means that all physics calculations are constant and run in predictable time. This means a fast computer will see the same physics as a slower computer (or in this case mobile device). Then, after the physics has been synced properly (at 60FPS), the actual screen drawing is done. This means that a slower computer will have a lower frame-rate for paint updates (30fps vs 60fps for example), which makes sense because the computer should lag but not break (due to physics time-step differences).

Debug Mode
One of the most difficult challenges with this project was dealing with collisions. Unlike other games where a box or a circle can model an object mostly accurately, I needed a way to detect collision between two irregular objects efficiently. Originally I thought about color-based pixel testing (really slow), and also doing Bézier curve collision calculations (extremely difficult and computationally expensive). What I ended up doing was hacking it, and just fit 6 circles within the body to do collision tests for each circle to determine whole body collision.


Fish.prototype.collide = function (fish) {

  // there are 6 circles that make up the collision box of each fish
  // check if they collide
  var c1, c2
  for (var i=-1, l = this.circles.length; ++i<l;) {
    c1 = this.circles[i]

    for (var j=-1, n = fish.circles.length; ++j < n;) {
      c2 = fish.circles[j]

      // check if they touch
      if(distance(c1, c2) <= c2.r + c1.r) {
        return true
      }
    }
  }
  return false
}

Another challenge I faced was dealing with rendering performance on mobile. The most expensive part of the whole painting operation (which was the bottleneck) was drawing the curves for each fish every frame. Normally most application use a sprite sheet to handle character animation (see Senshi), however The Pond has many dynamic elements in terms of color and shape based on rotation speed which make using a sprite sheet extremely difficult. So instead of using a sprite sheet, I draw each fish as a combination of Bézier curves.

Fish.prototype.drawBody = function() {
  var fish = this
  var size = this.size
  var ctx = this.ctx
  var curv = this.curv
  var o = this.oscillation
  ctx.strokeStyle = fish.bodyOutline
  ctx.lineWidth = 4

  for(var i = -1; i < 2; i+=2){
    var start = {
      x: size,
      y: 0
    }
    var c1 = {
      x: size * (14/15),
      y: i*size + size/30*o + curv/3
    }
    var c2 = {
      x: -size/2,
      y: i*size + size/30*o + curv/2
    }
    var end = {
      x: -size*2,
      y: i*size/3 + size/15*o + curv
    }
    ctx.moveTo(start.x, start.y)
    ctx.bezierCurveTo(c1.x, c1.y, c2.x, c2.y, end.x, end.y)
    var c3 = {
      x: -size * 2.5,
      y: i*size/6 + size/10*o + curv
    }
    var c4 = {
      x: -size*3,
      y: i*size/4 - size/15*o + curv/2
    }
    var end2 = {
      x: -size*3,
      y: -size/15*o + curv/3
    }
    ctx.bezierCurveTo(c3.x, c3.y, c4.x, c4.y, end2.x, end2.y)
  }
  ctx.stroke()

}

Now, this code could be optimized slightly by removing the new objects being created ({} generates a new object), however based on testing the biggest performance culprit is the bezierCurveTo() call, and having clean code takes priority over a micro-optimization. `this.oscillation` is based on a sin wave, and `this.curv` is based on distance to current rotation target. Overall, I was quite pleased with the rendering performance of the app. For more details, check out the commit log on github where you can find the commits which made the biggest performance improvements.

Lastly, I had a good bit of trouble figuring out how to get the fish to turn toward a target direction in the most efficient manner (take the shortest path around the unit circle). I eventually came up with this:

function directionTowards(a, b) {
  return Math.atan2(a.y-b.y, a.x-b.x)
}    
    
var dir = this.dir
var targetDir = directionTowards(MOUSE, this)
var arcSpeed = this.arcSpeed

// should it turn left or right? (based on shortest distance)
var arcDirection = 1

// if the arc distance is greater than 180deg, turn the other way
if(Math.abs(dir-targetDir)>Math.PI) {
  arcDirection = -1
}

// prevent over-turning 
var moveDistance = Math.min(arcSpeed, Math.abs(dir-targetDir))

// do the actual rotation
this.dir += (dir > targetDir ? -1 : 1) * arcDirection * moveDistance

// normalize dir within range( -PI < dir < PI )
if(this.dir>Math.PI) {
  this.dir = this.dir - Math.PI*2
} else if(this.dir<-Math.PI) {
  this.dir = this.dir + Math.PI*2
}

Bonus: I tried to integrate a metaballs effect into the game, but it didn't work out. Check out this jsfiddle for a great example of metaballs (blog post).

Thursday, September 26, 2013

Retin.us - Naive Bayes Machine Learning Basics


Retin.us - My Personal RSS Reader (source)
It's been a while since my last update on Retinus (original post). Not much has changed since then, the codebase has been quite robust and hasn't needed any maintenance. However I did promise in the previous post that I would go over how I built it, so I figured it would be a good time to go over the feature I added recently instead, which is a predicted article interest notification system using Naive Bayes classification.

There is a close up of what it looks like ----->
Green means that I will probably be interested in this article,
and Red means that I will probably not be interested in the article. Simple.

I actually had a similar system in the form of a chrome extension for Google Reader (back when that was still a thing). The way it works it that it analyses patterns in my history, specifically it looks at the title, summary, and target links of articles which I have clicked on in the past (indicating interest), and it compares that to new articles for similarity. If there seems to be a correlation with the current article and what I clicked in the past, the classifier (Naive Bayes) returns a positive correlation, and I mark the article as interesting.

Now, before I go on, I will preface with the fact that this feature is currently turned on for everyone, however it classifies only based on my history. The reason for this is
  1. I am the only heavy user of my reader, which means I am the only one with enough click history to make accurate predictions for. And
  2. Computing and storing predictions is expensive (storage, memory, and computation), and I simply don't have the resources to maintain that kind of service for everyone (pro version?).
Anyways, lets begin with an introduction to Naive Bayes classification and go over basically how it works. I recommend you read other sources as well (and watch this video), but I will try to explain it from a high-level programmer perspective.

Naive Bayes works by assuming that all 'features' are independent (that's the Naive part). A 'feature' is an input. For example, lets say I have a cat dog with brown fur and floppy ears. Well then I may have two features I can use to compare him to other dogs - fur color (1) and ear floppiness (2). I can measure how floppy his ears are (and get a float/int) or I can measure if his ears are floppy (and get a boolean) (same for fur). However, we have to assume that these are independent. The fact that the dog is brown does not affect if his ears will be floppy (remember, Naive). It may be that in fact the features really are related (brown dogs have floppy ears) however, we ignore this fact in order to make computation reasonable (it's a CPU intensive computation).

Now, lets see how we can extract 'features' from text. Lets take the following string:
The dog sat down and the dog was happy.
 Alright, we can now break down the string into individual words (each word is a feature) (this is where you tokenize words, and can do thing like remove punctuation and add word combinations).
{
  'the': 2,
  'dog': 2,
  'sat': 1,
  'down': 1,
  'and': 1,
  'was': 1,
  'happy': 1
}

Now, lets add another sentence to compare it to, and also label out two sentences differently.
// dog sentences
'dog': {
  'the': 2, 'dog':2, 'sat':1, 'down':1, 'and':1, 'was':1, 'happy':1
},
// cat sentences: the cats like scratching 
'cat': {
  'the': 1, 'cats': 1, 'like': 1, 'scratching': 1
}

Ok, now we can try to classify a new sentence: 'the animal lay down at my feet'
{
  'the':1,'animal':1,'lay':1,'down':1,'at':1,'my':1,'feet':1
}
// dog-ness score
2 + 1 + 1 = 4 ('the'*2, 'lay', 'down')
// cat ness score
1 = 1 ('the'*1)

And we can then see that the sentence is more 'dog-like' than 'cat-like' because it's score is higher when comparing to the dog sentence history. Now obviously we haven't taken into account total sentences per class, or normalization of variables, or other things which would give us better results, but you get the idea. Here is a proper implementation of what I just described in ~100 lines of javascript. That bit of code comes straight out of the library which I used to do the classifications.

While I think it's important to understand generally whats going on in an algorithm like this, I don't find it so interesting to re-write what others have already done (as it can get quite complicated, especially with how you group words together, and stemming). So I opted to use the 'natural' language processing library, which has a built in Naive Bayes classifier.

In addition the the Bayes classifier, it also has a Logistic Regression classifier which I looked at, and I also found this SVM library svmjs which I tried as well (it didn't really work out). I'll explain both real quick.

The Logistic Regression classifier was horribly slow compared to the Bayes classifier, because logistic regression does not assume that the 'features' (words) are independent of each other. The results were also not that much better than the Naive Bayes, so I decided not to use it. The way this type of classifier works (~150 lines source) is, well, I'm not really sure. I think about it like a super multi-dimensional graph, where each word lives in N dimensional space and input documents create lines in this space, and then to classify we create a line out of our input document and compare it to other lines in the space (I hope that made sense).


In addition to the logistic regression, I tried to use a Support vector machine (svm) to try to combine the results of many classifiers into a new, more accurate classifier. Besides the fact that the SVM library I chose to use didn't even work correctly (probably my fault), the SVM led to results that were sub-optimal for my particular problem (more on this in a sec). The way the SVM works, is that it takes an input matrix (array) of features and then tries to fit new matrices into either one of two groups (binary classification) by comparing it to  matrices in its history.

Like so:
// train
[1,1,1,0,0,0]  1
[1,0,1,0,0,0]  1
[0,0,0,1,1,1] -1
[0,0,1,1,0,1] -1
        
// test
[0,1,1,0,0,0] == 1
[0,0,0,0,1,1] == -1

Now, as I mentioned before, I have 3 separate feature sets for rss feed items: title, summary, and link
My final solution was this:
var classifier1 = new natural.BayesClassifier() // summary
var classifier2 = new natural.BayesClassifier() // title
var classifier3 = new natural.BayesClassifier() // links

// in a loop
 // train history based on hostname
  var hostname = url.parse(item.link).hostname.replace(/\./g,'_')
  classifier3.addDocument([hostname], isInteresting)
  
  // bayes text classifiers
  classifier1.addDocument(item.summary.toLowerCase(), isInteresting)
  classifier2.addDocument(item.title.toLowerCase(), isInteresting)


// test
var tag1 = classifier1.classify(docs[i].feedItem.summary.toLowerCase())
var tag2 = classifier2.classify(docs[i].feedItem.title.toLowerCase())
var tag3 = classifier3.classify(hostname)

result = (tag1 === 'true' && tag3 === 'true' || tag2 === 'true'  ? true : false)

Which gave me ~60% accuracy, with ~20% false negatives (I had ~4000 documents in my training set, and I separated 200 extra documents for testing, meaning that the classifiers were not trained on the 200 selected documents meant to check their accuracy). I care a lot about false negatives because a false negative means that an article was marked as uninteresting, when in fact I would have actually been interested in reading it. When I applied the SVM or combination Bayes classifier:
var classifierSum = new natural.BayesClassifier() // total

// in a loop
 var tag1 = classifier1.classify(item.summary.toLowerCase())
 var tag2 = classifier2.classify(item.title.toLowerCase())
 var tag3 = classifier3.classify(hostname)

 classifierSum.addDocument([tag1,tag2,tag3], isInteresting)
  
// test
var tag1 = classifier1.classify(item.summary.toLowerCase())
var tag2 = classifier2.classify(item.title.toLowerCase())
var tag3 = classifier3.classify(hostname)

result = classifierSum.classify([tag1, tag2, tag3])

I ended up with 90% accuracy, but also 100% false negatives (it marked everything as uninteresting). 90% is technically better than 60%, but having so many false negatives was not optimal for my problem.

That about covers the machine learning aspect of Retin.us so far, over the next few weeks I will be assessing its accuracy, and may eventually use it to pre-filter out uninteresting articles so as to save me time. As for the rest of the software stack, basically it uses Sails.js (home page), MongoDB, and Backbone (the Sails version is quite outdated, but the backbone code is decent to learn from).

Sunday, September 8, 2013

戦士 - Senshi (an MMO Battle-Royale inspired game)

senshi.zolmeister.com
A real-time MMO Battle-Royale inspired game, with permadeath (names can never be reused).
Source: github.com/Zolmeister/senshi - Sprite Attribution: Ghosa
Audio editor: senshi.zolmeister.com/audio.html

This game is my submission to js13kgames, a contest similar to js1k, but instead you get 13k (for the client, and also the server), and you get to zip it (!). After competing in js1k, I was amazed at how much I could get out of 13k, especially due to the zipping. In fact, the source in the zip files is uncompressed, because using a minifier actually increased the file size. In the end: client - 11.3KB, server - 4.1KB.

Making the game was fairly straight forward, except perhaps the audio, which I will explain in detail, as well as the real-time socket.io code.

The game engine is completely server-side, with the client solely used for rending and sending keystrokes. This is necessary to prevent cheating, as the client can never be trusted. The ideal solution (which I didn't get around to implementing) is to have the client and the server both have physics engines running simultaneously, with updates from the server forcing the client to sync up. Here is a great video from Google.IO which goes deep into the mechanics of a real-time MMO game:
Google I/O 2012 - GRITS: PvP Gaming with HTML5
With the 13KB limitation, you have a lot of room to work with in terms of pure JS, however the art assets (Sprites and Audio) are highly constrained. For this reason, I decided to go with pixel art, and custom audio format. I am not the most artistically inclined person, so I opted to use a sprite someone else made on opengameart.org and then modified it to include each weapon (using GIMP). Here is what my sprite sheet/atlas looked like:


As you can see, each frame of animation has its own image, as well as each weapon. On the right, I merged the item/terrain atlas in the same PNG to save space. The player is 22x20px, and the items are 14x14px (some items were unused).

One of the biggest features I wanted to add was diagonal walking, but I just could not muster the artistic talent (not for lack of trying) to draw the character (lets call him Zed) facing new directions. So I cheated by skewing the image using ctx.setTransform() :
function drawPlayer(x, y, name, health, dir, frame, weapon, kills) {

  // 22 x 20, with +10 x-offset
  var tanAngle = Math.tan(Math.PI / 4)
  var rowDir = [1, 0, 0, 0, 1, 2, 2, 2]
  var row = rowDir[dir]
  var col = frame == 3 ? 1 : frame
  col += (weapon + 1) * 7
  x += 10

  ctx.save()
  if (dir == 0 || dir == 1 || dir == 7) {
    // facing left
    ctx.translate(44, 0)
    ctx.scale(-1, 1)
    x = 22 - x
  }

  //draw character
  ctx.translate(x+11,y+10)
  
  if (dir%2==1) {
    // diagonal
    if(dir==1 || dir==7)
    ctx.setTransform(3.8, (dir == 1 || dir == 5 ? -1 : 1) * tanAngle, 0, 4, 400-x*4+22*4+11*4, 300+y*4+10*4)
    else
      ctx.setTransform(3.8, (dir == 5 ? -1 : 1) * tanAngle, 0, 4, 400+x*4+11*4, 300+y*4+10*4)
  }

  ctx.drawImage(image, col * 22, row * 20, 22, 20, -11, -10, 22, 20)
  ctx.restore()

}

Another thing to note is that I was unable to use off-screen canvas pre-rendering because It kept aliasing the image, event through I told it not to (a big problem for a pixel-art game):
ctx.webkitImageSmoothingEnabled = false
ctx.mozImageSmoothingEnabled = false

In order to get some real-time action going, I decided to use Socket.io (as it is allowed by the rules without counting against the file-size). Socket.io is a Websockets compatibility library which uses backup transports for data if websockets is not available (Socket.io is absolutely amazing!). Now, notice that the player has a lot of attributes associated with them: name, x, y, health, kills, weapon, direction, animation frame, isAttacking, and the keys that they are pressing (remember, everything is computed server-side). Every tick of the physics engine, we have to update all the users positions and send the data out to all the other players. This adds up to a lot of data being sent, and exponentially increases per-player in the arena.

In order to be efficient with how much data we send per player, I decided to only send a `diff`, or only what has changed in game state since the last update. I shrunk all variable names to 1 letter, and devised a data structure that would efficiently provide a full diff. Here is what I came up with (small snippet):

// This diff will be sent and applied by the client to sync their arena
// diff[0-2] = players
// diff[0] = new players (append to end)
// diff[1] = del players indicies (splice, starting in reverse order)
// diff[2] = player updates (i:index, updated attrs)
// diff[3-5] = bullets
// diff[6-8] = items
var diff = physics(++frame)

// don't send if empty
if (!diff.reduce(function (total, x) {
  return total + x.length
}, 0)) return

// clear out empty trailing arrays
var found = false
var i = diff.length - 1
while (diff[i] && diff[i].length == 0) {
  diff.splice(i, 1)
}

This added a lot of serialization/deserialization overhead, however it was worth it because it drastically reduced data size. After setting up a diff-based structure, I decided to look into more ways of data compression. This website was quite helpful: buildnewgames.com/optimizing-websockets-bandwidth/. Additionally, I found this tool which looked quite promising: msgpack. Basically it defines a way to represent JSON as binary data in a highly optimized way. Sadly, I was unable to use it, not because of it's filesize, but for lack of binary data support in Socket.io - #511. Socket.io isn't perfect, but I was disappointed in that it didn't support some of the really useful data compression options available to raw websockets - #1148, #680.

In the end, I went with the diff strategy, which will hopefully be enough (haven't tested at large scale).

The last significant part of this game is the audio. Now, let me preface by saying that I made the audio myself, with no help, and zero prior experience. That being said, I think the audio actually turned out pretty good. Not great, but pretty good. Also, the audio data size (with decoder) ended up being ~1.3KB, which is outstanding considering that even the most trivial music goes into the hundreds of KB.

With only 13KB to work with, I looked at perhaps using someone else's music and just bitcrushing it. However, it quickly became apparent that I wasn't going to get enough length and quality out. So I decided to look at what others had done, and found this amazing resource: Chime Docs (Chime Hero). I also found this great tutorial on `chipping` techniques (for creating Chiptunes - classic 8-bit music). Based on this information, I decided to make my own chiptunes editor: senshi.zolmeister.com/audio.html

The editor is basic, and quite limited in what you can do, but it gets the job done. The key though, is that it is able to output algorithmically compressed audio, which leads to a 1 minute song at <1.5Kb.
In order to create the editor I followed the Chime Hero docs carefully, and from it figured out how to generate all of the other types of sound waves (this fiddle was also helpful: jsfiddle.net/CxPFw/1):

var samples = Math.floor(11025 / 2)
var bytes = []
for (var s = samples; s--;) {
  var byte;
  if (note.wave === 'square') {
    /* Square wave */
    byte = (Math.sin(s / 44100 * 1 * Math.PI * note.freq) > 0 ? 1 : -1)
  } else if (note.wave === 'sine') {
    /* sine wave */
    byte = Math.sin(s / 44100 * 2 * Math.PI * note.freq)
  } else if (note.wave === 'saw') {
    /* sawtooth wave */
    byte = sawtooth(s / 44100 * note.freq)
  } else if (note.wave === 'ramp') {
    /* ramp wave */
    byte = Math.abs(s % (note.freq) - note.freq) / note.freq
  }
   bytes.push(byte)
}

bytes = bytes.map(function (byte, s) {
  s = samples - s
  // normalize bytes
  return byte * 127 + 128
}).reduce(function (str, byte) {
  // encode the bytes
  return str + String.fromCharCode(byte);
}, '')

var player = new Audio('data:audio/wav;base64,UklGRjUrAABXQVZFZm10IBAAAAA\
   BAAEARKwAAESsAAABAAgAZGF0YREr' + btoa('\9\0' + bytes))
player.play()

function sawtooth(x) {
  var pi = Math.PI
  var tn = Math.ceil((x + pi) / (2 * pi));
  var y = ((x - tn * 2 * pi) + 2 * pi) / pi;
  return y
}

With this code, I am able to simply export an array of [freq,wave] date from the editor, and then convert that into music (highly efficient). Also for those astute musicians, I decided to use a D Minor Pentatonic scale for the note range, using this scale hertz chart as reference.

Now, if you hit the 'export' button in the editor, you may note that the output string in the console is actually ~12KB. This is because while the data is highly optimised, it is not yet compressed. For that, we go to JSCrush which compresses javascript by turning it into a string, and then compressing that string via pointers to heavily used sequences within the string. It does a fantastic job of compressing the output to a manageable size.

Now, after loading that code into the game, I realized that the compile time (to generate the wave from frequency data) was quite long and blocked the UI, so I decided to offload it to an embedded Web Worker (The 'export' globalizes an 'a_audio' object which is the un-base64 encoded data string seen in the above code as 'bytes'):
<script id='audioworker' type='text/js-worker'>
_='  keW  t(eWvar t=<PI;var n=<ceil((e+t)/(2*t));var r=(e-n*2*t+2*t)/t; r}`=<floor(11025/2); s=e!nWj=nNeW„=[];e[1];&=e[0]; =0;Bif(==0†1Œ>0?1:-1$=1†2Œ$2W =tO&)$3W =<abs(Fs%&-&)/&}„.push( )} „})!tW tNt,nW e[n]?[t].concat(e[n]):[t]})},[])NeW e!tW e+t},0)/e‹})Ne,tWt=`-t; e*127+128})!tW e+Gt)},"");if(j‹===0WBj+=G128)}} e+j},"");Faudio="9\\0"+ s}Fr=[[  |K  |J  |D  |J  |K  |  1XXXX  KZ           Z[ 1  L   RK D  RJ  RK  Žz,R  R   LH          Y     T T T   Š             Q‚‚ ƒƒƒƒ  [ @ T @ T @   @ Q   Y   …  …      ˆ  @ T @ T @  LLKZ    J1  D1   JZKZ   1  DZ    JZDZH   Q‚ LL [z   @ T @   @^,ˆTL‡ ‡]];kFr)   2  R JZDZ  ,@ 3 z,R ],[ ~, U3, V3, [^, [z, 174.61, J1 D1 [^ ‰#‰3€ @440€ @V# @U3 [ @^  0  1  1  DZ T[ 392, KZJZ  z,Ž RJ  RD  R  R function Fbyte 440, return   [ ]        T T T       [   1 [  ^,   [z [^ [U3 [~ [V3  !.reduce( (e,#3  @~ $}else if(=&Ffreq<Math.@0 Bfor(Fs=`;Fs--;WD[   Fa_GString.fromCharCode(H[      …ŠJ   K   L    N.map( (O(Fs/44100*Q      ‚R3 T  U261.6V349.2W){X      Y                Z1 ^220`FsamplesjFtonekplayExported(z196|1  ~293.66Fwave=€ [ @392 [[   ‚    ƒ    „ sArr…     †W =<sinO‡       ZˆT @ T @ ‰ @U#[ @VŠ        ‹.lengthŒ*<PI*&)[     RŽR ^,RK        ';for(Y in $='ŽŒ‹Š‰ˆ‡†…„ƒ‚€~|zkj`^ZYXWVUTRQONLKJHGFDB@<&$#!                             ')with(_.split($[Y]))_=join(pop());eval(_)

postMessage({'setAudio': a_audio})
</script>

And the loading code:
var blob = new Blob([$('#audioworker').textContent], {type: "text/javascript"});
worker = new Worker(window.URL.createObjectURL(blob))
worker.onmessage = function (e) {
  var val = e.data.setAudio
  player = new Audio('data:audio/wav;base64,UklGRjUrAABXQVZFZm10IBAAAAABAAEARKwAAESsAAABAAgAZGF0YREr' + btoa(val))
  player.play()
}

($ is not jQuery, instead it uses the built-in selector: `$ = document.querySelector.bind(document)`)

It's worth noting that there is still some lag due to the creation of the audio element, which you can't offload to the Web Worker because it doesn't have access to the `Audio()` object, or the `btoa` function for that matter (#55663).

Monday, August 26, 2013

Tanzania, Africa - In 1.75 Million Particles

Source: github.com/Zolmeister/tanzania (page: tanzania.zolmeister.com/full)
Note: Requires WebGL

Alright, let me explain what you're looking at. If you fullscreen the app and have a 1080p monitor (1920x1080), then 1.75 million particles (1,753,920) will be generated and animated in real-time per image. Handling 1.75 million particles is no walk in the park, and without a decently fast GPU it may be quite slow. While it may not seem that complicated (as I thought at first), there is in fact a ridiculous amount of intense code involved. Lets begin!

So, I just came back from a vacation in Tanzania, Africa where I summited Mt. Kilimanjaro (the highest free-standing mountain in the world - 5,895 meters), went on safari, and spent a week at the beach in Zanzibar. During my trip, I took ~2k photos, of which I selected ~150 (plus a few from friends I went with). I take pride in my photos, and I wanted to showcase them in a special way, so I decided to make a particle-based image viewer.

My first attempt was to try to use the HTML5 canvas object. This failed quickly and spectacularly, taking about a minute to render a single frame. This is because I was generating 1.75mil objects, each with an x, y, and color attribute, and then trying to update all of those objects x,y coordinated every frame.

For those that don't know, you get 16ms to do math every frame. This is the magic number which equates to a 60fps video stream, which also happens to align with most monitors refresh rates (60Hz), so even if you went over 60fps it wouldn't render any faster.

Let me put it this way. You can't even increment a number to 1.75mil in 16ms, and that's without doing any operations (on my cpu it took 1162ms). So the question is, how do you animate (change the x,y coordinates) 1.75mil particles? The GPU. GPUs, unlike CPUs, are really good at doing lots of small calculations in parallel. For example, your CPU probably has 4 cores, but your GPU may have 32, 64, or even 128 cores.

In order to take advantage of the GPUs ability to do massively parallel computing, we are going to need to use WebGL, and specifically the GLSL OpenGL shading language. I opted to use the great Three.js WebGL framework. Three.js has quite poor documentation, but it has a great community and lots of examples to learn from. These examples were especially helpful (yes, to manually inspect the uncommented unintuitive source)


I also learned a lot from these amazing tutorials: aerotwist.com/tutorials

Deviation: Here's how I took my 2k photos and picked a few.
python script to save select images while viewing them as a slide show
Image resize + crop bash script:
for file in *.jpg; do                                      
  convert -size 1624x1080 $file -resize '1624x1080^' -crop '1624x1080+0+0' -quality 90 +profile '*' $file
done

Thumbnail bash script:
for file in *.jpg; do         
  convert $file -resize 120x -quality 90 +profile '*' thumb/$file
done

File rename bash script:
for file in *.jpg; do
  mv "$file" "$file.tmp.jpg"
done
x=0
for file in *.jpg; do
  ((x++))
  mv "$file" "$x.jpg"
done

Alright, back to javascript. Let me explain a bit about how the GPU and WebGL work. You give WebGL a list of vertices (objects with x, y, z coordinates) that make up a 3D object mesh, and you give it a texture to be mapped to the vertices, this gives you a colorful 3D object. With a ParticleSystem, you get a mesh (vertices) and you give it a texture that is applied per vertex. This means you cannot add or remove particles easily (though you can hide them from view), so you need create a mesh with as many vertices as you will need particles.

So, what happens is that these two things, the vertices and the texture, get passed into the Vertex Shader, and the Fragment Shader. First, everything goes to the Vertex Shader. The Vertex Shader may alter the position of vertices and move stuff around (this is important, as it will let us do animation on the GPU). Then the Fragment Shader takes over, and applies color to all the parts of the object, using the vertices as guides for how to color things (for shadows for example). Here is a great tutorial.

Shaders are coded in a language called GLSL. Yeah, that already sounds scary. But it's not too bad once you spend a few hours banging your head against a wall. Here is what my (shortend) vertex shader looks like:
<script id='vertexshader' type='x-shader/x-vertex'>
  varying vec3 vColor;
  uniform float amplitude;
  uniform int transition;
  uniform int fullscreen;
  
  void main() {
    
    vColor = color;
    vec3 newPosition = position;
    newPosition = newPosition * (amplitude + 1.0) + amplitude * newPosition;
    vec4 mvPosition = modelViewMatrix * vec4( newPosition, 1.0 );
    gl_PointSize = 1.0;
    gl_Position = projectionMatrix * mvPosition;
  }
</script>

And here is what my fragment shader looks like:

<script id='fragmentshader' type='x-shader/x-fragment'>
  varying vec3 vColor;
  
  void main() {
    gl_FragColor = vec4(vColor, 1.0);
  }
</script>


Easy. GLSL has 3 main variables that get passed from javascript to the GPU.
  • varying - passed from the vertex shader to the fragment shader (used to pass attributes)
  • attribute - passed from JS to vertex shader only (per-vertex values)
  • uniform - global constant set by JS
Sometimes you can get away with doing a particle system by updating an 'attribute' variable for each particle, however this isn't feasible for us.

The way I will do animation (there are many ways), is to update a uniform (global constant) with a number from 0 to 1 depending on the frame I'm at. I will use the 'sine' function to do this, which will give me a smooth sequence from 0 to 1 and then back to 0 again.

// update the frame counter
frame += Math.PI * 0.02;

// update the amplitude based on the frame
uniforms.amplitude.value = Math.abs(Math.sin(frame))

Now, all I need to do to get cool animations is have each particle move with respect the the amplitude from its original location. Here is the second effect in the app (in the vertex shader):
newPosition = newPosition * (amplitude + 1.0) + amplitude * newPosition / 1.0;
newPosition = newPosition * (amplitude * abs(sin(distance(newPosition, vec3(0.0,0.0,0.0)) / 100.0))+ 1.0);

Now, as far as the fragment shader goes, you see I am updating the color from an attribute set by Three.js (`color`), which is the color per-pixel from the image to the vertex. This (I think) is quite inefficient (but fast enough for me), and the optimal way (I think) is to pass the image directly as a texture2D variable and let the fragment shader look at that to determine its color (nucleal.com/ does that I think). However I couldn't  figure out how to do this.

Here is the Three.js code for using a custom vertex and fragment shader:

    uniforms = {
      // This dictates the point of the animation sequence
      amplitude: {
        type: 'f', // float
        value: 0
      },
      // Dictates the transition animation. Ranges from 0-9
      transition: {
        type: 'i', // int
        value: 0
      }
    }

    // Load custom shaders
    var vShader = $('#vertexshader')
    var fShader = $('#fragmentshader')

    material = new THREE.ShaderMaterial({
      uniforms: uniforms,
      vertexShader: vShader.text(),
      fragmentShader: fShader.text(),
      vertexColors: true,
      depthTest: false,
      transparent: true
    })

    particles = new THREE.ParticleSystem(geometry, material)
    scene.add(particles)

Now, we're missing the `geometry` part of the particle system, as well as a way to display the particles with a 1:1 pixel ratio to the screen. The latter is solved by some field-of-view voodoo code:

camera = new THREE.PerspectiveCamera(90, width / height, 1, 100000)
// position camera for a 1:1 pixel unit ratio
var vFOV = camera.fov * (Math.PI / 180) // convert VERTICAL fov to radians
var targetZ = height / (2 * Math.tan(vFOV / 2))
    
// add 2 to fix rendering artifacts
camera.position.z = targetZ + 2

And the former is solved with a giant for-loop (Don't actually do this)
geometry = new THREE.Geometry()
var vertices = []
for(var i=0;i<1,750,000;i++){
  vertices.push(new THREE.Vertex(i, i*20 % (i/2), 0))
}
geometry.vertices = vertices

There are many things wrong with the code above (it won't even compile), but the most important part is to notice that it's pushing a NEW object every time it runs, and this is happening over 1million times. This loop is extremely slow, takes up hundreds of megabytes of memory, and breaks the garbage collector (but more on that later). Not to mention that you can't have a loading bar because it blocks the UI thread. Oh yeah, don't forget about colors! (This code is also ridiculously slow)
geometry = new THREE.Geometry()
var colors = []
for(var i=0;i<1,750,000*3;i+3){
  var col = new THREE.Color()
  col.setRGB(imgData[i], imgData[i+1], imgData[i+2])
}
geometry.colors = colors

The first thing that came to mind that would fix both problems at once (sort of) is to use Web Workers. However, when passing data to and from web workers it gets JSON.stringify 'd, which is horribly slow and takes forever (but there's another way - more later).

We can speed things up by using plain objects instead of Three.js objects (they seemed to work the same):
var col = {r: 1, g: 2, b:3}
This is considerably faster, but still not fast enough. Well, it was fast enough for me for a bit, but then the app started randomly crashing. Turns out, I was using over 1GB of memory. No, I wasn't leaking memory, the problem was instead that the Garbage Collector could not run and crashed the app.

Here is a great video on Memory management in the browser from GoogleIO, and I'll explain a little bit about GC. Javascript is a GC'd (Garbage Collected) language, which means you don't have to de-allocate memory you use. Instead what happens is you get 2 memory chunks. The first is short-term memory, and the second is long-term memory. Short term memory is collected often, and objects that survive long enough (aren't collected) move into long-term memory. In order to determine what memory can be collected safely, The garbage collector goes through every object and checks what objects can be reached from that object (an object graph if you will, as a tree with nodes). So when the GC runs on our app with 3million+ objects, it crashes.

Finally, after much headache, user `bai` on #three.js (freenode IRC) saved the day with the suggestion to use BufferGeometry in Three.js instead of just Geometry. BufferGeometry is exactly what I needed, as it exposes the raw typed javascript arrays used by WebGl, which means a huge performance increase and drastically reduced memory usage. I'm talking about going from 1GB of memory to 16MB.
// BufferGeometry is MUCH more memory efficient
geometry = new THREE.BufferGeometry()
geometry.dynamic = true
var imgWidth = imgSize.width
var imgHeight = imgSize.height
var num = imgWidth * imgHeight * 3
geometry.attributes.position = {
  itemSize: 3,
  array: new Float32Array(<this comes from web workers>),
  numItems: num
}
geometry.attributes.color = {
  itemSize: 3,
  array: new Float32Array(num),
  numItems: num
}

//update colors
geometry.attributes.color.needsUpdate = true

Ok. At this point, I had a reasonably fast application, but it still had a few rough edges. Specifically, with the blocking UI thread when loading a new image (read the image pixel by pixel and update the geometry.attributes.color array - 1.75mil pixels). See, Javascript is single threaded, so when you have a long running loop, the whole webpage freezes up. This is where Web Workers come in (I told you they would be back).

Web workers spawn a new OS thread, which means no UI blocking. However we still have the issue of having to send data via objecting cloning. Turns out that there are special objects which are transferable (specifically ArrayBuffer objects), which pass by reference (vs copy) except that the catch is that they are removed from the original thread. eg. I pass an array to a worker, and I can no longer read it from my main thread, but I can in the worker thread. In order to understand how to take advantage of this, we need to understand javascript typed arrays.

Typed arrays in javascript consist of a read-only ArrayBuffer object, and a `viewer` which allows you to write data to the ArrayBuffer behind the scenes. For example, if I have an ArrayBuffer, I can create a viewer on top of it (cheaply) Uint8ClampedArray(ArrayBuffer), which lets me read and write data to the buffer. Now, lets look at how I pass the ArrayBuffers back and forth between the web worker thread and the main thread to offload heavy work.
(web worker code)
<script id='imageworker' type='text/js-worker'>
onmessage = function (e) {
  var pixels = new Uint8ClampedArray(e.data.pixels)
  var arr = new Float32Array(e.data.outputBuffer)
  var i = e.data.len
  
  while (--i >= 0) {
    arr[i*3] = pixels[i*4] / 255
    arr[i*3 + 1] = pixels[i*4+1] / 255
    arr[i*3 + 2] = pixels[i*4+2] / 255
  }
  
  postMessage(arr.buffer, [arr.buffer])
  return close()

};
</script>

(main thread code)

function loadImage(num, buffer, cb) {
  var img = new Image()
  img.onload = function () {
    var c = document.createElement('canvas')
    imgWidth = c.width = img.width
    imgHeight = c.height = img.height

    var ctx = c.getContext('2d')
    ctx.drawImage(img, 0, 0, imgWidth, imgHeight)
    var pixels = ctx.getImageData(0, 0, c.width, c.height).data
    var blob = new Blob([$('#imageworker').text()], {
      type: "text/javascript"
    });
    var worker = new Worker(window.URL.createObjectURL(blob))
    console.time('imageLoad')

    worker.onmessage = function (e) {
      console.timeEnd('imageLoad')

      workerBuffer = geometry.attributes.color.array
      geometry.attributes.color.array = new Float32Array(e.data)
      cb && cb()
    };

    worker.postMessage({
      pixels: pixels.buffer,
      outputBuffer: workerBuffer.buffer,
      len: pixels.length
    }, [pixels.buffer, workerBuffer.buffer])

  }
  img.src = 'imgs/' + num + '.jpg'
}

(Bonus: console.time(), console.timeEnd() - useful timing shortcuts)

And that's it! It wasn't easy, but I definitely learned a lot.