The goal of this Milestone is to build a web proxy that will be used by Mynij PWA to crawl websites from provided sitemaps (or RSS feeds) and also to make online search with searx. The proxy will solve the problem of cross-origin request in Javascript by setting appropriate headers in the http response to Mynij.
Cross-origin request problem
CORS (Cross-Origin Resource Sharing) is a mechanism that allows resources on a web page to be requested from another domain outside the domain the resource originated from. This kind of "cross-domain" requests is forbidden by web browsers, by same origin security policy. CORS headers sent in ajax requests define a way in which the browser and the server can interact to determine whether or not cross-origin is allowed.
In our case, Mynij needs to crawl various URLs which are all from differents domains, this mean that browers will block request because of missing CORS headers in the response which allows source website to interact with Mynij server. To solve this problem, we need a proxy which will forward requests using python library like requests, urllib, etc
and send back response to Mynij including required CORS headers.
The proxy will add the following header in the response:
Access-Control-Allow-Origin: https://mynij.app.officejs.com
The Origin url sent by the proxy in the response headers is the origin of the website that makes the request. This mean that if Mynij PWA URL is https://mynij.app.officejs.com, then proxy responses will always contain .Access-Control-Allow-Origin: https://mynij.app.officejs.com.
This allows any website to use the proxy and make cross-origin requests without CORS issues.
Mynij Proxy
Mynij proxy is a Python proxy web server which forwards requests to 3rd party web sites and ensures that responses will not be rejected by the originating web browser. The proxy also ensures a fast response and saves bandwidth by caching some requests. Proxy source files are hosted in lab.nexedi.com, and were developped using Starlette which is a lightweight ASGI framework/toolkit, ideal for building high performance asyncio services.
By deploying Mynij proxy with SlapOS and combining it with a Rapid.Space CDN, we ensure the proxy is available in different locations worldwide for crawling URLs, including in China. Rapid.Space CDN service accelerates content delivery by reducing the time to negotiate SSL/TLS sessions and by keeping a copy of content close to end-users.
Proxy deployment
To simplify proxy deployment, a Software Release for SlapOS was introduced. It automates proxy build and deployment including all required dependencies. Theses dependencies are mostly python eggs:
- Gunicorn for Python WSGI HTTP Server
- Starlette which is a lightweight ASGI web server frameworks
- Httptools is a Python binding for the nodejs HTTP parser
Mynij Proxy being a python egg, can also be installed with python3 or pip3 using a version released on pypi.
Source code for Mynij Proxy Software Release is accessible on Nexedi gitlab https://lab.nexedi.com/Mynij/slapos-mynij/tree/master/software/mynij-proxy, it can be deployed in SlapOS using Theia or Webrunner. The picture below shows a deployment with Webrunner.

URL
connection parameter is use to access the proxy. A sample command to fetch a web content using wget
utility is:
wget PROXY_BASE_URL/proxy?url=URL_TO_FETCH
With our deployed proxy, the command to get https://nexedi.com home page is then:
wget --no-check-certificate https://[2001:67c:1254:e:9::e702]:3001/proxy?url=https://nexedi.com -O index.html
Proxy also deploys a Slapos Monitoring stack which checks the status of all involved services using so-called "promises" (in the sense of Mark Burgess).

After the proxy is deployed with Webrunner, a CDN should be added. This document rapidspace-HowTo.Request.A.CDN explains how to request a CDN on rapid.space for the deployed proxy and make it accessible through IPv4 around the world, even though the proxy backend is hosted on IPv6 only.
Performances tests
We did some proxy performances tests to measure how the proxy behaves when there are many simultaneous requests, which will be the case whenever many users are building sitemaps. In the test bellow, we check the proxy availability by simulating 1000 users connecting to the server during 30 seconds. The test was done on a Mynij instance deployed inside a Virtual Machine running on Linux with 1G of RAM and 4 CPU cores.
The server responded to 792642 requests in 30,07 seconds which is about 26363.64 per seconds.
$ ./wrk -t12 -c1000 -d30s https://[2001:67c:1254:e:9::e702]:3001/ping
Running 30s test @ https://[2001:67c:1254:e:9::e702]:3001/ping
12 threads and 1000 connections
Latency 37.59ms 15.13ms 386.96ms 86.76%
Req/Sec 2.24k 337.05 3.28k 73.05%
792642 requests in 30.07s, 102.05MB read
Requests/sec: 26363.64
Transfer/sec: 3.39MB
In the second test, we deploy a local http server to serve a small file at URL http://localhost:8080/obj/script.o
, then use the proxy to get that file with 1000 simultaneous users again during 30 seconds. The server handles 12434 requests in 30 seconds, which is about 413.10 requests per seconds.
$ ./wrk -t12 -c1000 -d30s https://[2001:67c:1254:e:9::e702]:3001/proxy?url=http://localhost:8080/obj/script.o
Running 30s test @ https://[2001:67c:1254:e:9::e702]:3001/proxy?url=http://localhost:8080/obj/script.o
12 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 931.95ms 396.16ms 2.00s 72.63%
Req/Sec 62.84 32.59 270.00 64.59%
12434 requests in 30.10s, 293.84MB read
Socket errors: connect 0, read 0, write 0, timeout 1793
Requests/sec: 413.10
Transfer/sec: 9.76MB
Now, we run again the same test to get the same small file from the file server but this time without the proxy in between. The result shows that the server handle more requests, but this can be explained by the fact that going through the proxy involves two http servers and thus more processing time. Yet, the proxy is mandatory for cross-origin requests and can provide the advantage of caching.
$ ./wrk -t12 -c1000 -d30s http://localhost:8080/obj/script.o
Running 30s test @ http://localhost:8080/obj/script.o
12 threads and 1000 connections
Latency 4.61ms 47.04ms 1.73s 99.37%
Req/Sec 235.98 235.49 1.66k 87.43%
30754 requests in 30.10s, 724.52MB read
Socket errors: connect 140, read 0, write 0, timeout 74
Requests/sec: 1021.78
Transfer/sec: 24.07MB
Mynij proxy is not faster than direct access, unlike caching proxies such as Apache traffic server. However, it can handle around 400 requests per second, which is good enough at the current stage of Mynij. To optimize this result and reduce the latency while building sitemaps, Mynij is able to manage a swarm of proxy servers all at the same time. Therefore, if the number of requests is too high, all proxies configured on Mynij can share the load and scale.