class: middle, center, invert # Technology to the Rescue #
.meta[Alexander Solovyov, CTO modnaKasta] ??? Hey everyone, I hope Andrew woke you up already. Let's go. --- class: invert # What's this all about ### ## .center[What works when you try to switch a car during a race] ??? - How I decided to structure this talk -- ## ## .center[And what doesn't] --- # What's modnaKasta - Shopping club - Founded in 2010 - Largest fashion-oriented ecommerce site in Ukraine - Hundreds of thousands clients - Hundreds of thousands orders - 1k+ RPS every day (more so during events) ??? - Just an internet shop - Everybody knows how to do that - Write some code, have some clients, sell some stuff, no big deal. --- # Early 2015 - Aging codebase - started in 2006 four teams ago - 80k+ lines of Python - Tons of support requests - Is a drag on business - hard to maintain - hard to fix - hard to improve - Black Friday 2014 was a commercial success and ***lots of downtime*** ??? - The "write some code" bit is not that easy as it could be, though. - Stuff's more complex than it seems - modnaKasta's Black Friday is the biggest one --- # Black Friday 2015 - 10 months of optimization work - Huge commercial success ??? - I started working there on January 2015 - We've spent whole 2015 grooming everything -- - One mistake and 3 hours of downtime --- # How do you fix that - Nobody knows original architecture ideas - Were there any ideas even? - The system itself is convoluted - Lots of logic in models, overridden `.save()` methods and QuerySets - Without caching it did at most 5 RPS on my laptop - *Rewrite!* ??? - Stuff's bad - You can go many ways - But we had no idea what's going on inside --- # How do you rewrite a site - Decide what the end state should be (high level) - Do it bit by bit - Smallest piece by smallest piece - Be very careful - Do not rush - Update your old system to support new one -- - **DO NOT RUSH** --- class: invert, middle, center # The Story ## Various things we did that worked out well ??? - Real content starts here --- # Page-by-page site rewrite - `Mar '15`: decision that rewrite is necessary - `Apr '15`: first actual bits of code for new site - `Nov '15`: first trial for a new site (heh) - `Feb '16`: new main page released - `Mar '16`: basket and order - `May '16`: campaign - `Nov '16`: checkout - `Dec '16`: order list ??? - In summer 2015 it was slow - Sped up in autumn - Winter was where we did stuff -- ### Black Friday '16 was OK :) --- # Single Page Application - Simpler to make rich interactions with user - Less data transfer/retrieval on page change - Lower load on the server - Lower load on the DB - Unified API for web and mobile apps - Slower initial load :( - Server-side rendering is a must - React, if you wonder ??? - SPA is controversial - I believe we mostly got it under control --- # Server-side rendering: Node.js ```js function handler(req, res) { res.writeHead(200, {"Content-Type": "text/html"}); render_to_string(req.url, (initial, content) => { // Simplest template ever, // just a wrapper with html/head res.end(render_template(initial, content)); }); } http.createServer(handler).listen(6000); ``` ??? - Most important stuff for SEO - Render your current page - Put it inside wrapper with `head` and return --- # render_to_string ```js function render_to_string(url, callback) { router.set_route(url); var comp = router.Root(data_store); // first time render to fire off all AJAX queries React.renderToString(comp); // wait for the queries to end xhr.current_queries.watch(function() { if (get_xhr_count() == 0) { callback(data_store, React.renderToString(comp)); } }); } ``` ??? - First renderToString triggers AJAX requests - Then we wait until they end - Then render second time with an actual data --- # Are we happy? - Manage a pool of Node.js render servers - Get results through HTTP API - Render everything twice ??? - Did a render pool - Bundle JS file inside of an app - Put it on disk on start - Run Node processes against it - Freaking hell -- # Not really --- # Clojure to the rescue! - Whole new app is in Clojure (obviously) - `.clj` - Clojure, `.cljs` - ClojureScript, `.cljc` - both - Front-end is in ClojureScript ``` (defc Example [] [:div.item {:on-click smile} [:span.inner "test"]]) ``` ??? - You can write files executed in both - This code returns React component --- # Server-side rendering ``` (defc Example [] [:div.item {:on-click smile} [:span.inner "test"]]) > (println (rum/render-html (Example)))
test
``` ??? - But on the server the same code returns simple function - So we can render it to a string - The best kind of a solution - a simple one --- class: invert, middle # Data ??? - Let's talk hard stuff --- # Meta data pairs .center.middle[
] ??? - That's my favorite story - Benchmark inside the team for other disasters - More or less understandable solution for KV-data - Okayish design gone horribly wrong --- # And we store various data there... ```sql =# SELECT k.key, k.id, COUNT(p.id) -# FROM product_metadatapair p -# JOIN product_metadatakey k ON p.key_id = k.id -# GROUP BY k.key, k.id -# ORDER BY COUNT(p.id); key | id | count ----------+----+---------- Объем | 18 | 26 -- volume | 3 | 27 -- empty string Описание | 17 | 176 -- description Номер | 2 | 443 -- number Pазмер | 1 | 3112 -- size with latin "P" Размер | 16 | 15212192 -- size (6 rows) ``` ??? - Empty key has empty values - First letter of "Pазмер" is a latin one --- # And stocks... .center.middle[
] ??? - Effectively MDP becomes a single SKU - Can single SKU have multiple stocks? Why m2m? --- # And basket! .center.middle[
] ??? - 25 mln records in basket item - 1 BI - 1 sku, but m2m again --- # Problems? - 22 queries to DB for 1 size -- - 110 queries to DB for 5 sizes -- - Let's fix it with cache -- background-image: url(img/facepalm.png) - Which depends on user's basket, so is cached per-user --- # How it should be .center.middle[
] ??? - No m2m relations --- # PostgreSQL - I love it - Just pure blessing - Allowed us to come fairly easily from the dark ages of pain to the days of sanity ??? - If your project is not using RDBMS as main db I'm almost certainly sorry for you --- # No ORM - ORMs over-fetch data (people are lazy) - Implicit behavior complicates understanding - ORMs are inflexible - ORMs are slow (they are piles of code) - ORMs are prone to errors (1+N queries, etc) - ORMs prevent people from understanding a data layout - One of your most valuable assets ??? - The only things ORMs are good for is composition --- # Composition ``` {:select [:name :phone] :from [:user_profile] :where [:= :id 1]} ``` - Just a regular map, compose how you want - Side-effect: people are learning SQL ??? - Regular Clojure map - Making helpers is just writing functions --- # ElasticSearch - Used for faceted filtration - Died in 3 seconds under 30% of production load - It just stopped answering for ever - Not even to a SIGKILL - People around kept saying "you want too much" ??? - Documentation is pretty bad - No good optimization guides - Explain generates huge JSON and no tool to understand it --- # Sneaky Cassandra caching! - When products are published - gather facet data from ES - put facet data into Cassandra - On request, first check Cassandra for data - Cache top level data and first level of every filter ??? - We cheated - Most of the queries are not touching ES --- # Kafka - Ordered messaging system - Crazy fast - Beautiful concept of topics/groups - Removes dependency on availability of neighbouring systems - Hard to overstate importance of this - Replaced custom APIs for data exchange between systems ??? - Kafka is my second favorite system after Postgres - Simple design - Orthogonal features - 5000 w/s --- # Before .center.middle[
] ??? - Custom HTTP APIs everywhere --- # After .center.middle[
] ??? - Hub appeared without anyone doing anything --- # Onyx - onyxplatform.org - Masterless distributed event stream framework - Describe all your flows with data - Using it for data exchange and publishing - For cache generation - Side-effects on events (sending emails etc) - Debugging can be hard sometimes ??? - Our Celery :) --- class: invert, middle # Platform --- # Functional programming & immutable data - Prevents in-place ad-hoc data mutation - Makes you write data processing in pipelines - Easier to test - Less confusion about behavior - Immutability removes whole classes of bugs - Makes your app simpler ??? - New technologies are here to help us write better programs - And better is *more correct* -- - Makes your *life* simpler --- # Clojure - Expressiveness - Speed - Sharing code between server and client - Hot code reload that works - Practical functional language - It's also fun! ??? - Simple - Fast - Crazy good community - High quality libs - FUN --- # REPL - Connect to running process - Not your regular IPython shell - Execute various code there - Experiment with code *in editor* - Update a function - Switch to different env - Why nobody does that in Python baffles me --- class: invert, middle # Monitoring --- # Riemann + Influx + Grafana
--- # Miniprofiler - Made by StackOverflow - Server-side libs in many languages
--- class: invert, middle # What did not work well --- # Moving fast (and breaking things) - You do a change too fast - It breaks your: - Business - Life - Will - Energy - Spine - I learned that by trial :) ??? Once upon a time I "fixed" reservation system in two days. It took us two weeks to gather pieces back. --- # Fixing old code ## If the code is sufficiently rotten, there is no fix, only replace. Learn what it does and write it yourself. --- class: invert, middle # Results --- # Performance - Median API response time - `18ms` - Median server-side page rendering time - `71ms` - Black Friday 2015: 18k online, 18 servers with Python, *dead* - Black Friday 2016: 22k online, 8 servers with Clojure, *alive* - Postgres is not breaking a sweat at 4k selects a second --- # Amounts of code - 6k loc of API - 13k loc of Front-end - 5k loc of event stream processor - Hard to compare --- # Conclusions - Rewriting is hard - Still can be done (and sometimes should) - Everybody should switch to immutable data