Many people tend to say they don't want to write tests. One of the reasons is usually that the tests are too slow. The tests can be slow because they are written in a bad way. They can also be slow because of slow components. One such component is usually a database.
The great thing about PostgreSQL is that all types of queries are transactional. It simply means that you can start a transaction, then run the test, which can add, delete and update all the data and database structure it wants. At the end of the integration test, there should be called rollback which just reverts all the changes. It means the next test will always have the same database structure and data before running, and you don't need to manually clear anything.
For running the integration tests we need a test database. One of the most important things when running test is speed. Tests should be fast, programmers don't like to wait ages just to know that there is something wrong.
We can also have a read only databases for the tests. Then you don't need to worry about the transactions, however you always need to ensure the tests won't change anything. Even if you assume your tests won't make any changes, it is always better to use a new transaction for each test and rollback at the end.
The main idea for fast integration tests using PostgreSQL is that those tests don't change anything in the database. If they don't change, we don't need to worry about some possible data loss when the database suddenly restarts. Then we can just restart the tests. Tests should prepare the data before running, assuming the database is in unknown state.
This database should be as fast as possible, even if it means losing data when some unusual things happen. Normally PostgreSQL works really great when someone turns off the server plug suddenly or kills the database process. It just doesn't lose the data.
However we really don't need this stuff when running the tests. The database can be loaded before running tests. If the database is suddenly shut down, we should restart the tests.
The simplest thing is to change a couple of settings which enable great secure writes, however it slows down the database. We don't need to have secure writes, they are only important when something crashes. Then we should restart all the components used for integration tests and load database before testing.
For testing I will use pgbench program which makes a test similar to TPC-B. The tests prepare the data in four tables and then performs a simple transaction:
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
where the params are randomly chosen during execution.
The database is created normally using SQL query:
CREATE DATABASE pbench;
Before running the tests, pgbench has to prepare initial data. This is done using the -i param. My computer is not very slow, so the default size of the database is too small, I used a quite larger database size using the -s 25 param. This way the database size is about 380MB, including indexes.
./pgbench -h localhost pbench -i -s 25
The first test will just run using standard PostgreSQL configuration settings and will run for 5 minutes (the -T 300 param).
./pgbench -h localhost -T 300 pbench
The initial results show about 80 transactions per second (tps).
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 23587
tps = 78.621158 (including connections establishing)
tps = 78.624383 (excluding connections establishing)
You may probably noticed the "number of clients" and "number of threads" values. It is the scenario where you have sequential tests, so all of them are run one by one. However integration tests written in a good way can be run in parallel, so let's run the pgbench once again, but with three threads and three clients.
./pgbench -h localhost -c 3 -j 3 -T 300 pbench
The results show that it is a little bit faster now:
number of clients: 3
number of threads: 3
duration: 300 s
number of transactions actually processed: 29782
tps = 99.268609 (including connections establishing)
tps = 99.273464 (excluding connections establishing)
Let's now change the PostgreSQL settings to some more dangerous, so it can lose some data when shut down suddenly, but in fact I don't care as all the data is loaded just before running the test.
I've written at the end of the postgresql.conf file the following lines:
fsync = off # turns forced synchronization on or off
synchronous_commit = off # synchronization level; on, off, or local
full_page_writes = off # recover from partial page writes
Those changes need a database restart, and after restarting PostgreSQL, I just run the pgbench tests once again.
All the results are in the following table:
number of clients and threads
1
2
3
normal settings
78 tps
80 tps
99 tps
dangerous settings
414 tps
905 tps
1215 tps
change ratio
531 %
1131 %
1227 %
As you can see, you can do three simple things to speed up your integration tests using PostgreSQL:
Change default PostgreSQL settings to speed the database up.
Change your tests to run in parallel.
Run each test in one transaction.
This way I managed to speed up the tests from 78 tps to 1215 tps. It means that the integration tests which normally run in 60 minutes, should now run in 4 minutes.
I've also played with many other settings which could have some impact on the PostgreSQL speed. They really have, however the impact is so small that I don't think it is worth mentioning here. Changing those three settings can make the PostgreSQL fast enough.
Richard and David presented a recent case study on an e-commerce hosting client.
Several Interchange catalogs drive their individually branded storefronts, on top of a standard single-server LAMP stack boosted by an SSD drive.
Last year the sites came under an intense Distributed Denial of Service attack which lasted nearly two weeks. End Point responded immediately and soon engaged third-party DDoS mitigation firms. This experience later prompted an Interchange caching implementation.
Cache population and expiration is difficult for any dynamic web application using sessions, and doubly so for e-commerce sites. Every shopping cart needs a session, but delaying session creation until the first POST submission enables efficient caching for most of the sitemap. Other Interchange caching improvements made it back into the upstream code.
I?ll keep brief my last post about this year?s Utah Open Source Conference.
I was asked to give on both day one and day two a talk called ?Website Performance Boot Camp? which carried this brief description:
What?s the difference between a snappy website and a sloth that you turn away from in frustration? A lot of little things, usually. It?s rarely worth doing 100% of the optimization you could do, but getting 75% of the way isn?t hard if you know where to look.
We?ll look at HTTP caching, compression, proxying, CDNs, CSS sprites, minification, and more, how to troubleshoot, and what?s best to leave alone when you have limited time or tolerance for risk.
Here is the video recording of the first time I presented the talk. (The technician noted its audio was ?a little hot?.)
Thanks again to the conference organizers and the other speakers and sponsors, and the nice venue Utah Valley University, for making it a great conference!
Many of my colleagues know that I'm fairly involved in animal rescue as a photographer and more recently as a foster. I recently became involved in HeARTs Speak, a non-profit, volunteer-driven organization that brings together artists (photographers and more) who volunteer at animal rescue organizations.
Frank the Foster
I worked with them to help launch a new website with a design from Enso Blue, which brings us to the point of the article. Given a choice of many platforms and tools, what tools did I use in development of a new site with the only restriction being how much time I was able to put into development and maintenance? Here's a quick rundown the tools I used for the new website.
Ruby on Rails: Here at End Point, we develop many applications in Ruby and Ruby on Rails. It's a platform that encourages decent code organization and efficient development. Another option that came up was PHP with WordPress. However, since the website required a custom application and voting process for joining members, I concluded that WordPress would be a bit of a hassle with this level of customization (though I'm personally a big fan of WordPress and its community).
RailsAdmin: I blog about RailsAdmin a lot because it's become my go-to tool for providing a thorough admin interface that integrates nicely with Devise (user authentication) and CanCan (user authorization).
RailsAdminImport. This is an open source Ruby on Rails gem that I developed for an End Point client to import data, because RailsAdmin doesn't include import functionality out of the box. By installing this gem into the HeARTs Speak application, I introduced the ability to import data from CSV easily. Read more about it here.
New Relic. I installed New Relic's free offering's for both Rails application monitoring and server monitoring. New Relic is a very popular performance analytics and monitoring tool in the Rails space. They offer paid membership levels as well, but I am satisfied with the free basic monitoring at this point.
Full Page Caching. Known Rails pitfalls include poor performance and a demanding server load. I added full page caching to all front-end facing pages to mitigate the effects of poor performance. I recently wrote about page caching in RailsAdmin here.
Google Analytics. This is an obvious and popular choice. Google Analytics offers so much in the way of traffic analysis, conversion analytics, and now even real-time tracking. If you aren't using it on your site, you should be!
WebPageTest. This is a nice service for examining performance of a website. Several of us End Pointers have mentioned it in blog articles before. If you aren't using a performance analysis tool like WebPageTest or YSlow, I'd highly recommend it.
Firebug and Screencast-O-Matic. Firebug is always a huge part of development for me. In this project, I used Screencast-O-Matic to provide an example of what it's like to work with Firebug to speed up the design iteration process, similar to process described in this blog article.
Most sites that use Interchange take advantage of Interchange's "more lists". These are built-in tools that support an Interchange "search" (either the search/scan action, or result of direct SQL via [query]) to make it very easy to paginate results. Under the hood, the more list is a drill-in to a cached "search object", so each page brings back a slice from the cache of the original search. There are extensive ways to modify the look and behavior of more lists and, with a bit of effort, they can be configured to meet design requirements.
Where more lists tend to fall short, however, is with respect to SEO. There are two primary SEO deficiencies that get business stakeholders' attention:
There is little control over the construction of the URLs for more lists. They leverage the scan actionmap and contain a hash key for the search object and numeric data to identify the slice and page location. They possess no intrinsic value in identifying the content they reference.
The search cache by default is ephemeral and session-specific. This means all those results beyond page 1 the search engine has cataloged will result in dead links for search users who try to land directly on the more-listed pages.
It is the latter issue that I wish to address because there is--and has been for some time now--a simple mechanism called "permanent more" to remedy the default behavior.
You can leverage "permanent more" by adding the boolean mv_more_permanent, or the shorthand pm, to your search conditions. E.g.:
Link:
<a href="[area search="
co=1
sf=category
se=Foo
op=rm
more=1
ml=5
pm=1
"]">All Foos</a>
Loop:
[loop search="
co=1
sf=category
se=Foo
op=rm
more=1
ml=5
pm=1
"]
...loop body with [more-list]...
[/loop]
Query:
[query
list=1
more=1
ml=10
pm=1
sql="SELECT * FROM products WHERE category LIKE '%Foo%'"
]
...same as loop but with 10 matches/page...
[/query]
If the initial search is defined with the "permanent more" setting, it will produce the following adjustments:
The hash key used to store and identify the search cache is deterministic based on the search conditions. Many searches for Interchange are category driven. Thus, all end users who wish to browse a category end up clicking identical links, which create duplicate search caches, belonging uniquely to them. With permanent more, they all share the same cache, with the same identifier. As long as the search conditions don't change, neither does the cache identifier. Even as the cache is refreshed with new executions of the search, the object remains in the same location. Thus, the results a search engine produced this morning reference links still valid now, tomorrow, or next week, provided they reference the same search conditions.
The cached search object has no session affinity. Any link referencing the cache with the correct hash key has access to the content.
Taken together, "permanent more" removes (for the most part, addressed later) dead links from more lists cataloged by search engines. There are, however, other benefits to "permanent more" beyond those intended as described above:
As stated in passing, standard Interchange search caching produces duplicate search objects for common search conditions. For a busy site, these caches can have an impact on storage. Typically, maintenance is implemented to clean up cache files for all such files whose age exceeds by some amount the session duration (standard is 48 hours). With permanent more, duplicate caches are eliminated. A cache location is reused by all users with the same search requirements, keeping data-storage requirements for caches to the minimum necessary. As searches change, ophaned caches can still easily be cleaned up as they will immediately start to age with no more access to them necessary for storage.
For the same reason that "permanent more" resolves search-engine links, it also resolves content management for individual sites using a reverse proxy for caching. Because most (and certainly the easiest) caching keys are based off of URL, the deterministic nature of the hash keys for "permanent more" allows assurance that the cached content in the proxy accurately reflects the search content over time, and that all users will hit the cached resource and not generate new, unique links with varying hash keys.
One shortcoming of "permanent more" to be aware of is the impact of changing data underneath the search. Even if search conditions do not change, the count and order of matching record sets may. So, e.g., enough products may be removed from a given category to cause the last page of a more list to become empty, which would cause any specific link into that page to become dead. More minor, but still a possibility, is the introduction or removal of products so that a particularly searched-for term has been "bumped" to another page within the search cache since the last time the search engine crawled the more lists. For searches backed by particularly volatile data, "permanent more" may not be sufficient to address search-engine or caching demands.
Finally, "permanent more" should be avoided for any search features that may cache data sensitive to an individual user. This is unlikely to happen as, under most circumstances, the configuration of the search itself will change based on the unique characteristics of the user executing the search (e.g., a username included in a query to review order history). However, it is still possible that context-sensitive information could be stored in the search object and, if so, all other users with access to the more lists would have access to that information.
PostgreSQL does as much work as it can in RAM, but sometimes it needs to (or thinks that it needs to) write things temporarily to disk. Typically, this happens on large or complex queries in which the required memory is greater than the work_mem setting.
This is usually an unwanted event: not only is going to disk much slower than keeping things in memory, but it can cause I/O contention. For very large, not-run-very-often queries, writing to disk can be warranted, but in most cases, you will want to adjust the work_mem setting. Keep in mind that this is very flexible setting, and can be adjusted globally (via the postgresql.conf file), per-user (via the ALTER USER command), and dynamically within a session (via the SET command). A good rule of thumb is to set it to something reasonable in your postgresql.conf (e.g. 8MB), and set it higher for specific users that are known to run complex queries. When you discover a particular query run by a normal user requires a lot of memory, adjust the work_mem for that particular query or set of queries.
How do you tell when you work_mem needs adjusting, or more to the point, when Postgres is writing files to disk? The key is the setting in postgresql.conf called log_temp_files. By default it is set to -1, which does no logging at all. Not very useful. A better setting is 0, which is my preferred setting: it logs all temporary files that are created. Setting log_temp_files to a positive number will only log entries that have an on-disk size greater than the given number (in kilobytes). Entries about temporary files used by Postgres will appear like this in your log file:
2011-01-12 16:33:34.175 EST LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp16501.0", size 130220032
The only important part is the size, in bytes. In the example above, the size is 124 MB, which is not that small of a file, especially as it may be created many, many times. So the question becomes, how can we quickly parse the files and get a sense of which queries are causing excess writes to disk? Enter the tail_n_mail program, which I recently tweaked to add a "tempfile" mode for just this purpose.
To enter this mode, just name your config file with "tempfile" in its name, and have it find the lines containing the temporary file information. It's also recommended you make use of the tempfile_limit parameter, which limits the results to the "top X" ones, as the report can get very verbose otherwise. An example config file and an example invocation via cron:
$cat tail_n_mail.tempfile.myserver.txt
## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Thu Nov 10 01:23:45 2011
MAILSUBJECT: Myserver tempfile sizes
EMAIL: greg@endpoint.com
FROM: postgres@myserver.com
INCLUDE: temporary file
TEMPFILE_LIMIT: 5
FILE: /var/log/pg_log/postgres-%Y-%m-%d.log
$crontab -l | grep tempfile
## Mail a report each morning about tempfile usage:
0 5 * * * bin/tail_n_mail tnm/tail_n_mail.tempfile.myserver.txt --quiet
For the client I wrote this for, we run this once a day and it mails us a nice report giving the worst tempfile offenders. The queries are broken down in three ways:
Largest overall temporary file size
Largest arithmetic mean (average) size
Largest total size across all the same query
Here is a slightly edited version of an actual tempfile report email:
Date: Mon Nov 7 06:39:57 2011 EST
Host: myserver.example.com
Total matches: 1342
Matches from [A] /var/log/pg_log/2011-11-08.log: 1241
Matches from [B] /var/log/pg_log/2011-11-09.log: 101
Not showing all lines: tempfile limit is 5
Top items by arithmetic mean | Top items by total size
----------------------------------+-------------------------------
860 MB (item 5, count is 1) | 17 GB (item 4, count is 447)
779 MB (item 1, count is 2) | 8 GB (item 2, count is 71)
597 MB (item 7, count is 1) | 6 GB (item 334, count is 378)
597 MB (item 8, count is 1) | 6 GB (item 46, count is 104)
596 MB (item 9, count is 1) | 5 GB (item 3, count is 63)
[1] From file B Count: 2
Arithmetic mean is 779.38 MB, total size is 1.52 GB
Smallest temp file size: 534.75 MB (2011-11-08 12:33:14.312 EST)
Largest temp file size: 1024.00 MB (2011-11-08 16:33:14.121 EST)
First: 2011-11-08 05:30:12.541 EST
Last: 2011-11-09 03:12:22.162 EST
SELECT ab.order_number, TO_CHAR(ab.creation_date, 'YYYY-MM-DD HH24:MI:SS') AS order_date,
FROM orders o
JOIN order_summary os ON (os.order_id = o.id)
JOIN customer c ON (o.customer = c.id)
ORDER BY creation_date DESC
[2] From file A Count: 71
Arithmetic mean is 8.31 MB, total size is 654 MB
Smallest temp file size: 12.12 MB (2011-11-08 06:12:15.012 EST)
Largest temp file size: 24.23 MB (2011-11-08 19:32:45.004 EST)
First: 2011-11-08 06:12:15.012 EST
Last: 2011-11-09 04:12:14.042 EST
CREATE TEMPORARY TABLE tmp_sales_by_month AS SELECT * FROM sales_by_month_view;
While it still needs a little polishing (such as showing which file each smallest/largest came from), it has already been an indispensible tool forfinding queries that causing I/O problems via frequent and/or large temporary files.
End Point recently started working with a new client (a startup in stealth mode, cannot name names, etc.) who is using PostgreSQL because of the great success some of the people starting the company have had with Postgres in previous companies. One of the things we recommend to our clients is a regular look at the database to see where the bottlenecks are. A good way to do this is by analyzing the logs. The two main tools for doing so are PGSI (Postgres System Impact) and pgfouine. We prefer PGSI for a few reasons: the output is better, it considers more factors, and it does not require you to munge your log_line_prefix setting quite as badly.
Both programs work basically the same: given a large number of log lines from Postgres, normalize the queries, see how long they took, and produce some pretty output.If you only want to look at the longest queries, it's usually enough to set your log_min_duration_statement to something sane (such as 200), and then run a daily tail_n_mail job against it. This is what we are doing with this client, and it sends a daily report that looks like this:
Date: Mon Aug 29 11:22:33 2011 UTC
Host: acme-postgres-1
Minimum duration: 2000 ms
Matches from /var/log/pg_log/postgres-2011-08-29.log: 7
[1] (from line 227)
2011-08-29 08:36:50 UTC postgres@maindb [25198]
LOG: duration: 276945.482 ms statement: COPY public.sales
(id, name, region, item, quantity) TO stdout;
[2] (from line 729)
2011-08-29 21:29:18 UTC tony@quadrant [17176]
LOG: duration: 8229.237 ms execute dbdpg_p29855_1: SELECT
id, singer, track FROM album JOIN artist ON artist.id =
album.singer WHERE id < 1000 AND track <> 1
However, the PGSI program was born of the need to look at all the queries in the database, not just the slowest-running ones; the cumulative effect of many short queries can have much more of an impact on the server than a smaller number of long-running queries. Thus, PGSI looks not only at how long a query takes to run, but how many times it has run in a certain period, as well as how often it runs. All of this information is put together to give a score to each normalized query, known as the "system impact". Like the costs on a Postgres explain plan, this is a unit-less number and of little importance in and of itself - the important thing is to compare it to the other queries to see the relative impact. We also have that report emailed out, it looks similar to this (this is a text version of the HTML produced):
Log file: /var/log/pg_log/postgres-2011-08-29.log
* SELECT (24)
* UPDATE (1)
Query System Impact : SELECT
Log activity from 2011-08-29 11:00:01 to 2011-08-29 11:15:01
+----------------------------------+
| System Impact: | 0.15 |
| Mean Duration: | 1230.95 ms |
| Median Duration: | 1224.70 ms |
| Total Count: | 411 |
| Mean Interval: | 4195 seconds |
| Std. Deviation: | 126.01 ms |
+---------------------------------+
SELECT *
FROM albums
WHERE track <> ? AND artist = ?
ORDER BY artist, track
At this point you may be wondering how we get all the queries into the log. This is done by setting log_min_duration_statement to 0. However, most (but not all!) clients do not want full logging 24 hours a day, as this creates some very large log files. So the solution we use is to analyze a slice of the day, only. It depends on the client, but we try for about 15 minutes during a busy time of day. Thus, the sequence of events is:
Turn on "full logging" by dropping log_min_duration_statement to zero
Some time later, set log_min_duration_statement back to what it was (e.g. 200)
Extract the logs from the time it was set to zero to when it was flipped back.
Run PGSI against the log subsection we pulled out
Mail the results out
All of this is run by cron. The first problem is how to update the postgresql.conf file and have Postgres re-read it, all automatically. As covered previously, we use the modify_postgres.pl script for this.
The exact incantation looks like this:
0 11 * * * perl bin/modify_postgres_conf --quiet
--pgconf /etc/postgresql/9.0/main/postgresql.conf
--change log_min_duration_statement=0
15 11 * * * perl bin/modify_postgres_conf --quiet
--pgconf /etc/postgresql/9.0/main/postgresql.conf
--change log_min_duration_statement=200 --no-comment
## The above are both one line each, but split for readability here
This changes log_min_duration_statement to 0 at 11AM, and then back to 200 15 minutes later. We use the --quiet argument as this is run from cron so we don't want any output from modify_postgres_conf on success. We do want a comment when we flip it to 0, as this is the temporary state and we want people viewing the postgresql.conf file at that time to realize it (or someone just doing a "git diff"). We don't want a comment when we flip it back, as the timestamp in the comment would cause git to think the file had changed.
Now for the tricky bit: extracting out just the section of logs that we want and sending it to PGSI. Here's the recipe I came up with for this client:
16 11 * * * tac `ls -1rt /var/log/pg_log/postgres*log
| tail -1`
| sed -n '/statement" changed to "200"/,/statement" changed to "0"/ p'
| tac
| bin/pgsi.pl --quiet > tmp/pgsi.html && bin/send_pgsi.pl
## Again, the above is all one line
What does this do? First, it finds the latest file in the /var/log/pg_log directory that starts with 'postgres' and ends with 'log'. Then it uses the tac program to spool the file backwards, one line at a time ('tac' is the opposite of 'cat'). Then it pipes that output to the sed program, which prints out all lines starting with the one where we changed the log_min_duration_statement to 200, and ending with the one where we changed it to 0 (the reverse of what we actually did, as we are reading it backwards). Finally, we use tac again to put the lines back in the correct order, pipe the output to pgsi, write the output to a temporary file, and then call a quick Perl script named send_pgsi.pl which mails the temporary HTML file to some interested parties.
Why do we use tac? Because we want to read the file backwards, so as to make sure we get the correct slice of log files as delimited by the log_min_duration_statement changes. If we simply started at the beginning of the file, we might encounter other similar changes that were made earlier and not by us.
All of this is not foolproof, of course, but it does not have to be, as it is very easy to run manually is something (for example the sed recipe) goes wrong, as the log file will still be there. Yes, it's also possible to grab the ranges in other ways (such as perl), but I find sed the quickest and easiest. As tempting as it was to write Yet Another Perl Script to extract the lines, sometimes a few chained Unix programs can do the job quite nicely.
I recently wrote about raw caching performance in Rails and advanced Rails performance techniques. In the latter article, I explained how to use a Rails low-level cache to store lists of things during the index or list request. This technique works well for list pages, but it doesn't necessarily apply to requests to an individual thing, or what is commonly referred to as the "show" action in Rails applications.
In my application, the "show" action loaded at ~200 ms/request with low concurrency, with the use of Rails fragment caching. And with high concurrency, the requests shot up to around 2000 ms/request. This wasn't cutting it! So, I pursued implementing full-page caching with a follow-up AJAX request, outlined by this diagram:
First, the fully-cached is loaded (quickly). Next, an AJAX request is made to retrieve access information. The access information returns a JSON object with information on whether or not there is a user, and if that user has edit access to that thing. If there is no user, the page stays as is. If there is a user, but he does not have edit permissions, the log out button is shown and the username is populated. If there is a user and he has edit permissions, the log out button is shown, the username is populated, and additional buttons requiring edit access are shown.
The Code
To cache the full page, I use the caches_page method, and cache only on requests of HTML format (other formats are not cached):
class ThingsController < ApplicationController
caches_page :show, :if => Proc.new { |c| c.request.format.html? }
...
end
My access level request looks something like this:
def accessibility
respond_to do |format|
format.json do
render :json => {
:logged_in => current_user ? current_user.to_json(:only => [:id, :username]) : false,
:can_edit => current_user ? Thing.find(params[:id]).can_edit?(current_user) : false }
end
end
end
My HTML has some bits of code sprinkled throughout it:
My jQuery AJAX request looks something like the code shown below. Note that I remove elements that do not apply to the current request:
$.ajax({
type: 'GET',
cache: false,
url: editability_path, //editability_path is defined in the HTML (a JavaScript variable)
dataType: "JSON",
error: function(xhr){
$('.require_editability,.require_loggged_in').remove();
},
success: function(results) {
if(results.logged_in) {
$('.require_logged_in').show();
$('#my_account').html(results.logged_in.username);
if(results.can_edit) {
$('.require_editability').show();
} else {
$('.require_editability').remove();
}
} else {
$('.require_editability,.require_loggged_in').remove();
}
}
});
And don't forget the sweeper to clear the fully cached page after edits (or other ActiveRecord callbacks):
class ThingSweeper < ActionController::Caching::Sweeper
observe Thing
def after_save(record)
expire_page :controller => :things, :action => :show, :id => record.id
end
end
Additional Notes
There are some additional notes to mention:
If a user were to hack the AJAX or JavaScript, server-side validation is still being performed when an "edit" action is submitted. In other words, if a hacker somehow enabled an edit button to show up and post an edit, a server-side response would prohibit the update because the hacker does not have appropriate accessibility.
HTML changes were made to accommodate this caching behavior, which was a bit tricky. HTML has to handle all potential use cases (no user, user & no edit access, user & edit access). jQuery itself can also be used to introduce new elements per use case.
The access level AJAX request is also hitting more low-level Rails caches: For example, the array of things that a user has edit permissions is cached and the cache is cleared with standard Rails sweepers. With this additional caching component, the access level AJAX request is hitting the database minimally.
Performance optimization scenarios such as this make an argument against inline editing of resources. If there were a backend admin interface to allow editing of things, full-page caching would be more straight-forward to implement.
Conclusion
With this functionality, fully cached pages are served with an average of less than 5 ms/request, and the AJAX accessibility request appears to be around 20 ms/request (although this is harder to test with simple command line tools). This is an improvement over the 200 ms/request initially implemented. Additionally, requests at a high concurrency don't bog down the system as much.
I see a lot of questions regarding Spree performance in the spree-user group, but they are rarely answered with metrics. I put together a quick script using the generic benchmark tool ab to review some data. Obviously, the answer to how well a site performs and scales is highly dependent on the host and the consumption of the web application, so the data here needs to be taken with a grain of salt. Another thing to note is that only two of the following use cases are running on Rails 3.0 — many of our current Spree clients are on Spree 0.11.2 or older. I also included one non-Spree Rails ecommerce application, in addition to a few non-Rails applications for comparison. All of the tests were run from my home network, so in theory there shouldn't be bias on performance tests for sites running on End Point servers.
We expect to see high performance on some of the sites with significant performance optimization. On smaller VPS, we expect to see the the server choke with higher concurrency.
Last week, I was coding in Perl for an Interchange project. I've been in and out of Perl and Ruby a lot lately. While I was working on the project, I came across the following bit of code and wanted to finally sit down and figure out how to use the map function in Perl on this bit of code.
my @options;
for my $obj (@$things) {
push @options, {
value => $obj->{a},
label => $obj->{b}
};
}
return @options;
I'm a big fan of Ruby's inject method and in general a fan of the Enumerable Module, but I have a brain block when it comes to using the map method in both Perl and Ruby. I spent a little time investigating and working on a small local Perl script to test the implementation of the map method. I came up with the following:
After that, I wanted to make sure the code change was justified. The Interchange application that is the source of this code is built for performance, so I wanted to ensure this change didn't hinder performance. It's been a while since I've done benchmarking in Perl, so I also had to refresh my memory regarding using the Benchmark module. I came up with:
#!/usr/bin/perl
use Benchmark;
my $count = 1000000;
my $things = [
{'a' => 123, 'b' => 456, 'c' => 789 },
{'a' => 456, 'b' => 789, 'c' => 123 }
];
#Test definitions as methods to mimic use in application
my $test1 = sub {
my @options;
for my $obj (@$things) {
push @options, {
value => $obj->{a},
label => $obj->{b}
};
}
return @options;
};
my $test2 = sub {
return [ map {
{
value => $_->{a},
label => $_->{b}
}
} @$things ];
};
#Benchmark tests & results.
$t0 = Benchmark->new;
$test1->() for(1..$count);
$t1 = Benchmark->new;
$td = timediff($t1, $t0);
print "the code for test 1 took:",timestr($td),"n";
$t0 = Benchmark->new;
$test2->() for(1..$count);
$t1 = Benchmark->new;
$td = timediff($t1, $t0);
print "the code for test 2 took:",timestr($td),"n";
The results were:
Test #
Before (For Loop)
After (Map)
1
5 sec
4 sec
2
5 sec
4 sec
3
5 sec
5 sec
4
5 sec
5 sec
5
6 sec
4 sec
6
6 sec
4 sec
7
6 sec
4 sec
8
5 sec
5 sec
9
5 sec
4 sec
10
5 sec
4 sec
Average
5.3 sec
4.3 sec
In this case, replacing the imperative programming style here with Functional programming (via map) yielded a small performance improvement, but the script executed each method 1,000,000 times, so the performance gain yielded by just one method call is very small. I doubt it's worth it go on a code cleanup rampage to update and test this, but it's good to keep in mind moving forward as small bits of the code are touched. I also wonder if the performance will vary when the size of $things changes — something I didn't test here. It was nice to practice using Perl's map method and Benchmark module. Yippee.
A few years back, I started working on a module named DBIx::Cache which would add a caching layer at the database driver level. The project that was driving it got put on hold indefinitely, so it's been on my long-term todo list to release what I did have to the public in the hope that someone else may find it useful. Hence, I've just released version 1.0.1 of DBIx::Cache. Consider it the closest thing Postgres has at the moment for query caching. :) The canonical webpage:
http://bucardo.org/wiki/DBIx-Cache
You can also grab it via git, either directly:
git clone git://bucardo.org/dbixcache.git/
or through the indispensable github:
https://github.com/bucardo/dbixcache
So, what does it do exactly? Well, the idea is that certain queries that are either repeated often and/or are very expensive to run should be cached somewhere, such that the database does not have to redo all the same work, just to return the same results over and over to the client application. Currently, the best you can hope for with Postgres is that things are in RAM from being run recently. DBIx::Cache changes this by caching the results somewhere else. The default destination is memcached.
DBIx::Cache acts as a transparent layer around your DBI calls. You can control which queries, or classes of queries get cached. Most of the basic DBI methods are overridden so that rather than query Postgres, they actually query memcached as needed (or other caching layer - could even query back into Postgres itself!). Let's look at a simple example:
use strict;
use warnings;
use Data::Dumper;
use DBIx::Cache;
use Cache::Memcached::Fast;
## Connect to an existing memcached server,
## and establish a default namespace
my $mc = Cache::Memcached::Fast->new(
{
servers => [ { address => 'localhost:11211' } ],
namespace => 'joy',
});
## Rather than DBI->connect, use DBIx->connect
## Tell it what to use as our caching source
## (the memcached server above)
my $dbh = DBIx::Cache->connect('', '', '',
{ RaiseError => 1,
dxc_cachehandle => $mc
});
## This is an expensive query, that takes 30 seconds to run:
my $SQL = 'SELECT * FROM analyze_sales_data()';
## Prepare this query
my $sth = $dbh->prepare($SQL);
## Run it ten times in a row.
## The first time takes 30 seconds, the other nine return instantly.
for (1..10) {
my $count = $sth->execute();
my $info = $sth->fetchall_arrayref({});
print Dumper $info;
}
In the above, the prepare($SQL) is actually calling the DBIx::Class::prepare method. This parses the query and tries to determine if it is cacheable or not, then stores that decision internally. Regardless of the result, it calls DBI::prepare (which is techincally DBD::Pg::prepare), and returns the result.The magic comes in the call to execute() later on. As you might imagine, this is also actually the DBIx::Class::execute() method. If the query is not cacheable, it simply runs it as normal and returns. If it is cacheable, and this is the first time it is run, DBIx::Class runs an EXPLAIN EXECUTE on the original statement, and parses out a list of all tables that are used in this query. Then it caches all of this information into memcached, so that subsequent runs using the same list of arguments to execute() don't need to do that work again.
Finally, we come to fetchall_arrayref(). The first time it is run, we simply call the parent methods and get the data back. Then we build unique keys and store the results of the query into memcached. Finally, we mark the execute() as fully cached. Thus, on subsequent calls to execute(), we don't actually execute anything on the database server, but simply return the count as stashed inside of memcached (in the case of execute, this is the number of affected rows). For the various fetch() methods, we do the same thing - rather than fetch things from the database (via DBI, DBD::Pg, and libpq), we get the results from memcached (frozen via Data::Dumper), and then unpack and return them. Since we don't actually need to do any work against the database, everything returns as fast as we can query memcached - which is in general very fast indeed.
Most of the above is working, but the piece that is not written is the cache invalidation. DBIx::Cache knows which tables go to which queries, so in theory you could have (for example), an UPDATE/INSERT/DELETE trigger on table X which calls DBIx::Cache and tells it to invalidate all items related to table X, so that the next call to prepare() or execute() or fetch() will not find any memcached matches and re-run the whole query and store the results. You could also simply handle that in your application, of course, and have it decide when to invalidate items.
It's been a while since I've really looked at the code, but as far as I can tell it is close to being able to actually use somewhere. :) Patches and questions welcome!
I'm a sucker for a good performance case study. So, when I came across a general request for performance improvement suggestions at Inspiredology, I couldn't help but experiment a bit.
The site runs on WordPress and is heavy on the graphics as it's a site geared towards web designers. I inquired to the site administrators about grabbing a static copy of their home page and using it for a case study on our blog. My tools of choice for optimization were webpagetest.org and YSlow.
Here are the results of a 4-step optimization in visual form:
Inspiredology's complete homepage.
The graph on the left shows the page load time in seconds for a first time view. Throughout optimization, page load time goes from 13.412 seconds to 9.212 seconds. Each step had a measurable impact. The graph on the right shows the page load time in seconds for a repeated view, and this goes from 7.329 seconds to 2.563 seconds throughout optimization. The first optimization step (CSS spriting and file combination) yielded a large performance improvement. I'm not sure why there's a slight performance decrease between step 3 and step 4.
And here's a summary of the changes involved in each step:
Step 1
Addition of CSS Sprites: I wrote about CSS Sprites a while back and A List Apart has an older but still relevant article on CSS Sprites here. Repeating elements like navigation components, icons, and buttons are suitable for CSS sprites. Article or page-specific images are not typically suitable for CSS sprites. For Inspiredology's site, I created two sprited images - one with a large amount of navigation components, and one with some of their large background images. You can find a great tool for building CSS rules from a sprited image here.
Combination of JS and CSS files, where applicable. Any JavaScript or CSS files that are included throughout the site are suitable for combination. Files that can't be combined include suckerfish JavaScript like Google Analytics or marketing service scripts.
Moved JavaScript requests to the bottom of the HTML. This is recommended because JavaScript requests block parallel downloading. Moving them to the bottom allows page elements to be downloaded and rendered first, followed by JavaScript loading.
Addition of expires headers and disabling ETags: These are standard optimization suggestions. Jon wrote about using these a bit here and here.
Step 4
Serving gzipped content with mod_deflate: Also a fairly standard optimization suggestion. Although, I should note I had some issues gzipping a couple of the files and since the site was in a temporary location, I didn't spend much time troubleshooting.
A bit more cleanup of rogue html and CSS files. In particular, there was one HTML file requested that didn't have any content in it and another that had JavaScript that I appended to the combined JavaScript file (combined.js).
A side-by-side comparison of webpagetest.org's original versus step 4 results highlights the reduction of requests in the waterfall and the large reduction in requests on the repeat view:
What Next?
At this point, webpagetest.org suggests the following changes:
Gzipping the remaining components has a potential of reducing total bytes of the first request by ~10%.
Additional image compression has the potential of reducing total bytes of the first request by about ~6%. This metric is based on their image compression check: "JPEG - Within 10% of a photoshop quality 50 will pass, up to 50% larger will warn and anything larger than that will fail." Quite a few of Inspiredology's jpgs did not pass this test and could be optimized further.
Use a CDN. This is a common optimization suggestion, but the cost of a CDN isn't always justified for smaller sites.
I would suggest:
Revisiting CSS spriting to further optimize. I only spent a short time spriting and didn't work out all the kinks. There were a few requests that I didn't sprite because they were repeating elements, but repeating elements can be sprited together. Another 5 requests might be eliminated with additional CSS spriting.
Server-optimization: Inspiredology runs on WordPress. We've used the wp-cache plugin for a couple of our clients running WordPress, which I believe helps. But note that the case study presented here is a static page with static assets, so there is obviously a huge gain to be had by optimizing serving images, CSS, and JavaScript.
Database optimization: Again, there's no database in play in this static page experiment. But there's always room for improvement on database optimization. Josh recently made performance improvements for one of our clients running on Rails with postgreSQL using pgsi, our open source postgreSQL performance reporting tool, and had outrageously impressive benchmarked improvements.
I just read an article about CSS selectors. The combined.css file I created for this case study has 2000 lines. Although there might be only a small win with optimization here, surely optimization and cleanup of that file can be beneficial.
I recently wrote about several jQuery tips, including performance optimization techniques. This isn't going to improve the serving of static assets, but it would be another customer-facing enhancement that can improve the usability of the site.
If you work on Internet systems all day like we do, there's a good chance you use some sort of monitoring software. Almost every business knows they need monitoring. If you're a small company or organization, you probably started out with something free like Nagios. Or maybe you're a really small company and prefer to outsource your alerts to a web service like Pingdom. Either way, you understand that it's important to know when your websites and mailservers are down. But do you monitor with purpose?
All too often I encounter installations where the Systems Administrator has spent countless hours setting up their checks, making sure their thresholds and notifications work as designed, without really considering what their response might be in the face of disaster (or an inconvenient page at 3am). Operations folk have been trained to make sure their systems are pingable, their CPU temperature is running cool and the system load is at a reasonable level. But what do you do when that alert comes in because the website load is running at 10 for the last 15 minutes? Is that bad? How can you be certain?
The art of monitoring isn't simply reactive in nature. A good SysAdmin will understand that real monitoring takes an active presence. Talk to your DBAs, software engineers and architects. Learn how the various components of your system(s) interact and relate, both in good times and bad. Review your performance trends (graphs) to see how each metric evolves over time. Without understanding the functional scope of your systems, you can't expect to set meaningful thresholds on them.
Last but not least, every alert should be actionable. Getting paged because your application server is down is useless unless you have the proper remediation path documented and tested. Know what actions are needed, who should perform them, and what parties to escalate to in case the remediation fails. Focusing your energies on purposeful monitoring results in fewer false alarms, faster recovery from failures and regression, and an acute understanding of your entire application stack.
I've recently been jumping back and forth between YUI and jQuery on several different client projects. I prefer working with jQuery, but whenever I work with another framework, I realize what's out there and how I should continue to improve my skills in my preferred framework. I read up on jQuery tips and put together a summary of common tips I need to follow more explained here in an ecommerce demo.
The Setup
Before we get started, some notes on the ecommerce demo and performance testing:
console.time(?task?) and console.timeEnd(?task?) are used to debug task runtime for performance measurement
The performance numbers provided in the article were measured in Chrome (Ubuntu) where the average of 10 tests is reported. In all the tests, an additional for loop was added as part of the test to see measurable performance differences. See the notes at the bottom of the article on performance differences between Chrome and Firefox.
A screenshot from the demo app.
1. The first tip I came across was a recommendation to use a for loop instead of jQuery's each method. I start off with some un-optimized code to test that loops through our products JSON object and renders the images to the page. In this case, the use of the for loop instead of each doesn't give us a significant performance difference.
Products displayed with a for loop instead of jQuery's each method.
2. The next "I Can't Believe I'm Not Using this jQuery Technique" I found was a recommendation to use data tag. Although I've read about the data tag, I haven't worked with it enough to use it consistently. With this code, we assign product data (name, price) to each product link as it?s added to the DOM. Upon clicking a product, instead of traversing through the products array, we render the "featured_product" contents based on it's data. The data tag is recommended over assigning values to arbitrary HTML tag attributes such as assigning our product <a> and <img> the name and price values to title or alt attributes.
jQuery's data tag is used to populate the right side of the page as a product is clicked.
3. The next tip I came across frequently was a recommendation to cache your selectors, shown in the example below. A selector $('div.products a.featured') is created and used when users like to view "Featured" Items. This gave me a 33% performance gain.
Multiple products added to test onclick for identifying "featured" products.
4. Another tip I came across was the recommendation on using context in jQuery selectors, described here in the jQuery documentation. In one example, I try updating the $('div.products') selector to set the context as $(?div.wrapper?), but saw performance worsen here. In another example, I added "this" as a context for populating the featured product, but again saw performance worsen slightly here. In this case, performance gain will depend on your original selector, but it's worth testing.
5. Another common tip I came across that "I can't believe I don't follow" is to use an id selector instead of a class. I?ve admittedly read this several times, but again it's a practice that I sometimes forget about. In my ecommerce setup, I add a loop to add the products 10x to our DOM, and with a change of selector from 'div.products' to 'div#products', I saw a small performance improvement.
6. I found a recommendation to minimize the DOM minimally. This is a tip that I typically follow, but another one that's easily forgotten. In our ecommerce setup, I create a single point of appending to the div#products selector after generating my product links and data. This gave a ~25% performance gain.
7. Another tip I came across was to use event delegation in jQuery. The idea is that your event has more context to work with than general selectors. You can access the event and manipulate it?s parent. I found that $(e.target).parent() is the same as manipulating $(this) performance-wise. It?s likely that using one or the other is much faster than using a general DOM selector such as $(div.product' + id).
var featured_product = $('div#featured_product');
$('div#products a').click(function(e) {
var product = $(e.target).parent(); // same as $(this)
featured_product
.html('<h1>' + product.data('name') + '</h1>')
.append(product.find('img').clone());
});
8. One tip that I've never seen before is to use ".end()" in chaining. Instead of reselecting the $('div.products') region, I use "find()" to apply a css style, then traverse up the chain to find another set of products to apply a css style, and repeat. This change gave a small performance bump, but you might tend to use the cached selectors described in Tip #3 instead of the following.
9. I found several examples of writing your own selectors and how easy it is! I wrote two selectors with the following to identify products under $20.00 and products over $1000.00.
Products priced over $1000 are highlighted with a custom selector.
10. I also found several examples of how to write your own chain methods. In this example, I create two chain methods to set the product background to white or grey and update my under20 & over1000 methods to use this new chain method. The nice thing about creating your own chain methods is that these methods can be easily modified in the future with minimal code changes because it follows the DRY principle. I'm not sure if it's intended, but the use of my custom chain method did not work with the end() chain method described in Tip #8.
Products priced under $20 are highlighted with a custom selector and custom chain method.
Again, the product images for this demo app are from DryIcons.com and the final application code can be found here. The application was also deployed on Heroku to verify that the code works in IE [8], Chrome, and Firefox.
In Case You are Interested
During development of the demo, I found a significant performance differences between Chrome and Firefox. See the following tests:
I recently had the privilege of reading and reviewing the book PostgreSQL 9.0 High Performance by Greg Smith. While the title of the book suggests that it may be relevant only to PostgreSQL 9.0, there is in fact a wealth of information to be found which is relevant for all community supported versions of Postgres.
Acheiving the highest performance with PostgreSQL is definitely something which touches all layers of the stack, from your specific disk hardware, OS and filesystem to the database configuration, connection/data access patterns, and queries in use. This book gathers up a lot of the information and advice that I've seen bandied about on the IRC channel and the PostgreSQL mailing lists and presents it in one place.
While seemingly related, I believe some of the main points of the book could be summed up as:
Measure, don't guess. From the early chapters which cover the lowest-level considerations, such as disk hardware/configuration to the later chapters which cover such topics as query optimization, replication and partitioning, considerable emphasis is placed on determining the metrics by which to measure performance before/after specific changes. This is the only way to determine the impact the changes you make have.
Tailor to your specific needs/workflows. While there are many good rules of thumb out there when it comes to configuration/tuning, this book emphasizes the process of determining/refining those more general numbers to tailoring configuration/setup to your specific database's needs.
Review the information the database system itself gives you. Information provided by the pg_stat_* views can be useful in identifying bottlenecks in queries, unused/underused indexes.
This book also introduced me to a few goodies which I had not encountered previously. One of the more interesting ones is the pg_buffercache contrib module. This suite of functions allows you to peek at the internals of the shared_buffers cache to get a feel for which relations are heavily accessed on a block-by-block basis. The examples in the book show this being used to more accurately size shared_buffers based on the actual number of accesses to specific portions of different relations.
I found the book to be well-written (always a plus when reading technical books) and felt it covered quite a bit of depth given its ambitious scope. Overall, it was an informative and enjoyable read.