* standby: add standby mode
In standby mode PageSpeed is off, except it serves .pagespeed. resources and
PageSpeed query parameters are interpreted. This is equivalent to "off" in
mod_pagespeed.
With this change, "off" in mod_pagespeed is deprecated, and people should use
"standby" instead.
* add test file
* rewrite doc
* get tests to pass under memcached
* doc: minor display tweak
* build release tarball: open source it
* change it from being given a branch to working on the current checkout
* change it to work with create_distro_tarball, which was already opensourced
* make it build openssl 1.0.2 if needed
* doc ubuntu 14 dep
build_development_apache.sh flakes about 21% of the time, due to the apr
makefile not handling parallelism properly when running the install target.
Explicitly set -j1 when installing apr.
Fixespagespeed/ngx_pagespeed#1338
This should fix the ngx_pagespeed build on travis, because the ngx_pagespeed build is using devel.
(This is actually something I had originally done as part of 91662d08a, but because part of that change involved removing our exporting tools I hand-exported the change and missed this file.)
* many files had no license comments at all
* some files had license comments suggesting that they weren't open source
(like a terse "all rights reserved") when they actually are open source.
* all our files are licensed under apache and should be marked as such
ngxpagespeed.com/install serves a redirect to build_ngx_pagespeed.sh on github. Now that we develop on master and trunk-tracking is gone, fix it to point at master.
Already fixed (manually) on the live site.
These are the scripts and Makefiles we've been using to develop mod_pagespeed. These were in a google-internal repo instead of being open sourced for complicated internal reasons, but now mod_pagespeed developers inside and outside google can use the same tools and flow.
This change adds a bunch of make targets. To see them, look at the big comment at the top of devel/Makefile. To run them, cd to devel/ and run make:
cd devel/
make apache_debug_smoke_test
This change also adds 'checkin' tests, which are a way to run all our tests together, so we can be confident a change doesn't break any of them. To run checkin tests:
cd devel/
./checkin
They're pretty slow: about 2hr on my machine. Definitely don't bother runnning checkin tests until unit tests and system tests have passed first.
malloc, but makes sure that padding bytes at very end are deterministically
zero.
Reason: Testing on native builds on Debian Sid suggested that *something* in
our compression stack was apparently letting some of the padding bytes
influence the output. (Noticeable on PngOptimizerTest.ValidPngs).
This works around the issue.
(From libpng docs on changes in 1.6.x:
"The library now issues an error if the application attempts to set a
transform after it calls png_read_update_info() or if it attempts to call
both png_read_update_info() and png_start_read_image() or to call either
of them more than once.")
mutable_output_partition(i) and change call-sites as needed.
Add an AtomicBool used for checking that we should not be modifying
the CachedResult in a RewriteContext after it is serialized to the
cache, and a new private method RewriteContext::CheckNotFrozen() to
check it.
This is needed to build on Debian sid against system libpng, which is always 1.6.
Also stop using png_sizeof, it's the same as sizeof in practice,
and is gone in 1.6.
a) we can easily & selectively purge the HTTP cache by deleting a directory
b) doing this doesn't also clear the metadata cache, which needs to remain fresh to repro the bug.
1) RateController::Fetch would try IncrementIfCanTriggerFetch, and fail since there were too many outstanding background fetches for the host. Note that "too many" is 1 in case of prefork.
2) A previous fetch would finish, notice there is nothing queued, so do nothing else.
3) RateController::Fetch would EnqueueFetchIfWithinThreshold, but there is no long and outstanding fetch for the host to push it out of the queue on completion.
The change just locks the HostFetchInfo for the duration, and adds a bunch of lock annotation for that. (This does mean we hold two locks now).
(This was probably more visible in loadtests since this is per-host and that has lots
of hosts).
Also fix an occasional DCHECK about current_global_fetch_queue_size_, where we could end up dequeing the entry and decrementing it before the increment due to another race --- thanks to Josh for pointing it out.
- Work-around various SELinux permission type stuff.
- Force .js to have the correct MIME type.
- os_redirector.sh should load build_env as well as shell_utils.
StringPiece is eventually going to be replaced with std::string_view,
which has a somewhat different API. Adjust our use of StringPiece to
avoid some things that are going away.
by making sure that other_rewrite_driver gets cleaned up: the test sets
RewriteTestBase into managed mode, which basically means it does
rewrite_driver = NewRewriteDriver
other_rewrite_driver = NewRewriterDriver
... but then nothing actually does anything with other_rewrite_driver, making it
stick-around until factory shutdown does emergency cleanup.
(see https://github.com/pagespeed/mod_pagespeed/issues/1421)
The remote-config tests used to abuse netcat as a kind of server. They
would run one netcat for each possible response we wanted to test, each
on its own port since we couldn't have netcat read the request line to
make a custom response, which meant we needed a lot of ports.
Additionally they depended on a flag that is only available in the
debian distribution of netcat, so testing on other platforms meant
copying in debian's patch.
This change replaces the hacky fragile use of netcat with a server
written in python as a relatively simple wrapper around select.select().
Also, this adds four more tests for verifying that remote config does
the right thing when the remote config server becomes
available (rcport8), when we have a stale copy of the config (rcport9,
which fails), when the config 403s (rcport10), and when the config
initially 403s but then 200s (rcport11).
* In a followup I'll rename the numbered ports to better names, but I
think it's less confusing if I keep them the same here.
* "pathological_server" probably isn't the best name, since most of what
it's doing isn't actually that bad. On the other hand, having full
control over what we send out allows us to be as pathological as we
want to if we want to use this to test anything else.
nginx side of the change: https://github.com/pagespeed/ngx_pagespeed/pull/1293
AsyncFetch::HandleDone(success=false) can't pass success=false to
HeadersComplete. Success being false means "we've encountered an error"
but the headers may or may not have been updated to match. For example,
if the CacheUrlAsyncFetcher doesn't find something it will set
status=false and set up 404 headers, but if SendFallbackResponse aborts
a response for having the wrong content type then it sets status=false
while leaving 200 headers.
We need to make sure that error headers are sent whenever status=false,
and the only place we can make sure of that is in AsyncFetch::Done
itself, because HandleDone is too late and HandleHeadersComplete can't
see status.
This fixes a flake in resource_content_type_html.sh with ngx_pagespeed
where when the deadline alarm was hit, which mostly happens under
valgrind, it would send out the headers for the request with no body.
I've verified the fix by making sure that the test passes if you have
set test_instant_fetch_rewrite_deadline=true, and also that while we
previously got 5 flakes in 133 runs under valgrind now we don't get any.
(I previously tried to fix this on the ngx_pagespeed end with
pagespeed/ngx_pagespeed#1307 , but that doesn't work consistently
because ngx_pagespeed races to send out headers immediately after
receiving HeadersComplete.)
Split apache/system_test.sh into apache/system_tests/*, the same way
automatic/ and system/ are divided.
I haven't done any reordering or cleanups: I want to first get it split, then
come back and make it tidy, or else keeping this outstanding CL synced without
merge conflicts will be a huge pain.
The main advantage of splitting the tests is that it allows us to run a single
test with TEST_TO_RUN=filename when debugging, but it also makes future
refactors simpler.
A change about two years ago broke check_not and check_not_from so that they didn't actually
exit on error anymore, because their failing exit status was swallowed by
the subshell. Luckily there aren't any tests that have gone stale in the mean time!
This was added, found to be too flaky, and has been disabled for over a year. It probably doesn't work anymore, and if we want to revive it we'll probably do it a different way.
(Also, I want to make changes to this part of the test script, and having stale code around makes it hard to make sure I haven't broken the stale code.)
* Switch blocking rewrite_test to use the larger Beach.jpeg. On my VM the other
image was completing too quickly.
* Make process_scope test fetch a directory that has an index.html, not
a 403.
Split system/system_test.sh into system/system_tests/*, the same way
automatic/system_test.sh is divided.
I haven't done any reordering or cleanups: I want to first get it split, then
come back and make it tidy, or else keeping this outstanding CL synced without
merge conflicts will be a huge pain.
The main advantage of splitting the tests is that it allows us to run a single
test with TEST_TO_RUN=filename when debugging, but it also makes future
refactors simpler.
type, such as html, pdf, or something we can't parse.
This has a very slight negative effect on performance for low-entry URLs
in siege tests (very close to the noise level), and a fairly significant
on high-entropy URLs, such as HTML with random query params. The idea
is to avoid filling the cache with them.
People can set TEST_TO_RUN=test-name to run only a specific test.
This only works for tests that are set up to use run_test, which is currently
only the ones under automatic/, but I'm planning to convert system/ etc in a
followup.
I didn't test 7ecf90cff properly , and it was broken in three ways:
* It didn't pass in REDIRECT_STATUS, so php wouldn't start
* When you start php-cgi from the command line this isn't
needed, but when php-cgi detects it's being run
non-interactively it refuses to start without it.
* A test that was expecting to talk to flush.example.com in one
case and noflush.example.com was always getting
noflush.example.com.
* The 'check if php is working' command worked, but it was using
an endpoint with a 2s sleep when it didn't need that, which
slowed down the tests.
Don't DCHECK on computing query-options from a bad URL. Handle
responsive_images corner-case with an escaped URL. In particular don't
DCHECK.
Allow make_show_ads_async to run even when the <script> tag is the only
tag in the document. In particular don't DCHECK.
when shutting_down. That doesn't just leak it, it leaks the rewrite driver,
which we then have to try to force-shutdown which may also try to delete the
RewriteContext if it's used in fetch, potentially blowing up on stats access.
* Add install/start_php.sh that starts a php server on the specified port,
killing an existing one if need be.
* Change the follow-flushes test to first check that the page can be fetched.
* Refactor the arguments to check_flushing to reduce duplication.
nginx side of the change is: https://github.com/pagespeed/ngx_pagespeed/pull/1288
by always registering cohort, and deciding whether we need it at read time.
Also add the corresponding example page and integration test.
Fix it on nginx by adding a separate hook for post-property-cache init, as
its actually not ready in StartParse w/ProxyFetch (while it is with Apache)
Also remove some needless quoting that was pointed out in review.
Supports both MOVED and ASKING. When receiving MOVED, pulls down the cluster mapping with CLUSTER SLOTS so that it can send queries to the right servers going forward
Most of the work by yeputons, with final cleanup, testing, and CLUSTER SLOTS support by jefftk
The way you used to install apache2 configuration on Debian-based systems
was to put it in /etc/apache2/conf.d/, but starting with Ubuntu 14 LTS
there's a new system where you install it to /etc/apache2/conf-available/ and
then run a2enconf to symlink it to /etc/apache2/conf-enabled. On these
systems we weren't installing pagespeed-libraries.conf in a way that made
it usable, because when we put it in conf.d it just was never loaded.
Switch to installing to /etc/apache2/conf-available/, and then use a2enconf
if it's available. If a2enconf isn't available, manually symlink to conf.d.
Fixes: https://github.com/pagespeed/mod_pagespeed/issues/1389
be due to signals being sent & caught, and a slightly-too-tight
timeout.
The root cause of the interrupt problem was lack of EINTR handling in
apr_memcache2.c. In general in Apache there is no signal handling so
this does not affect any kind of production use (at least when not
shutting down), but affects our unit tests in some enviornments. As far
as I can tell there was only one point where the EINTR handling needed
to be added, but I left behind more debug printing, via
fprintf(stderr...), which came through in unit tests. No
message-handlers are available from inside apr_memcache2.c.
Backs out the change to send an ascii reset sequence to the memcache.
It's totally unnecessary, exposes the ascii protocol of memcached at
the wrong layer of abstraction, and adds further risk to the test.
The current cache-cleaning is broken because it only deletes the main
cache, and not any alternate caches. Also cache-cleaning can be quite
slow, so be a little noisier while doing it, factor it out into a
script.
(Change by jmarantz@)
* With rel=preload the hint tells us what type of resource it is, and if
urls have been preserved for that type we should not strip it.
* If the rel=preload type isn't image, script, or style we shouldn't
strip it, because those are the only urls we change.
* The filter was originally written to use src= when it should have used
href=, which meant it removed hints it shouldn't have.
* Minor cleanup: change 'type *name' to 'type* name' to match local style.
* Minor cleanup: change NULL to nullptr.
Fixes: https://github.com/pagespeed/mod_pagespeed/issues/1392
Fixes: https://github.com/pagespeed/mod_pagespeed/issues/1393
For backporting to v33 for 1.11.33.4 I instead had a minimal version without the cleanups or refactoring: https://github.com/pagespeed/mod_pagespeed/pull/1394
cause gRPC stuff to happen if the CentralControllerPort is set.
There is a currently a known deadlock with the controller, which will be
fixed in a subsequence change.
not be enabled during most of our tests, which should instead match the
configuration used by default for our userbase.
Requires https://github.com/pagespeed/ngx_pagespeed/pull/1278 to be
checked in concurrently.
Note that this change breaks backward compatibility, because now specs like `::host:::port` are incorrect, while previously all "empty" pieces of spec between colons were ignored.
These are the changes I needed to make to get build-release-platform
to work on the buildbots:
* fcgid is not vailable on centos5, so do not require it
* httpd on centos5 is in /usr/sbin, which is not in PATH
* httpd -M on centos5 prints its output to stderr
* we have set -x so you cannot do foo ; if [ $? != 0 ] ...
* we now depend on having a c compiler newer than gcc 4.1,
so mess with the PATH to get us the one from scientific
linux
* but devtoolset includes a version of sudo that does not
support -E, so rename that so it is not used
* do not delete release/ for one architecture when you
start building the next one
* remove the scp to centos-buildbot since we keep
everything now
* git pull --ff-only only makes sense on a branch, not
when you are re at a tag
* put a linebreak in EOF) so we do not screw up syntax
highlighting in emacs
This change creates infrastructure for Redis Cluster tests and few simple smoke tests. More tests will come in future changes, when Redis Cluster is actually implement
This refactoring is necessary because in Redis Cluster we should be able to keep several connections to Redis at once and switch between them.
* Move connection-handling functions in RedisCache to RedisCache::Connection
* Create RedisCache::RedisCommand() which issues command, handles locks and validates response instead of the caller, thus simplifying the code.
builds.
I noticed this when testing the controller which can wake up a bunch of
blocked rewrites as part of the shutdown process. However, in general these
messages are generally useless/worrying for end users, plus we DTRT by
canceling the Function cleanly anyway. The gRPC specific issue will be
discussed when that code lands.
* Add system_specs.cc which aggregates structs for holding information about external cache servers (host, port) and corresponding parsing methods.
* Update SystemRewriteOptions and SystemCaches so that they use new structs instead of RedisServerSpec (which is removed)
* Parsing of server spec is done differently than in AprMemCache: it does not accept things like `:host:port` or `host:::port`.
This change aims to make further integration of Redis Cluster configuration easier.
It was not waiting until RedisNotRespondingServerThread actually receives connection, which resulted in TcpServerThreadForTesting failure. Other tests do not need to explicitly wait on that because they rely on what server thread answers to the client.
It didn't cover the one way that actually works, and
of the mechanisms it reported only one was used, and
only if enabled by an option that's not documented for
MPS/NPS, and only for Safari and obsolete Chrome, and
was using a mechanism that's likely to cause trouble.
It never worked in mod_pagespeed/ngx_pagespeed, and adds a lot of complexity due to how it's structure all over the place. More immediately, much of older code dealing with prefetching is used only by it.
(There are some lose ends this still keeps wrt to RewriteOptions::kFlushSubresources, and
perhaps something else that I missed, but this should be the bulk of it...)
* IsHealthy() is guaranteed to not lock for a long time.
* If one thread is currently connecting to Redis, other threads will fast-fail instead of blocking until connection succeeds/fails.
* Similarly, if connection is dropped at some point, only one thread will start re-connecting, others will fast-fail.
* RedisCache() ctor now needs ThreadSystem instead of single AbstractMutex.
Install FastCGI & libphp into root server if not already there.
Add -clean switch to optionally clean up /var/www/html prior to trying
to install over those directories.
Rename 'apache_system_test' to 'apache_root_test' to (hopefully) reduce
confusion. Sadly this test is not working yet.
Make all the ports disjoint for root_test/build_release_platform, so
don't get into a situation where a failed build_release_platform leaves
root apache2 owning a port required by our checkin tests.
* Now all custom servers (e.g. RedisGetRespondingServerThread) are started via RedisCacheTest::StartCustomServer<>, ports are also booked by RedisCacheTest.
* Get rid of cache_.reset() lines in tests in favor of several custom methods in the fixture.
* Make kTimeoutUs checked by static_assert, not CHECK
* Get rid of explicit `sleep 2` with following check for server availability, make it check availability every 0.1s until 2s pass. It gives significant speedup in practice as memcached/redis typically start up instantly.
* Move shell_utils.sh to open source.
* Implement wait_cmd_with_timeout for use in check_for_leaks and start_background_server - it uses $SECONDS instead of counting iterations so commands used can take arbitrary amount of time to run.
process-scope options, and thus generate warnings, into separate scripts
with quaranteened config.
Also (because this bit me during testing of this change): increase the
serf timeout when running under valgrind, and also change the
unconditional long timeout in mod_pagespeed_test/ipro/instant/wait to
only do so under valgrind. This was made much easier by an earlier
refactor of the way configuration options get plumbed through our
makesfiles.
Mechanically/high-volume-wise this required duplicating some of the build/
and yasm/ build bits, since they're in the main Chrome repo.
More messy is the url/ situation: there is a version available, but not completely
up-to-date one, in fact one that's older than our base and what we were using from svn,
so we need to do weird compatibility massaging.
(I've also picked up newest RE2, which required some sync up changes).
* Change order of declarations and definitions for better grouping.
* Move comment regarding reconnection strategy to header's beginning.
* Better comment formatting.
debug.conf.template into pagespeed.conf, using gmake functions. Add an
ending delimiter to all the sed-patterns for customizing the
configuration templates.
The refactoring reduces slightly the number of times we must repeat
stanzas involving OPT_* et al in our makefiles. More reductions are
needed.
The ending delimeter (\b) solves a problem where the pattern "#REDIS"
was a substring of "#REDIS_LOADTEST", and sed would break the template.
To work around this problem I used "#RED_LOADTEST" but now you don't
find that variable with a csearch for "REDIS".
* Add RedisCache::GetStatus() which prints result of INFO command.
* Add test for GetStatus().
* Make GetStatus() and statistics from CacheStats wrappers around RedisCache available from admin panel.
All Redis-statistics-related stuff should be done in this change.
By default we don't respect Vary headers we find on resources. People who do want us to respect them can set 'RespectVary on' in their configuration. The cache-extender was not considering the value of the RespectVary option in deciding whether things were cachable, and was instead assuming RespectVary=true.
Fixes https://github.com/pagespeed/mod_pagespeed/issues/1373
* Make AddMessageToBuffer() non-virtual, as nobody ever overrides it.
* Add version of AddMessageToBuffer() which accepts file:line arguments and adds them to the log line stored.
* Use the latter in ApacheMessageHandler.
Nginx side of this change: https://github.com/pagespeed/ngx_pagespeed/pull/1255 .
Related to 3e31b0c5d6.
This method is to be called in subclass' destructor. The goal is to avoid race
condition on vptr between the server thread and destructors who change vptr.
That also guarantees that all member fields of subclass are destructed after
server thread finishes.
Snapshot the SHM cache to disk about every five minutes, and restore on restart. Stop writing changes to the shm metadata cache through to the file cache.
Nginx side of the change, which includes a system test: https://github.com/pagespeed/ngx_pagespeed/pull/930
(No system test for Apache because what we're testing is completely shared between the two of them and we don't already have restart tests for Apache.)
* Add install/run_program_with_ext_caches.sh script.
* Make install/start_background_server.sh not override SERVER_CMD if one of sanity checks fails, that way it does not break anything if accidentally run after another start_background_server.sh.
* Fix outdated comments in install/start_background_server.sh.
This is cleaner code-wise, and makes it easier to do additional things like properly order dependencies, have different domain policy for preload hints, etc.
(It's also not as expensive computationally as I first thought, since the CSS parser has code for just parsing imports)
Testcase for this is TEST_F(PushPreloadFilterTest, IndirectCollected)
Should be done in this commit:
* New configuration variable for enabling single Redis server as cache backend. It should work everywhere where we have Memcache as an option.
* All system and leak tests should run under both Redis and Memcache.
* Unit tests for configuration of Redis (system_caches_test.cc).
* All other hook-ups of existing Redis adapter functionality which are not listed under "will be done in follow-ups" below.
Will be done in follow-ups:
* Full hook-up of statistics for Redis cache, including graphs on admin site (if any).
* Configurable reconnection timeout.
* Load stress tests.
* HTML documentation updates and update of pagespeed.conf.template.
* Features of Redis adapter: support for Redis Cluster, configurable timeouts for operations, MultiGet support, etc.
* Possibly: parse Memcached configuration in the same way, e.g. check correctness during config load, not actual cache start.
* Connect() is not available, use newly introduced StartUp().
* StartUp() tries to connect and enables reconnections until ShutDown().
* Reconnection is done inside Get/Put/Delete - either instantly after communication failure, or after some delay if last reconnection attempt did not succeed.
* IsHealthy() returns true when reconnection becomes possible to make the caller call Get/Put/Delete and initiation reconnection.
* TcpServerThreadForTesting::PickListenPortOnce now picks a specific port.
This test does not really test anything from SystemCaches. Also done:
* Refactor out BlockingCallback from SystemCachesTest to a separate file so
it can be used both in system_caches_test and apr_mem_cache_test.
* Introduce a SystemCachesExternalCacheTestBase fixture which holds tests (in a
form of tests helpers) that are common for memcached and Redis. It also holds
virtual functions which configure either server.
* All non-memcached specific tests are moved to that fixture with cut-paste.
* Implement SystemCachesMemCacheTest so that all old tests are run with memcached.
to
ignore them (eg. don't drop Content-Encoding: gzip if the content is
actually
gzipped).
fixes#1371
add two small checks to ipro recording gzipped content tests
We added an experimental option to extend the cache lifetime of
resources that explicitly set a TTL below our minimum. It looks to me like this
can be retired now that the experiment is done.
The option was never exposed in open source, so it's safe to fully remove it.
In 89efe99ad we changed the way we downloaded the closure compiler to switch
to using a tagged release, but the JAR file you get with a release has a
different filename. Update closure/download.sh to expect the new filename
format.
* Add FlushAll() public method to RedisCache and call it right after connecting to server in RedisCacheTest
* Add CacheKeyPrepender wrapper for CacheInterface
* Make AprMemCache add prefixes to all keys added to Memcached. Prefix right now is simply a test case + test name. Flushing Memcached requires us to further modify our apr_memcached fork, which we decided to avoid.
When debugging system test flakes that manifest as fetch_until timeouts it's
helpful to have the failed input to fetch_until available in the failure
message, because that's what people put in bug reports (and is all that's
available on blaze). This change makes fetch_until print:
Fetched file: 681 bytes (
... contents of fetch file ...
)
on timeouts. If the fetched file is binary we just dump the first 256 characters, as hex, along with how big it is and its guessed mime type:
Fetched file: 98051 bytes, image/jpeg; charset=binary (
00000000: ffd8 ffe0 0010 4a46 4946 0001 0100 0001 ......JFIF......
00000010: 0001 0000 ffdb 0043 0005 0304 0404 0305 .......C........
00000020: 0404 0405 0505 0607 0c08 0707 0707 0f0b ................
[snip]
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
...
)
Inspired by https://github.com/pagespeed/mod_pagespeed/issues/1359
InMemoryCache is a maximally simple implementation of cacheInterface,
for debugging and faking purposes.
This changelist is more for me (Egor) to get feedback than to actually submit the
code into Piper, because I doubt there are any direct applications. The goal is to ensure that I understand what it's going on in caches and that I haven't missed anything important.
* Shared code now lives as a helper script named run_program_with_server.sh.
* No special `-multi` mode anymore: all arguments are passed directly to `eval` and invoked as if they were typed inside the script. Essentially it enables `-multi` by default. We were not using it anywhere except abandoned experimental branch anyway, so it's ok to break backwards compatibility.
Without this CL, if the cache cleaner runs for more than the interval
at which we start cache cleaning, the second run would see the first's
lock as stale and begin another run. This CL makes the cleaner touch
its lock file every N (=1000) operations so that a cleaner that's still
making progress won't have its lock broken.
Because a big chunk of cache cleaning is one call to GetDirInfo that
does a huge recursive search, I had to extend GetDirInfo's API to
so you can pass in a callback for between FS operations.
Fixes https://github.com/pagespeed/mod_pagespeed/issues/1337
delete all the statics" --- code earlier than ~ApacheRewriteDriverFactory,
which includes ShutDown and code calling PthreadSharedMem::DestroySegment,
as otherwise it will spam logs with warnings about "Attempt to destroy unknown
SHM segment"
(ref https://github.com/pagespeed/mod_pagespeed/issues/1349)
* Add hiredis to dependencies and write .gyp for it
* Add run_program_with_redis.sh (copied from run_program_with_memcache.sh)
* Add RedisCache and one simple unit test for it
Things to do:
* Add more tests
* Add configuration variables for Redis cache so it can be enabled
* Understand purpose of memcache-related flags and variables in test Makefiles
and ensure Redis has same, if needed
If what we're going to inline into CSS or JS starts with the gzip magic
bytes (file signature) then it's very likely to be gzipped and very
unlikely to be valid CSS or JS, so we should abort the inlining.
Fixes the amount of https://github.com/pagespeed/mod_pagespeed/issues/1307
that I could repro. If we later see mangling files in-place (or combining)
then we can extend this.
(These show up in debug logs --- I was expecting them to show up
in debug comments as well, and while that didn't work out, no reason
not to keep this).
* Adds Connection: to the list of headers that contain fields separated by ","
* Marks Alt-Svc and Alternate-Protocol as hop-by-hop, so we will treat them as such.
* Adds sanitization of headers marked as hop-by-hop in Connection: headers as per rfc.
* Moves initialization of 'AtExitManager' out of the css filter to the more central ProcessContext as this change adds another dependancy on it -- and we must initialize it exactly once.
* Changes the header definitions to be lazy initialized to avoid extra allocations when manipulating headers, while avoiding static initialization.
(From https://github.com/pagespeed/mod_pagespeed/pull/1195)
defer_js was using Image objects to preload scripts for WebKit in a way that is
harmful in modern browsers. Currently this isn't actually needed in Chrome,
since the preload scanner already finds our deferred scripts, but Chrome is
thinking of removing that [1]. This change switches us to the new standard,
rel=preload, which we will need once Chrome updates its scanner not to preload
things with invalid type attributes.
Fixes https://github.com/pagespeed/mod_pagespeed/issues/1054
[1] https://bugs.chromium.org/p/chromium/issues/detail?id=623109
via:*google*.
Completes the fix to
https://github.com/pagespeed/ngx_pagespeed/issues/1149
A challenge here is to make sure we test all the egress points, e.g.
- pagespeed resources: cached and reconstructed on demand
- fallbacks of various sorts
- ipro resources: cached and reconstructed on demand
- loaded from LoadFromFile
also we must make sure we don't cache the 'public' based on the request
headers.
Partially fixes https://github.com/pagespeed/ngx_pagespeed/issues/1149
General strategy:
- adjust general mechanism for computing output response header from input response headers
to incorporate 'public' on input.
A challenge here is to make sure we test all the egress points, e.g.
- pagespeed resources: cached and reconstructed on demand
- fallbacks of various sorts
- ipro resources: cached and reconstructed on demand
- loaded from LoadFromFile
is included in something w/o explicitly restricted media.
Key thing to note here: empty media_ means everything, and we're doing
intersection, so there is a room for confusion between nothing and
everything.
otherwise we end up rejecting combine + rewrite CSS URLs. As part of this,
split ProxyMode semantics into two pieces: the decoding bit (which we want),
and the part that forced absolufication (which we don't want here, but other
applications do).
and then sync it back to per-ServerContext options. Should fix the
regression with ImageMaxRewritesAtOnce
(https://github.com/pagespeed/mod_pagespeed/issues/1305)
and give RewriteDriverFactory and all the ServerContexts a
consistent view of their value.
Apache, but only today was fixed in ngx_pagespeed on 5/11/16, in
https://github.com/pagespeed/ngx_pagespeed/pull/1193
To test a fix to this bug, I need to alter files in the htdocs
directory during the test. This is source-controlled. We were doing
this before with mod_pagespeed_test/cache_flush but somehow getting
away with it (or just getting lucky with timing). But this is really
a bad practice.
So in this pull-request I also change the infrastructure to simply copy
the cache_flush/ and purge/ subdirectories to htdocs/. The rest of the
directories get symlinked.
This must be committed alongside https://github.com/pagespeed/ngx_pagespeed/pull/1197
dropping our fd reference to it when the ApacheRewriteDriverFactory[1] is
destroyed, so it doesn't hang around between the runs.
This also ensures that we exit if we get cleaned up, even if there are
some httpd children hanging around for some reason keeping the fd alive ---
seems to happen with leftover php-cgi processes for me when doing smoke tests
with 2.4, which ends up wedging as we loop waiting for the processes to exit
[1] We don't want to do it like that for SystemRewriteDriverFactory, since
that can get hot-destroyed when reloading config, as tests verify.
This is because I want to use it under an ultra-optional flag in MPS, but the default
stat names (e.g. all-requests) would be quite confusing for something that won't be
turned on by most people.
chain of CachedResult -> Dependencies -> InputInfo and can then
safely have a Dependencies in CachedResult, rather than a cycle.
This is needed for kind of dumb reasons, though: normal RewriteContext
operation forces things to go through CachedResult while I really only
want to run on the slot chain, but it doesn't seem worth a major refactor
to avoid that.
Note: resource_tag_scanner_test.cc should go with previous commit,
input_info.proto with this one.
net/instaweb/rewriter/rewrite_driver_speed_test.cc, which is a proper
microbenchmark sharing the same infrastructure requirements.
apache/speed_test.cc was annoying because it was slow during unit
tests, and printed lots of error messages that slowed it further and
were distracting when running interactively. And its output was less
useful than speed-tests using the microbenchmark system.
explicitly:
1. AddInstrumentation, which needs to be completely disabled in AMP.
2. responsive_images & rewrite_domains, which need to be partially
disabled in AMP.
Partially addresses:
https://github.com/pagespeed/mod_pagespeed/issues/1263 . It's possible
that this commmit fully addresses that bug, but validation is needed.
pure virtual. This will allow the contexts to have completely different
implementations for RPC or "local" access, which is a pre-req for gRPC.
This also removes some redundancy in the controller class heirarchy.
This change adds two processes: a controller, and a babysitter. The
controller process is intended to host cheesy's controller work. The
babysitter process just waits around in case the controller dies and
restarts it if it does.
In order to exit when the host process exits, the controller watches a
pipe. If reading from that pipe gets an EOF, then it knows the master
quit and it can quit itself. If the master wants to quit the controller
to load a new one with an updated configuration, it can write a byte to
the pipe, which the controller will see and exit.
Reviewed in https://github.com/pagespeed/mod_pagespeed/pull/1260
Companion nginx change: https://github.com/pagespeed/ngx_pagespeed/pull/1113
out that we weren't running scheduler-based locks while running the
tasks on a driver's private scheduler.
Includes a new unit-test that spins forever without this change.
Change to having ApacheFetch block using the scheduler rather than
TimedWait on its own condvar, whenever there is a RewriteDriver. This
allows in-place deadlines to work, and eliminates the 'abandoned' state
of ApacheFetch, simplifying its code and usage.
Added a Sequence* to CacheUrlAsyncFetcher in order to synchronize the
memcached cache response with the request-thread.
This depends on x509_check_host, which was added in openssl 1.0.2. If we want
to support older versions of openssl we need to parse certs ourself, like svn
does, which is a bunch of moderately tricky security sensitive code that I'd
really rather avoid. We normally build agaist boringssl, which has this
function, but we also prepare a tarball build that intendes to link against
system openssl. So anyone using that build process will need to upgrade to
1.0.2.
don't incorrectly rely on boringssl directly from sha1 code, but rather go through our selector target (and similarly use it to find the includes) to find the ssl lib one is supposed to use.
Fixes https://github.com/pagespeed/mod_pagespeed/issues/1139 too
Creating an instaweb handler will run MakeRequestUrl, which assumes that
request->unparsed_uri is non-null. So move the creation to after where
we check that it's non-null. To be safe, move it all the way down to
where it's first needed, in case some other validity checks end up being
relevant.
Fixes#1248
our behavior when one exists.
Was motivated by https://github.com/pagespeed/mod_pagespeed/issues/1238, but I can't
seem to reproduce the behavior Jeff saw.
If anyone has any ideas as to what I am missing, I would appreciate it.
(Still going ahead with this since it does fix one case and adds more tests.)
minification & gzip compression.
Maintain the semantic that when decompressing, if we lost all our reduction,
we remove the x-original-content-length.
If we have multiple levels of compression, we retain the largest value
of x-original-content-length.
Compute x-original-content-length across resource rewrites even if we don't
have a content-length header on the origin, but we do have the entire content
loaded.
decompressed even when the cached response is compressed and the request
does not have accept-encoding. This was found by observing flakiness in
valgrind-system tests.
Fix flaw in the version of HTTPCache::Put that takes ResponseHeaders*
and mutates it unexpectedly, by adding compression headers.
Fix a flaw in InflatingFetch that Reset didn't reset all the cached
boolean bits. I don't think this was the cause of anything in
production becasue we only use Reset in tests (where we re-use a Fetch
object).
Default behaviour is to strip subresource links which are in scope for pagespeed,
these are the resources that are not disallowed or are valid domains in the domain laywer.
Added can_modify_url flag to HtmlParse and CanModifyUrl function to the HtmlFilter which
indicates whether urls can be rewritten by the parser and thus should be removed.
This is tested now in the strip_subresource_hints_filter_test.cc as this is only used by the
strip subresource hints feature right now. This should be moved to the HtmlParse tests.
DetermineEnabledFilters has been rolled up into DetermineFiltersBehaviour, which also
determines can_modify_url for all the filters and possible future "behaviors".
Added new option to explicitly prevent the default behaviour:
ModPreserveSubresourceHints on/off
For ubuntu a check for the new setup /var/www/html instead of /var/www for the document root
has been added.
don't display them.
Move content-type lookup until right before it's use, which (a) is
more intuitive and (b) looks likely to be faster in terms of map setup.
concurrently.
Print elapsed time properly for subshell tests. Fix typo in SVG
reference which was causing spurious timeout warnings in nginx tests,
and then flagged by nginx system tests.
Remove nginx test flakiness by using fetch_until an image gets small
to indicate an ipro-rewrite is done, rather than testing for
image_ongoing_rewrites to be zero. The old technique might fail because
we might check that stat before the image starts being rewritten.
Note: with this change, nginx system tests no longer flake for me,
whereas previously about 10% of the time it would flake on "IPRO flow
uses cache as expected". That was also suppressed for valgrind runs,
which is no longer needed.
Even after this change, nginx system tests with valgrind tests still flake
with "Fetch timed out" log messages, which I am adding to the
suppressions, and with "Embed image configuration in rewritten image
URL.", where the recursive-wget result is not optimized. I think that
might be a real user-facing bug, and I will report it to the nginx list.
The system_cache_path is logged on startup and on some errors. It has
newlines in it, used as a separator. This means you get log lines like:
2015/12/17 13:27:00 [info] 105645#0: \
[ngx_pagespeed 1.10.0.0-7582] Initializing shared memory for path: \
/home/jefftk/ngx_pagespeed/test/tmp/file-cache/
flush
.
This change makes us use spaces instead so you get:
2015/12/17 13:27:00 [info] 105645#0: \
[ngx_pagespeed 1.10.0.0-7582] Initializing shared memory for path: \
/home/jefftk/ngx_pagespeed/test/tmp/file-cache/ flush .
The new version of serf uses apr_sockaddr_ip_getbuf to add IP addresses to log
messages, but this function is new in APR 1.3. This means we get a runtime
error on CentOS 5, where Apache fails to start.
We didn't catch this on our buildbot because one of our dependencies needs to be
fetched over svn with svn 1.8 or higher, and installing the new svn brought
along a new apr (1.5) and everything worked.
Fixes#1224
for now disable combine_css. Proper resolution (which will
make it win in some circumstances) will be a follow up,
but likely quite a bit more complex.
Note that combine_js isn't affected by this since it makes an entirely
new script element for combination (as well as new ones for evals), rather
than using the one from the first input.
Addresses Otto's reduction in issue #1215
Change the non-webp cache key to avoid picking up any that we previously messed up.
Also restructure the code a bit to lower the risk of this occuring again.
Part of issue #1216
and don't complain about too old a gcc in that case.
Get rid of -Wno-unused-but-set-variable --- clang warns about the flag,
and the gcc warning it disables actually seems useful. Adjust the code
instead.
When serf_bucket_response_status gets an ssl error it can close the connection
and call CallbackDone(false). This nulls async_fetch_.
Only try to use async_fetch_ if serf_bucket_response_status returns success.
When running with an old version of curl (like 7.15.5 on the centos
buildbot) if you leave off the http:// on a proxied fetch curl doesn't
add it. So it will send:
GET messages-allowed.example.com/mod_pagespeed_message HTTP/1.1
Which Apache will reject with:
HTTP/1.1 400 Bad Request
Not sure why this didn't turn up with the buildbots running tests.
default_hdrs_check to 'loose', where they currently aren't set. These are the
current implicit defaults, so this is a no-op change in preparation to flip
those defaults.
When reading large images a chunk at a time, callgrind indicated a
significant amount of time spent appending bytes to a dynamically
expanding buffer.
Note: this adds a call to fstat so we know the size of the file before
we start to read it.
Another note: the only way I could find to implement this reading into
std::string involved using std::string::resize, which nulls the bytes we
are about to write. That seems like a shame but it is what it is :).
of making ngx_pagespeed build with the same (needed for
people using recent-distro gcc >= 5.x with our PSOL), so people
who build psol from source on new gcc don't have trouble.
impendance mismatch between mock-time and real-time required for the
spammer tests doesn't work well with valgrind, and only luckily works
in non-valgrind.
Virtually all libpng versions through 1.6.18, 1.5.23, 1.4.16, 1.2.53, and
1.0.63, respectively, have a potential out-of-bounds read in
png_set_tIME()/png_convert_to_rfc1123() and an out-of-bounds write in
png_get_PLTE()/png_set_PLTE(). The former vulnerability has been assigned ID
CVE-2015-7981 and the latter CVE-2015-8126. Both are fixed in versions 1.6.19,
1.5.24, 1.4.17, 1.2.54, and 1.0.64, released on 12 November 2015.
read to see if the new value is different.
This means we will do more cache reads but fewer cache writes most of
the time.
Skip optimization-locks for on-the-fly resources, which makes no sense
for cheap uncacheable optimizations.
- Move the Apache tests for IPRO + MPD to system/system_test.sh
- Add a flag 'trusted_input_' in ProxyFetch to allow ngx_pagespeed
to transform html but not proxy external html fetched via MPD.
MPS-side of the fix for:
https://github.com/pagespeed/ngx_pagespeed/issues/1015
already listed in instaweb.gyp:instaweb_system, and resource_tag_scanner.cc is
also in instaweb_rewriter.
(Having the dupes is a problem when one tries to link things together properly).
https://github.com/pagespeed/mod_pagespeed/issues/1149
Note that IPRO+MPD was working (and optimizing) in MPS; it just wasn't
getting the right UA/Accept-header bits propagated into the
request_properties, so webp transcoding didn't happen.
fixes#1048
in the case where
if (strncmp(MS_VALUE, conn->buffer, MS_VALUE_LEN) == 0) {
and
else if (strncmp(MS_END, conn->buffer, MS_END_LEN) == 0) {
both fail, it was possible for queries_sent to never decrement.
This patch sets rv to APR_EGENERAL in this case, decrements the queries_sent,
and closes the connection.
According to the trace from betabrand this is where the hang is.
Patch applied from apr dev mailing list
(http://www.mail-archive.com/dev%40apr.apache.org/msg26265.html)
Also adds a test for this path. The test is dependent on a new print to
stderr in apr_memcache2, but the real test is that we don't hang in this
situation.
Before this change we would call ap_rwrite() etc from the rewrite thread, and if
that blocked for a while we might not have any rewrite threads available to
serve other requests. With this change, all potentially blocking Apache calls
always happen on the request thread. Writes are buffered in ApacheFetch until
the resource is complete, and then sent out in one go.
Most uses of this (ex: IPRO) were already not streaming, so we don't lose that
much by buffering. We are doing more copying than we were, and to evaluate this
impact I ran "siege http://localhost:8080/$testimage -c200 -t1h" both with and
without the change, to really stress test this piece of ipro. This test pulls a
single 1.1MB ipro optimized image with 200 concurrent readers for 1hr, and
should give us a worst-case indication of the slowdown buffering causes.
before qps: 393.11
after qps: 392.20
A 0.2% worstcase slowdown is not bad at all; buffering seems to not be a
problem.
This fixes some instances of #1048, but we're not sure yet whether it fixes all
of them. It definitely doesn't fix ones due to slow filesystems, but we haven't
seen that version in the wild.
For uses of ApacheFetch that we know will always be synchronous we disable this
buffering, which is slightly more efficient.
1) Restore the separate TTL for metadata on 4xx. The removal of it is what caused
regression test failures: a CSS linking to a missing image meant that we would try to
re-check the CSS every 5 minutes, which meant that the cache warming of optimizations from
1st and 2nd runs would be gone by the 3rd run. This restores the state of the world before.
2) Fix the check failures in fallback code, and nonsense logic around it.
Before this CL, we had two notions of kinds of failure:
1) At HTTP cache level, there were a few kRecentFetch* variants that HttpCache could remember, applying a different TTL based on the kind of failure.
2) Resource had FetchResponseStatus, which had slightly different categories, which were not perfectly aligned with the HTTPCache's, and had their own TTL policy, expressed largely via code inside RewriteContext::AddRecheckDependency, that was not quite aligned with HTTPCache's, either.
3) Resource's FetchResponseStatus classification was completely lost when there was a cache hit rather than fresh fetch, and it wasn't really possible to restore it accurately, since, again, HTTPCache had somewhat different notions of kinda of failures.
This CL moves the FetchResponseStatus enum up to HTTP level, expands it to cover all the cases that matter to both layers. TTLs are also centralized there, via HttpCacheFailurePolicy struct, as is classification of failures --- which was done slightly differently in different spots, too.
(Though this still doesn't fix in_place_resource_recorder having own logic... but maybe I can iterate that)
- beacon events for activity in the mobile menu.
- Tweak the handling of menu open and menu close events (in particular making sure
to use a consistent source of truth rather than independently toggling several settings).
- Tweak the presentation of the menu dropdown image to be a bit more like what seems to be
standard practice: point right until opened, then point down. Close all submenus on navigation
(this last decision is debatable UX, but at least makes us consistent between iframe and
non-iframe modes).
on newer version of opensuse (due to openssl 1.0.2 maybe?)
`pkg-config --libs libssl` no longer returns -lcrypto, but
`pkg-config --libs openssl` does. This change is required to allow linking with
system libs on suse tumbleweed.
closes#1117
Scheduler to allow adding an alarm with the lock already held.
It also required a bugfix for a latent bug in mem_lock_manager.cc
about capturing a temp StringPiece in the map when querying it via
operator[]; it must be queried using find().
In this wrapper we use the scheduler-mutex for protecting the
MemLockManager, but we ensure that we are not holding that mutex when
we call the NamedLock callbacks. All of the code in this CL is around
doing that, and connecting the wakeups/alarms of the scheduler with
those needed by the MemLockManager.
Another commit will follow, which tests the migration of the system-test and
unit-test framework to use this lock manager.
[work by jmarantz; committed by jefftk]
Before this change the ordering of closure library files inside our
compiled js files would change depending on which computer the build
ran on. This would lead to a lot of noise on commits, where unrelated
js 'changes' would get included in commits despite no actual changes
in the js source.
By explicitly telling closure compiler about all the js files in a
deterministic order the files it generate become deterministic as
well. This is a bit of a hack, sticking a find | sort | sed
into the gyp file, but I think it's the least hacky way to fix it.
* pulls ApacheFetch out into it's own file
* creates a set of mock apache functions that log to a global varible
* extends NullCondvarCapableMutex to support condvars
* comments on, but does not fix, bug in cookie stripping
- Classify mobilization as JS-using filter, and teach SupportNoScript filter about it.
Note that this doesn't fix noscript in iframe mode.
- Actually apply the disabling-JS-producing-filters-for-XHR policy in mod_pagespeed (and ngx_pagespeed, I think),
by implementing it at RewriteQuery level (where it's more logical) rather than in the 'merge configs' helper used
by ProxyInterface.
(Rather than needing separate WriteThroughHTTPCache). Port over
all the WriteThroughHTTPCache tests to show it works. The plan is to
kill WriteThroughHTTPCache in follow up changes, and also to simplify
HTTPCache API to take advantage of removal, dropping virtualness in the
class.
Note that this change does not incorporate the new lock manager into
any production code; it just adds a test.
To add this to a single-process server it needs a wrapper (or update)
to add thread safety. To add this to MPS/NPS it needs a server/client
wrapper for serialization and an RPC infrastructure.
By itself this new lock manager does not provide a measurable benefit,
but it enables us to gauge popularity of a resource and use that to
avoid optimizing unpopular images.
1) Add 3x to default densities based on my PSS user tests.
2) Update doc to note things we have fixed and change densities in example.
3) Add another example to the mod_pagespeed_example/ file.
production code to use non-blocking versions. Leaves the blocking
interfaces in SchedulerBasedAbstractLock which remains accessible to
tests for blocking implementations.
Adds 'const' qualifier to NamedLock::name().
tag) rather than before the first image on the page. Also inserts the script
pretty-much unconditionally (if we see the jquery slider script in head we don't
bother, but merely having no images doesn't prevent script insertion).
Before this CL, any srcsets would be ignored and thus browsers might load the non-preview or non-lazy version from srcset. We came upon this bug while talking about how ResponsiveImageFilter would interact with these filters. But this was also a bug for any user-added srcsets.
Incorporate more config data into the key used for sharing CachePath
objects so that we don't share cache objects between vhosts that have
purging enabled/disabled, and we don't share between those that have
different cache-flush filenames.
https://github.com/pagespeed/mod_pagespeed/issues/1077
While there is nothing technically wrong with caching and rewriting empty resources, there's not much value and we have run into situations where we unexpectedly produce empty resources.
Fixes issue 1050.
* Rename GoogleUrl::Escape() -> EscapeQueryParam(), Unescape() -> UnescapeQueryParam() to clarify their limited scope.
* Change GoogleUrl::CopyAndAddEscapedQueryParam() -> CopyAndAddQueryParam() and do the escaping in the function, this seems like a safer and easier interface to use and most callsites were explicitly calling the escaper anyway.
* const StringPiece& -> StringPiece
* Add some more tests for CopyAndAddEscapedQueryParam().
* Don't reference the wrong error code in case of WebPMuxAssemble failure.
* Create and use no-arg overload of MultipleFrameReader::Initialize() method.
* In GifFrameReader, downgrade messages about image data errors to be INFO messages. (vchudnov)
the background fetch was trying to run mobilize_label_filter, but that filter
was failing to label the DOM elements because
driver()->request_properties()->IsMobile() was false for the subfetch (though it
was true for the enclosing page).
Added a new GoogleUrl::Sanitize function to help for general cases where this comes up (we need to make sure that a URL is free of whitespace, '"' and other chars which may be safely escaped).
commit from jmaessen:
Generate menus on the server side using mobilize_menu_render_filter doing a
second fetch of the HTML in the background and running mobilize_label_filter and
mobilize_menu_filter on the results. Inject the navigation bar on the server
side and adapt the mobilize_nav.js to properly handle the prsence of these
elements on the rewritten page.
Specifically, we now do the following:
1) If the highest resolution image is below the inline threshold (or some lower threshold), use that high-res image inline as the src (no srcset). This includes the case where the image had native dimensions == height & width attributes and was below the inline threshold.
2) Else, disable inlining and just add a normal srcset.
Notably here, we never mix inline images with responsive images. Technically, it is possible to mix the two (have a 1x version with data URL, but 2x without). However, it seems like this is (A) wasteful for the 2x browsers, (B) complicated to understand and (C) unlikely to be particularly common/much value to support.
and improve its docs a bit.
Only do caching in ApacheSlurp in proxy suffix setups, and
not slurp or testproxy modes which we usually use for loadtests.
Make sure that the map and call buttons fit in the header bar exactly. Make the map and call button to have the same size.
This change helps http://www.ligo.co.uk. (huibao)
Update the BoringSSL library to a recent commit, update build files (asm, err_data.c) to be from chromium.
add UPDATING_BORINGSSL document for information on how to update the BoringSSL library
Inserts srcset attribute with multiple resolutions of the image.
Based on the prototype Josh presented at Velocity, I've added testing, properly static'd the JS and fixed many bugs discovered in the process.
There are still many TODOs including:
1) Adding system tests
2) Deciding on what resolutions to add (currently adding x2 and x4, we probably want to make an option for this and test for sensible defaults)
3) Don't add srcset if all images are the same contents.
if (*):
use no ua dependent optimizations, large screen
if (Chrome/Firefox/Gecko/...):
use ll,ii,dj
if (Chrome versions that support webp):
use ll,ii,dj,jw,ws
if (Various old browsers, tablets):
use no ua dependent optimizations, large screen
if (mobile):
use no ua dependent optimizations, small screen
This CL changes it to:
if (*):
use no ua dependent optimizations, large screen
if (Chrome/Firefox/Gecko/...):
use ll,ii,dj
if (accept header has 'webp'):
use ll,ii,dj,jw,ws
if (Various old browsers, tablets):
use no ua dependent optimizations, large screen
if (mobile):
use no ua dependent optimizations, small screen
The motivation for this change is that IE Edge reports as a version of Chrome that would support WebP but doesn't actually support WebP. (jefftk)
candidates, rather than hardcoded to just grab one. Of course in the
current usage we just grab one.
This is in preparation for prototyping a logo-picker GUI.
Also use swap-with-last to remove elements from the middle of an array
rather than nulling out and having to copy over a new array.
are two <head> tags, one of them unclosed. We now ignore the head tags. We
also put in some more debug logging that yields useful information when flushes
prevent us from annotating the DOM (this allowed me to find the bug on a 3-flush
page).
Also update pagespeed_libraries.conf to match current released libraries.
non-navigational. For simplicity of configuration we simply store a
comma-separated list with optional leading + or -, and match both classes and
ids against names on the list.
flags anymore, they just complicate the code. Meanwhile, the way we actually
are doing mobilization doesn't match the filter's model of labeling, causing us
to (say) label content inside a navigation region even though we roundly ignore
that fact (and the labeling is probably wrong). Finally, it looks like we might
have been classifying while logging training samples, which is definitely,
definitely wrong and would cause us to over-fit our past classifiers rather than
ignoring them completely as we ought to.
the navigation to encompass them. This is a starting point, but only labels
whole enclosing <div>s; we see instances where the titles are part of an
enclosing div that also contains other content, and that will require us to make
finer-grained decisions (and perhaps increase the set of tags subject to
labeling, since there are titles that are not found in <div>-like structures.
We must append ?PageSpeed=off to non-pagespeed URLs because otherwise IPRO might rewrite the source and lead to a non-sensical source map. However, IPRO does not modify .pagespeed. URLs, so we can safely avoid this query param. Furthermore, the param does not make sense for .pagespeed. resources and seems to 404.
Fixes issue 1043.
Somewhat unrelated, I added a new field, CachedResult::is_inline_output_resource, and modified RewriteContext::CreateOutputResourceForCachedOutput to recreate InlineOutputResources if appropriate. Previously it was Creating vanilla OutputResources for all requests leading to confusing server state.
Additionally, I added a new field, ResourceSlot::preserve_urls, to distinguish between uses of set_disable_render() which meant (this slot was already rendered, so don't re-render) and (preserve_urls is on, so don't edit URL). Before this CL all ResourceSlot::Render() implementations edited URLs, but the new ones update inline data, so they should be used even if preserve_urls is enabled.
Background:
Before prev rev inline CSS was being modified in two different ways in different filters:
1. Editing the characters_node.mutable contents() directly.
2. Replacing the characters node from the parent element.
So after doing #2, another filter might come along and try to write to the original Characters node (#1), which is now dead. While this won't crash or leak it won't perform the second filter's optimization.
This change makes it so that the optimization of inline CSS follows more closely the slot used for external resources, and therefore adds more robustness to the rewriting infrastructure.
contain nested links (indicating a serious error in the source html). This
avoids eg. content-classifying the text of a nav entry when it's in its own div.
this scenario, we do *not* want the origin-mapping to strip the proxy
suffix, because the result would be 'loopbacked' which we don't want.
The loopback fetcher has a good mechanism for this:
RequestContext::IsSessionAuthorizedFetchOrigin. We just need to
populate that in RewriteOptionsManager::PrepareRequest, which is easy.
The pain-point here was to change all the call-sites for
PrepareRequest to pass in the request context, if there is one.
in its id (and we double-label the element, which is bad). This fixes that;
thought I could get away with unescaped ids since the spec seems to claim that
stuff that needs escaping is naughty -- but that's not how things work out in
the big bad world. Tests added (yes, they failed). Glad I put that CHECK in
there. (jmaessen)
Update deps for BoringSSL (jcrowell)
Some of the CSS in mobilize.css had to be split off to a different
file because it works only in conjunction withe the layout algorithm,
and would bork a desktop site without it -- this represents the bulk
of the files.
Improve heuristic to recognize sprites so we can give
background-images on empty divs a min-heighto. Isolate the
sprite-recognizer into its own method so we can later test it, etc.
Add heuristic to recognize slideshows. Currently it just looks at
class=nivoSlider but we can expand it obviously. We don't want to
blindly mess up any styling inside a slideshow.
Rip out absolute positioning, solving a variety of problems.
get rid of a bunch of duplicate formatting that was happening in various of the
handler classes (and a lot of "%s" format strings, which is what send me down
this hole in the first place; still plenty of those left in our code).
The big savings here should be 2- or 3-fold formatting in the
mock_message_handler and some double-formatting in the system message handler
subclasses.
Noticed we didn't have these while investigating the fact that JS filter minified inline JS blocking the main parsing thread while CSS filter does it async using a ResourceSlot.
Attempt to correct negative margin-bottom for resized images to make
slide-shows work.
Change handling of float:right;clear:right; to use display:block and
avoid reordering nodes.
Don't account for the vertical space consumed by visibility:hidden
absolute nodes.
opt mode. That way if any of these are firing we will see them in the
error logs. This seems categorically better. (sligocki)
Make it possible to update static asset config dynamically (morlovich)
Compiled code updates
Document how an empty return from Resource::url() means that the resource has no URL.
I surveyed the callsites to Resource::url() and only a handful should be called for inline resources. Among those, several were simply for logging. The rest are addressed here: specifically: Adding the URL to the CachedResult for URL purging (Does not apply to inline data) and trying to find base and trim URLs in CssFilter.
The clever bit here is that if an element already has an id, we attempt to use
that id in creating the ids for its children. The hope here is that those
labels will be consistent even as the page updates, particularly for areas
around the edges of the page (navigation, header, footer).
Also update generated JavaScript and libraries configuration.
excludes the same-classification signal because it dominates the classification
task, and is redundant with parent->child propagation in the labeling filter.
So this is intended just to lower the rate of "parent is content, child is
navigational" that we see if our input data doesn't support it.
I also updated the decision trees based on pre-CL training runs. New trees
based on the new signals will be tested next.
Two changes were necessary:
1) Change CssFilter to mutate the Characters Node contents, instead of replacing the node, so that the downstream CriticalSelectorFilter could read and write the new contents.
2) Register slots for inline rewrites so that they will be consistently processed in the correct order.
Note that the proper way to fix (1) is probably to use information from the InlineResourceSlot to Render the change in CriticalSelectorFilter, but that is a bit more complicated, for now we continue to use the saved Characters Node and expect that earlier filters will not replace it.
* Before, TEMPDIR was used in a few tests, was global to the user, and wasn't
ever cleared. Tests that worked in TEMPDIR and ran in both Apache and Nginx
could conflict when run in parallel, and tests that wanted to be free from the
influence of previous runs of the same test would use 12640 suffixes on files.
* Make there be one TEMPDIR for Apache and a separate one for Nginx that get
cleared before every run. This means we don't need the 12640 suffixes anymore.
* TEMPDIR should now only be used for test infrastructure and TESTTMP should be
used by individual tests. When running parallel tests, each test has it's own
TESTTMP under TEMPDIR, so they won't collide.
* The cache flush test previously assumed it had exclusive write access to
/mod_pagespeed_test/cache_flush/, which was wrong. Now it copies the files it
needs into /mod_pagespeed_test/cache_flush/12640 and works there.
* Simplify the filenames for parallelization logs and and temporary directories
to be just the name of the test, and not include the PID of the process
executing the test.
* Nginx side of the change: https://github.com/pagespeed/ngx_pagespeed/pull/855
GIF images: fix quirks mode; pad in scanline interface
This CL:
(a) tweaks the quirks mode as described in b/16539233, and
(b) causes the GIF reader returned by CreateScanlineReader() to pad GIF frames when they are smaller than the GIF image (or, in GIF terminology, to pad GIF images when they are smaller than the GIF screen).
The latter is done by means of a new MultipleFramePaddingReader class to implement the actual padding. This allows FrameToScanlineReaderAdapter to focus on simply converting the function calls between the two APIs.
There are actually two specific problems here:
1) Critical CSS Section: Inline non-critical CSS is not being stripped if rewrite_css runs on it.
2) Full CSS Section: Unoptimized versions of inline CSS are being used instead of versions optimized by rewrite_css.
I have a pretty good idea for why this is happening and I'm working on a fix.
non-externally-settable option rather than doing it when debug is switched on. Compute relative density of text and image tags within links. Update the navigational classifier to use the new signals.
proxy suffix rewriting.
- In ProxySlurp, apply DomainRewriteFilter's Location:
fix up directly on 3xx, since normal HTML rewriting
won't actually run on a non-200.
- Permit proxy suffix rewriting to work accross different
schema.
After a discussion on www-style@w3.org, it appears that moving @font-face above all rulesets does not affect the meaning, so we keep the font-faces in a separate list (like @import and @charsets).
These may be nested inside of @media rules, so I had to re-work the parsing framework to allow parsing at-rules inside of @media.
to be tested without bothering to set up shared memory.
Fix a bug found in our first-char-of-line encoding scheme for message-type in the
shared circular buffer (and test it).
Fixes Issue 998
accept webp. Modern Chrome sends Accept:image/webp, and we should
check that before rewriting img URLs to point to webp files.
Proactively fixes predicted Issue 978
I had to pivot and leave IE9 enabled for these features to avoid
diving into a regold nightmare in the FlushEarlyFlow testing, which
for some reason is testing defer_js on IE9.
Distinguish between statement labels and object properties in JsTokenizer.
This is an obscure case I discovered we weren't handling correctly.
1) In "{foo:{}/x/i}", the "foo:" is a statement label, so the "{}" is an empty
block, so the "/x/i" is a regex literal. We already get this correct.
2) In "z={foo:{}/x/i}", the "foo:" is an object property, so the "{}" is an
object literal, so the "/x/i" is division. We were getting this case wrong
before by failing to distinguish it from case (1). Fortunately the fix is
very simple.
3) Finally, in "z=function(){foo:{}/x/i}", we're back to a statement label and
a regex literal. However, previously we were throwing away the BkHdr parse
state that allowed us to distinguish this from case (2). Fixing case (2)
requires us to retain that state to avoid breaking this case (unfortunately,
this causes a little bit of code churn in the unit tests).
1) Skip over nested {}, [] and ()
2) Skip over strings, identifiers, at-rules and other "any" values that may contain escapes like \}.
These cases were partially covered by different of the Skip* functions, but not in all.
Proper recovery for at-keywords in values lists was pretty unclear in the spec, but from a test in a variety of browsers, it appears that they do not follow the same recovery procedure as at-rules. Specifically at-rules are supposed to be terminated by the first of ;, the next {} block or a closing }. But almost all browsers interpret
.foo { font: bold 20px monospace @foo; color: green }
as simply:
.foo { color: green }
throwing away the font property because one of the properties wasn't valid, but also considering the ; to end the entire declaration, not just the unexpected at-keyword.
<style>@import foo</style>
would be completely removed from a page. I can't think of any reason that that would break a page, since this is only true for malformed expressions, but it seems like it's better if we don't touch things we can't parse.
To make this work I also had to add an error for when ParseImport hits an unexpected token and remove one for when ParseMediaQueries hits EOF (because @imports technically do not need to be closed by a ; at the end of the file).
Added decision tree logic to rewriter/. This only includes code for constructing
and running the decision tree, but not the actual decision trees we will use for
classification.
Don't CHECK-fail in IsSafeToRewrite.
Use OptiPNG and multiple configurations to improve Scanline PNG writer
This CL improves the Scanline PNG writer by
(1) using OptiPNG
(2) trying 4 set of configurations and using the best one.
On 10,389 GIF and PNG images test set, the average size has been reduced from 8,551 to 8,186 bytes. That is 4.3% improvement.
-----
Change LOG level to INFO for images with size different from that of frames
This is a quick fix to silent the log. A thorough fix is to pad the frame when it's smaller than the image.
this is really groundwork for more sophisticated signal gathering. Eventually
we'll have to defer labeling any given element until we can compute some global
properties (such as count of words / characters / links).
When printing an error message that cache-purging is not enabled:
1. Print the correct syntax for each server (nginx vs apache)
2. Point to the user to the doc explaining the feature in more detail
3. Allow C++ to emit HTML formatting to make all that look better.
Clean up the whole CSS world, using CSS for more of the styling rather than
inlining styles. Avoid using "em" for font-sizes as it seems to make
embedded monospace look too small, instead using "12px" as our base size for
most text.
This gets our whole admin site using just sans-serif and (in limited
places) monospace. There used to be other serif fonts mixed in and it
looked pretty bad.
In the purge table, draw a horizontal rule above it. Omit the table
if it's empty. Omit the sort-order button if there's just one
element. Put the URLs in the second column and the dates in the first
column, since the dates are fixed width.
Generally make most stuff smaller.
due to loopbackroutefetcher (or FetchOriginDomain), logging both the URL
as the destination server would see it and the connection host/IP.
Fixes mod_pagespeed issue 786
every time we want to log something. This is more efficient, and also helps
us fail on startup if we change the name of a stat at some point in the future,
rather than at some hard-to-predict point after requests start processing.
Specifically, allow invalid UTF-8 characters in comments and string
literals, which is where they usually show up.
The main upside to this change is that it lets us (usually) get away
with passing unconverted Latin1-encoded input into JsTokenizer, even
though JsTokenizer requires UTF-8 input. The main downside is that in
order to make this work, given RE2's limitations, I have introduced
some, um, subtlety into the way string literals are matched. "Subtle"
is not something that one wants one's code to be. So I'd appreciate
feedback on whether there's a better way to do this. (mdsteele)
bar-charts. Refactor the code a little to make it more compliant and
easier to follow.
Clean up the language around message filtering.
Count the initial change from 0 as an update when comparing frequency
of statistics updates.
a { _height: expression((parentNode.offsetHeight-17)) }
Turns out there was a bug where ParseAny() would double count parentheses. For example, starting on "(2 + 3) 9 7)" it should skip over just "(2 + 3)", but before this CL it would skip all the way over the entire string by counting the first ( twice.
Improves non-recoverable CSS parsing failures from 137/2729 to 79/2729. (net/instaweb/rewriter:css_minify_count)
Clean up some confusing shell pipes that were expected to fail at the
end of the pipe.
Eliminate some redundant "|| fail" which is no longer needed now that we
run tests with "set -e" and "trap".
Adds a compile dependency for the static_rewriter .o files on the .a,
so we don't crash nginx tests when we change class layouts. This is a
little conservative but recompiling those 2 object files is cheap.
This must be committed with https://github.com/pagespeed/ngx_pagespeed/pull/782
1) Move instructions for debugging rewrite_proxy_test.sh into scope where PID is defined so the instructions work.
2) Remove TODOne TODO.
3) Fix really easy TODO.
time before the StartElement is flushed. It achieves the same effect
for browsers as DeleteSavingChildren, but is a lot simpler because it
doesn't alter the in-memory DOM, it just affects the way it's
serialized. Downstream filters of course would still see the element,
thought the HtmlWriterFilter will deliberately ignore it.
We store this 'invisible' bit into the 'style' enum, which prior to
this CL was closed close_style, CloseStyle, etc. Most of the changes
in the CL are for that trivial rename.
debugger. Add the attributes to the HtmlStartElement and HtmlEndElement
events, so we know more than just the tag name in when calling
DebugPrintQueue.
Simplify the implemenation of DeleteNode, eliminating a redundant
state variable.
CSS. In some contexts it seems like the CSS is getting inlined
unminified CSS injected including the boilerplate Apache comments, so
"head -25" doesn't have what we are searching for.
JS updates for new compiler release.
required wrapping all the commands that were expected to fail in
functions (check_not, check_error_code) that check for failure and
swallow it to avoid aborting the script.
Resource::IsSafeToRewrite, which I'd already started work on when Shawn flagged
it as a TODO in his CL. Note the TODO: I've fixed the most-used call site, but
need to fix the others (mostly resource combiners).
the output we are testing is not simply empty, or an unrelated error.
Factor out the different ways we test for '200 OK' into some common helper
functions.
variables. This required being a little more careful with how we
account for stats on a write-through, since we can't add negative
corrections.
Note that there was some regolding as the currently results counted
expirations for both L1 and L2. I think that was golded wrong: the
expirations should only count once per lookup for the entire
write-through flow.
Unit test hygiene; Fix a couple of bugs where the driver script wasn't very
robust to unexpected inputs, at least one of which was causing a false positive
on a broken test.
- Write on configuration read callback.
- Read in AsyncFetch::HeadersComplete, invoked after property cache
lookup (which proceeds in parallel to config lookup) completes.
This is done by gating successful property cache callback infocation
on having the configuration read/request headers computation complete.
1) URL would be too long,
2) Rewriting was disallowed,
3) Domain not authorized (we'd expect this to trigger on inputs first, right?)
4) Input resource was not cacheable, 404, etc.
Create HttpOptions structure (and future uses that want to access options from http/ directory).
Store a copy of HttpOptions in RequestContext. These contexts are a good way to get options into ResponseHeaders constructor in AsyncFetch.
Store a pointer to RewriteOptions in Resource. Resources also create ResponseHeaders and storing a version in here is useful.
I tried to pass in the correct custom options in each case, but in many places I could not access custom options. These are all marked with TODOs (or Notes explaining why default options are fine). A follow-up will pipe in custom options in those cases that need them and verify that the rest do not. Note: I believe that this does not regress anything. Just exposed the places where we are using default rather than custom options.
Also TODO is getting rid of the default ResponseHeaders constructor and forcing all construction to explicitly pass in options.
Fix msan failure triggered by gif_square.
The uninitialized value appears to come from giflib5 not clearing or copying the flag when creating a new ColorMap object.
the framework will do it automatically, and do so without violating
thread safety:
Before this it was possible for a nested image rewrite 1 to be
rewriting and touching parent CachedResult from a slow-rewrite thread,
while a second one was being Propagate()d on main rewrite thread touching
the same thing. Now all the propagation is on main rewrite thread.
Fixes tsan failure.
As per previous discussions, I am also changing the MultipleFrameReader
interface so that Get{Image,Frame}Spec() assign a copy rather than a
pointer. Some of the other changes made it obvious this was the time to do it.
This CL depends on the changes in
https://code.google.com/p/page-speed/source/detail?r=2578
(checked-in, thank you bmcquade@!) and in
http://page-speed-codereview.appspot.com/1170001/ to support, in
ModPageSpeed-land, the giflib encryption target needed by the tests.
SystemMessageHandler. So we can keep the format of messages in Apache and Nginx
consistent and easier to modify. Add unit test for
AdminSite::MessageHistoryHandler.
printing functionality. This should mean that slots can ignore the
RewriteDriver.
Also modifies RewriteDriver::Propagate(...) to propagate errors stored in the
CachedResult into the context. I *think* this makes this the only necessary
caller to RewriteContext::AddDebugMessages (one of the overloadings of
RewriteContext::AddDebugMessage, which I've renamed).
types for. Also keep InPlaceRewriteContext from serving 200/OK when
in-place rewrites fail.
LoadFromFile needs to be able to get the content type from the extension, so
for files with unknown extensions we should fall back to our normal fetcher.
Fixes https://github.com/pagespeed/ngx_pagespeed/issues/719 which was
originally reported as an ngx_pagespeed bug but applies to both nginx
and apache.
Add debug message for image classification
If "debug" filter is turn on, insert information about whether the image is a photo and transparent to html.
Message sample: <!--Image has [no ]transparent pixels and is [not ]sensitive to compression noise.-->
Change by vchudnov:
Change libwebp dependency to come from the official libwebp repository
.
This means moving a bunch of thread-related code from apache/to system/. While we're at it, also move handling of use_per_vhost_statistics, install_crash_handler, and rate_limit_background_fetches to system/ so they're no longer duplicated between nginx and apache.
I also partly cleaned up the way we handle options that are set on the factory, getting the duplication out of apache/ and nginx/ and moving the code into system/.
nps change: https://github.com/pagespeed/ngx_pagespeed/pull/731
Add TODOs for some other failures that require more general fixes (unauthorized files, URL too long, etc.)
Refactor CSS flattening test debug framework to be used by all inheritors of css_rewrite_test_base.
Insert comment with critical images at bottom of page in debug filter. Also modify the IsCriticalImage calls to take a StringPiece instead of a const GoogleString&.
Add debug messages for HTML resource tag rewrites that expire. (jmarantz)
Clean up a bunch of const StringPiece& and const GoogleString&
arguments to StringPiece in test bases. (sligocki)
requiring Statistics subclasses to implement AddHistogram and pass it a
mutex however it wants.
This reduces the amount of implicit dependencies and lets us once
again remove NullThreadSystem from our production path.
To avoid duplicating all of the content of classes that were basically
the same physical implementation of two different interface (Impl), I
added template helpers VarTemplate and UpDownTemplate. These use a
common Impl class so that each Statistics implementation doesn't have
to grow to accommodate this disinheritance.
DCHECK-failing on a negative Add.
Add a new UpDownCounter in parallel which allows Set and negative increment,
using that where required.
The plan, after this, is to do a massive rename of Variable->Counter. And
UpDownCounter to Variable, or something like that.
Fix for windows build bots for core libs fixing our python2_6 dependency
since Chromium seems to have removed it from its DEPS. We still get
it from Chromium but from its tools directory directly. Also fixed
formatting to be consistent - whitespace (and parens) changes only.
have seen flakiness in nginx we think might be correlated.
We will re-introduce this when we have time to dive in deep and make
sure it is robust in nginx.
url by putting IPv6 addresses in brackets. This was causing consistent and
baffling smoke test failures on instaweb.cam and its VMs. As a result this
change will need to be patched onto the 1.8 release branch.
testing a library that does exhaustive switch statements without defaults on the
enum values, but were testing values outside the enum range. What happens in
this case is undefined, so we shouldn't test it against an expected value that
assumes fall-through.
Also use the atomic version where non-atomic versions used to be used (but atomic seems better).
Also fix up data_to_c to actually fail (and thus cause Make failure if file writing fails.
In r3029 we changed ProxyFetch to use RewriteDomain mappings on Location
headers. This was to support Adwords experiments that we're no longer
pursuing, and is not necessary any longer. It also creates a bug in
ngx_pagespeed, where people don't expect MapRewriteDomain to affect redirects.
Reverting this back to how it was before.
Fixes https://github.com/pagespeed/ngx_pagespeed/issues/656
since it requires client code to use the same flag.
It also causes trouble for LOG(FOO) but that needs further investigation
to be certain of the cause.
changed the invariant on rewrite_context to cache the parent's driver field
locally rather than chasing the parent chain looking for a non-NULL context.
Note: this switches us from using "protobuf_lite" to using
"protobuf_full_do_not_use", and makes 64-bit Release
libmod_pagespeed.so 639k larger (8.5M vs 7.9M). Is this worth it? We
could also maintain our own hand-crafted serializer for the data in
the protobufs for a lot less bytes if we care.
addition to the one done at page.onload time, so that first image of a
slideshow (or similar stuff) gets detected as critical.
The CriticalImagesBeaconFilter now adds an onload handler which can be checked
by LazyloadImagesFilter and DelayImagesFitler to ensure that they add an onload
handler only if the previously added onload handler was added by the beaconing
filter. The LazyloadImagesFilter and DelayImagesFitler also need to now add
back the image-onload-check-for-criticality whenever applicable.
FastWildcardGroup emitted, and it's not in base_core, so link fails in
html_minifier_main_dependency_check. Since that class isn't used outside tests
(RewriteOptions provides an own subclass), just move it to tests.
so that oversize writes don't get lost.
Also avoids some of the messages about oversize writes being lost (issue 907),
so I feel comfortable downgrading that warning -> info.
(no more scoped_ptr<RewriteOptions>*). Ultimately, this interface
cleanup enables us to hang onto the QueryOptions object a little longer
so we can use it to check for other query-params that are not represented
in Options.
necessary. We do this by plumbing that actual webp variations we'll require up
into the CachedResult from deep in the bowels of image.cc (the image optimizer),
where final format decisions are made.
hosts for system tests by properly
fixing the test that hardcoded modpagespeed.com to use
@@TEST_PROXY_ORIGIN@@ instead.... then rename
@@TEST_PROXY_ORIGIN@@ to @@TEST-PROXY-ORIGIN@@ so that
our config file checks don't complain about using _
in host name.
reverse proxies. Because of the weird filter ordering in Apache
we actually end up looking at headers twice: first to see if it was gzip'd
by a reverse proxy, and next for complete headers (while in nginx we just
get the full headers the first time).
Addresses issue 896.
Include the full option name in the output from OptionsToString,
aligning the columns for the values based on the longest name.
Include the invalidation timestamp in the output as well.
Remove the "InlineUnauth:" prefix from the ToString implemenation
for ResourceCategorySet, as adding that extra prefix means (a) it is
not similar to other ToString methods and (b) we cannot parse what
we print.
Add testing for OptionsToString.
we were writing to css_base_gurl_/css_trim_gurl_ in
RewriteSingle simultaneously to reading them in
AbsolutifyIfNeeded called from HandleDeadline.
We do this by basically just computing the right value
on demand from available information and the initial value.
This is made somewhat more tricky since OutputResource::url
isn't thread-safe.
rather than via separate filter. This avoids trouble with
hashes from nested rewriters changing when files are updated.
This reordering does make it impossible for me to keep one
regression test, and makes a couple no longer relevant.
use it exclusively for in-place optimization requests, serving webp images with
an appropriate Vary: accept header accordingly. Note in particular that the
"webp" user agent is no longer singularly magical, which required a bit of test
fixing. This is part 1 of 3-4. The next part will eliminate Vary: headers in a
number of cases where the underlying image will not actually change format.
Part 3 will disable URL mutation for images in IPRO CSS requests. Finally, we
may optionally add Cache-Control:private for IE to avoid must-validate behavior
for cached Vary:Accept images.
Patterns on every test method. Instead do it once per
process and inject into the factory. This complicates
the factory slightly but speeds up debug tests a lot.
and probably more under valgrind.
with test-pattern "RewriteQuer*":
old: 61 tests from RewriteQueryTest (3750 ms total)
new: 61 tests from RewriteQueryTest (575 ms total)
I had noticed the speed degredation right away while
debugging today.
My approach here was difficult on multi-server deployment
owners when they upgraded, and Josh suggested a different
approach that doesn't have that downside.
resolution. The way in which Merges actually occur for a global
options with vhost overrides is:
1. parse and compute options for global
2. parse and compute options for vhost
3. construct blank new options.
4. Merge global into new options
5. Merge local into new options
Thus the prioritization I previously wrote was exactly backwards.
ModPagespeedInlineUnauthorizedResourcesExperimental <true|false>
to
ModPagespeedInlineResourcesWithoutExplicitAuthorization <off|comma-separated-list-of-resource-types>
where comma-separated-list-of-resource-types can only contain strings from the set {Script,Stylesheet}.
have a single global one inside RewriteDriverFactory, and just
pass in options manually when using it for URL decoding.
This is a humongous memory win when there are lots of VHosts ---
my 30,000 vhost test config's usage goes down from
1.368GiB to 847MiB, which is about 38% savings.
Unfortunately, this is rather brittle in design: RewriteDrivers
are connected to ServerContexts heavily, which means the decoding
driver is connected to one as well, and we have to be careful to make
sure to always pass in the right options when using it.
content hash, which can cause variable names to change if the input is updated,
potentially causing HTML/JS inconsistencies.
Fixes https://code.google.com/p/modpagespeed/issues/detail?id=881
Unfortunately this changes variable names so care must be taken when
deploying on multi-server deployments.
our current implementation of AprCreateThreadCompatiblePool does
not establish a mutex for the allocator.
See apr/memory/unix/apr_pools.c:134
I confirmed in the debugger that allocator->mutex is NULL.
I believe this was leading to a race when I tried testing with
ModPagespeedMemcachedThreads set to 0. In that scenario, we will
initiate memcache operations from the apache request thread for
pcache lookups, and from the rewrite thread for metadata and
http cache lookups. During load-tests I found 4 spinning httpd
processes, and debugging them I found them spinning in apache pool
cleanup. Meanwhile we were writing "waiting for property cache"
log entries 1 per second. After making this change I can't repro
this in load-tests.
The ModPagespeedMemcachedThreads default is 1, though, so most
of the accesses get throttled through a single thread for
end-users, but I found an exception in
SystemCaches::GetFilesystemMetadataCache, which provides a
blocking interface. Our load-testing does not hit this situation
because it doesn't enable LoadFromFile.
Early reports from one user indicate that this does resolve the issue.
Fixes Issue 885.
getting nested bit set, which could cause inconsistency in JS Combiner
resource hashes when reconstruction of minification missed the deadline
(which it should not honor due to nested bit).
Fixes https://code.google.com/p/modpagespeed/issues/detail?id=880
In another CL I'm working on, I ran into problems when I added a comment at the end of the JS file because that might change the result of ProfitableToRewrite().
AFAICT, there's no advantage to this being lazily rewritten. Additionally, now that we explicitly rewrite, we can avoid all the mutable variables.
This is a rewrite of JsLexer meant to solve the intermittent problems
we've been having with JS minification. Unlike the previous design,
it's not a pure lexer, but instead keeps track of an abbreviated sort
of parse state -- not enough to generate a full parse tree, but enough
to accurately detect regex literals and semicolon insertion. It
passes all of the existing JS minification tests, as well as some new
ones that the old minifier couldn't hope to pass, and it even does a
slightly better job of removing linebreaks because its semicolon
insertion detection is more accurate.
would break the fonts filter, since other filters would create a plain
UrlInputResource for the same slot in the HTML, and then would fail
to do anything with it since the results are private.
Do this by adding a notion of a ResourceProvider which stakes out
the portion of URL namespace as out of reach of normal CreateResource.
Also uses this notion to tweak debug filter messages produced, in
particular fixing production of misleading warnings about athorization
for fonts API resources.
Change the way filter-enabling is computed in the presence of Preserve
to be functional, rather than having a conflict-resolution phase
mutate the filter set, as that is destructive to subdirectory
overrides where we might want to turn preservation off.
Add a semantic to have an explicit extend_cache_xxx setting override preserve-URLs for that type.
This establishes a hierarchy of configuration precedence:
a. explicit forbid is permanent all the way down the hierarchy and cannot be overridden
b. "lower level" configs (vhost, query-params, subdirectories) override higher level
c. explicit filter setting overrides preserve
d. preserve overrides rewrite-level
to stash the rewrite driver for recycling and not just when we
begin a new parse. Without this case I observed us keeping
~100MiB of junk around on my 10K vhosts + traffic setup.
SSL fetches getting 400 in some cases:
1) We were not forwarding the explicit Host: header to to the SNI
host name, which mattered when the URL was different (e.g.
when LoopBackRouteFetcher did its thing)
2) SNI hostname should not include the port; so fetches from non-standard
ports weren't working.
so ngx_pagespeed can get it too.
To keep the helpful error_message from SerfUrlAsyncFetcher, wire it
through RewriteOptions as a new-fangled error_detail.
The basic approach involves allowing filters (mostly inlining filters) to declare that they allow unauthorized resources to be processed. And, in case a resource is found to be unauthorized but allowed by the filter, it is fetched and stored into a separate cache-key-space prefixed with unauth:// (or unauths://). Checks and tests should ensure that we never end up using such a resource's URL directly anywhere.
CSS metadata cache key -- at least if css image inlining is enabled.
Remove the support files for a system test -- this is easier covered
in a unit test.
Insight's libpagespeed: we no longer use its DEPS file,
and now only use it for googleurl_noconv and some 3rd party
libraries.
This did require yoinking its base/ build system, but
with that we really can update chromium version independently.
browsers, if serve_rewritten_webp_urls_to_any_agent is off.
For non-.pagespeed. URLs, don't consider the cached result valid if it
contains vary:accept and content-type:image/webp, and the
request-headers don't include accept/image:webp.
Fixes Issue 846, and offers a more complete resolution to Issue 848.
code portion of Insight's libpagespeed since we do not actually
use it any more. This did require yoinking its copy of
pagespeed_overrides.gypi here (almost identically, modulo stuff
like our own path and duplicate chromium_version%).
The result is that we can now bump up Chromium version w/o worrying
about circular dependencies. Whee!
We still use libpagespeed_deps and some libraries from Insight's
repo, however; some of that should probably be cleaned up
in a follow up.
since streaming HTML like that can cause MPS to deadlock in property
cache fetch if we're using memcached.
To be more detailed, the deadlock chain is:
Cache hit on html -> RecordingFetch -> ApacheWriter
-> MPS HTML apache filter -> blocking pcache lookup, where the last
deadlocks when using memcached cache since CacheBatcher will not issue
a new lookup until callbacks for the current batch complete.
as pointed out in mod_pagespeed issue 840. To enable that,
clearly establish invariant on lock status from RewriteDriver
callbacks (not held), and make it follow that.
involving filters deleting slots (e.g. CSS combiner) and multiple
flush windows --- it's supposed to be called only at the very end.
This could potentially lead to multiple beacon insertions, and also
to really confused HTML parser, since we were trying to do
Insertion-at-body-end at a midpoint of document, which spammed loadtest
with "HtmlElement Parents of ..." warnings.
Allow multiple calls to CreateShmMetadataCache(default) without giving an error
Users of 1.7.30.1 are seeing 'Default shared memory cache: Cache named
pagespeed_default_shm already exists' in their error log. This is not
actually a problem, and is caused by LookupShmMetadataCache returning
NULL for a given name even after a successful call with that same name
to CreateShmMetadataCache, due to LookupShmMetadataCache needing
cache_to_use to be set (which doesn't happen until RootInit). Just
silently accept these additional calls to CreateShmMetadataCache and
do nothing.
I made this an option because I believe that we believe that it's better
to serve origin resources for webp URLs to non-webp-capable browsers so
that link-sharing does not fail. The reason that this change is necessary
is to support sites that want to use proxy-caches that cannot work with
Vary:Accept, in which case they will not be able to downgrade responses
to old browsers.
Fixes Issue 848.
See also Issue 846 which, means that setting this option 'off' doesn't
currently work, since our http cache lookups currently do not
incorporate any bits indicating whether the client can support webp. Thus
setting the default of the new option to 'on' for now.
to the original behavior when we're done running deferred scripts.
Unfortunately the page we load may override our overrides (calling through to us
as it does so)! So we can't just put the original functions back. As a result
we check object state and a flag to see whether we should simply revert to the
original behavior or continue to behave as if overridden.
call DeleteFetchInfoIfPossible and have it destroy the map
entry and HostFetchInfoPtr + HostFetchInfo simultaneously with
a new call to ::Fetch having grabbed the entries and released
RateController::mutex_ but not yet incremented the outstanding
events count.
Serf, apr and OpenSSL into pagespeed_automatic.a (plus a few
console data files that got lost).
This should simplify ngx_pagespeed build a bit, and now
that we do symbol renaming make it possible to use HTTPS
there, since symbol renaming should prevent OpenSSL inside
pagespeed_automatic.a from clashing with any in ngx_pagespeed
my pending change of including OpenSSL into psol.a: doing that
would make the renaming take ~5:50, which is insanely slow, and parallelization
brings it down to ~1:21.
colliding with symbols in nginx, do a processing run on libpagespeed_automatic.a
renaming most of non-C++ symbols.
(Part of this snuck into r3636 --- the change in net/instaweb/automatic/Makefile
due to an oversight on my part)
(thankfully not currently possible in {mod,ngx}_pagespeed:
Do not cache empty content when a proactive background
refresh fetch is triggered by a HEAD request.
and rewriter/distributed_rewrite_context_test
In response, force all users to provide mutexes to MockTimer's constructor, rather than as a separate
optional call to set_mutex.
(Also pulls in some cleans up and an experimental flag).
Any virtual host that does not already have a shared memory metadata cache will share a default 50MB one. Users can configure the size of this cache by setting DefaultSharedMemoryCacheKB and they can disable it entirely by setting that to 0.
In order to do this I had to:
1) Create a new helper function get_stat()
2) Move wget results from the root $OUTDIR to a subdirectory (so that that subdirectory could be rm'd without nuking everything in the $OUTDIR). In this case we need to keep stats outputs from previous runs, but fetch_until needs to rm -rf wget output dir between runs.
Note: As far as we can tell this doesn't fix the issue that Huibao found, but this is my first step in de-flaking this test and cleaning up some technical debt.
combine_javascript is now a core filter, and convert_png_to_jpeg is core and
added to rewrite_images. Critical css and webp conversion are not yet switched
on by default by this CL, and more measurement is required for move_css_to_head
(default in PSS) vs move_css_above_scripts vs doing nothing.
virtual method override with a instance of the templatized base class
variable as an argument. This change simplifies the semantics by
reducing the number of virtual methods and eliminating the need to
have a base class instance as an argument.
The only thing I know left that doesn't support this feature is CSS-flattening which can change the base URL for the contents and thus it's not clear whether we should still try to preserve URL relativity.
flow when there is a Cache-Control header added upstream of mod_pagespeed.
This was never needed: we have a different mechanism for correcting
the caching headers in HTML.
contexts into system/.
This is a prerequisite for the change I'm working on to turn on SharedMemCache
by default, which would happen in SystemRewriteDriverFactory::PostConfig().
element has multiple url-valued attributes.
The main fix is that if someone is doing their own lazyloading we can optimize
something like <img src="data://" big-src="realimage.jpg"> with custom
UrlValuedAttributes.
This also means we can now optimize "poster" on <video>, "longdesc" on <frame>,
<iframe>, and <img>, "formaction" on <input>, and "cite" on <body>.
Fixes Issue 466.
just unit tests):
- Add a mechanism to RequestContext to permit session-specific
permitted fetch origin domains, to be used on top of IsDomainKnown
- Teach GoogleFontServiceInputResource to ask for fetches of
{http://,https://}fonts.googleapis.com, as needed.
- Teach LoopbackRouteFetcher to honor the RequestContext whitelist.
More detailed error codes from the scanline interface classes.
This change defines a ScanlineStatus class that is used to report the
success or error of various operations in the Scanline{Reader,Writer}Interface
classes and of the CreateScanline{Reader,Writer} utilities. The old APIs are still
present for backwards compatibility, though it would probably be a good idea in a
follow-up changes to convert the clients to the new convention and get rid of the old APIs.
due to a dropped request after 10 seconds, rather than 5 minutes.
Otherwise you can get into a situation where pages with 5 minute TTL on
resources can never be fully optimized, and even if they have longer TTL
will take 5 minutes to optimize.
Note that this will definitely affect pages with >500 resources, or whatever
our max queue length is, but it can affect smaller pages as well because
this count is kept across all pages.
Also make InsertNodeAtBodyEnd insert before </body> even if the open tag has been flushed already, by adding a new public CanAppendChild method to HtmlParse.
in local storage. They are included only in the image's hash and that
is now used of part of the local storage lookup key (for images). We
don't mutate the pagespeed_lsc_url because that is used as a fallback
when the image isn't found in local storage.
We should really fix this to be more efficient in parsing in some way, but for now this should help people with default settings (especially those with cheap shared hosting).
AFAICT only the two directives I moved over to debug.conf.template were being tested. Perhaps this .htaccess file was there just to test that .htaccess config works? In that case, we now have plenty under mod_pagespeed_test/ that should cover that.
No need to ConsiderResponseHeaders in InPlaceResourceRecorder::DoneAndSetHeaders if we've already failed (by jefftk@)
Add header dependency to rdestl needed for google3 (by jbelmonte@)
I default the option to off so that this CL doesn't get overwhelmed by all the test changes. A follow-up CL will turn this option on and fix all tests.
This also leaves some things pathways out (for example, the image combining path). But I didn't want to bloat this CL too much.
Use StringMultiMap::RemoveAllFromSortedArray to make
Headers::RemoveAllFromSortedArray faster.
This CL is a prereq of fixing bug 664 because that will add a lot more
removing of "hop-by-hop" headers, and I don't want to risk a
performance degredation. See
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html for details on
the list of headers we must strip.
Instead of having a bunch of bools with slightly different semantics
and complicated logic to determine when to delete and when to ->Signal,
just use a categorized refcount (e.g. active rewrites, detached rewrites,
etc.), and check IsDone before and after changing it.
There is a bit of a semantic change: previously ->Cleanup would roughly set a
"now it's safe to care about refcount 0" bit, now it's actually a deref,
so it's no longer sort-of idempotent.
There are also a few more checks; in particular you can no longer just
call StartParse lots of times without doing a FinishParse, a few
tests stumbled on that. ->ReleaseDriver also complains when its handed
something that has non-zero refcounts, so most spots that used it are now
calling ->Cleanup() instead
saves some runtime (not specifically measured) by reducing the number of .as_string()
calls needed to make temps for use in Lookup.
It costs some memory by forcing an extra StringPiece object
for use as the key, while still keeping the GoogleString
backing-store for that string in the value.
This has one small interface change: name(index) returns
a StringPiece now rather than a const char*.
Also a new method RemoveAllFromSortedArray was added, along with a
speed test illustrating why it's better to use that than a
StringInsensitiveSet to implement RemoveAll.
status and (if applicable) a nonce. This simplifies a bunch of control flow and
makes it less ad hoc. Also changes the contract of PrepareForBeaconInsertion to
shift a bunch of checking to the caller, since those checks are more
class-specific than we initially supposed.
effectively min(max(A, B), A) so if A > B then A would always be used,
resulting in setting or getting an invalid index. What we want is just
min(A, B) which is the new code. Note that the DCHECK and the fact that
A would always be <= B means that this couldn't trigger but it is wrong.
2. Removed an unnecessary template parameter by replacing it with the known
type, which is actually more type safe.
Also clarified some tests on GoogleUrl demonstrating a really strange behavior we have there. Follow-up CL coming to fix that wacky behavior, right now I just want to demonstrate it in tests.
- kExpectCached to indicate that everything should come from cache.
- kInputHtml and kOutputHtml so that methods can know if they're being
called when generating the HTML input or output, such as ...
- CssDebugMessage virtual method that tests can override to inject the
debug message they expect after the rewritten CSS.
2. Added debug messages to CSS flattening that explain why flattening
failed. It will be generally useful for any debug messages. It
necessitated a new field in the cached results so that we can emit
the reason when we're emitting the cached value. Also modified the
flattening tests to actually test that things are coming from the
cache as originally intended (kExpectSuccess | kNoStatCheck is an
ugly incomplete hack).
If the origin returns non html content while flushing early, trigger a redirect to a noscript version of the page with flush early disabled (by poojatandon@)
The major pain here was wiring query param-based options into ConsoleHandler(). We usually only check them in the HTML filter or .pagespeed./in-place resource handlers.
bugs due to embedded nuls.
This does slow down the symbol_table_speed_test, significantly,
by as much as 15-20%, but doesn't seem noticeable on html_parse_speed_test.
Based on a sample load test run (with LogLevel info):
$ wc -l pagespeed.messages
4979581
$ sed 's/.*\]//' pagespeed.messages | sed 's/http[^ ]*//' | sort | uniq -c | sort -n | tail -20
9199 Reset: flush in a script.
11547 HTTPCache key= remembering not-cacheable status for 299 seconds.
x 14139 Rewrite script to
22405 HTTPCache key= remembering not-found status for 299 seconds.
x 22732 Starting to rewrite images in CSS in
28065 RenderedImage not found in cache
x 57399 Found script with src
82402 Cache entry is expired:
x 88018 Cannot sprite: no explicit dimensions
x 88580 Could not rewrite resource in-place because URL is not in cache:
x 102364 Could not rewrite resource in-place:
x 118189 Found image URL
x 119582 Attempting to sprite css background.
x 196582 Fetch succeeded for status=200
x 204861 Fetching resource
x 214974 Attempting to save resource in-place:
x 328035 Fetch complete:
x 328166 Initiating async fetch for
x 637666 Serving rewritten resource in-place:
x 828610 Trying to serve rewritten resource in-place:
(x's added to messages I've removed)
Also removed a few others:
$ grep -c "Inline element not outlined" pagespeed.messages
152899
$ grep -c "Unlocking lock" pagespeed.messages
134177
$ grep -c "Locking (lock" pagespeed.messages
134177
$ grep -c "Successfully rewrote CSS" pagespeed.messages
20238
Reducing messages 4979581 -> 1188193 (76%).
There are still 2 tests in base that depend on targets in util and sharedmem, it'd be nice to break that dependency, but at least the rest of the tests don't depend.
unit-test.
Move ModPagespeedDangerPermitFetchFromUnknownHosts into system_rewrite_options, and expand the general-purpose option-processing mod_pagespeed to enforce strict/non-strict process-scoped settings.
Add missing virtual dtor in SystemServerContext
Put all the options for creating fetchers into the fetcher map key.
All of these are the same in Apache and Nginx except for DefaultTimer() where Apache uses an AprTimer and Nginx uses a PosixTimer. I think PosixTimer is a good enough default that it should be in system/, however, and Apache can just override it to get an AprTimer.
if I purge at time X then resources at time X will be invalid. Previously
resources at time X would be valid, but X-1 would be invalid. This reduces
regolding when we switch to PurgeSet.
ApacheProxyFetch, and in InPlaceResourceContextTest's FakeFetch. Add
a unit-test and a system-test for cache-purging with IPRO.
Fix the URL used for the InPlaceDataRecorder in Apache to have the
?ModPagespeed query-params stripped.
Remove method ResponseHeaders::IsDateLaterThan and instead incorporate
that into RewriteOptions::IsUrlPurged.
Simplify PropertyCache application code in proxy_fetch.cc by using Sequences.
Now all three functions Done(), ConnectProxyFetch() and Detach() will run on a
sequence to avoid the possibility of race across them (pulkitg)
- We need a dummy file for rewriter_html to get a binary on OS X
- rewriter_html (and html_minifier_main) should be relying on
pagespeed_http_core, not pagespeed_http (as the latter in particular
needs RE2 which isn't officially supported on Windows)
and new configurations in parallel in the master process. These are independent
instantiations and shouldn't try to share segments. By adding an instance
number to the segment name that is incremented every time a PthreadSharedMem is
created, segment names won't overlap.
This change has been in ngx_pagespeed in a forked version of the files since March [1] but as it should work with Apache and other ports it would be good to merge it upstream.
[1] https://github.com/pagespeed/ngx_pagespeed/pull/179
net/instaweb/rewriter over to pagespeed/kernel/html/.
That also required pushing the wildcard support up util -> base_core, since comment
removal is configured using them.
Original author: morlovich
We were using iframe to download js files in parallel. But this causes
all frame busting techniques to fail and ends up in reloading of pages
infinitely.
Previously, this stat was being incremented in the Put() and "should we rewrite this file?" paths. Those cases are not really things expiring out of the cache so much as never ending up in the cache in the first place. This puts all the HTTP cache stats together and allows us to actually post an info message noting that specific URLs expired out of cache.
many of these files are currently compileed without all the warnings
we usually use. These edits are needed to get the to compile with the
default flags.
- Save selector pcache data for the two flows.
- For critical_css, move link-adding script before <body> (makes testing easier).
- For flush_early, test both critical_css_filter and critical_selector_filter.
The critical_css_filter will be going away eventually.
This CL fixes:
1) Actually unmap the released segments
2) Properly release shm cache segments
3) Properly release local stats segments
[1] children wrap up current requests and exit
add all styles (critical_css_filter has this).
Move ImageFormat to scanline_interface.h
Make ImageFormat, its string representation, and its associated MIME type be derived from a Single Point of Truth (SPOT) via macros. The latter two can be obtained via utility functions defined in the scanline_interface.
Fixed an error in the macro collision check for the pixel format macros in scanline_interface.h.
Regularized the macro names.
is looking for doesn't exist.
- Have a default min caching TTL and use this value to cache resources that
have an explicit cache value set, which is less than the chosen min value.
Hide this behind an option until the experiments are done. This is intentionally
kept separate from implicit_cache_ttl as that applies to resources that
do not have an explicit cache control set.
with some corresponding API changes to CacheableResourceBase and
AsyncFetchWithLock.
Not integrated anywhere, nor is the rest of the framework taught about the
new url() vs. cache_key() distinction.
Fix checkin tests by switching from 1.gif to an optimizable image.
Our 'optimization' of cache-extending 1.gif broke I think because
gstatic started serving 1-year cache lifetimes.
This change to the test is no longer necessary, as we can now proxy
long-TTL files now. It is better to use a more reliably present gif as our
test file.
early. The old technique worked up to Chrome/26, but broke in Chrome/27.
2. Use max of resource cache TTL and implicit cache ttl for caching the
resource when rewrite
Partition() fails and goes through AddRecheckDependency().
Add 2 levels of testing for this:
1) Test in installer.include that there are no @@'s left in pagespeed.conf file.
2) Actually restart httpd after purge/install of new binary on VMs, this will check that the default configuration does not cause httpd to crash.
Also added a fix to PSOL binary building.
Fixes issue 742.
Use the About page instead of the Google's homepage in SerfUrlAsyncFetcherTest,
to prevent the test failing by getting a redirect (e.g. to ccTLD outside US).
--- Expires: was left at 1 year rather than matching the actual TTL.
This mostly didn't matter since max-age is supposed to win, but made
for some ultra-confusing debugging sessions.
Fixing it exposed a unit error in a test in RewriteContextTest:
CachingHeaders can't parse headers for 1 million year validity,
and now we are providing them for both Expires: and Max-Age,
while previously Expires: of one year ended up getting picked up.
correctly.
- Add apache system tests
- Fix issue because of which we weren't setting the critical images correctly in the split_html_helper_filter.
- Add an option to add a meta referer in the split filter.
though I mostly want it for fonts stuff. Along the way, make UrlInputResource
stop pretending it doesn't want a RewriteDriver, and update one test that
passed a NULL one to it. Oh, and make its constructor private, since you're
not supposed to create these directly, either.
updates and avoid hammering the disk.
Note that if we have 20 processes we can still write 20 qps to the disk
which is quite high. So perhaps this needs to be scaled back, but let's
see how this goes.
instead depending on DownstreamCachePurgeLocationPrefix for deciding
whether the feature is enabled or not. Also removing
DownstreamCacheLifetimeMs option completely. (anupama)
Fix system-test break due to a changed TTL at gstatic. (jmarantz)
* Changing DownstreamCachePurgePathPrefix to DownstreamCachePurgeLocationPrefix which takes a host:port location in addition to the path prefix. We add the host:port combination as a known-domain to the DomainLawyer, so that purge requests are sent directly to this domain.
* Modified the purge-url generation logic to use the DownstreamCachePurgeLocationPrefix everywhere.
* Added system tests (with and without varnish installed)
change, retaining no-store, must-revalidate etc from whatever settings
the user might have had.
Changes HttpAttributes::kNoCache from having max-age=0 in it as well,
moving that concatenation to another constant.
(2) Update console documentation to fit with new console.
(3) Link from graphs to documentation sections which explain the metric and possible remedies.
When referer matches a known pattern, do blocking rewrite. When a new header 'X-PSA-Blocking-Rewrite-Mode: slow' is present force the blocking rewrite to wait for async events.
* Adding a "PURGE" request method to be used for downstream cache purging in a later CL (e.g. Varnish-cache-purges from Apache). Note that Purge method support is not being added to Fastnet/Hope paths because it does not seem to be relevant.
* Modified RewiterDriver to use the purge-method (e.g. PURGE) in the purge-request. Also, added a few basic stat checks for number of purges issued. (System test modifications will need varnish installation and will be done in a follow up CL.)
# Re-name to ScanSplitHtmlRequest to DecodeSplitHtmlRequest, and make it no-op when split isn't enabled. Also, add to OS flow.
* Adding a DownstreamCachingDirectives class that uses PS-CapabilityList request header value to decide whether certain optimization-related capabilities such as image inlining, webp etc. are to be supported for this request. Such headers are required in order to allow downstream caching layers to dictate the optimizations that will be done on this request (in sync with their cache fragmentation logic).
* Changing RequestProperties to refer to DownstreamCachingDirectives in addition to DeviceProperties in order to decide whether an optimization is supported or not. Note that only few properties are supported via DownstreamCachingDirectives right now. This will be expanded in future CLs.
Add ModPagespeedLogDir for use by statistics logging file and other future logs (for example, log of failing URLs).
Fixes issue 559 and allows us to enable statistics logging for users.
The error is:
chcon: can't apply partial context to unlabeled file ...
AFAICT, this has been failing for $(MOD_PAGESPEED_CACHE), but succeeding for $(MMOD_PAGESPEED_CACHE)-alt and thus the Makefile succeded because the last command in the for loop succeeded, but this failed when I added $(MOD_PAGESPEED_LOG) to the list.
Improve log messages when things aren't combinable somewhat;
longer term it would be helpful to surface them in a more
accessible way, however.
From ksimbili:
Remove old escaping method for tags which has been there since blink
time.
This was there to solve the early closure of script tag due to
</script> inside the string.
for example:
<script> var str="</script>"; </script>
will through an error.
-Add support for Above the fold scripts execution.
-Move the code in deferJs, so that the costly operation of setting attribute is
done only when needed.
Fix weird bug where variable lists and timestamp lists would sometimes have different lengths.
Force variables to have values at every timestamp. If they did not have any stored in the log, return 0.
+Fix accidental rollback.
This just plumbs the nonce generator into the rest of our infrastructure; I
decided to split off the actual *use* of the nonce generator in beaconing into a
separate revision for your sanity and mine. This one touches a lot of files
fairly minimally; the next will touch a few files but make some largish changes
(especially in tests).
This removes the weird split between ServerContext and Resource here, and removes the
need for the cache code to be aware of non-cache-using things like data: and LoadFromFile resources.
This new class will also be a target for factoring out code from UrlInputResource,
so an another subclass, FontApiResource can share most of it.
Unfortunately, we can't quite remove Resource::UsesHttpCache, since it has external uses,
but at least it lets me remove Resource::rewrite_options().
JS is enabled. This CL fixes the recently found chrome27 issue where
Image.src() does not apply in the <head> portion of HTML. (mpalem)
Fix the valgrind failure caused by a recent change by copying the rewrite
option to a member variable. (mpalem)
the request header, we add a referer field in the request header.
Also we do not add any referer for any original html request. (hujie)
PurgeSet: Enforce monotonically increasing times coming in from the API (jmarantz)
Using Rendered Dimensions Filter to resize images. (poojatandon)
fetchers that we do use to the asynchronos one, and removing redundant
and unused ones. In some cases the remaining implementation needed to
pickup a few fixes from the removed one.
Log image size for non-background rewrites too.
Save the size in the metadata for doing the logging.
Also log low res image type.
image_rewrite_filter.cc has the functional change.
Rest are just for testing it.
Fixed bug in size of option_id_to_property_array_: it was sized after
the base properties were added, to all_properties_->size(), but more
properties could later be added by subclasses.
Addresses Issue 707 in which filters that returned
kTooBusy had their RewriteContext::Render() method called, causing
empty image URLs to be returned.
With this change, if a context was too busy to rewrite then it (and
its parents!) will not render. This means that if just one image of a
CSS file fails to rewrite, then the CSS as a whole will not rewrite.
creating/destroying nodes.
Effect on the speed bench (with std::map --- other implementations
are likely to be improved more) is ~5% speed up of LRUReplaceSameValue,
and ~10% speed up of LRUGets
Tweaks to lazyload images -
1) Set the buffer to 200 pixels after onload, so that images just below the
fold are pre-loaded.
2) Attach the onscroll handler even in the onload setting, so scrolls before
onload trigger image loads.
3) Remove the set timeout on onload since the interaction issues with inline
preview no longer exist.
Change the Fallback PropertyPage suffix for query params and base
path. Update kStatusCodePropertyName and
kIsLazyloadScriptInsertedPropertyName in fallback property page also. (pulkitg)
Moving definition of "status" into the "#if SERF_HTTPS_FETCHING" block because
SERF_HTTPS_FETCHING is going to be set to 0 in ngx_pagespeed for now, and we don't want
"status" to become an unused variable during compilation.
Use the latest WebP in mod_pagespeed.
Change by slamm:
critical_css_beacon_filter:
Avoid adding empty value (becomes undefined in JavaScript)
for CSS file with no selectors.
Add test for a CSS file with no selectors.
Refactor tests to eliminate some boilerplate.
Update CSS parser
2) Added RequestContext::TimingInfo::{FirstByteReturned,GetTimeToFirstByte}
Replace remaining usages of LoggingInfo::timing_info with RequestContext::TimingInfo.
3) Moving of shared_mem* files to new location.
RewriteOptions have been added to indicate whether the feature is enabled or not, how much rewriting (before the response is served out) qualifies an entry for being retained in the cache, and the path to be used for cache purging. If the feature is enabled, if there is significant rewriting remaining to be done (after the response is served out), we detect this case in ReleaseRewriteDriver and then initiate a cache purge. The cache purge in the nginx case is a loopback request that hits the same server (which has a configured proxy_cache).
Some obvious TODOs for future CLs: Support cache purging variations with other pagespeed-internal caches, and other external caching layers (varnish) in a generic manner. Fine tune the "rewritten-percentage" computation.
Removed the SetRequestHeaders call from ProxyFetch constructor to avoid
race conditions in request_headers_ usage.
Introduced a check for request_headers_ being NULL when SetRequestHeaders
is called, so that we know if there are other race conditions in our code.
2. Fix pagespeed_automatic_smoke_test w/clean make tree (Makefile).
3. When doing a blocking rewrite, wait for pending events too.
moving the parser, except for two files --- charset_util and
countdown_timer --- that are not actually needed at this level,
and which got pushed up to :util.
Also drop the unneeded dependency of util_core on protobufs.
Removed several calls in our code which were basically the completely redundant: IsCacheable() && IsProxyCacheable().
Annotated with TODOs several places where we are currently using IsPrivateCacheable(), but should be using IsProxyCacheable().
Flushing critical CSS early as base64 encoded HTML caused page bloat.
Fixing it here by flushing critical css rules as innerHTML inside
a script element that is flushed early. This results in better compression.
Change by ksimbili:
Move mod_pagespeed_console_test away from some deprecated APIs.
- Add time to first paint to the beacon.
Change by ksimbili:
- js_defer: The change to pass the event object
to the event handlers for the events that we
fire(DOMContentLoaded, load).
From http://git.chromium.org/gitweb/?p=chromium/deps/openssl.git;a=summary
before: 2013-03-26 digit@chromium.org Restore the x509_hash_name_alpgorithm_change patch.
after: 2013-01-19 tfarina@chromium.org Remove <(library) usage from openssl.gyp.
The 2013-03-26 version results in valgrind failures on 7/10 runs:
valgrind --leak-check=full \
.../src/out/Debug/mod_pagespeed_test \
--gtest_filter="Serf*Goog*"
Rolling back to the 2013-01-19 version results in all valgrind tests working.
directly to pagespeed/. This is in preparation of moving
this directory, as it makes it much easier to see what non-moved
dependencies it still has (and will make it not refer to net/instaweb
immediately after the move).
I did not re-alphabetize mixtures of net/instaweb/util/public
and pagespeed/kernel/base includes since I expect the
former to convert to latter.
we always want to be threaded these days; and accordingly
adjust Serf fetcher tests to not try to use the non-threaded path.
This required making sure that stats get adjusted before Done() is called.
The fetcher code itself isn't yet simplified, however.
until the DOM content is loaded. This delays the download of the scripts
to a later point as well. Instead, try to download a predetermined
number of scripts early (currently 3) but defer execution.
wind up looking at the rewrite_options_ after they are freed.
Remove the RewriteDriver parent_ pointer and the options() indirection
as it appears we have no such guarantee that parent drivers outlive
their children, particularly with freshening.
Rollback attempt to fix some double-load of CSS in flush early filter,
as it has issues with regression where there is no start head, but
there are link tags in prehead.
just set the options correctly in tests that used it.
Also kill RewriteDriver::mutable_options() which was only
used to implement set_rewrite_deadline_ms
child drivers can instantiate Resource objects that need to continue
to have valid rewrite_options() objects after the child drivers have
been retired.
(I want the methods that actually deal with elements to be accurately
identified since some of them will need to deal with fully parsed
conditional comments as well).
First step in making these interfaces more modular and helps make sense of the TEST_Fs only used to test the logger without need for the rest of the interface.
deep-copy them for every level of .htaccess.
Benchmarking indicates that this more than doubles the speed of
option-merging when only one of the sides of the merge has a non-empty
DomainLawyer.
new class CachingHeaders which refactors the implementation in
Pagespeed Insights in a way it can be efficiently used in both places. (jmarantz)
When an image is rewritten by the CSS image rewriter, the context info
does NOT include all the flags that are needed to make the image
rewriting decisions. Fix this by using the same path to set the resource
context as the regular image rewriting. (mpalem)
Clean up a couple TODOs in critical images finder.
* Modifies UpdateCriticalImagesCacheEntry to not take ownership of the StringSets it is passed, matching behavior of critical selector finder.
* Rename critical_image_set to html_critical_image_set.
Change by gee:
Add implementation for MemberCallback_1_1.
Change by morlovich:
Implement FinishParse in terms of FinishParseAsync, to remove
code dupe (and differences --- sync version wasn't updating a stat!).
More integrated distributed rewrite tests. Instead of mock fetches
from the distributed task the distributed task is run using a test
fetcher that calls RewriteTestBase's other_driver()->FetchResource().
This also let's us test the shared cache interaction between the ingress
and rewrite tasks.
Move img tag counting from logging_html_filter to to a new dom_stats_filter.
Also add counting of external css, script nodes and the number of critical images
used on the page.
earlier code which was inserting the javascript code multiple times in the
page, which ended up clearing the state of the map, and caused us to not found
the high_res_src in the map, leading to blank images.
- Get rid of the experimental mode. We now have 3 modes
1) For desktop - we rewrite inplace
2 & 3) For mobile - insert low res and end of flush window, and either load
high res at end of body, or load it lazily.
Also increased default logging interval from 3s to 1minute, so that we don't fill up the logfile quite so fast. 1 minute seems reasonable for the console use-case, but we can always tweak as needed in the future.
We don't want to use RE2 for targets that Insights
wants, since RE2 isn't directly portable to Windows,
while PSI is. In practice that means we want to use
kernel.gyp:base, not kernel.gyp:util (as the latter
includes RE2) from those targets (and perhaps pull
in hashtables directly if we need them).
specialization, a curry'd version for member functions. This seems the
most direly needed version in PSOL, which has an explosion of pure virtual
"Callback" inner classes which need a single argument.
1) Fixing bug with flushing critical css early.
Issue: We were flushing critical css as well as the external css file.
2) Removing the small optimization which doesn't defer inline scripts
until an external script is encountered.
Reason: There are few sites where the inline scripts at the start
create a blocking external script using document.write.
Enable required filters when critical_css is enabled on mod_pagespeed. Also,
enable rewrite_css when FlattenCssImports is enabled, since the latter is
otherwise meaningless. Note that we do NOT enable filters if they've explicitly
been forbidden - so it's possible to run the critical css filter chain in
isolation (without import flattening and css rewriting) if you want to by
forbidding css rewriting.
(ie not PSS).
Fix knock-on bugs related to the treatment of missing critical image data. We
experimented with various settings (including having no data mean "everything
critical" regardless of whether data was pending or not). We are for now going
with the PSS behavior, where if critical image data just hasn't arrived yet we
assume that images are NOT critical.
enable rewrite_css when FlattenCssImports is enabled, since the latter is
otherwise meaningless. Note that we do NOT enable filters if they've explicitly
been forbidden - so it's possible to run the critical css filter chain in
isolation (without import flattening and css rewriting) if you want to by
forbidding css rewriting.
directive but with hard-wired code in instaweb_handler(). The reason
for this is so that we work even with WordPress's mod_rewrite rules
that send it into a rewrite loop unless they are disabled for the
beacon location; hard-wiring it means we don't have to ask all WP
admins to modify their pagespeed.conf.
critical selector family:
- For beacons, we don't want to include alternates in the computation
- For critical-selector, we want to omit alternates from the
critical portion, but keep them in the lazy-load portion.
Also replace just-added IsStylesheetNotAlternate with semantically clearer
IsAlternateStylesheet.
Noscript nodes are present in cache html body. Fixing the xpath
computation.
Also wildcard the ignore for build system gunk in pagespeed/,
so it doesn't have to be updated every time.
- Stop using functions in what's supposed to be plain sh code
- Pass in LASTCHANGE.in in install.gyp as well, to make that also work
from tarball build
does not report last revision of repo but rather last changed revision
in a path, which should make release building immune to changes to other
branches.
Added RequestContext::TimingInfo, which PSOL code should use to
communicate timing events. Use for fetch and request start/end events.
Change by jmaessen:
Fix webp output image quality test to use a webp-capable user agent.
Looking through the version history, it appears that this test *never*
worked as described -- none of the images on the page were being converted
to webp, and the file wildcard at the end was matching the fetched *html*
file, whose size (for these purposes) we don't care about at all.
use the IsMobile property to determine whether to use small screen
quality for image compression. (mpalem)
Put the explicit dependency on gtest.gyp back into
pagespeed_automatic_test and mod_pagespeed_test. That change caused thoses
tests binaries to run no tests. (jmarantz)
RewriteDriver is overholding rewrite_mutex slightly less)
There were 2 kinds of failures:
- In TrimRepeatedOptimizableDelayed, whether we hit a repeated
rewrite or just 2 cache hits is non-deterministic, so relax the
test to reflect that.
- In a bunch of invalidation tests, whether the re-fetch of input
would be reported as an insert of indentical reinsert depended on
the extent to which the mock clock moved, as that could change the
Date: header. Ensure the clock moves to produce determinism.
Reduce unnecessary cohort lookup:
1. Remove blink cohort lookup for FlushEarlyFlow
2. Only do kBeaconCohort for ServerContext::HandleBeacon
Change by ksimbili:
Fixing the bug with flushing critical css early.
Issue:
<link> tags are written inside <script> tags which makes the bytes
flushed useless.
Add DCHECK on range after SharedMemHistogram is loaded from SHM.
I was getting a problem where ChildInit() was called for a stat
that we apparently hadn't set in the parent init. I think this
check will help to make sure it is noticed ASAP at least in debug.
Bringing cache html diff to the same state as in current Blink (rahulbansal)
Nomenclature changes in class FallbackPropertyPage; Added more
documentation to explain the option added for usage of fallback
properties values. (pulkitg)
Change cache ownership model so the factory/server-context etc is responsible
for deleting everything. This allows more complex sharing models without
needing to make CacheCopy, which actually had a (small) per-lookup overhead,
and another layer of annoying nested callbacks in the debugger.
version, by making sure to capture the state of CSS after the filter preceeding us
has had a chance to run.
This also tightens a ridiculously long lock hold of rewrite_mutex; which has some
risks and kills a tiny bit of dead code.
Reason: This crash happens because of the race between Collector::AddPostLookupTask() and Collector::Done(). PostLookupTasks can be called even before fallback_property_page is not set.
Properly deal with CSS inside <noscript> blocks, by not keeping it in summaries,
and splitting it into separate portions in the load-everything trailer; with
the loader JS adjusted to understand that.
is to avoid calling NewRewriterInfo() except in SetRewriterLoggingStatus().
2. LogImageBackgroundRewriteActivity() still calls NewRewriterInfo() but it
should not be updating the application status anyway so that is unchanged.
3. Set status to ACTIVE in ImageRewriteFilter so that we can start using its
status counts.
creating thread-systems.
This change moves the implementation to platform.cc and contrives to
include that cc files only in testing libraries, and not in production
code.
Remove ThreadSystem::CreateThreadSystem, changing many call-sites to
reference Platfrom::CreateThreadSystem.
via property cache; by just rewriting in per-slot rendering. This
should make the full CSS and partial CSS use the same images
(except in weird corner cases involving on-demand reconstruction of
full CSS); but does make this a lot more sensitive to expiration of
original CSS.
Unfortunately it doesn't fully fix the <noscript>/IE-comment behavior
(which I was hoping it would), as the delayed version still doesn't
distinguish them; but Jan has suggested a solution that can be done
in a follow up.
a) Separate hashes in the cookie using '!' not ',' since our test framework
treats commas as field separators.
b) Removed the 'has_url' parameter because it turns out that CSS that
needed the pagespeed_lsc_* attributes added already had them but
they were removed when copying attributes from the link to the style
and it seems silly to delete them re-add (and it eliminated the code
wart that was 'has_url').
Move RewriterStats::RewriterHtmlStatus into enums.proto as
RewriterHtmlApplication::Status.
Also update pagespeed_libraries.conf, and a re-compile
of optimized version of mod_pagespeed_console.js with newer
closure compiler.
In addition I cleaned up font minifying to use CssMinify methods rather than ToString() methods and fixed a bug where we weren't setting bytes_from_original_buffer_ in copy constructor and assignment operator for Css::Value.
Fixes Issue 508.
Move RewriterInfo::RewriterApplicationStatus to new file, enums.proto,
as RewriterApplication:Status.
Change by wmatthews:
Replace calls to scoped_ptr(NULL) with calls to scoped_ptr()
Remove buggy 3-arg implementation of write_handler_response() that is only used in one place. Just used the 4-arg version there to simplify things. (sligocki)
- Don't enable ourselves if there is no critical selector info
yet.
- Make sure to ask rewrite_driver to write out the DOM cohort
if we're on.
- Keep track of media, base URL (will be needed to fix absolufication
in cases that flattening/CSS filter somehow failed).
- Encode media in the output correctly.
This speeds up a new speed-test by >2x.
Time(ns) CPU(ns) Iterations
before: BM_RewriteOptionsMergeAllDisabled 13300 13200 50000
after: BM_RewriteOptionsMergeAllDisabled 5623 5600 100000
We were setting the protocol number of the response headers to 1.1
even though the request was for 1.0. This caused Apache to chunk
encode the output when it should not have for a 1.0 request. This
caused old versions of wget that did not support chunk encoding to not
close a keep-alive connection. So now we don't set the protocol number
at all, as we have no reason to.
Add ImageJpegNumProgressiveScansForSmallScreens setting
Change by jmaessen:
A sample html file for critical_css beacon testing purposes, and eventually a system test.
Change by nikhilmadan:
Fix new rewriter HTML logging to really log if logging_info() is overriden.
Change by bmcquade:
Make IsLiteralTag static since it doesn't depend on member data and I want to call it
in a context where I don't have an HtmlParse instance.
We were setting 1.1 which caused Apache to use chunked encoding, even if the client was 1.0.
chunked encoding. --This line, and those below, will be ignored--
M header_util.cc
M header_util.h
- Move CacheKeySuffix implementation from CssSummarizerBase to
subclasses, so the beacon filter get the empty suffix it needs,
and have CriticalSelectorFilter provide a proper implementation
that works for it.
- Fix case where JsDetectableSelector() is empty
- Fix handling on unparseable regions.
(from morlovich)
HTML keyword analysis, offering a modest speed improvement. There more
interesting speed improvements elsewhere int the code base. (jmarantz)
Track the number of img tags in a HTML page. (bharathbhushan)
rolling a testing cohort. Also, remove unused (and unmet) decl from
beacon_critical_images_finder.h. This looks like it was omitted from
old CL by mistake. (jmaessen)
Re-use ApacheProxyFetch for in-place-resource-optimization in mod_pagepseed. This
will help resolve a corner-case we spun printing a 'waiting' message under some
conditions. (jmarantz)
--This line, and those below, will be ignored--
M net/instaweb/rewriter/critical_selector_finder_test.cc
M net/instaweb/rewriter/public/beacon_critical_images_finder.h
M net/instaweb/rewriter/critical_selector_filter_test.cc
M net/instaweb/apache/instaweb_handler.cc
code duplication between ApacheRewriteDriverFactory and NgxRewriteDriverFactory,
and between ApacheServerContext and NgxServerContext. This should also help a
lot with future ports. Empty for now; in several followups I'll start moving
functionality in.
by delaying it until Render(); which also fixes SummariesDone not being called on
a cache hit case(!).
Add some state information it can use (which can help the main filter know
when to invalidate pcache entries), and expose it to the subclasses, tweaking
the access API to results as well, to use status rather than NULLness to denote
success/failure.
Add some +debug output.
We noticed !ie in the wild (presumably a browser hack). While fixing it I also noticed that we would parse things like: "prop: foo !important bar" as "prop:foo !important; bar", so added checks as well that each declaration explicitly ends in ';' or '}'.
Track number of WebP conversion timeouts, as well as duration/count of WebP successes and failures, broken down separately by source image type and by transparency status. (vchudnov)
approach for mod_pagespeed. Likely to be wrong in some ways, but I
want us to have some common ground for how components communicate.
(So probably this should have a ServerContext hookup as well...)
selector instrumentation and for its use to produce actual critical CSS.
The data encoding is still rough, and it will likely need some refinement
to get HaveSummaries() to be actually able to render.
Don't attempt to rebuild mod_pagespeed.so in between every system
test. Just stop, wait for Apache to relinquish the port, and start. (jmarantz)
Add possibility to update a statistic from waveform. (vagababov)
memory leaks in error cases they expose, as well as the
API awkwardness of CreateShmMetadataCache.
Also add a very basic integration test for ShmCache.
the former sets the HTML element attribute as we need, whereas the latter does
not always.
Also cleanup the JS code to cater for data being missing unexpectedly, and the
C++ code to cater for an element not having been processed when we expected.
ApacheRewriteDriverFactory into a new class, SystemCaches,
so it can re-used by ngx_pagespeed and can be unit tested.
(Tests will be in a follow up change, mostly done).
Requires distinguishing Cache misses from kRememberNotCacheable in CacheUrlAsyncFetcher response. I do this by passing back a 501 status code for cache misses. Only if that status code is seen do we record the resource.
2. Remove no-longer-used code.
3. Add config option for metadata_cache_staleness_threshold_ms.
4. Logging for decode rewritten urls filter.
5. Tweaks to Logging for LazyloadImages filter.
6. Add X-UA-Compatible header to Flush Early Flow Response Headers, so that
it doesn't break in the defer JS code for IE9.
Deprecate the use of this.nextScriptIndexInHtml_ in js_defer.js as this no longer needed as same functionality is achieved by opt_last_index. Each high priority and normal priority of defer javascript execution will have different psa_not_processed nodes. (pulkitg)
Log more metadata cache and rewriting related fields. (sriharis)
Added an optional third argument for MapProxyDomain that rewrites optimized resources to the third argument's location (such as a CDN).
This is in response to Issue 599.
Change the delay images filter in the non inplace path to use
the image map and add each image in a separate script tag. This
is enabled by default when aggressive mobile rewriters are enabled
and the request is from a mobile user agent.
Also revert accidental premature export of cache_html
actual stop command (graceful is a restart, not stop,
and the init script doesn't support graceful-stop)
Also don't use popen -c, it's not available on Centos5.
Stoping is still flakey with this, but seems manageable
with manual intervention.
combine css, combine js, move css to head.
2. Enabling disable_javascript filter for cache html flow and pass through
case.
3. Making strip comments filter to retain the panel comment while
rewriting the cached html.
Don't spew the contents of detached RewriteContexts during the
shutdown report of leaked drivers; that seems to crash intermittently.
That needs to be investigated; left a TODO.
- Avoid temporary GoogleUrls (older gcc's view these
as a copy).
- We need to have js_minify build by the packing step,
so the packaging targets need a dependency on that.
- Don't pass in a warning flag gcc < 4.5 doesn't
understand on such versions
- Provide a wrapper script around 'ar' that can be
used on system with GNU binutils < 2.18 which do not
have thin archive support.
Fix webp conversion: +lossless fallback, -global quality defaults, +small webp tweaks
Have lossless webp fallback to other lossless conversions.
If --convert_to_webp_lossless is set, the fallback logic is now:
for pngs: webp->png->gif
for gifs, if --convert_gif_to_png is also set: webp->png->gif
No longer warn nor set defaults if global webp/jpeg quality are not set.
Small webp tweaks
Change by (anonymized internal code improver):
Fix return type of a method.
Includes small refactor for looking at stat-patterns after a warm fetch in the ipro tests.
One unresolved issue: add user-settable option for how to set the TTL on the
IPRO-rewritten resources when the origin was loaded via file.
2. Consolidate the semantics of the rewriters header so that is always just a
reflection of the contents of the applied rewriters string in the log record.
To generate these files I used a mac running 10.8:
$ cd httpd-2.4.3/
$ sudo ln -s
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain
/Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.8.xctoolchain
$ ./configure
This generated include/ap_config_auto.h and include/ap_config_layout.h which
I've copied here with no changes.
be set via just setting option values, with possibility
of extensibility via subclasses.
Add unit test coverage.
(Not touching enum extensibility here, but it should be mainly a matter
of virtualizing the convertors).
- Don't write a cache invalidation timestamp of '0'. The transmission of
invalidation timestamp from Statistic to Option was problematic as
stats self-init to 0 and don't have an unset case.
- Add some CHECKs for valid InputInfo timestamps on LoadFromFile resources.
- Changes by alexfh:
- With a clang fallthrough warning
- Fix the fallthrough_intended macro
we happen to have some cache data because we'll populate the FileInputResource
contents & headers, but leave the timestamp unedited.
Also put in some backup code to try to grab the timestamp on demand, but
it should never be reached.
Removed inconsistent (& hard-to-search-for) virtual methods on
Resource: IsCacheable and IsCacheableTypeOfResource in favor of
UseHttpCache.
Fix a case in ReadAsync where a callback would not be called.
Also refactored the unplugged code so that we determine the server
context only if we need it and only once.
2. Documentation for the PreserveURLs flags.
mod_pagespeed and ngx_pagespeed into a separate class.
This is step 1 out of ??? of refactors to reduce code dupe
between mod_pagespeed and ngx_pagespeed and make some
things more testable.
(Step 2 will be to share some of the configuration parsing
code).
- Some refactors of blink/critical image computations
Change by jmarantz:
- Turn off 'extend_cache' for the embed-config tests. This was causing some
flakiness in the system-test looking for .ic. and .cf. resources when depending
on timing we can get .ce. resources.
In the embed_config test, wait for the actual rewrite we were looking
for. Just looking for two .pagespeed. instances can lead to flakiness
because we might successfully cache-extend but not yet have the image
rewrite done.
Make the blink diff computation condition more explicit and prevent unnecessary rewriting for change detection in cache miss case.
Also remove the use of propagate_blink_cache_deletes option -- we always do it.
Allow specifying a CacheInterface when adding cohorts to the
PropertyCache, so that different cohorts may be backed by different
cache implementations.
ModPagespeed off disable HTML rewriting but still serves .pagespeed. resources and can be turned on via query parameters. Modpagespeed unplugged returns from all Apache handlers as soon as possible and therefore .pagespeed. resources are no longer served and query parameters are not parsed.
done on nested drivers on behalf on other rewrite, as it causes
two problems:
1) In HTML path, can end up with CombineJs failing to apply and
caching that it got a private input, making it there for not
apply for a while.
2) In Fetch path, it can get variable names to get out of sync(!)
which is a correctness issue.
of a filter. Preserves the existing semantics of the applied_rewriter string
field, maintaining it in parallel with the rewriter_info message array.
An example of using this new structure is provided in the JavaScript filter.
Major TODOs:
1) We don't distinguish between cache misses and RememberNotCacheable in handle_as_inplace(), thus non-cacheable resources will be recorded (extra string copy) every time they are loaded in Apache (with IPRO on).
2) We should initiate IPRO rewrite immediately after storing input in cache (currently it takes 3 loads to get a resource IPRO rewritten).
making it use AtomicInt32 directly. This makes it
easier to port to other platforms.
Cleanup some of the AtomicInt32 API: making the naming
more compliant with the style guide, and make the barrier
semantics explicit. Fix a couple spots that needed barriers
as a result.
leaked driver was:
http://domain_hyperlinks_off.example.com/mod_pagespeed_test/rewrite_domains.html,
which had a detached rewrite with partition key
rname/ce_1jfHXnFYzU_EA9cv6iin/http://src.example.com/image.png@@_
Of course there is no such thing as src.example.com/image.png but were
trying to fetch it from the live internet and failing. Based on the
normally-stable behavior of our system I think we were doing that
properly, but we must have been doing that too slowly on some test
machines, and therefore leaking the driver due to our time-bound
shutdown flow. So I added a debug.conf.template change to PassThrough
mode to avoid attempting to fetch the resource in the first place.
Also fixes races because were not setting psatest as the blocking-rewrite key
for some VirtualHosts that need it.
which take into account both query-params & the spdy-ness of a
request.
Add using_spdy to RequestContext, and add request_rec* to a new
ApacheRequestContext subclass so we don't have to squirrel that around
as much.
This is all supportive to a later change to centralizing handling of
query-param processing and enable us to use
ResourceFetch::BlockingFetch in unit-tests to get it to better
correlate with the way things work in mod_pagespeed.
trouble with parallel builds. We need any included protos
to be compiled in advance as the build target also does the
sed'ing of the .proto -- and the processed result in genfiles/
is what's needed for the include, not the original.
2. Add mod_pagespeed_prefetch_start script for the cases where no resources
get flushed early.
3. Issue 582: do nothing in our Apache map-to-storage handler if we are in
proxy mode as it causes all requests for pagespeed URLs to fail with a 403
Forbidden status.
Add logging for property cache lookup information. This includes whether each
each cohort was a hit/miss and the properties found while looking up cohorts.
backward compatible for Legacy decoder to understand Non-legacy encoding
and Non-Legacy decoder to understand the Legacy encoding.
2. Add User Agent related information to Metadata cache keys.
3. Fix portability problems in hostname_util.cc.
Add HTTPS fetching support in Serf. Note that the Serf support is
dependent on OpenSSL, which is large and has a complex license. We do
not include it in our open-source distribution by default, and it must
be compiled form source after some small patches to source & config
files. (jmarantz)
The main change is to add a DCHECK that the request_context is non-NULL in the HTTP cache callback (SetTimingMs())
The rest of the CL fixes tests by synthesizing test request contexts or propagating request contexts as appropriate.
fetch_until $URL
wget $URL > foo
to
fetch_until -save $URL
I have only seen a couple of these flake, but decided to fix them all at once. Also did a lot more cleanup in system tests like consistently cleaning up the OUTDIR before each fetch_until to avoid getting index.html.1 files and "set -u" to avoid really annoying failure modes from typos in variable names.
The directives now work by forbidding filters that modify URLs. Image, CSS, and javascript minification still take place, but they are prevented from rewriting the URL.
xpaths. Do the same in split_html filter while matching xpaths. (bharathbhushan)
Fix memory-leak in in-place resource optimization system test (jmarantz)
We currently add pagespeed_low_res_src (and increment num_inline_preview_images)
for anything that belongs the Image semantic category and this might be a
link href in some extreme cases.
This change fixes the bug that occurs due to the mismatch in expected number of
inline_preview_images and the number of low res images acted upon by delay_image_filter.
Previously, in such cases, we used to end up placing the delay-images-js and the low-res-data-maps
only at the end of the body and not at the end of every flush window.
Response headers containing no-transform on non-html resources will be obeyed. The resources may still be cached, but they won't be optimized. This addresses issue 587.
This should fix periodic failures of blink_flow_critical_line_test
and server_context_test, though unfortunately does mean we lose a
DCheck in LogRecord::log_system()
Also add AbstractMutex::DCheckUnlocked() and implement it in the
debugging CheckingThreadSystem::Mutex wrapper, to cleanup
the test in ~LogRecord.
back along with a cache miss. The intent of this change is to add
monitoring to investigate whether a better cache "freshening" policy would
increase the cache hit rate. (gee)
Remove non-compliant optional-args and non-const last-args. (jmarantz)
no longer just block whenever invoked!), though the
thread/queue setup is very coarse.
Also fix a dangling memory access to message body --- the
Apache -> MPS filter was getting the AsyncFetch deleted by
calling ::Done a bit earlier than MPS -> Apache filter
was run for the last time and was accessing that.
convert_jpeg_to_progressive, now that it's in the core set.
Changed some 'furious' experiment test infrastructure, which was
exploiting the difference between explicitly setting progressive-jpeg
and not, to explicitly *disable* progressive_jpeg in the opposite
experiment so the test results come out the same.
Do not lazyload images inside marquee tags since they scroll and hence the
position determined may be wrong.
Handle a couple of TODOs for split_html.
Moving sync creation before the critical section in Detach so that it is
created before the collector is deleted.
A) Our guess of when to use quirks mode is != to the browsers and if we "fix" something from quirks-mode format to actual CSS format, we could change the look of site (oy) and
B) Our logic for when we decide to use quirks-mode isn't even consistent. We do it for CSS found in HTML, but if we get a request for the same CSS from the same UA, but don't have the result in cache, we will redo the parse, this time in quirks-mode (just to be safe).
I think the only reason that we originally used quirks-mode parsing was that it allowed us to parse more stylesheets, but now with "unparsed section" handling, we can still parse these stylesheets, just ignoring the quirky sections.
Always send blink js in split instead of falling back to deferjs-js when xpaths are absent.
Support logging for split_html rewriter.
This fixes broken pages on WPT benchmarks when critical line info was
absent but critical images are present. It might fix other issues too. (bharathbhushan)
In sites like www.lifed.com, ad iframes are loaded from the same domain (lifed)
and our addition of the GA snippet (in case of live experiments) causes extra pageviews
to be recorded for these pages, causing analytics data to be distorted. (anupama)
Run 'apache_release_test' without symlinks so we see whether testdata
is complete. (jmarantz)
Don't fail AprMemCache* tests when memcached is not running and the
port is unspecified. (jmarantz)
Image doesn't get rewritten when we turn it off with headers.
wget: missing URL
Usage: wget [OPTION]... [URL]...
Try `wget --help' for more options.
install/system_test.sh: line 103: --header=ModPagespeedFilters:-rewrite_images: command not found
Add to our advanced beacon, the following blink timing metrics (with the mentioned query parameters):
CRITICAL_DATA_RECEIVED (b_cdr)
CRITICAL_IMAGES_INLINED (b_cii)
CRITICAL_IMAGES_LOW_RES_LOADED (b_cilrl)
CRITICAL_IMAGES_HI_RES_LOADED (b_cihrl)
NON_CACHEABLE_DATA_LOADED (b_ncdl)
NON_CRITICAL_LOADED (b_ncl)
Also, change the recorded start time to use window.performance.timing.navigationStart whenever available.
This directive provides control over how long mod_pagespeed will wait to optimize a resource per flush window before putting the optimization task in the background (for future requests) and returning the unoptimized resource to the user.
deliver an https resource with a fetcher that doesn't do https.
It used to basically just drop the request on the floor, not calling
the callback. The resulted in 5-second freezes until time out and
leaked rewrite driver on apache_debug_smoke_test with the not-quite-
done-yet change to add the feature to mod_pagespeed.
For example: #MyForm\.myfield { ... } should not be re-serialized as #MyForm.myfield { ... } (which has different semantic meaning).
I also separated identifier escaping from string escaping to avoid making that so strict.
Make SimpleSelector::ToString() always return an escaped identifier and don't double escape it.
Fixes issue 574.
prevents the turning on of filters using query parameters or headers
if the filter is not already enabled. This is to allow site admins to
lock off filters that they never want to run, for example for
performance or security reasons.
comments. Note that we already had a filter named 'debug' but it
only enabled debugging features in other filters.
Adjust the testing of css_combine_filter's debug mode and fix its
custom Flush messages (which were wrong at end-of-doc).
tests in mock_timer_test.cc into mock_scheduler_test.
Make mock_time_cache always-active in TestRewriteDriver, though it's
dormant unless awakened by RewriteTestBase::SetCacheDelayUs.
Remove direct access to the MockTimer, as time should now only be
adjusted via the scheduler. Add helper methods to advance time in the
scheduler to RewriteTestBase and TestRewriteDriverFactory.
"testing" with a MapOriginDomain http://localhost:8080https://localhost:8443!)
- It's insufficient to check whether connections are using SSL since
with actual SPDY connection slave connections won't (but they'll still
have handy SPDY version info)
- Teach LoopbackRouterFetcher to deal with different schemas, now that
https is a possibility. Also run it after mod_spdy_fetcher since we don't
want to be messing with the hostname, and mod_spdy_fetcher has its own
"talk to ourselves!" mechanism.
- Produce the EOS bucket separately, batching it with the data seems to
upset the header reader in Apache, and doing it together is problematic in
cases the last data read is speculative.
- Images are flushed using inline script and this inline script is first thing in the flush head section so that this script will not get blocked because of other resources.
- Scripts will be flushed only if defer_javascript is disabled.
StringPiece::ToString(), and StringPiece::as_string() made in
contexts where it is more efficient to use the original string
directly.
Factor out some stats collection hacks unto a helper function. Add
a TEST: message where one was missing. (jmarantz)
element. This is needed for a subsequent change, which moves the
deferJs/blinkJs code to bottom of the HTML rather than in the <head>. (ksimbili)
Make cache_url_async_fetcher_test.cc compile on g++ 4.1 by avoiding the passing
of empty-constructed class instances with private copy-ctors.
downgrade "shutting down child" from Error to Info so this will happen
silently.
Address http://code.google.com/p/modpagespeed/issues/detail?id=568 --
Keep track of which process first noticed a cache.flush and only
print the message once.
(r161115), but don't flip over the switch yet, as we also need libpagespeed
to be updated, which it should be able to do after this is in.
The bulk of the change is in needing to indirect to scoped_ptr.h as it's
now in a new place; with an addition of a version check for stl_util.h
rename and some build system adjustments, most notable of which is passing
of Chromium version from gclient to gyp to a #define.
Most of the work for this change was done by Libo Song (lsong@)
to mod_rewrite mangling. Rather than looking at the path in the request
(which might be affected by mod_rewrite), we look at the full URL we
stored in the note, which we must then decompose with GoogleUrl. To
avoid adding an extra url-parsing step, we do that only in the logic
that handles the url from the note after we failed to parse it as a
.pagespeed. resource. (jmarantz)
Update RewriteDriver method documentation to reflect reality (matterbury)
These parameters prevent filters from modifying URLs when rewriting
HTML documents. The filters are loaded into memory but not applied in
the HTML rewriting path.
merging a few test-classes together that only added helper methods.
Finish moving the non-trivial RewriteContextTestBase methods into the
cc file. (jmarantz)
Outlining defer_javascript code which will be served from the server
or GStatic (ksimbili)
- Moving GA js snippet to end of body and simplifying our furious snippet to not carry anything other than the setSiteSpeedSampleRate call and the custom variable setting for denoting the experiment.
Clean()ing up the driver while we were still waiting for it.
This in particular resulted in check failures of waiting != kNoWait
in CheckForCompletionAsync during load test w/worker MPM, as
a 2nd resource request would begin waiting for completion of a RewriteDriver
released by ResourceFetch of a first one, while first one was
still waiting on it.
Also make sure to capture the cleanup bool in FetchCompleteImpl
properly within the critical section.
This is similar to how mod_pagespeed parses query_parameters and request headers for options. The major difference is that response headers are only parsed for HTML pages, not resources.
Fixes issue 160.
operation by later filters. Update existing filters with that
functionality, and add tests to show that we don't half-do cache
extending pointlessly like we used to before this change.
Also fix the cache extension count stat; it was counting up
when -any- filter optimized the slot, not just the cache extender.
(arguably justifiable, but less so given filter ID logging).
This does not yet attempt to deal with the case where we actually
want to run a filter even if some of its input slots are marked
as unsafe for further processing. (Will be needed for JS combiner
when library identification is on).
timescale of health checks 30 seconds, and to revert the squelching of
log messages. While the log messages may fill disks, eventually, at
~5 messages every 30 seconds it would take a long time, and it's very
nice to have the messages for forensic value.
so we get a consistent view.
In most cases the canceling of RewriteContexts is done centrally in
RewriteDriver, deleting the contexts rather than adding them to the
queue. The exception is CssFilter which holds onto a created
RewriteContext after attempting to initiate it, so in that case we
most proactively avoid adding the filter.
In all cases we unit-test the filters both for HTML-Rewrite & Fetch
with unhealthy caches.
Added an id->enum lookup function for use by forthcoming directives.
2. More system-test tweaks to get better diags info.
3. Another attempt at making QueuedWorkerPoolTest less racy.
attributes (mtime timestamp + contents hash) in the metdata cache with
a key that includes the server hostname and the file path. This way we
don't have to configure anything special and there is no stomping of
each other's attributes by different hosts. This works if we're using
memcached or the file system for the metadata cache so there's no need
for configuring or enabling.
the implementation from AprFileSystem). Make it testable in open-source.
Also make StdioFileSystem default for Apache. The Apr file system
accumuleaks memory into a pool, and fixing it would require allocating
a temporary pointless pool on just about every operation on top of existing
wasteful lock acquisitions, making it considerably less efficient than just
asking the OS.
(The bulk of the change is just injecting a new constructor argument to
StdioFileSystem all over the place).
But make sure that we can still reconstruct rewritten resources when
they are fetched, even though we will have to do the fetch from
origin & re-run the optimization each time.
Also adds in a CacheInterface ShutDown() method though it's not used yet. This will be needed soon.
method for determining health.
Use a thread for all cache operations in AsyncCache, and simplify that
class and its testing to allow only one concurrent thread.
up as a crash during a load test. buffer_ was checked at the top of
ApacheMessageHandler::MessageVImpl but used later on without a lock.
Restore change 2090 which had accidentally gotten squashed:
Unbreak the 2-server setup of load tests (morlovich)
Switch from 'check' to 'check_from' in a few places to
help debug some breakage. Fix that breakage -- In Apache 2.4 we are seeing
W/"0-gzip" as the etag. (jmarantz)
context. Also refactor mod_instaweb.cc to use GetQueryOptions from
ServerContext, and move the XmlHttpRequest check to request_headers. (sriharis)
Make lazyload not load the images if they are hidden. Load them after onload. (ksimbili)
Switch from 'check' to 'check_from' in a few places to
help debug some breakage. Fix that breakage -- In Apache 2.4 we are seeing
W/"0-gzip" as the etag. (jmarantz)
style= attributes (issue 535).
This changes how we construct the CacheKeySuffix(), which is used to
deal with possibility of relative URLs inside inline CSS getting resolved
differently in different contexts:
- We don't even need the suffix in case there is no url() in the style
attr.
- In case there is a url(), use the absolufied version of the style=
as the disambigator, rather than the directory itself. This way, absolute
URLs, or relative URLs that resolve to the same thing will end up sharing
a cache key.
the 'real' and the 'other' driver in this class: added a SwapServer
method to swap between the many 'real' and 'other' variables, so that
we can fetch using either driver.
Previously: <link rel=stylesheet href=... title=foo> -> <style>...</style>
Now: We do not inline that style because it has unexpected attributes.
Other possibilities might be to transfer the extra attributes to the <style> tag, but I don't think we can do that for all of them (for example title seems to only be talked about for external CSS).
Exposed by testing for issue 523.
two arguments, as it makes it easier to deal with initialization order
dependencies. This makes it possible for Srihari not to have to jump
through hoops in the test portion of his L1-metadata shadowing
fix.
Remove redundant string conversions: calls of string::c_str(), StringPiece::as_string() (qrczak)
Fix ajax rewriting always retrying and recomputing the result
when sharding is on, by centralizing the domain-lawyer based
normalization logic, and fixing a lookup in RewriteContext to use it. (morlovich)
Get is guaranteed to return AFTER the callback has been invoked.
b) Added CacheInterface::IsMachineLocal so that we can tell if the
cache is local (private) to the server [well, as best we can tell;
with some implementations like AprMemCache it can be difficult or
impossible to stop others from using your cache].
The pedantic filter is used to help prevent pagespeed from breaking HTML4 compliance.
Specifically, it adds default type attributes to script and style tags where they have been
omitted.
and <ModPagespeedIf !spdy> actually work.
Also make it possible to turn on spdy-configuration with a header,
X-PSA-Optimize-For-SPDY, so that people who use something other than
mod_spdy to terminate SPDY may be able to use this, and so I can test
this without requiring hard-to-build tools like spdycat.
This is the code part of issue 450.
IsAnyFilterRequiringScriptExecutionEnabled() to not use pointers to
member-functions. (rmathew)
Rollback parallel download of js files in Firefox and IE using iframe
technique. (nikhilmadan)
that the same memcached instance can be used with different file-cache
backing store. This also mates the CacheBatcher with the
memcached-connection, so we can batch requests across different
vhosts.
Use the new key_value_codec and FallbackCache infrastructure in util
rather than having that functionality embedded in the memcache
interface code.
Eliminate apr_mem_cache layer, although its unit-test still lives.
After this CL a follow-up will 'g4 mv' apr_mem_cache_servers.* to
apr_mem_cache.*, but this incremental step will be easier to review.
Add new fallback_cache implementation, abstracting out into a re-usable utility class
some of the infrastructured used routing large cache Puts away from memcached and to
the local file system.
2. Added a new test script for blink.
3. Fix a bug in the callback for blink cache delete propagation.
4. Update change detection hash in logging mode.
standard cdn. Right now only a sample configuration is included; we are working
on documentation and a more thorough default configuration that can re-direct
all libraries hosted by ajax.googleapis.com.
RefCountedObj inheritance to containment to reign in direct uses of
the underlying GoogleString*. All the ugly behavior is still existing
for performance reasons, but is a bit more structured in the way it's
called.
- Remove a comment that wasn't meant to be left in
(but just to spark discussion during codereview)
- Extract out the statistics + console handlers into
own routines.
and construct special spdy-specific configuration object.
(Not actually used during rewriting yet, however)
Also adds ?config and ?spdy_config modes for mod_pagesped_statistics,
as they are needed for testing.
2. Flush blink JS early if split_html is enabled.
3. Refactored ParseSingleImport into a new ParseNextImport function and
reimplemented the former using the latter. Required for coming change.
- Fixing a bunch of integration issues:
- Making panel_loader.js working with Split
- Correcting the initializations of panelLoader.
- Cleanup: Remove panel_config.proto.
already set up when actual request comes. (pulkitg)
Added a filter which is used to count the number of rewritten
resources in the page and write to property cache. This filter should
be first post render filter. This filter can be used for more
rewritten content scaning.
Remove html_parser_types.h as it has proven IWYU-unfriendly.
Setting charset from FlushEarlyInfoFinder if it is not set already. (mmohabey)
Add an experimental filter for applying smart-diff. Adding smart-diff
to the diff detection flow, only in logging mode. (guptaa)
Logging of extend_cache_*, *_combine, and minification filters for css and js. (anupama)
2. Resolve a character-set encoding issue with blink and IE9 with ISO-8859-1
content: Strip out "content-type" META tags from original HTML, if any;
insert an appropriate one. Send a UTF-8 BOM before the HTML content so that
IE9 recognizes it as UTF-8.
3. Enable Lazyload images for blink.
4. Avoid pages going to quirks mode in IE if deferJs is enabled.
5. Fixing parallel downloads with firefox and IE.
6. Resolve issues with Blink and IE9. Fix below-the-fold table-layout in IE
using a workaround. Undefine document.all in IE during DeferJs.
This adds no new functionality, but enforces a stack discipline for the
explicit initialization of this data.
This is in preparation for a future change where the static parts of Options will
be moved into static structures so they don't have to be copied every time
RewriteOptions are created, which can occur multiple times per request.
- User Agent field added to the Blink Info proto.
Now, User Agent supports WhiteList and BlackList determination too.
(poojatandon, snagori)
- Fix lazyload images for IE in quirks mode. (nikhilmadan)
2. New version of the mod_pagespeed console.
3. If the flush-subresources filter is enabled, the DNS prefetch filter will
flush the DNS prefetch calls in the flush early flow.
Return the original size of an optimized resource in the
X-Original-Content-Length header. (mdw)
Rename ResourceManagerTestBase -> RewriteTestBase. (piatek)
Moving furious-related "need_to_store_experiment_data" from
RewriteDriver to RewriteOptions. (Since furious id is anyways in
RewriteOptions, adding this related info here should be fine.) (anupama)
Add flag to run html diff detection in logging mode - in this mode,
diff is computed, but if a mismatch is found, critical line is not
recomputed. This mode is used only for logging purposes. (guptaa)
vector, which saves exactly 2 pointers worth of storage per
element (as well as an allocation). This results in ~7.1% memory
savings on single-shot parsing of the giant HTML Nikhil
extracted (1007.7MiB -> 936.0MiB)
(real numbers likely smaller due to shedding stuff on flush, and
less pathological HTML)
to infinities add two extra buckets.
This is because extanding out the first bucket from [0, ...)
(a common case) to (-inf, ...) made it impossible to estimate
the median if it fell there, which is common for some timing
diagrams.
In Apache, we can have distinct caches per VirtualHost
(ResourceManager), so all of the caches structures currently
owned and referenced in factories should really be held in the
ResourceManager instead.
This was earlier dealt with ad hoc with redundant pointers and
assertions and this has grown unmaintainable.
This CL enables the proper management of memcached connections, per vhost,
and independent from the file-caches.
I believe it also makes it easier to adopt property-cache capabilities
in mod_pagespeed properly, though that was not the original intent of
this change.
SetSuggestedNumBuckets, since I want to make them
have different values in ShmStatistics, and as I feel
that MaxBuckets doesn't clearly communicate what it does.
Also some misc. cleanups of neighboring code and comments.
2. More flexible logging.
3. Quote url= in noscript redirect.
4. Making the content type charset meta tag detection logic common between
meta_tag_filer and suppress_prehead_filter.
5. Provide the ability to directly set url of html slots as well as css slots.
empty and dns prefetch info does not get populated. So, Insert Dns Prefetch filter
never gets applied. Moved SaveOriginalResponseHeader() call to
supress_prehead_filter's EndDocument. (pulkitg)
- Currently, defer_js, lazy_load and inline_preview_images filters remove the src
so insert dns filter does not get applied on those domains.
Moved insert_dns_filter above all post render filter. (pulkitg)
- Fix wget version check to accept versions > 1.12 as well (jud)
2. Copy Google's favicon.ico and logo.gif to www.modpagespeed.com/do_not_modify/
with content-hash so that we can make sure they do not change and can be
used in serf_url_async_fetcher_test in perpetuity.
3. Add a utility method for getting an attribute's escaped value from
an element.
Decoding url fails with urls like 1.js?a#12296;=en
Now we don't decode the url in src attribute. We just change the
attribute name to orig_src. (ksimbili)
Rollback updates to suppress_prehead_filter.cc pending further
investigation (morlovich)
Better URL sanitization. (jefftk)
(instead of ?ModPagedspeed=off) that disables js-inserting filters and
inserts a <link rel=canonical href=original_url> in head. This also
sets the stage using cookies to prevent redirects on every request. (sriharis)
+others
Disable lazyload if the browser doesn't support inlining. Also, don't call ComputeCriticalImages if the user agent doesn't support inlining. (nikhilmadan)
Add an option to control the maximum size of an image in CSS that is converted to webp. (nikhilmadan)
Add new flush early tests to test.gyp. Missed this in the previous CL. (nikhilmadan)
a default implementation, which is to loop over the entries
and call Get, but it also has a more interesting implementation
for memcached, which supports multiget.
to view global stats if per-VHost is on. Fix HTML timing histogram.
Add some rudimentary testing for the per-vhost stats flag.
(Fetcher and cache stats still global only).
flow tests independent of browsers by introducing dummy user agent strings.
(mmohabey)
- Flush all the css in the html early with flush subresources filter instead of flushing css only in the head.
(pulkitg)
ModPagespeedUsePerVHostStatistics. Setup works, and mod_pagespeed_statistics
prints the local data. Some stats are global-only for now, however,
and there is no way of viewing global stats if per-vhost is on
for this revision.
subresources early is enabled. (ksimbili)
- Dont update kSubresourcesPropertyName value in property cache if user
agent does not support flush_subresources. (pulkitg)
- blink: Splitting the StripNonCacheableFilter into 2 separate filters.
This is being done since just the stripping of non cacheable content is
required for the HTML diff detection part, but other things being done in
this filter are not. (guptaa)
- Blink logging improvements (poojatandon)
- Disable insert_dns_prefetch for non-chrome browsers.
The interaction between insert_dns_prefetch, which applies for all browsers and
collect_subresources, which applies only for chrome, was causing FSE to not
apply on WPT benchmark runs. (bharathbhushan)
- Use base CriticalImagesFinder by default in the resource manager instead of NULL pointer.
(jud)
- Supressing a LOG(DFATAL) in blink flow since the new html diff detection feature has scripts
which are not deferred. (guptaa).
- Remove stale dependency in automatic/Makefile; it's gone since we now use
our own url_to_filename_encoder.
- Do not report timings beacon in blink when doing testing, for better
determinism. (rmathew)
- Introduce a flag to override the cache-time for cacheable resources in
Blink. (rmathew)
- js_defer: Moving the following code out of experimental.
* Parallel download of scripts.
* Also not defering inline scripts until script with src is found.
(rmathew)
This was originally added when the parser was young and had blanked a few CSS files simply because it could not parse them. Now it has grown to the point where I do not expect blanking the page to be a failure mode.
"‘hex_buffer[0]’ may be used uninitialized in this function"
The warning is false as hex_buffer[0] is only read in
case kFirstDigit and we transition to kFirstDigit only after writing
to hex_buffer[0] (but gcc can't prove that).
- blink: First attempt at automatically detecting when the publisher html has
changed. This is still experimental and the approach needs more
experimentation and iteration. (rahulbansal)
RewriteSingle().
This introduced a race condition where the RewriteDriver() which was used to
fetch the resource is deleted by the time we call Render(). Thus the
RewriteOptions are also deleted and resource->url() uses old RewriteOptions
which are already deleted.
It also fixes another race condition where SaveOriginalHeaders might be
getting modified in another thread while getting coped in one thread.
(mmohabey)
- If the resource is cacheable, then second request is doesn't download
anything extra.
- If the resource is not cacheable, then it is better to place the
script request as soon as possible. Many ad requests are of this
kind. (ksimbili)
Only onload should be waitiing for async scripts to finish and nothing
else need to wait.
- So far we were delaying DOMContentLoaded event, which was delaying
the browser onload for few sites. (ksimbili)
Adding the limit on combined resource size.
This is being added to do experiments to figure out the optimal size for the combined resource. (ksimbili)
within a single process only (so not within multi-process servers
like Apache). This is mostly so that existing shared-memory histogram
implementation can be used inside tests, but could also be potentially
a useful way of bootstrapping deployments.
(framework only, no user option yet) [nikhilmadan]
- Some slight markup fixes in statistics.
- Adding gflag for report unload time. [satyanarayana]
- Ensure that the client domain rewriter script gets
appended even when body spans multiple flush windows.
[rahulbansal]
- Keep track of applied rewriters. [mmohabey]
Subchanges:
* std::string -> GoogleString in apache dir.
* #ifndef out references to LogMessageHandler which doesn't seem to work in Google3
* RewriteOptions::ToString() -> RewriteOptions::OptionsToString() because it was originally named the same as static RewriteOptions::ToString(Foo foo) methods which caused "shadowing error" when trying to compile ApacheConfig.
This CL contains the actual changes for HTTPCache invalidation of URL
patterns and also the rewrite options changes for this. I can split
the latter into a separate CL. (sriharis)
Adding a comment in rewrite_query.cc indicating that the current logic
causes css inlining-threshold to change when only
ModPagespeedImageInlineMaxBytes header is specified. (anupama)
Done by bvb working with sarahdw, divided roughly as follows:
bvb: tests, integration with Statistics constructor, SharedMemVariable changes, UpdateAndDumpIfRequired, Logger ctor/dtor, responding to CL comments
sarahdw: DumpConsoleVarsToWriter, IsIgnoredVariable
Turn memcached tests back on. apr_memcache was not
initializing properly because the thread_count being passed in
w as negative. This was due to AutoDetectThreadCount not
being called until after the cache was initialized. I moved
that init to the ctor to make it work properly.
- First cut of url rewriting on the client side. (rahulbansal)
- Move the lazyload JS to before the first image instead of always putting it in
the head. The other side effect is that we don't insert the JS code when the
page doesn't have any images. Also, do stricter checking for dfcg slideshows.
(nikhilmadan).
- Initialize in ctors suspiciously uninitialized member variables. (piatek)
- Comment out memcached system test till we get to the bottom of it. (jmarantz)
- Rewrite CSS in the Characters() hook. Since we are guaranteed by HtmlParse that
we will always get only one Characters() block in a <style> element, we can
rewrite that inline script immediately.
Obviates the need for css_filter_across_flush variable, since we no longer fail
to rewrite CSS which spans across a flush boundary. (sligocki)
- Switch apache_furious_test to using fetch_until, avoiding potential future flakiness
(jefftk)
- Fixing cache_extender counter for number of cache extended resources on
a page. (anupama)
- Adding an option to disable blink dashboard which'll be used for
validation.
- Added a log DFATAL to find out if any script found with out
pagespeed_no_defer. (ksimbili)
- Make Furious extendable; added Merge to Furious to make Clone easier to
override in subclasses (mukerjee)
Done by bvb working with sarahdw, divided roughly as follows:
bvb: tests, scraping and parsing data, responding to CL comments
sarahdw: tests, DOM manipulation, Charts API.
inline_only_critical_images, defer_js, and blink.
- Decisions about cache extension for resources happen in about two places:
first we have to decide which urls to consider and then once we've fetched
them we have to make sure that it's really ok to cache it. Previously the
code only fetched resources for further consideration from urls found in
attributes that are supposed to contain images, stylesheets, and scripts.
This change expands consideration to all attributes that are expected to
indicate resources.
This is only an intermediate step, however, because once we fetch the
resource one of our checks is that the content type is an image, css, or
javascript. Which means that while we will now identify <audio src=...> and
fetch the url, we won't actually cache-extend it because it's not on the
whitelist. In standard usage this change won't make any changes to what
resources end up being cache extended, though more resources will be fetched
from the origin server, potentially large audio and video files. After this
change is in, I'll be coming back to change
CacheExtender::RewriteLoadedResource to have a more complete whitelist for
what should be considered for cache extension.
- Expanded system testing for url-valued attribute detection and pulled it into
its own script.
- Add an experimental filter, experiment_spdy, for forcing some resources onto
https and hence, for a capable client, SPDY. This is temporary and will be
removed after more testing.
Include the index of the active furious spec in resource urls, and
when handling a resource request parse it out and use it to apply
experiment settings.
This changes the pagespeed resource url format from:
LEAF.pagespeed.FILTER.HASH.EXT
to
LEAF.pagespeed.EXPT.FILTER.HASH.EXT
... change too long to be displayed here ... (jefftk)
Experimental code to measure the preview-images gain. Always inplace
low quality data uri's in the image tag itself. Replace low quality
data uri with high quality images only after onload of that image is
triggered. (pulkitg)
Blocking-rewrite cleanup (bharathbhushan)
offsets. It did not actually work, as it was done
when creating the cache in the parent process anyway,
before any forking, and it makes the unit tests
fail on some newer Linux distributions.
(See issue 453)
libpagespeed versions that have it on for chromium_code,
as it can cause binary compatibility problems (showing up
as crashes in serf tests with gcc >= 4.7 --- see MPS
issue 453).
scripts are done executing. Also cleans up "psanode" elements from the DOM at
the end.
2) Flush Subresource Early Filter.
3) Trigger onload only if browser onload is triggered. Else wait for browser.
4) Remove the old prioritize visible content options.
5) Place js request for all js files for now. We will need to do some marking
of nodes to indicate a resource is cacheable or not by the browser.
Converted "GoogleString& foo_;" -> "GoogleString foo_;" because I think most cases do not guarantee that the string reference they are initialized with will live long enough.
Converted "Foo& foo_;" -> "Foo* foo_;", which acts identically, but makes it much more clear from the call that a pointer is being saved and must be valid for some lifetime.
Use RewriteOptions::x_header_value() instead of the extra version parameter in ProxyInterface.
Set default mod_pagespeed header value in ApacheConfig constructor instead of using "__VERSION__" dummy value and requiring user to replace.
2. timing.proto deprecated and placed inside logging.proto.
3. Use full URL based matching for blink.
4. Set a limit on the max HTML rewritten by blink.
5. Option to apply blink on all URLs if families is empty.
6. Fixed memory leak in proxy_fetch.cc
7. Added utility function to determine the charset of a CSS resource.
of the resource tag scanner to identify other element attributes as being
url-valued in addition to the default set from the html4 and html5 specs.
Fixes issue 437
- Rewrote the client/session -> experiment interface for furious to be
extensible (mukerjee)
- Changed /mod_pagespeed_statistics handler to print all variables with space padding.
(sarahdw, bvb)
- delay_images: fix problems with multiple flush windows (pulkitg)
families then we replace this' blink families with src's. (sriharis)
getElementById() should return node which is already processed by
defer_js, not the node which is not yet processed. (pulkitg)
Make blink option parsing tolerant of extraneous whitespaces. (sriharis)
Freshen modpagespeed.com, giving it some logos & links to other project docs (jmarantz)
in Apache: as with prefork we can't really have too many
concurrent corrections, we will need to drop things far
more aggressively. With that, remembering failure for 5
minutes will clearly be too much, so mark those drops
with a header and give it a separate 10 second TTL.
(poojatandon)
- add_instrumentation: Add tail script only once even if
there are more than one body tags present. (satyanarayana)
- add_instrumentation: Support incorporating experiment ID
into the beacon (31066430)
- lazyload_images: While replacing the image src create a new
element and replace, instead of setting the src, since that
doesn't seem to always work in chrome. Also, change the setTimeout
to 0 since we just want to make sure we don't block. (nikhilmadan)
- defer_js: Fix handling of some additions of listeners
after firing of deferJs.Dom_Ready (pulkitg)
- defer_js: Fix bug with async scripts in IE (pulkitg)
- blink: Make the default blink behaviour be to apply on all urls
(with default cache time). (sriharis)
always know the final mimetype, we employ conservative strategies (e.g. //[CDATA hack)
when we don't think we know for sure. We can also be sure to close <link>
tags.
Fixes Issue 439
1) Make check() call ["$@"] instead of [eval "$@"] so that we don't need double quoting or other tricks.
2) Make apache_system_test.sh source system_test.sh so that (A) we only need to run one executable to test do testing and (B) we can share the shell function definitions (which had started to skew).
3) Other minor cleanup like grep -> fgrep or egrep where I noticed it to make it clear what we were testing for.
drop the requirement for using pollable fetchers inside
Apache, as we should no longer be relying on the poll
functionality anyway (instead using RewriteDriver
to block while resource fetches happen in an another
thread).
So .ttf files would not be cache extended, because they were not a recognized content type, but .html (or .swf, etc.) files would be cache extended as .txt.
This came up because when CssFilter falls back to CssTagScanner, it no longer knows the context of URLs it is trying to rewrite, so it passes all to image rewrite filter and cache extender. We don't want to cache extend non-CSS/JS/images.
- Change the default for ModPagespeedFileCachePath from
/var/mod_pagespeed/cache/ to /var/cache/mod_pagespeed
- Remove ModPagespeedGeneratedFilePrefix, which we had reserved
for future use and now don't intend to use.
generated using document.write). (ksimbili)
Moving some of the image paremters from options to rewriters, as this
causing lot of confusion in the bridge code. (satyanarayana)
Rename blocking-rewrite prefix used for testing to "psatest".
Strip authorization: on the proxy host. (morlovich)
Make RewriteDriver's fully_rewrite_on_flush controllable by a HTTP header. If
X-PSA-Blocking-Rewrite request header's value matches the flag configured value,
fully_rewrite_on_flush will be set to true. (bharathbhushan)
/mod_pagespeed_message should display timestamp in local time, not GMT.
Fixes Issue 448.
- Issue:
when body.innerHTML is set in js, the old references to the dom
becomes invalid.
This change will evalute the current location every time. (ksimbili)
backward compatibility for kRecompressImages (poojatandon)
- Make sure all async scripts are executed before window.onload when
doing defer_js (pulkitg)
Stop emitting Warning: "Unicode value 0x0 is not valid for interchange" since we are now handling the error.
In preservation mode, allow ruleset to be passed through verbatim if there was an error parsing values.
Also added a lot more tests for non-UTF8 character encodings.
Fixes issue 426.
Update modpagespeed.com documentation links. (sligocki)
Expose UniLib::IsInterchangeValid(char32 codepoint) in UnicodeText to
help us remove spurious warnings from sites using \0 in CSS files as
an IE targeting hack. (sligocki)
Trigger property cache lookup after we have rewrite options. This
enables us to use rewrite options signature in the key and allows
cache invalidation when the options change. (guptaa)
appropriate versions of system libpng (just like
libpagespeed's does for its bundled version --- it'll
probably make sense at some point to only have one
libpng.gyp/libpng copy between us and libpagespeed).
Just use "</body>" to detect the end of critical HTML in Blink Critical
Line. Avoids "Marker not found" errors (and non-application of Blink)
when the HTML is even slightly malformed. (rmathew)
Now each CSS ValidateRewrite() should be of one of 4 types:
kExpectSuccess <- Successfully rewrote and changed URL.
kExpectNoChange <- Successfully rewrite, left URL alone.
kExpectFallback <- Parsing failed, fallback rewrite succeeded.
kExpectFailure <- Parsing failed, fallback rewrite failed (or was disabled).
to recognize Linux3 (and FreeBSD!) as platforms that want
make, but not new enough that it removes features used
in .gyp files of our dependencies. Note that this change
also removes the indirection of gyp version via Chrome's
DEPS since I think it would be too risky to touch base/
Move several CssFilter methods into CssFilter::Context to avoid the spegetti of calls going back and forth between the two classes.
Moved css_image_rewriter_async.cc -> css_image_rewriter.cc.
script syntax accordingly.
Note that in this change, mod_pagespeed will not work quite right yet because
it doesn't generally have accurate content-type when it runs.
- Add ability in instrumentation filter to report unload time. (satyanarayana)
- Adding option to allow blink for mobile devices. If allowed for a site, blink
will get applied to all mobile devices for that site (guptaa).
- Adding ability to use fixed desktop user agent for fetches in blink critical line in case of cache misses.
(rahulbansal)
- Fix the crashes on the loadtest in ProxyFetch. original_content_fetch_ is not guaranteed to exist once HandleDone()
is called, so NULL the pointer since we may call HandleWrite() after calling HandleDone(). (nikhilmadan)
result in the initial rule being treated as unparseable, which in the
case of @import resulted in the import not being absolutified and so
breaking the CSS when in proxy mode.
LATER: It turns out that the underlying problem is that CssMinify::
AbsolutifyUrls wasn't absolutifying URLs in unparseable *selectors*,
it was only handling declarations. Fixing that resulted in the @import
being absolutied but not minified, so I've left in the BOM stripping
and reinserting so that we ignore it (the BOM) for parsing purposes.
2. Fixed a bug where POST request parameter were getting lost when using
custom options using headers or query options.
MaxSegmentLength was misnamed MaxUrlSegmentSize in rewrite_option_names.gperf
IncreaseSpeedTracking was missing from rewrite_option_names.gperf
Deprecated Img directives actually no longer work.
Fixes issue 431.
number of requests are active for that domain. It first queues up
requests upto a limit, triggering them when possible, and then starts
dropping them on the floor. (nikhilmadan)
Fixed an issue with JsonFetch (pulkitg)
reflow detection tweaks (sriharis)
Added a DeleteProperty() function in PropertyCache to delete the
PropertyValue from PropertyValue map. (pulkitg)
filter (scan_filter.cc) with the value saved in the RewriteDriver so
that's available to other filters. scan_filter.cc was chosen because
it is guaranteed to run and runs first (currently at least).
2. Fixed the experiment code to instantiate a RewriteOptions on each request.
filters to explicitly specify both to Write, rather than giving it at
construction for only -some- spots and scattering SetType's to
cleanup. Make sure we get the content-type right for all filters (the
combiners got it wrong before) and charset for single-resource ones
(nothing got it right before). This does, however, require some
conservatism in length checking as we don't know extension length at
OutputResource construction time. This helps avoids extension
corruption in metadata cache when a request has a valid and known
extension. (This is in response to MPS issue 412, remembering issue
192.) (morlovich)
Rewrite domains of hyperlinks, but do not shard them. Fixes
http://code.google.com/p/modpagespeed/issues/detail?id=428 (jmarantz)
getBoundingClientRect() seems to calculate them incorrectly. (nikhilmadan)
- defer_js: Pass event to the event handlers. Now we try to pass only the load event to all
onload event handlers. (ksimbili)
- Export preScriptHeights_ in detect_reflow.js since this is needed in
detect_reflow_extension.js (sriharis)
- Adding unit tests for blink critical line. (rahulbansal)
pack things tighter, store quotes more efficiently, and avoid
one debugging field that's not really needed (as one can just
use a pointer instead).
On one gigantinormous HTML document of doom it reduces our parser memory
use (335.2MB -> 313.9MB, or about 6%), though less if you
consider fragmentation.
More mechanically, it makes HTMLElement go 96->80 bytes,
and HTMLElement::Attribute go 48 -> 40.
We want to rewrite introspective javascript in the ajax flow.
We add a boolean 'relocatable' to the metadata cache, and set it to false in
RewriteJavascript if we see introspective javascript. In Propagate we only
modify the urls for slots whose CachedResult has reclocateable=true.
jefftk:
Fixing bad choice of sed separator in staging_except_module
because we're likely to break it. Adds
ModPagespeedAvoidRenamingIntrospectiveJavascript (jefftk)
- Allow users to choose which custom variable slot to use for furious.
Adds ModPagespeedExperimentVariable (nforman)
- Deterministic Js Filter for measurements (mmohabey)
- Add naming info to cache objects for debugging/logging (nikhilmadan)
- Fix updating of implicit cache ttl for a resource already cached in our
caches and the default implicit cache ttl for that domain has been changed
meanwhile. (pradnya)
- Mechanism to have Option<> that do not affect RewriteOptions signature
computation. (sriharis)
- Remove some useless string copies.
- Some stats for blink critical line (guptaa)
- Handle no-JS redirector directly in blink critical rite to avoid
escaping (rahulbansal)
- Use user agent matcher from resource manager in proxy interface (ksimbili)
- Restore the AddBaseTag filter for some experimentation.
- Add ApplyPlatformSpecificConfiguration hook to RewriteDriverFactory (morlovich)
- defer_js: Set psa_not_processed attribute to node inserted through doc.write()
(ksimbili)
- defer_js: SetAttribute only if the node is an element and not nodes like text,
comment etc (ksimbili)
- Support per family non cacheable elements in blink flow critical line. (rahulbansal)
- Cleanup some dead (one use was a no-op, which was also killed) Set*Fetcher APIs
in RewriteDriver (morlovich)
image dimension parsing (while respecting browser behavior). Additional testing
and tweaking for insert_ga_filter. Changes to js_defer script to allow more
scripts to be deferred safely (this filter is still far from safe).
- Fix some missing calls to ComputeCaching (nikhilmadan)
- Blacklist IE8 for defer Js and blink (ksimbili)
- In Apache, only run the property cache on the L2,
as otherwise it would have difficulty tracking stability.
(nikhilmadan)
- Using HtmlDetector for blink (guptaa)
- Adding charset to the blink response. (guptaa)
- Adding more mutexing to property cache. (rahulbansal)
- Document security implications of some code (morlovich).
with it on output yet). I've also de-virtualized
Resource::ComputeContentType since it's not overriden
anywhere, and given the virtualness of SetType it makes
for a confusing subclass API.
Also tweak DetermineContentTypeAndCharset to set sane output
when failing, as that makes my initial attempt at using
in Resource not crash, which suggest it's a more robust
API.
valid/invalid values. This is intended to help fix
the problem of out-of-date data in metadata L1s
shadowing valid data in metadata L2s (and to eventually
replace the HTTP-specific solution for a similar issue).
Fixes an IE issue. InIE, if a script has textContent set before src is set,
then src is downloaded and executed before adding script node to DOM.
This results in wrong getElementsByTagName('script');
(sriharis) Two changes in detect_reflow.js:
1) Export the getReflowElementHeight function.
2) Expose a function that returns true when reflow detection is done.
ModPagespeeedDomainRewriteHyperlinks per popular demand.
Adding a LOG(INFO) while entering BlinkFlowCriticalLine temporarily
since this is super useful while debugging. (rahulbansal)
Use layout marker in blink critical line flow (rahulbansal)
2.2.x and 2.4.2+ versions of the module. Unfortunately
for the RPM the upgrade frequently can't add the new
loading sequence smoothly as the LoadModule is in a
likely-customized file..
Rename response_headers_ptr to mutable_response_headers per protobuf convention.
Make it return NULL when the response-headers have been flushed. Provide a new
const version response_headers() which is available forever. (jmarantz)
Flush document.write buffer as soon as getElementById is encountered.
- The change based on general observation. This doesn't fix all the
scenarios though.
- I have tested all the sites from popular sites, where getElementById was
returning null due to document.write. This fix works with all those.
- Also now, we maintain the state in which deferJs is currently in. (ksimbili)
- The change based on general observation. This doesn't fix all the scenarios though.
- I have tested all the sites from popular sites, where getElementById was returning null due to document.write. This fix works with all those.
- Also now, we maintain the state in which deferJs is currently in.
I think this is a relic of async-mode testing. But none of these tests were even reading the GetParam() any more so they were all running the same test twice.
and apache.gyp --- make the latter only
contain things that are common between the
module and unit tests, and rename it to
instaweb_apr.gyp accordingly. This is also
significant because it means things in mod_pagespeed.gyp
now are exactly the things that rely on httpd
headers (and hence need to be built separately
for Apache 2.2 and Apache 2.4)
Stripping out unprintable characters in panel_filter.
Add a maximum size limit for low res image generation in InlinePreviewImages.
Added a basic test to check that changing wildcard groups in rewrite_options changes the option signature.
HtmlElement::Attribute::DecodedValueOrNull() (jmarantz)
onload attribute inside html tag were not deferred so far. This
change fixes it by assuming only JS in all the onload attributes This
change also moves deferJs code to HEAD tag, so that onloads can be
added to deferJs queue. (ksimbili)
we do not try to access 'this' once we have released the mutex.
2. Include allow_resources_ and retain_comments_ in RewriteOptions
ComputeSignature.
3. onload attribute inside html tag were not deferred so far; fix that by
assuming that only JS appears in all the onload attributes. Also move
deferJs code to HEAD tag, so that onloads can be added to deferJs queue.
2. Insert DelayImages javascript at the end of the last modified image node
instead of end of body.
3. Start downloading high quality image just after all low quality images are
transferred to client.
4. Fixed bug with failing to absolutify URLs in unparseable CSS, or in
parseable CSS when image rewriting is disabled (that is, when all of
recompress_images, left_trim_urls, extend_cache_images, & sprite_images
are disabled).
added a new attribute pagespeed_no_defer for script node that should not be deferred
(ksimbili)
- Handling the scenario when BlinkCriticalLineData is present in the cache.
(rahulbansal)
work. Should not be an extra dangling object risk since
preceeding code to fetcher use touches the
RewriteOptions, which are owned by the RewriteDriver.
had a CacheFetcher set on it, which was used only by AJAX
rewrites, while the corresponding ResourceManager had
the serf fetcher. This moves that cached rewriter
management to ajax rewriter. This revealed that we were
doing testing of ajax rewrites w/o HTTP caching
of inputs at non-resource level, which did not match
reality. The ajax test is therefore extensively
regolded (in some cases making expectations actually
match the comments!).
was causing 404s for those browsers, particularly when fetching concatenated JS
and CSS files. We now decode these urls correctly.
Also, a bunch of fixit work to use push_back rather than append, since it is
(most importantly) more readable, and I am assured yields somewhat nicer code
with the GNU STL String library.
- Add UsageDataReporter to permit dashboard-like
reporting of error messages in principle (not currently hooked up) (sligocki)
- IWYU fixes + some cleanups in blink (rahulbansal)
- Complete integration of ClientState into ProxyFetch flow. (mdw)
- Some experiment integration in insert_ga_filter (nforman)
- Fix js_defer.js to not use .id but more robust getAttribute("id") (ksimbili)
- Move js_defer test files from mod_pagespeed_example to mod_pagespeed_test (guptaa)
- Fix inappropriate escaping of attributes in lazyloadimages (nikhilmadan)
- Refinments to readyState, doScroll, document.{open,close} handling js_defer (ksimbili)
- Fixes in js_defer for multiple body/html tags.
* Handle non-js script nodes inserted dynamically
* Factor out code from the meta tag filter for use by the css filter.
* HTML files for js_defer tests.
In lazyload_images, re-check all visible elements after onload since some elements may have been moved around by re-flows / DOM manipulations. Also, on detecting the jquery slider, load all images loaded lazily before and disable rewriting.
as 5 of these for just about every RewriteDriver, so if we make
a set on every construction it's actually hot enough to be
worth a percentage point or so on a profile in case where
we can't reuse the RewriteDriver.
This change results in a ~30% speed up in RewriteDriver
construction.
(nikhilmadan)
- Add support for conditional refreshening resources in the UrlInputResource flow.
(nikhilmadan)
- Get rid of UrlNamer::ConfigureCustomOptions which isn't used any more.
(nikhilmadan)
- Store fallback_http_value() in a class variable in UrlResourceFetchCallback.
(nikhilmadan)
- Add extra parmeter in the beacon to indicate if the request is originated from
an iframe (satyanarayana)
- Do a redirect inside <noscript> for blink. (ksimbili, rahulbansal)
- In the blink flow, disable all rewriters if there is a json cache miss since the
same fetched content is used to compute the json. (nikhilmadan)
background requests. (nikhilmadan)
Changed the constructor of ProxyFetch to accept RewriteDriver. New
RewriteDriver will be constructed in ProxyFetch if no RewriteDriver is
passed. This change is needed because blink flow requires
RewriteDriver to update the property cache and with current
implementation RewriteDriver is not present there. (nikhilmadan)
Some pending cleanup: removed dead code in BlinkFlow. (rahulbansal)
Adding option for retaining the input image color sampling for jpeg
images. (satyanarayana)
1. Basic plumbing for new BlinkCriticalLineFlow and BlinkFilter
2. Refactoring the old enable_blink flag and use it to guard the new flow. (rahulbansal)
Don't apply lazyload images on <input> tags since the onload event is
not fired for them (nikhilmadan)
Added tracking for total input resources size (for comparison with bytes saved) and number of uses of minified JS and standardized the meaning of blocks_minified_ (now a block does not have to actually get smaller, just be successfully minified).
Also added some testing for these stats.
cases where SharedAsyncFetch'es are chained (nikhilmadan)
- Fix invalid memory access of request_headers in HandleWrite
of fallback-value fetches since request_headers may already have
been deleted. Modified the test case and was able to reproduce the
problem without the fix. (nikhilmadan)
agents for blink. Now the the UA matcher is created in the factory
and is re-used in the drivers (rather than making a new one for each driver).
(sriharis)
- Remove FF blacklist ofr deferjs now the bug in its handling is fixed.
(rahulbansal)
- Correct the use of StartParse in BlinkFlow to check if the call succeeded
before doing anything further. (nikhilmadan)
it requires using -Duse_system_apache_dev=1 (and perhaps
-Dsystem_include_path_httpd) as our own headers are for
2.2.x.
Not 100% stable yet -- have a server crash on one
of our integration tests to chase down, still.
(And we will likely want to tweak our logging, since now
Apache includes some of the things we were adding ourselves)
Also remove the old/unused APR statistics class rather
than porting it.
Refactoring to serve javascript strings using javascript_url_manager.
Fixed Property cache lookup if url contains query params.
Added a support to use the property cache results even before the proxy_fetch is set.
Don't let CacheUrlAsyncFetcher store responses into the cache if HandleDone is called with false.
have an alarm cleaned up by Signal(), 'this' could get deleted from
under Scheduler::CondVarCallbackTimeout::RunAlarm if the user callback
relinquished the mutex.
(detected by asan)
Change the constructor of ProxyFetch to accept RewriteDriver.
Blacklist DeferJavascript on firefox again.
Per-domain browser blacklist pattern for blink.
Add check on url blacklist in Blink flow.
- Don't warn about options we handle inside mod_pagespeed,
and not with generic option handling
- Proper error message when the value is incorrect
- Fix confusion surrounding MessageBufferSize -- the code
was setting an inactive option in ApacheConfig that didn't do
anything, rather than the proper ApacheRewriteDriverFactory one.
To reduce confusion, remove the still-unused ApacheConfig one.
waiting_alarms_to_dispatch list got deleted in a timeout
from a different thread, by giving the responsibility to
delete the object to ::Signal in that scenario.
to eliminate dangling reference problems where the original text
is no longer valid. This is comparable to current behavior with
parseable text, which is copied into UnicodeText objects.
which, it turns out, doesn't add any value for any of our existing
testcases.
When I had added the tag-bag I added a functional testcase, but that
testcase was solved in a better way since then, rendering the tag-bag
obsolete.
The code was sort of a mixure of allowing and not allowing NULL statistics or variables such that if NULL was actually ever passed in for statistics it would cause spurious failures in unexpected places, now it clearly fails immediately.
Etag headers to update the outgoing request headers, so that we can
potentially get a quick 304 from the backend and avoid having to
download the entire response from the origin backend again. (nikhilmadan)
Add client-id infrastructure into meta_data constants & proxy_fetch. (piatek)
Using the shiny new SetOptionFromName mechanism to set options in
mod_instaweb.cc. (sriharis)
- Undo accidental reverts of OpenCV and furious changes.
- Fix all rewrites running on main rewrite thread (and not
low-priority one) when fetching, which could cause latency spikes
on expensive external rewrites and cause deadlines not take effect.
- Implement load shedding in QueuedWorkerPool (at level of sequences).
this is useful after the above, as otherwise it's easy to get us to make
an ever-growing queue. Use this in RewriteContext.
This is supported in PSA core now, but not yet hooked in in mod_pagespeed.
- Use ASSERT_DEATH_IF_SUPPORTED and not ASSERT_DEATH since
googletest doesn't support death tests on FreeBSD.
- Don't build the dependency-check binary if we're building
with external ICU, as the check purposefully hardcodes .a
paths.
- Rename Resource::ContentsValid() -> HttpStatusOk() (sligocki)
- Allow lazyload images filter to load images on onload instead of onscroll which is configurable via RewriteOptions
(nikhilmadan)
- Make lazyload_images_filter less CPU intensive by re-using the position of previous elements in the page, and don't
lazyload already inlined images. (nikhilmadan)
- Insert low res images at the end of body tag only if defer js and lazyload
filters are enabled because low res images will be blocked by js and other
images which are below the fold. If any one of the LazyLoadFilter of
DeferJavascriptFilter is off, place low res images in the image tag itself.
(pulkitg)
- Adding ability to serve blink.js via gstatic. (guptaa)
- Add CanFetchFallbackToOriginal() implemented in terms of new
virtual OptimizationOnly() to centralize logic on
whether we can discard rewrites (e.g. for load-shedding), and
to make it possible to distinguish non-optimization filters
for which we do not want to do this (morlovich)
- Make the cached_resource_fetches stat update properly again (morlovich)
in ProxyInterfaceTest.
Note in particular that the property cache fetch starts as
soon as we see the URL, but we don't block our flow waiting
for it until it's time to start processing HTML events in
filters. So any RewriteOptions lookup & HTTP fetch for content
happens in parallel with the cache.
calling OutputResource::UrlEvenIfHashNotSet in normal rewrite thread
and a filter calling OutputResource::BeginWrite via ResourceManager::Write
in a low-priority rewrite thread by removing some logging.
This is hard to trigger now, but will be more common once fetches
use low-priority rewrite thread.
Fixed a bug where not finding a LeafPostion would send a negative number to StringPiece's length argument.
Rework them to instead return NULL if the URL is invalid or the method doesn't make sense for this URL type (Ex: PathSansQuery() for data URL).
Also reworked DCHECKs to LOG(DFATAL) so that we log ERRORs if we hit this in production (should never happen).
merely causes resource dumps to be saved but does not make them loadable
from disk. This way we do not have to worry about it affecting behavior,
while the command-line tests can still snoop around on the output.
- The timestamp argument to ResourceManager::Write (which had
no effect any more but people still took effort to implement)
- Proto fields that are no longer used.
- The old sync implementation of RewriteSingleResourceFilter --- it's
now basically a stub class.
- Use RE2 in blink templates (gagansingh)
- insert_ga examples (nforman)
- Cleanup tests to not rely on store_outputs_in_file_system (morlovich)
- Avoid doing full PNG optimization when we just want to read in
a gif for spriting. This can be a huge gain in some pathological
cases (morlovich)
render response and extract above the fold images information which will be
used by ajax rewrite context callback. (pulkitg)
Avoid caching privately-cacheable resources (morlovich)
based on a .conf file entry. Regardless of that conf file
setting, resolve a bug where if the origin serves us gzipped
content when we didn't ask for it, inflate it as needed. (jmarantz)
Fixing innerHTML with script tag bug in IE with a Hack which seem to
work well This solution is well explained in
http://allofetechnical.wordpress.com/2010/05/21/ies-innerhtml-method-with-script-and-style-tags/
(ksimbili)
In the ajax flow, set the Date header to now when serving rewritten
resources so that it looks cleaner. Previously, we were setting the
date header of the response to the time of fetch of the rewritten
resource. So, if you consider a resource with cache ttl 300 seconds,
which is rewritten and put into cache one day ago was served with a
Date set to 1 day ago and a Cache-Control max-age of 1 day + 300
seconds. (nikhilmadan)
Call SetTransformToLowRes when resize mobile low quality images.
Error out in case of larger mobile delay images. (bolian)
Call SetTransformToLowRes when resize mobile low quality images. (bolian)
Add support for serving stale responses in the rewrite context flow if the fetch fails.
It took me ages to figure out that I need to set set_store_outputs_in_file_system(false). (nikhilmadan)
empty or non-standard URLs in there. Rather, leave them as-is and
;eave it up to the browser to handle. In particular, a data URL
was causing the entire process to abort.
uncacheable panels. (gagansingh)
Added few DelayImageFilter flags in RewriteOptions and rewrite_gflags.cc (pulkitg)
Add missing 'check' calls to make sure we don't add X-Extra-Header twice. (jmarantz)
Rework RewriteOptions::Merge to merge src into this, rather than one,two into
this.
This resolves a potential performance issue because of the way wildcards
are combined. Because our basic mechanism for merging RewriteOptions
was dst=(one+two), we defined Clone in terms of CopyFrom, and CopyFrom
via Merge(one,one). When 'one' has wildcards, they get doubled in
size.
The reason we used dst=one+two in the first place is that matches the .htaccess
semantics in Apache. But it's easy to define that in terms of dst+=one,
dst+=two, so that's what this CL does, simplifying the Merge function.
b) Eliminate a few checks/dchecks and a spammy log line [pradnya]
c) Log blink requests [rahulbansal]
d) Tidy up some of the blink flow [rahulbansal]
* Don't store layout in cache if status code is not kOk
* Change the order of sending cookies so that the ouput is consistent
accept-encoding:gzip if it's not already there, and inflate
content as its being streamed. (jmarantz)
Converted delay images javascript into js modules. (pulkitg)
Supporting gif and webp formats in ajax flow. (satyanarayana)
Cleaning up the output state. output_contents_ doesn't have any sense
if output_valid_ is set to false. Also All image optimizer assume
that output buffer is empty. Since we are not clearing the
output_contents_ server is failing in image_converter
DCHECK(out->empty()). (satyanarayana)
a query.
b) When checking if an URL can be turned into an output resource by a
proxy rewriter, also check that the decoded base url will be valid
since that's used when processing/regenerating the resource and if
it's invalid these will fail (or worse, crash). This can happen when
we are given a corrupted URL. After this change we will simply
forward the request to the back-end server.
rewriting doesn't succeed, in FetchTryFallback we set the cache ttl to
5 minutes instead of leaving it as the original. (nikhilmadan)
Strip Last-Modified in ProxyFetch if we are rewriting
html. Conditional validation via If-Modified-Since is wrong since we
are rewriting the html. (nikhilmadan)
gunzip content for users that wanted cleartext but the origin gave
us gzip. (jmarantz)
Add support to return stale values from cache in case a fetch fails in
the proxy flow. I'm hoping to incorporate this into the rewrite
context flow as well, but I'll do that later. (nikhilmadan)
filling in the value from our table if any. Consolidate our two
similar code-paths that were parsing this line.
Remove now-superfluous include of <cstdio>
Better DCHECK-fu for AsyncFetch::HeadersComplete, which should do a live
check and LOG(DFATAL) for double-calling it.
Remove DCHECK for non-success status code in AsyncFetch::Done because this make
it impossible to handle a timeout that happens during streaming.
so we don't end up failing MPS URLs originated from an incompatible
version on a site we're proxying. (morlovich)
add "cookie2" http headers constant (morlovich)
Parse HTTP header first-lines where the reason-phrase is missing,
filling in the value from our table if any. Consolidate our two
similar code-paths that were parsing this line. (jmarantz)
Provide an API to prepend to pre_render_filters_.
b) Fix bom stripping not happening on fetch. The problem was that
this was accessing resources() in WritePiece, and we don't
actually set those up in async fetch flow. (ResourceCombiner
likely need heavy refactoring at this point). So instead of relying
on it, just pass in the index of the fragment to WritePiece.
c) Address regression of rewriting of multiply rewritten resources whereby
we would to fetch rewritten resources from the ORIGINAL domain not the
encoded domain, which wouldn't work when pagespeed then tried to fetch them
rather than rewrite them on-the-fly like it ought to do.
in the annotation text. Remove the html & newline tags. Add tests for
more cases, even ones that are not currently working like they should; which
are marked with TODOs.
This avoids creating jobs with 0 inputs.
This also incorporpate a small refactor of matterbury's that makes generic
AddNestedContext work properly in css_filter.
Add NullSharedMem, to make it easier to build on non-Linux, and
fix a bunch of fallback behavior bugs that are exposed with it:
- Check whether SharedCircularBuffer init succeeded before setting
it on ApacheMessageHandler. Also make CreateSegment failure fail
SharedCirculerBuffer init.
- Change how we handle fallback in SharedMemHistogram. We can't set
the mutex to NULL to signal it, as a bunch of Histogram methods
use it. So instead set it to a NullMutex, set buffer_ to NULL, and
make sure we check the buffer_ everywehre. (morlovich)
b) issue 356: Early detection of an unfetchable input resource to
short-circuit wasteful rewriting attempt when we cannot fetch it.
In particular, the serf fetcher cannot fetch https so give up early.
in Apache (as there is no cache set on the factory); and limit
the scope of that operation, to disallow writes of failures but
still let successes through. This is important to at least make
/some/ progress on ITK, which likes to kill processes a lot
(as often as every single request on smoke tests.)
b) Refactor:
Added second version of MapOrigin that takes a GoogleUrl to avoid
redundant conversions of GoogleUrl -> StringPiece -> GoogleUrl.
Also eliminated unnecessary initialization of return argument of
MapOrigin since that guarantees to do this if it returns true.
(since pagespeed library is now fixed everywhere). Simplify handling of
script body elements, since there should now be at most one
HtmlCharactersNode in any rewritable script body. Get rid of an
often-huge string copy after minification of an inline script by
swap()-ing the rewritten string into the DOM. (jmaessen)
single-resource filters & non-data-URLs. Only data URLs &
combinations get hashed. The hashes will then be divided into 64
buckets using the 1st characters to name a subdirectory, just in case
someone has large numbers of combined or minified files.
a close-tr should never close a table. There are countless other forced
hierarchies that this CL does not yet address.
Avoid emitting auto-generated close tags for elements that do not
require them, such as <p> and <li>.
Retain the characters from misplaced close-tags.
This allows us to parse and re-serialize a ton of unparsable CSS from Alexa-100, see examples in css_filter_test.cc.
Specifically, it reduced the number of unparseable CSS files in Alexa-100 from 53 to 25 (out of 177 total).
Add "explicit_close_tags" as a query-param-enablable filter for debug usage. (jmarantz)
Add "sprite_images" to the list of filters for automated load testing (jmarantz)
will auto-close the <font> when we see the </i>. When we
later see the </font>, we should squelch it because we are
already in "debt" one "</font>". Keep track of this debt in a
Bag (map<string,int>).
- document.write() improvements in deferjs filter (atulvasu)
- Fix crash on in-place rewriting of URL in blacklist in generic
proxy (nikhilmadan)
- Fix crash in generic ProxyInterface when having invalid
ModPagespeed params (bharathbhushan)
- Factor out the "is it really HTML?" sanity-checker into
a separate helper class in automatic/
Remove special handling for <tag<foo and <tag attr<foo --- it bypasses
the literal mode handling for <script>, and doesn't match what
browsers do anyway. (morlovich)
doc tweaks for SplitStringPieceToVector (jmaessen)
whitespace, all adjacent line-sep tokens will be delivered in a single
StringPiece. Unlike whitespace, line-separators can serve role in
evaluating the lexical stream against the language grammar.
- Added overloaded version of CssMinify::Stylesheet that takes an URL
for absolutifying relative URLs in the CSS being minified (matterbury).
- APIs for extracting charsets from headers (matterbury)
- Don't enable a dangeous transformation that's not usable
standalone under AllFilters.
- Fix threading use when handling failures in ProxyFetch.
is not too happy about that idiom.
Add a warning in the Makefile near the gperf files that anyone who adds
one should send a message to get other developers to run 'make svn_update'.
Add some relative URL tests for domain_rewrite_filter.
Filter to disable scripts by converting input html:
<script src="1.js">var a = 1...</script>
to:
<noscript disabled="true"><script src="1.js">var a = 1...</script></noscript>
(gagansingh)
Add parse error reporting in a few cases. (sligocki)
JS Embedding in PSA. Attempt to import defer_js code in this fashion. (atulvasu)
Cleanup rewriter_gflags.cc a bit. 1) Error message was doubled. 2) We
were setting rewrite_level and rewriters. (sligocki)
options signature, reducing the dependence on options-version.
Change the management/inheritance of those two-letter codes in
RewriteFilter and its derived classes to avoid duplicating those
string in each class.
use by the new @import flattener. (matterbury)
"Fix" URLs whose paths start with double-slashes, which was seen on
the web and its cleaner if we canonicalize.
Image doesn't need to decode a file to get its format and dimensions;
so we can delay actual decompression until we actually need it.
As a bonus, this also makes us stop believing content-type's.
- Various test filters for more complex Url namers (matterbury)
- Optimize data: URL creation a bit, also relevant for rewriting of
style attributes (morlovich)
- Support offloading partioning to a different thread, and do it
for the spriter, to avoid potential latency spikes. (morlovich)
with the original served with shortened TTL and the rewrite
going on in background when not met.
- Fix passing of RewriteOptions to AddPlatformSpecificRewritePasses (gagansingh)
- Add API hooks to disable standard HTML serialization output
to permit filters that construct it fully (gagansingh)
Stop forcing all resources we serve to be no-store. This is breaking back button cacheability.
We leave no-store in the Cache-Control if it was originally there for security reasons (the whole point of no-store is to protect sensitive information from being saved on disk), but if it wasn't in the original, we don't add it.
An interesting question is whether we should set a separate inlining threshold for css vs html, and if so how we make sure to do the right thing for inline CSS (<style> tags). (jmaessen)
a +-encoded set of URLs for combinations. Single-URL variant entry-points
are also exposed.
Use the native encoder for each filter in the test infrastructure.
Also, recovers from an unrelated SVN hiccup.
in a candidate combination and the one in combination_ being different
by killing combination_ and making the multiple candidates all share the
same library.
is not usable to safely add alarms that can be cancelled
cannot recognize the window between
when the adapter got invoked as an alarm callback and
when the user-supplied function ran --- so it cannot cancel
things safely during that window (as it's already too late to
call CancelAlarm as the alarm object got deleted).
So instead add a QueuedAlarm helper, and use that for
ProxyFetch's idle input detection.
Make idle flush time configurable. (morlovich)
Inject fetches at proxy_fetch level when the input connection
is idle for 10ms. This is in particular helpful for getting the first
congestion window worth of content out quickly when it doesn't
have enough content for the flush_html_filter to suggest it's
worth flusing ASAP. (morlovich)
hook it up in CacheUrlAsyncFetcher. Unfortunately not doable
cleanly for UrlInputResource until the sync path is dead. (morlovich)
Eliminated all hard-wired absolute paths in tests, changing them to
calls to Encode(), in preparation for a change to allow testing of
non-standard UrlNamer encoding/decoding. (matterbury)
Remove redundant UpdateDateHeader, using SetTimeHeader (equivalent) instead.
Simplify IsDateLaterThan to just use fetch_time_ms() (depends on ComputeCaching() called first).
safe by checking the actual type (matterbury)
Take advantage of the metadata cache in the fetch reconstruct path
to be able to serve from cache when the request is given with the
wrong hash or the rewrite failed. Provide a hook ---
StartFetchReconstruction() which should hopefully enable this
code to be usable for AJAX rewrites as well. (morlovich)
metadata produced on the fetch path:
- CSS combiner crash due to trying to access empty
partition produced by it.
- Fetch path produced inconsistent partly filled-in input
tables if an input failed to fetch.
one base that can be reused, and defensively make the validity
method pure virtual, so that people who want to add more sites
that will access the cache (e.g. me) do not accidentally end
up missing it --- and with refactor have an easier way of getting
it right.
Also add the rewrite_options() override to FileInputResource;
it was not used with it for rather subtle reasons (which are in
fact no longer true past-refactor); add a unit test to exercise that.
- Start returning 304s in generic Page Speed Automatic proxy flow (not Apache one)
if the cached response has a Last-Modified / Etag header which matches the one in
the request. (nikhilmadan)
- Write metadata cache on reconstruction fetches as well.
getting lost when serving slurps via the AsyncFetch interface, as they were
happening after the first write, and hence after HeadersComplete().
This is done by buffering writes, since headers like content-length and gzip-content-length
simply cannot be computed until all bytes from the slurp have been produced.
Goal: Start tests at non-0 time.
This should make tests more robust. It came up because I was comparing times in sec to times in ms. But they were turning out to be the same because both were 0!
not just a mocktimer, at cost of requiring a redundant
mock timer setting in places that need auto-advance. (morlovich)
Add mutex protection to the locking methods in MemFileSystem. (morlovich)
Propagate globally set rewrite options as default
settings. The merge order is:
global options from command-line / defaults.
domain-specific options
query-param options
RequestHeader options
Generally the later entries in this sequence win.
Start porting domain rewriting in CSS to filters other than
cache_extender (css_* filters). Just break out the transformer and
put it in CssTagScanner for now. (sligocki)
to it, to make it always reflect reality. Doing so exposed that
ResourceManager::LockForCreation in blocking mode lied and claimed that it
grabbed the lock when it actually timed out. So remove that bool return
and corresponding CHECK, and make this behavior explicit.
Add GoogleUrl::Reset(base_url, relative_url) methods.
Append a hash of the image files in a sprite to the cache key instead of the names themselves in order to keep the cache key down to a reasonable size. (nforman)
Make merge_libraries.sh work with thin libraries (used by newer gyp
versions) by using 'ar p' instead of 'ar x' (which doesn't work with
thin libs). (morlovich)
Remove redundant string::c_str() calls, in contexts which
expect a string or StringPiece. (qrczak)
Remove an assert that fired while debugging Apache -- it appears that
we can get an unitiated ResourceManager pool-initiated destruction after
the RM has already been initiated, or, perhaps, after the Factory has
been destroyed. This occured with an "apachectl stop" so I think the
latter.
which was in two places.
Fix races between callback deletion and check of self-destruct
bit in Function::Can{Run,Cancel}. These were causing heavy crashing
on WIP async flush + scheduler thread setup. (morlovich)
ScheduleThread class. This will be required once we don't block, as
nothing will be able to advance scheduler timeouts otherwise. (morlovich)
Use gperf table for rewrite_options filter names, eliminating the need
to construct a bunch of string-sets when constructing RewriteOptions.
factory once all resource managers get cleaned up on config-check ---
otherwise we end up with a factory with dangling statistics and pointers
to freed pools (on Ubuntu 10 and other system, the factory gets
destroyed as well as the .so is fully unloaded).
ResourceManagers allowed in that factory. This simplifies the
lifetime of many objects, particularly relating to shared memory
and statistics.
Add Apache-specific derivation of ResourceManager.
Add Apache Cache object, which owns the caching & locking mechanisms.
We will instantiate a new ApacheCache for each distinct file_cache_path.
Some mechanism will be created to ensure compatibility of other
configuration parameters that the cache is sensitive to, to make sure
they are consistent between multiple file-cache paths.
Fix merge array-indexing problem with a heterogenous merge. The plan
is to eliminate this altogether by avoiding heterogenous merges. But for
now fix it locally.
be responsible for making MockScheduler (and not ResourceManagerTestBase).
Note that this does introduce some contention issues in threaded servers;
but is a neccessary step as something has to run the scheduler...
from this one.
Make lazy-starting of worker-thread even lazier, exploiting the recent
update to Worker to make Start() behave correctly if already started.
The cost is an extra mutex lock/unlock every time we want to start
cleaning cache. The benefit is simplifying the startup sequence.
Strip out unused functionality from timer_based_abstract_lock, and add
in callback-based functionality instead. Write and test a prototype
replacement for timer_based_abstract_lock called
scheduler_based_abstract_lock. Future revisions will transition uses
of timer_based_abstract_lock across to scheduler_based_abstract_lock,
and strip out the uses of the blocking calls. I've run into enough
subtle hitches along the way that I'm trying to turn out small working
CLs. (jmaessen)
For example:
Content-Encoding: gzip, even-better-compression
Vary: Accept-Encoding, User-Agent
RemoveAll("Content-Encoding");
Vary: Accept-Encoding
Vary: User-Agent
While I was at it, I added an optimization that the map_ doesn't need to be
NULLed after Remove*-ing.
Make Scheduler::BlockingTimedWait responsive to other unrelated scheduler actions, in line with Scheduler::AwaitWakeupUntilUs. This was arguably a bug.
Decouple SchedulerBlockingFunction behavior from the scheduler mutex (except for the actual blocking waiting for something to happen part, which necessarily enters the scheduler mutex). (jmaessen)
incorrectly by factoring out URL decoding + filter code checking
code from DecodeOutputResource so it can be shared by
IsPagespeedResource, and do some cleanups as fallout: Change
DecodeOutputResource to take a GoogleUrl and use IsPagespeedResource
instead of DecodeOutputResource where appropriate. (morlovich)
- Split rewrite processing into two threads, so that
expensive/time-consuming parts can be more easily cancelled
for shutdown or (potentially) load-shedding, and to reduce
convoy effects.
- Port RewriteContextTest.CombinationRewriteWithDelay to use
scheduler alarms and not timer alarms;
GC RewriteDriver::BlockingTimedWait
relied on this particular misbehavior. Basically, we should block the
requested time if there are no outstanding alarms, and be woken up
when an alarm arrives. Also fixed mistreatment of quiescence in
mock_scheduler.cc that was exposed by the other bug fixes. Finally,
re-factor blocking class from rewrite_driver into scheduler as we need
to use it in various other places (including in tests). (jmaessen)
Declare independence from the tyranny of Apache pool destruction semantics by
passing NULL for the pool into the Serf constructor.
be implemented asynchronously internally, and use Scheduler to block
where needed.
Note that this required severe surgery to mock time to make it
far more deterministic, which in turn required making us use the
same deadline under Valgrind. Also adjusted the ProcessAlarms interface
slightly to prevent a race with checking of done flag.
as long as it's the same Statistics* object.
mod_pagespeed_create_server_config gets called only once with an https
configuration, so only one Factory is created.
But by the time pagespeed_post_config is called, there are two server_rec*
objects in the chain, and they both point to the same factory. Another
possibly remedy is to make a temp set<ApacheRewriteDriverFactory*> to
avoid initializing the same factory twice. But this fix is a little easier.
Split off global statistics into rewrite_stats.cc -- previously they
were embedded in ResourceManager.
Factor out construction of a ResourceManager object into
RewriteDriver::CreateResourceManager() in anticipation of having more
than one per RewriteDriver. At the moment, though, we still support
only one per driver.
- Some infrastructure for cachability of HTML; not active in mod_pagespeed
yet (sligocki)
- Parsing of some more complex HTTP methods (atulvasu)
- Some small refactorings (jmaessen)
the unit and absoluteness; the later of which revealed that
MockScheduler was parent-calling with relative and not absolute
values. Fixing that made tests slower, so I tweaked the sleep
time down to cut it back down again.
It is incorrect to assume that the work_queue_ is empty at
this point. It's possible for a sequence to get some
jobs queued but not get a chance to run at the time
we get to shutdown. In that case, active_ = false, so
we skip the loop and hit the check with a non-empty queue.
specify all the template syntax.
Have Function enforce exactly one of Run or Cancel being called.
Move the decision about whether to delete the Function* object after
calling to the Function object itself, so anyone can decide whether
they want theirs to be auto-deleted or not.
locking convention internally for Alarms and externally for the Functions that
we pass in as alarm callbacks, which made the locking much simpler overall. So
overall a plus I think. (morlovich)
rather than be assigned a QueuedWorker.
This required allowing for a 2-phase shutdown process in QueuedWorkerPool.
In Phase 1 all threads are quiesced and the system goes into shutdown mode.
In Phase 2 the data structures are deleted.
seg-fault 5% of the time. We need to avoid referencing locals in
NextFunction() after signalling that seqeunce shutdown is OK.
Keep track of whether shutdown is currently occurring and thus we should
not initiate handling of any new requests.
scheduler. We use the internal condition variable solely to orchestrate
deadline handling. This also migrates the mock_scheduler to the new framework.
This is the minimal drop-in replacement for the old scheduler. Still TODO:
tests for the new functionality. This is likely to reveal problems. In
particular, I think I will need to change the locking assumptions for the
TimedWait, AddAlarm, and AddAlarmFunction callbacks to require that they hold
the scheduler lock for their entire execution (they'll be able to call scheduler
methods from within the callback as well). (jmaessen)
Fix ResponseHeaders.ParseFirstLine to accept more than one word as part of the
reason phrase. Previously, a 301 Moved Permanently shows up as 301 Moved; 404
Not Found shows up as 404 Not etc. (nikhilmadan)
Added a new field cache - invalidate timestamp in rewrite options. (pulkitg)
with those without them by making them both go through ReleaseRewriteDriver;
this way they'll all be available in active_rewrite_drivers_ during
shutdown. This is also nice because at least one location was incorrectly
using ReleaseRewriteDriver on something that could have custom options;
luckily that code path was impossible to hit... (morlovich)
Make IsValidAndCacheable() const (possible now since
ResponseHeaders::VaryCacheable() no longer tries to ComputeCaching()). (morlovich)
Add support for keeping multiple workers in the resource manager.
This is dependent on cl/23240808 to aggregate the multiple workers into a single
Waveform. Another option that would require more work is to allow a waveform per
worker.
The big change here was that I had to change the allocation-flow in unit tests
for rewrite drivers so they could get worker assignments at an appropriate time.
This had a bit of fallout but all of it was in tests. (jmarantz)
minor logic errors spotted while trying to testcase it.
(Also, there is no need to call both .Clear() and .Reset() on ResourceCombiners
as Reset calls Clear --- see the comments...) (morlovich)
The filter ordering mistake was that we had the html-writer filter
before the js_inline filter.
The bug was that we were eliminating whitespace in a script tag when
the script-tag was in fact not rewritable due to a flush between the
opening and closing element tags.
entry-point ExecuteFlushIfRequested which must be called
externally.
This corrects a problem I had potentially introduced where
SerfUrlAsyncFetcher would -- due to an internal buffer limit in
serf -- call ParseText on 8k buffers repeatedly until EWOULDBLOCK,
and then call Flush.
So this change calls HtmlParse::ExecuteFlushIfRequested
from our Fetcher-flush rather than calling it directly from ParseText.
This ensures that we don't excessively flush when there are pending
bytes.
line numbers. This was because they'd call ::InfoHere, which depends
on the parser's position, from the rewrite thread.
Instead, add a RewriteDriver::InfoAt(RewriteContext, char*, ...),
implemented in terms of a new ResourceSlot::LocationString, which
prints location information for the input slots.
before data. It was previously calling the fetched_content_writer
prior to having the headers established, which is bad for streaming
HTML -- we want to know that it's html before we start processing it.
add a unit test for that. (rewriter_google_test was covering that,
but it's only in google3 flow, and its output is harder to read) (morlovich)
Keep track of dependencies on nested rewrites
for figuring out cache validity. Generate additional
dependencies to force input re-check if I/O fails. (morlovich)
I introduced fixing Vary: handling for async -- we don't want to
touch the headers if we might not even have a valid response.
While at it, fix inconsistency in behavior of IsVaryCacheable, which
could call ComputeCaching unlike every other method (IsCacheable in
particular); and adjust a test what was missing a ComputeCaching call
after manual header manipulation. (morlovich)
the "delete everything" signal handler run in non-main threads.
This is done by providing an ApacheThreadSystem that does some
signal masking for work threads (and doing that manually for
serf thread for now). (morlovich)
Add some re-usable DelayedFunction* template classes to bind context (0
to 3 objects) to a closure. This eliminates a whole suite of
cumbersome one-time-use classes.
In the future we can investigate doing this for Fetch callbacks and
Cache callbacks as well, but that would involve adding some templating
for Run arguments.
Scheduler with MockScheduler.
Note that this change should be non-functional -- it's just a
more intuitive (I hope) way to organize the mock time
abstraction.
Disable protocol trimming, since it causes IE8 to double-fetch all urls.
Fixes (or at least papers over)
http://code.google.com/p/modpagespeed/issues/detail?id=314. (jmaessen)
ResourceCombiner's UseAsyncFlow, which could lead to check
failures in obscure circumstances due to picking up old-style
cached results[1]. Since ResourceCombiner has a filter pointer,
just make it use the filter's asynchronous bit instead, to
avoid the problem. Added a testcase, but it's rather brittle,
unfortunately.
[1] With ResourceCombiner thinking we're sync, it was picking up
a cached result Combine, which meant a call to it
from Partition wasn't actually writing anything out, so
::Rewrite would try to do work --- but the code in THAT is
only prepared to handle single-partition things for fetches,
not multiple-partition ones. (morlovich)
Fix RewriteContextTest.CombinationRewriteWithDelay to not fail under
valgrind by explicitly setting the deadline in the RewriteContextTest
SetUp() method. (jmarantz)
It's not safe to access ::CanRewrite from ::Partition as at this point
we may already be detached; so do the checking in Render()
(which might mean we may end up throwing out our work in some unfortunate
circumstances) (morlovich)
Clean file cache in a background thread. Added Worker::IsBusy to
help test it.
Also adds a RefCountedOwner to help share SlowWorker between
ApacheRewriteDriverFactories. (morlovich)
Fixing TestCricular() in SharedCricularBufferTestBase, with multiple processes. (fangfei)
Add missing cleanup in this test (so we don't get valgrind
warnings about potential leaks in checkin tests) and adjust
comment to match reality --- this isn't really using multiple
processes at all. (morlovich)
service and mod_pagespeed. (pradnya)
Detect actual webp-capable user agents. Test handling of webp user-agents, and also do better unit testing of webp rewriting. (jmaessen)
Adding AllExceptQuery() method to GoogleUrl
Also adding AllAfterQuery, CopyAndAddQueryParam. Finally, added
unimplemented static methods. (jhoch)
Group SupportsImageInlining and NotSupportImageInlining tests. (fangfei)
and it looks like the image might possibly be convertible to webp judging solely
from its url, we request conversion to webp. This yields a resource_context
(and hence a rewritten url) distinct from any rewritten url that can be seen by
a non-webp-capable browser. This is true even if, ultimately, we actually serve
a non-webp image. Thus with the flag set we will attempt to rewrite every jpeg
twice, once for webp-capable browsers and once for non-webp-capable browsers.
That's mostly OK, but we pay the image resize cost twice for resized jpegs
rather than resizing once and recompressing the resulting bitmap using both
compression methods.
Note that the user agent matcher doesn't include webp-capable user agents yet.
That change is on its way.
http://code.google.com/p/modpagespeed/issues/detail?id=143). This is now the
recommended technique for shrinking images on the OpenCV Wiki. (The default,
bilinear interpolation, is recommended for *enlarging* images.) Note that the
results look blurrier when magnified so that you can see the pixels, but are
less subject to moire effects for image details like hair, and slanted edges
look less jaggy. As another plus, they compress slightly better.
by treating is largely as a data: URL, but skipping the
caching steps. Somewhat inefficient, however. (morlovich)
Remove these explicit Render-on-Flush calls.
Doesn't look like we'll be trying mixed async + sync
deployment, and they're making the testing less realistic. (morlovich)
Fix a crash in image rewriter in async flow when Rendering
when Partition() failed due to missing input. Add unit tests
for this in various filters. (Was covered by an integration test before). (morlovich)
to incomplete types in the header. (For some reason clangs warns
about it under iwyu, but no under normal compilation). (morlovich)
Fix cached rewrite of inliner output CHECK'ing. (morlovich)
Changed the time-advancement paradigm to request time-advancement from
the main thread, but do so by queuing a task to advance it. This guarantees
that the outstanding tasks will run first.
rewrite fails during fetch. (morlovich)
Add comment on alternative mechanism for helping maintain image-search rank,
and outline the implementation of that method.
Previously FileInputResources were being arbitrarily cached for 5 minutes.
Now the files are statted each time we consider an OutputPartition. If any file has changed mtime, the OutputPartition is invalidated.
Fixes all XFAIL tests in rewrite_context_test.cc except ones for nested resources. Those are not being dealt with yet by our OutputPartition caching scheme. CLs to come.
This probably only affects the async flow :/
This is currently limited to external CSS only (not inline), and
might only rewrite a single instances of same URL when cold. (morlovich)
Add some additional tests for various rewrites inside CSS.
Fix the external CSS cache extender testcase to not clear the fetcher;
doing that made it rely on caching done by the inline portion to operate,
making it hard to diagnose failures independently. (morlovich)
Fix for hanging in chained + nested case --- we need to propagate
nested rewrites in order to start successors, so we can't
wait for all nested rewrites to finish to start propagating them,
as some nested rewrites may be successors. (morlovich)
Tweak the code a bit to help avoid spurious tsan warnings. (morlovich)
Rewire how we handle rewrites in asynchronous case so that we have
a place to hook in the starting of nested jobs and can run after them.
Nest step will be to change image_rewriter_ in Context to be a
CssImageRewriterAsync, invoking it from Context::RewriteImages,
and having it feed rewrite contexts from other filters to
Context::RegisterNested. (morlovich)
Do more careful error-checking of libjpeg results. Most of these shouldn't fail according to the docs (they should only do that if we temporarily stop providing input, which we don't do), but pagespeed was seeing related failures and there's no actual harm in coding this more conservatively. (jmaessen)
use in single-threaded tests, which is most of them. A
MockTimer will be constructed with one of those but a proper
mutex can be established after construction.
Move the ownership of the MockTimer out of the MemFileSystem (where
IMO it never belonged in the first place).
Add Worker::ShutDown and use it to avoid idle callbacks potentially
referring to already deleted RewriteDrivers (likely cause of the
sporadic tap failure on image test (likely cause of the
sporadic tap failure on image test) (morlovich)
Add Worker::ShutDown and use it to avoid idle callbacks potentially
referring to already deleted RewriteDrivers (likely cause of the
sporadic tap failure on image test (likely cause of the
sporadic tap failure on image test) (morlovich)
condition variables in mock time.
Two mechanisms are provided to work with mock time, though only one is used at
this point.
Method #1 is to use a new idle-callback provided by the Worker class. Use that
to signal the condition-variable and allow the TimedWait in
RewriteDriver::Render to return.
Method #2, not in use yet, is to allow the various phases of RewriteContext and
MockUrlFetcher to advance time, and use new MockTimer alarms to signal the
condition-variable to complete the TimedWait as well.
the filter type from CommonFilter to RewriteFilter. This
thins the interface a little because the filter_prefix is
available as RewriteFilter::id(). There are no filters
suitable for combining that are CommonFilters but not
RewriteFilters.
RewriteFilter. What I'd found was that every new
RewriteFilter that I wrote had to have the exact same fetch
method which has to pass a number of arguments through to
RewriteContext::Fetch. It's easier just to provide a
mechanism to generate a RewriteContext.
- deferring slot update till Render(), which is either run in
the HTML thread.
- inserting RewriteContext* into initiated_rewrites_ *before*
initializing it
- mutex-protecting the wait_fetcher, and putting it in
pass-through mode in
ResourceManagerTestBase::CallFetcherCallbacks, so that
chained rewrites work.
[Thu Jun 02 11:33:39 2011] [warn] [mod_pagespeed 0.9.0.0-738 @18408] Invalid filter name in ModPagespeedFilters: extend_cache%5C
We don't want to escape arguments to fetch_until/fetch_fail since they get quoted already. So instead quote them in the
places we do need them --- direct wget invocation. (morlovich)
Move mem_debug.cc to util -- it's not apache-specific -- it's totally generic to C++.
Move mem_clean_up.cc to rewriter -- it's not apache-specific, it
applies to Pagespeed Automatic as well.
Add testing infrastructure for OpenSuse (morlovich)
RewriteContext is done until its successors have been
initiated.
Adjust deadlines for valgrind runs and debug builds so that
our tests tend to pass. Further adjustments may be needed. E.g.
when running under a real debugger we may want to extend them
way out.
eliminates the need to synchronize within a RewriteContext.
Instead we give synchronization responsibility to the RewriteDriver.
The usage of the task-queueing has three benefits:
1. Rewrites happen in a separate thread that can continue after
HTML has been flushed.
2. The mechanism to queue pending tasks offers a much more
debuggable implementation, at least in testing, than
when we were doing all the rewrite activity in mock
callbacks, which resulted in very deep call-trees.
3. The memory management of RewriteContexts themselves is
now a strict ownership model where the RewriteDriver
stays alive at least as long as the longest-running Rewrite,
and deletes all the Rewrites on Cleanup.
per style guidelines, adding accessors at a few levels of abstraction.
Factored out custom code in a few tests that set up counters and
delayed fetchers; the class supports that with class methods now.
Still at issue: the way that RewriteDrivers are externally managed; this
CL does not attempt to change that.
Move Validate methods out-of-line. Virtualize ParseUrl method. Add
'WaitForCompletion' method in RewriteDriver in anticipation of a CL
that implements it, and call it from the test infrastructure to
facilitate testing. (jmarantz)
Lower the # of iterations of threadsafe cache test when running under
valgrind, as the test can otherwise easily take > 10 minutes. Use the
logic from dynamic_annotations.h to detect that case (adding a forwarding
header for it). (morlovich)
private variables in ResourceManagerTestBase. Remove no-longer-needed
infrastructure for making 2 specific mutexes in factories. Use a
factory for instantiating the ThreadSystem.
class has admittedly gotten too complex. It needs to be simplified in various
ways, such as:
- moving all the state held in boolean variables to a
State enum
- providing some Print() scaffolding to clearly see
what's going on in the debugger
- reducing the amount of recursion through callbacks and
use a scheduler to queue up actions on the RewriteContext
in response to Fetcher, Cache, and NestedContext callbacks.
Document that HtmlElement::AttributeValue("foo") returns NULL if either:
* no attribute "foo" exists or
* attribute foo has no value.
And touch up various things along thee way while checking that that assumption was not missed.
In addition I solved a crash that would occur if you had a binary (value-less) attribute and called IntAttributeValue() on it.
Add script for generating bot-list table and sort bots (fangfei)
Tweak code for detecting XHTML (mdsteele)
Tweak slurping code to allow testing of domain sharding
between config check and actual startup, so we have to clear the statistics
object manually.
Fix CHECK "var != NULL. Variable not found: serf_fetch_request_count" when
no factory has us enabled by default, but statistics are on and we are enabled
via vhosts or query param, by making sure to initialize statistics if they are on
even if MPS itself is off...
functor to transform URLs. Use that to implement the URL absolutifier
without changing the interface to it, so we don't need to update
call-sites. The test now fully covers the separate transformer
and absolutifier.
version of gcc which is gcc 4.4. Other versions of gcc should not
get errors promoted. (morlovich)
Re-interpret StringPieceToVector's delims arg as a set, rather than a
sequence, adding testcases to support that. All our current usages were
single-character so this change has no current effect. (fangfei)
- Some API cleanups (jmarantz)
- Testing code for experimental async rewrite model paths (jmarantz)
- updated some library signatures for new pagespeed version's minifier.
- Thread APIs (morlovich)
- Thread safety fix in Serf fetcher (morlovich)
Adds RetainComments directive, as a user-visible option, allowing users to
turn on the 'remove_comments' directive but keep intact comments that much
any of a set of wildcards.
can create resources after the RewriteDriver is removed.
Notes that we still leave RewriteDriver entry-points which make it
simpler to call at existing call-sites, because it knows about the
RewriteOptions.
Move ReadAsync back to ResourceManager (where it was born) and let it
use the async cache interface, rather than the blocking wrapper.
- Header file cleanups (jmarantz, jmaessen)
- Cleaner statistics init sequence (jmarantz)
- Avoid need for conditionals on statistics use by supplying a
dummy object when real ones aren't available (jmarantz).
More thorough tests for timer_based_abstract_lock. The obvious properties were mostly being tested in subclasses, but Maks had run into issues with early / incorrect termination of the blocking methods. Those are particularly tricky to test, and the code here is a bit delicate: we have to stand up a second thread, and have the main testing thread test that the second thread is blocking in the way we expect, then kill and clean up that second thread. (jmaessen)
1) The server was handling multiple domains
2) It got two consecutive resource fetches on the same process
3) They were both cache misses.
In such a case, the second request was erroneously denied by domain checks.
necessary but not sufficient. There are still promoted compilation
warnings in the css parser which need to be cleaned up.
The most basic change here is to upgrade the Chromium revision to
one that has a working Mac 'gclient' implementation.
- Don't crash in statistics when attach in child process fails.
- Change how we do shared memory (using a shared anonymous mapping passed down
to children) to avoid the permission problem in the first place.
lower-level value cache interface rather than the higher-level http one.
This removes the overhead of packaging things up as http messages
(which was also potentially confusing), and will make it possible to
use a different backend cache for it and http cache in the future.
Also add a forgotten file from the testcases from 597.
claims to be XHTML but really isn't still gets proper DOM created for
it. The xhtml-sensitivity is still applied on filters such as
elide_attributes and remove_quotes that need to avoid breaking
xhtml-compliant pages.
Fixes Issue 252.
concept of on-the-fly resources, and taking advantage of that. Now that
we have more than 2 kinds of resources, actually store the kind in
OutputResource rather than a bool.
Add base_url validation to CreateInputResource*
- No data needs to be stored in the encoder instance
- The encoders read/write a protobuf 'ResourceContext'
- The escaper is not part of the encoder hierarchy but is simply
used by it
- The multipart encoder is part of the encoder hierarchy
Separate the 'encoder' implementation from the encoded data, passing
explicit 'data' objects throught the RewriteSingleResource call-stack
as needed.
Collateral Damage:
Made SimpleStats statistics_ available to all ResourceManagerTestBase
subtests.
Changed the initialization order in RewriteDriver so that
the filters needed to decode rewritten resources are available
after ResourceManagerTestBase::SetUp().
Moved some bool option shadow fields out of img_rewrite_filter
and instead get those option-values directly from the options
as needed. This was required due to
Removed RewriteFilter::CreateOutputResourceFromResource which
was tested but not otherwise used.
Remove excessive info output from url_left_trim_filter. (nforman)
Fix a rare crash in CssCombiner. (morlovich)
Permit timed-out serf requests to continue to spew data. (jmaessen)
a future optimization)
- Cleanup the API to RewriteSingleResourceFilter a bit, so that
the CSS filter can talk to image filter without knowing about its
encoder usage.
via a VirtualHost proxy (async)
Clean/optimize up SymbolTable and HtmlName. Allocate SymbolTable
memory in its own pool. We don't use 'arena' because we don't need
to call destructrors for char*. Stop interning keywords expressed
in their canonical (lower-cased) form as we have a perma-string
available from html_name.gperf. This will reduce the amount of
hashing and allocation done on many HTML files.
eliminating the need for each filter to initialize its own member variables
for Atoms. This also makes it chaeper to construct the filter-chains and
the lexer itself, and achieves case-insensitivity for our rewriting filters
while allowing our data representation to retain the case-sensitive text
so that we can make case-folding an option.
This is phase-3 in a 5-phase fixing of an issue where we case-fold
sensitive XML that's masquerading as HTML by lying about its
content-type.
Phase 1: Add HtmlName construct with tests (done: CL 19353690)
Phase 2: Change symbol-table to case-sensitive, but for
the short term, lower-case while lexing: CL 19343411
Phase 3: Change the rewriters to use keywords rather than enums.
This requires changes in Page Speed and Ads as well.
Phase 4: Stop lower-casing while lexing, and eliminate this method.
Phase 5: Turn case-folding off in Writer, fixing issues with flash sites
Phase 6: Better content-detection of XML and allow<enter description here>
we can update Page Speed's version of instaweb without complicating
its build flow. Note that we still employ dense_hash_map in the
CSS parser, which Page Speed does not depend on.
can both keep the case-preserved name, and do fast word
comparisons for tag matching. We still case-fold on output
so this should have essentially no functional difference.
Moved the keyword comparison from being atom-based to being enum-based,
eliminating the need for each filter to initialize its own member variables
for Atoms. This also makes it chaeper to construct the filter-chains and
the lexer itself.
Use condition variables for thread synchronization in serf_url_async_fetcher.
This makes worker threads notably more responsive. Also uses thread join rather
than multi-thread unlock/lock for termination detection (should work reliably
everywhere), and fixes subtle logic & locking bugs in during termination that
caused the main thread to shut down prematurely when fetches were in transit to
the async thread. (jmaessen)
locale issues. Also can use StringPiece which is advantageous when
source is a std::string.
Clean up gperf usage a little. Make an explicit name for the
generated table to reduce the non-obvious magic in the C++ section of
our gperf file.
- Fix bug with some malformed image width/height attributes
- Pool data structure for some upcoming refactors (jmaessen)
- Some tweaks to rewrite result metadata cache APIs
In particular, don't put in the same image more than once with two
different resolutions, which causes mod_pagespeed to make page-load
time worse. And do include a png that can be losslessly re-encoded.
of the origin domain. Admins will often (at our recommendation)
origin-map their internet domain to 'localhost' so that their apache
server will serve their own resource requests via loopback rather than
going out to the load-balancer and back in to another server.
But we shouldn't allow an HTML reference of 'localhost' -- or any
other origin resource -- to sneak through and be evaluated on the
Apache server.
- Fix bug 192, mishandling of URLs with trailing things, by
404ing those with junk and accepting (but not mis-caching) those with
queries (morlovich)
- Work on infrastructure for partitioning (jmarantz)
Issue 176: Relax 250 character restriction on URL segment size by
overriding Apache httpd map_to_storage hook. This can be overridden
in the config.
Issue 161: Allow config file prevention of combining files across paths.
Don't use an overly exprensive data structure here, it slows down parsing considerably; enough for this to be a ~15-18% speedup on parsing only, and overall ~4% speedup on hot path.
once, I thought it best to build just enough so that folks could comment on the
design and testing before I proceeded. Hopefully this CL is fairly managable;
it can go in on its own (though nothing presently depends on any of the code).
not touch it; previously the image rewriter used to just store a 304, and then
notice the problem when fetching the image. Now we just mark it inside an
rname/
entry, without writing out a bogus content entry pointlessly.
Make the CSS filter take advantage of this, and test that. Also, make the tests
somewhat less bool-heavy.
RequestHeaders.
Note that the parsing functionality is now split into a separate class,
ResponseHeaderParser, that needs to be instantiated explicitly from
any class that needs to parse HTTP streams.
This does not switch to using protobufs for serialization yet -
that's a cache-busting move that requires us to do something specific
to ensure that we don't try to read the old-format files.
Move more functionality from meta_data.cc to response_headers.cc and
request_headers.cc.
Get rid of the now-obsolete simple_meta_data[_test].(cc|h)
mod_headers from corrupting our caching headers. Rather than
depending on a late-running output-filter for fixing up our headers,
instead try to remove mod_headers and mod_expires from the filter
chain via a late-running add-filter hook.
I no longer depend on string literals in mod_rewrite.c but have
taken matters into my own hands by squirreling away the original
URL in request->notes, which provides me a string-map.
So now I let mod_rewrite do whatever it wants, and instead examine
the original URL only once, before mod_rewrite. Note that the user
can, with a bogus rewrite rule, completely screw up mod_pagespeed
now, because the the authorization module will work with the output
of mod_rewrite. In fact this was happening in my test because I
was rewriting to a URL that did not start with a /.
Adds in a new testcase to explicitly validate a claim made on the discussion
list -- that we can rewrite CSS files across domains that have been mapped
together via ModPagespeedMapRewriteDomain. Note that no functional changes were
needed, but I did a minor refactor in the test to pull together all the
'yellow' and 'blue' css text into char[] constants.
Fix a corner-case combining CSS files from different paths where a file
that changes the resolved base needs to be backed out cause it makes the
URL too big. This was resulting in an extra level of hierarchy in the
encoded names, and could create functional issues on multi-server setups
where rewritten URL decoding is needed for functionality.
Make ScriptTagScanner understand various attributes that determine
whether something is JavaScript or not, whether it should run, and to
decode the various non-blocking execution modes.
Port existing filters to use ScriptTagScanner rather than ad-hoc JS
detection code that was different in all of and add testcases for
instances where the previous checks failed.
would make the URL too large, and would also change its 'resolved
base'. The change to the resolved base was not properly backed out,
leaving the extra level of hierarchy in the path.
Forcibly remove caching and last-modified headers on rewritten HTML files,
as we have cache-extended all the resources on the HTML files, so it's
dangerous to cache the HTML files. They should be rewritten each time
to give a chance for changed origin resources to be rewritten in.
Fixed race-condition in async serf fetching that caused rare crashes
and server-load issues with URLs that time out. This primarily happened
with the browser-proxy pointing to the Apache server and the mod_pagespeed
slurp mode turned on, but the URL being requested was actually on the
Apache server.
Add several new statistics for measuring cache activity, rewriting overhead,
and option-parsing overhead (for .htaccess).
Add locking of resources during rewriting, so that multiple processes don't
try to optimize the same css, js, or images at the same time.
Add more system-testing for caching headers.
Add memory-leak checking for the open-source unit tests.
Fix leaks both in the serf library (as a patch)
and in our serf fetcher.
Change the fix_third_party_memory_leaks script to /bin/sh.
Refactor the check_for_leaks script to generate more concise and
timely output.
Stop destroying the AprStatistics pool -- it destroys the global mutexes
when they must still remain alive.
test for memory leaks in live Apache.
Add directive to disable statistics:
ModPagepseedStatistics off
Fix inline Javascript for XHTML by adding CDATA tags.
Reduce severity of 'missing body tag' in add_head from Error to Info.
by changing the missing body message to be an information notice
and not an error, as requested.
The bulk of the change is the testing infrastructure to verify this
(and similar changes)
Fixes combining options.
Fixes bug parsing content type with char-set.
Updates example .htaccess file with some of the entries you can put in .htaccess.
ModPagespeedAllow wildcard_expr
ModPagespeedDisallow wildcard_expr
These are usable from the conf file or htaccess.
Fix CentOS 5.4 compile problem.
Fixes compatibility between elide_attributes and xhtml.
Add support for legacy URLs so we can continue to serve any old ones
that were captured somehow (e.g. a search engine or browser bookmark).
When rewriting resources, encode ? and & embedded in origin URLs to
avoid having the new URL misinterpreted as having query params.
Add robustness and breadth to content-type parsing to improve the
reach of cache extension.
When fetching URLs, time them out and cancel them if they hang.
Remove obsolete dependence on protobufs and remove an implemented
but unused string buffer class.
Change format of generated URLs to lead with the original URL
with minimal encoding, and then follow with ".pagespeed." then
other information required for decoding and cache extension.
make us better internet citizens, and allows us to fix a case of Issue 85
where we were rewriting our own documents.
Also begins to plumb through the referer, but that is not working yet.
When a URL fetch fails; wait 5 minutes before trying again.
Add lock expiration based on timestamp to avoid failing resource loads due to stale locks
Avoid optimizing more than 8 images (by default) at the same time.
Scan first bytes in output filter to see if incoming bytes actually
look like HTML, and passing the bytes directly to the next filter if
they don't. This should reduce the wasted CPU trying to rewrite HTML
when the source is actually gzipped or a mis-typed image.
Avoid rewriting resources with empty name e.g. <script src="">.
Add stress test script.
Avoid warning on <?xml ...>
Add script install_apxs.sh to assist users installing against custom-built Apache
already-rewritten resource). Now with tests (the only difference to last time
is resource_manager_test). I also wasted quite a bit of time with a bum Apache
config before verifying that it does appear to resolve
http://code.google.com/p/modpagespeed/issues/detail?id=3 .
on the Issues page:
Issue #30: Very slow memory leak
Issue #18: Apache log level is ignored
Issue #11: After installing module server becomes unusable
Issue #10: A whole lot of HTTPD processes
Issue #5: Incorrectly handling not-quoted font-family names with spaces
Issue #5 was resolved by removing "rewrite_css" from teh Core filters
set until we fix the problem properly. If you are not using font family
names with spaces you may be able to re-enable the filter manual.
filter filename encoding scheme. Also pulled image_dim into a separate class,
cleaning up testing a bit in the process. Finally, fixed a rare crash bug in
img_rewrite_filter due to mis-ordered a callback.
on cache_url_async_fetcher to do it for us. This eliminates some extra
string copying. Update rewriter to accomodate change to nested cache
ownership model.
Add write-through cache implemenation, and change the ownership model
so when we compose caches, the outer cache owns the inner one.
`mod_pagespeed` is an open-source Apache module created by Google to help Make the Web Faster by rewriting web pages to reduce latency and bandwidth.
To see ngx_pagespeed in action, with example pages for each of the
optimizations, see our <a href="http://ngxpagespeed.com">demonstration site</a>.
mod_pagespeed releases are available as [precompiled linux packages](https://modpagespeed.com/doc/download) or as [source](https://modpagespeed.com/doc/build_mod_pagespeed_from_source). (See [Release Notes](https://modpagespeed.com/doc/release_notes) for information about bugs fixed)
## How to build
mod_pagespeed is an open-source Apache module which automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.
mod_pagespeed is built on PageSpeed Optimization Libraries, deployed across 100,000+ web-sites, and provided by popular hosting and CDN providers such as DreamHost, GoDaddy, EdgeCast, and others. There are 40+ available optimizations filters, which include:
## How to use
- Image optimization, compression, and resizing
- CSS & JavaScript concatenation, minification, and inlining
- Cache extension, domain sharding, and domain rewriting
- Deferred loading of JavaScript and image resources
Curious to learn more about mod_pagespeed? Check out our GDL episode below, which covers the history of the project, an architectural overview of how mod_pagespeed works under the hood, and a number of operational tips and best practices for deploying mod_pagespeed.
# cp: cannot stat ‘/tmp/instaweb.vRV047/mod_pagespeed-test-jmarantz/install/net/instaweb/genfiles/conf/pagespeed_libraries.conf’: No such file or directory
# TODO(jefftk): this also depends on prepare_release.sh and install-glucid.sh,
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.