Friday, July 25, 2008
Amazon S3 Fast downloads to EC2 using Curl
Our company is building an application that depends heavily on Amazon's Cloud Web Services: Simple Storage Service, Elastic Compute Cloud, and Simple Queue Service. We are using Java for a lot of the business logic and Groovy for the 'glue code', interacting with frameworks, etc.
Anyway, I've spent some time tuning downloads and found out that I can get an order of magnitude faster download time if I shell out to 'curl' than if I use jets3t or the lower-level HttpClient. Note this speed-up only occurs when moving an S3 object to an EC2 instance, not when moving it outside the cloud (to my laptop for instance).
For some reason uploads using jets3t are very fast and we are guessing at this point that HttpClient (which jets3t depends upon) is causing the slowdown because it either can't (or hasn't been configured properly to) deal with the extra-large packet sizes that AWS allows within its cloud.
Being a developer on a schedule I punted and shelled out to curl using a signed-url that jets3t provides for my S3 Object get.
Here is the pseudo-code in Groovy (would be trivial to convert to plain Java) for the shell-out:
One final note: this will capture a curl process error but not many of the errors that you could experience when working with S3. For example if the key did not exist, the curl process would succeed but the downloaded file would contain the Amazon error response xml instead of the intended file. So it is your responsibility to first do a s3service.getObjectDetails(..) to make sure the object exists, then you must check the downloaded content length (and possibly content type) to ensure that you received your object and not an error.
Anyway, I've spent some time tuning downloads and found out that I can get an order of magnitude faster download time if I shell out to 'curl' than if I use jets3t or the lower-level HttpClient. Note this speed-up only occurs when moving an S3 object to an EC2 instance, not when moving it outside the cloud (to my laptop for instance).
For some reason uploads using jets3t are very fast and we are guessing at this point that HttpClient (which jets3t depends upon) is causing the slowdown because it either can't (or hasn't been configured properly to) deal with the extra-large packet sizes that AWS allows within its cloud.
Being a developer on a schedule I punted and shelled out to curl using a signed-url that jets3t provides for my S3 Object get.
Here is the pseudo-code in Groovy (would be trivial to convert to plain Java) for the shell-out:
S3Service s3service = ... // Inject an instance of JetS3t S3Service
File file = ... // File representing download location on disk
S3Bucket bucket // Bucket object (could just be a string)
Date date = s3service.getCurrentTimeWithOffset()
long secondsSinceEpoch = (date.time / 1000) + 60L
def url = new URL(S3Service.createSignedUrl('GET', bucket.name, key, null, null, s3service.AWSCredentials, secondsSinceEpoch, false, false))
def cmd = ['curl']
// I break up large downloads so here is an optional byte range.
cmd += ['--range', "${low}-${hi}"]
cmd += ['--show-error']
cmd += ['--connect-timeout', '30']
cmd += ['--retry', '5']
cmd += ['--output', file.absolutePath]
cmd += [url]
Process p = cmd.execute()
p.waitFor()
if (p.exitValue() != 0) {
throw new IllegalStateException("Curl process exited with error code ${p.exitValue()}")
}
LOG.info("${file.name} download completed")
One final note: this will capture a curl process error but not many of the errors that you could experience when working with S3. For example if the key did not exist, the curl process would succeed but the downloaded file would contain the Amazon error response xml instead of the intended file. So it is your responsibility to first do a s3service.getObjectDetails(..) to make sure the object exists, then you must check the downloaded content length (and possibly content type) to ensure that you received your object and not an error.
Apple Time Capsule Disk Naming Breaks Backups
Took me awhile to figure this one out so I'll blog it here to help raise awareness.
I bought an Apple Time Capsule to add to my home network. I loved the idea of wireless backups using the cool Time Machine software. Anyway, I kept having issues where I could see the Time Capsule's disk in the finder but the back ups would fail with a warning about not being able to mount the disk.
There is a lot of long posts out on the webs but here are two things I tried:
1: Some people were having problems when the Time Capsule disk name was long/contained odd characters
It appears that the shared disk name (not wireless network name) needs to be fairly short ~25 chars. This is unfortunate because Apple software tends to default to a long name when you first set up a TC. (e.g. Joe Owner's Time Capsule).
Link to original thread on issue #1
-- note: this alone did not fix the issue for me but still seemed like a good idea
2: Delete the last entries in the sparse bundle
This one seemed to do the trick, got my Time Machine process past the 'processing' stage and into the actual backup (now at 1.2 GB out of 2.2 GB for the current job). The good thing is all my old backups (except the one day I deleted) still exist.
Here is a quote and a link to the thread that helped:
Of course the very first thing you should do is make sure you have the latest OS updates on the machines you are trying to back up. You should also make sure the hardware flash drivers are up to date on your Time Capsule
One final detail is that I also have an Airport Express attached to the network to extend the range of wireless reception and to share our printer. Not sure if this contributed to the issue (probably not)
I bought an Apple Time Capsule to add to my home network. I loved the idea of wireless backups using the cool Time Machine software. Anyway, I kept having issues where I could see the Time Capsule's disk in the finder but the back ups would fail with a warning about not being able to mount the disk.
There is a lot of long posts out on the webs but here are two things I tried:
1: Some people were having problems when the Time Capsule disk name was long/contained odd characters
It appears that the shared disk name (not wireless network name) needs to be fairly short ~25 chars. This is unfortunate because Apple software tends to default to a long name when you first set up a TC. (e.g. Joe Owner's Time Capsule).
Link to original thread on issue #1
-- note: this alone did not fix the issue for me but still seemed like a good idea
2: Delete the last entries in the sparse bundle
This one seemed to do the trick, got my Time Machine process past the 'processing' stage and into the actual backup (now at 1.2 GB out of 2.2 GB for the current job). The good thing is all my old backups (except the one day I deleted) still exist.
Here is a quote and a link to the thread that helped:
I just had the same problem. That error can be caused when Time Machine is attempting to hard link to a previously corrupted backup.Link to original thread on issue #2
Try this:
1) Connect to the shared drive from your Mac
2) Mount the .sparsebundle on your Mac
3) Inside the .sparsebundle, expand the folder "Backups.backupsdb"/
4) Sort the contents by date
5) Delete the link "Latest" and also the most recent incremental backup folder (ie. "2008-06-18-075750")
6) Unmount the .sparsebundle
7) Go into Time Machine Preferences, choose "Change Disk..."
8) Select the shared drive (not the sparsebundle)
9) Click Start Backup. After "Preparing" for a little while, it start backing up again.
10) Click Enter Time Machine when it's done backing up. All of your old backups should be visible (whew!)
Of course the very first thing you should do is make sure you have the latest OS updates on the machines you are trying to back up. You should also make sure the hardware flash drivers are up to date on your Time Capsule
One final detail is that I also have an Airport Express attached to the network to extend the range of wireless reception and to share our printer. Not sure if this contributed to the issue (probably not)
Monday, May 19, 2008
Beyond Compare for OS X?
Many developers moving from Windows to OS X will find that one valued application "Beyond Compare" has no mac equivalent.
Recently a new visual diff application has come out that works very well (for a 1.0). It integrates nicely with Textmate and the OS X terminal. While not as feature-complete and polished as Beyond Compare it does give you the meat & potatoes of what you need - folder and file diffing.
I encourage you to try it out but remember this is 1.0 software. For example, see my screenshot of a folder diff. Notice how the file/folder names are abbreviated even when there is ample room for display.
Friday, May 2, 2008
Intellij IDEA, OS X Leopard Spaces, Fixed with J2SE 6
Looks like the Java 1.6 update finally fixes the Java Swing/Spaces issues that have been so annoying for the last 10 months or so. Yay.
Monday, March 31, 2008
Intellij IDEA, OS X Leopard Spaces, pretty good solution
As of this writing some Java-based apps still don't work well with OS X 1.5.x (Leopard) Spaces feature. Specifically, if you are in a different space than where the IDEA window currently resides, command-tabbing to IDEA won't switch you to IDEA's space. Another side effect is that clicking the IDEA icon in the doc wouldn't bring you to the correct space either.
I noticed awhile back that if you had two idea projects open in the same space then the problem was solved, Spaces would then work properly with IDEA. The problem was the pain in the butt factor of having a blank 2nd project open (extra memory and thread resources).
I noticed that having any of IDEA's modal dialog windows open (like preferences) also solved the problem. Today it finally dawned on me to try IDEA's help window and, yes, it worked!
Solution
Simply open up the IDEA help window and put it behind your working project window. It won't work if you minimize the help window - it has to be open. Works great and doesn't suck resources.
Another solution that works, if you don't mind it, is to float one of the windows (e.g. the debug window). This works well if you have two monitors. Simply click the "float" button on any of the windows and move it out of the way.
Let me know if you have any other ideas or if there is a permanent solution available.
I noticed awhile back that if you had two idea projects open in the same space then the problem was solved, Spaces would then work properly with IDEA. The problem was the pain in the butt factor of having a blank 2nd project open (extra memory and thread resources).
I noticed that having any of IDEA's modal dialog windows open (like preferences) also solved the problem. Today it finally dawned on me to try IDEA's help window and, yes, it worked!
Solution
Simply open up the IDEA help window and put it behind your working project window. It won't work if you minimize the help window - it has to be open. Works great and doesn't suck resources.
Another solution that works, if you don't mind it, is to float one of the windows (e.g. the debug window). This works well if you have two monitors. Simply click the "float" button on any of the windows and move it out of the way.
Let me know if you have any other ideas or if there is a permanent solution available.
Subscribe to:
Posts (Atom)