Embedded metadata in an image file – such as EXIF and IPTC information – tell us a lot about a photograph and the circumstances in which it was taken – it tells about the context in which the image was created. When the shot was taken, using which device, at what location, what the light was like, … Useful, if not essential information if you’re trying to automate the creation of the story and context around, let’s say, a holiday. Sadly (and if you ask me, almost criminally) social media services such as Twitter, Facebook, Instagram, … delete exactly that information from a photograph the moment you upload it to their service. Actually, the only social network doing this right, is Google Plus.
What does EXIF data tell us?
Have a look at the list below. Here’s some of the EXIF (exchangeable image file format) information included in a photo taken with my Samsung Galaxy Nexus:
- Camera: Samsung Galaxy Nexus
- Date taken: October 4, 2013 4:18:13PM
- GPS Latitude, Longitude & Altitude: 38.792833 degrees North 9.389820 degrees West 477 m Above Sea Level (Time Stamp 15:16:37)
- Create date: 2013:10:04 16:18:13
- Modified date: 2013:10:04 16:18:13
- Date/Time original: 2013:10:04 16:18:13
- Image Unique ID: OAEL01
- Exposure Time 1/1848
- ISO 50, 0, 0
- F Number 2.75
- Aperture Value 2.59
- Brightness Value 0
- Shutter Speed Value 1/1802
- Light Source Fine Weather
- Scene Capture Type Standard
- Flash No Flash
- White Balance Auto
This information (where, when, camera/device, light circumstances, flash fired) is included by almost every smartphone. However, an image might not contain location information if you’ve explicitly told your phone to leave it out. If you want to try this with one of your own photographs, Jeffrey’s Exif viewer is a great tool to do so.
What does IPTC data tell us?
If a photographer wants to add additional information still, they can do so later using the IPTC standard: copyright information, contact details, title & description for the photo, a detailed location (street name, city, country) of the photo, tags about the photograph, the type of scene depicted, categories it belongs to (sports, news, art), if and when the photograph was digitally manipulated, … . An example of that you can see in the photo below:
You see, there’s tons of helpful information about the photograph and its context in IPTC & EXIF data. This info is especially useful for software, as software can’t see this is a photo of “a yellow tram, on a reasonably sunny day, in Portugal, in HDR style.” It can, however read the meta information to learn just that. Alas, when you upload a photograph to Twitter, Facebook, Instagram, … the majority of this metadata is stripped out of the image. The image file shown on these sites no longer contains any reference as to whom took the picture, where, when, … . If you upload an image taken two days ago, in Hasselt, it’s suddenly created ‘now’, and Facebook even has the audacity to dare ask you to add a location to show where it was taken. All our precious context: gone!
Which social media sites remove embedded metadata?
Which social media sites are guilty of this? Well, all but Google+. Here’s an nice overview that’s the result of a study by members of the Photo Metadata Working Group of the IPTC and contributors to the photo metadata survey of controlledvocabulary.com. (I’ve recently noticed that even Wikimedia does not include EXIF or IPTC information in resized version of the images they host. Where to file a ticket for that?)
Why do they remove embedded metadata from images?
Why would these social media sites want to get rid of your photographs embedded metadata? Maybe it’s hard to preserve this information? No, not really. Imagemagick, one of the most popular automated resizes, supports EXIF & IPTC. If you use -resize, and want to remove EXIF information, you actually need to add an extra command. It might be to preserve your privacy? After all, it could be a picture has the coordinates of your home in it? Not quite that either, because then they’d at least leave date created, right? So you can later see your holiday pictures in order? Then why?
1. To lock you in.
You want to know when a photograph was originally taken? No way to find out, except sometimes, by passing by these companies’ APIs. Same for location. Facebook happily strips out the EXIF GPS Location information out, and when the photo was taken, but then asks you to tag the location and fill in the date taken. Why?
Firstly, they don’t want location information in geocoordinates. They want it in their format, linked to – let’s say – the Facebook page for Starbucks Antwerp. As Facebook has stripped out the EXIF data, any other service asking for the photo’s location will get ‘Starbucks Antwerp, as listed here on Facebook‘ returned.
Secondly, if you then later download your own photographs to use them somewhere else? All information has been removed from the image, so they are pretty much useless. So you better stick to Facebook. That’s a lock-in. (So don’t use Facebook as a the only hosting / backup service for your holiday pictures!)
2. To claim license rights to as many images as possible.
Facebook, Twitter, Instagram, … all rely heavily on the ‘resharing’ of content. The more content that can be shared, the better. To get as much content (and thus traffic > page views > ad views > profit) as possible, they’ve decided to just forget about such a ridiculous thing as ‘copyright’.
Maybe Facebook’s Terms offer a clue? “For content that is covered by intellectual property rights, like photos and videos (IP content), you specifically give us the following permission, subject to your privacy and application settings: you grant us a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any IP content that you post on or in connection with Facebook (IP License). This IP License ends when you delete your IP content or your account unless your content has been shared with others, and they have not deleted it.”
This translates as: they can do whatever they want with your photographs, and they can sell that right on to whomever they please. Now, let’s say you download an image I’ve posted online under a Creative Commons Attribution Non-commercial license, and post it to your personal account on Facebook. You mention I’ve taken it. All fine, you’re adhering to the license. You have no right, however to grant Facebook any rights to my image. Facebook does not care. They just remove the embedded metadata which contains the copyright information and credit to me as an author. Facebook then suddenly has a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to an image I’ve never uploaded. Best then, right, to strip out the metadata that proves they don’t have the right to claim this license.
What can’t we do without embedded metadata?
I wasn’t looking into this because I wanted to complain about big companies not adhering to copyright licenses (I have G+ for that). I was looking into images and their embedded metadata because I wanted to know what extra ‘context’ information we could use. As there usually is no or few embedded metadata remaining, this is what we won’t be able to do:
- Know for sure when a photograph was taken. Often, we’ll only have information about when a photograph was uploaded. Very annoying if those are holiday shots you post when home again.
- Know for sure where a photograph was taken. As we can no longer see the phone’s GPS coordinates, we won’t know where the photo was taken. (Unless you’ve specifically added that information when uploading to Facebook/Instagram/… .)
- Know if you have taken a particular photograph, or are sharing someone else’s lolcat. I would assume that a photograph you’ve taken is more important to you than a random lolcat you shared. Even though the lolcat has twice as much comments. However, as we’re no longer able to see the device the photo was taken with, or the creator’s information, we are clueless as to who made the original image.
- Easily track duplicates. You’ve posted a photo to Twitter. Five minutes later, you post it to Flickr with a slightly different description. How are we supposed to know that these photographs are one and the same, or on two different subjects?
- Do an estimated guess as to if a photograph was taken ‘indoors’ or ‘outdoors’. Based on the lighting information included in the EXIF data (white balance, iso value, shutter time and diafragma, we can try to compute what the light environment was like, and thus if a photograph was taken indoors or outdoors.)
My conclusion? I (largely) agree with the Embedded Metadata Manifesto. There’s no excuse for social media sites stripping out all embedded metadata, except maybe, them trying to lock their users in and avoid copyright issues. These sites need to be called out on this. And, if I may indulge into a bit of wishful thinking, they should actually be doing the opposite of stripping out metadata: adding information to the images. Take twitter for instance, if the image doesn’t have a metadata description yet, why not add the copy of the tweet? Surely, it will contain some information about the photo. There’s no author yet? Add that user’s twitter handle. You’ve told Facebook the photo was taken at Starbucks, rather than 38.792833, 9.389820? Then why don’t they add that address to the metadata too? They’d be building a world richer in information, and full of attribution.
That, and I’m near-concluding that maybe, with its slightly different license (non transferable) and EXIF data still in place, Google Plus isn’t that bad a place to store my holiday photographs after all .