I’ve added in a new passport MRZ resolver as it’s been quite a requested function – most of the ones available are online only and whilst I’m sure they don’t scrape any data, it’s always good to be careful.
IPTC and PDF metadata was a little bit lacking previously – it was getting the main fields but getting stuck on some of the (admittedly less useful) fields. It should gather it all now. I’ve also made MI a lot better at grabbing creation type dates (still be careful with this!)
v0.7 – Callahan
Passport MRZ resolver. This is extremely alpha, but was requested by a few companies using MI as most tools for that are online.
IPTC metadata now displays properly.
All fields added/calculated by MI (and not part of the metadata) now start with a *
MI tries harder to find a creation date – still be careful with this.
More metadata from PDF files, and removed some useless values.
I’ve been doing Open Source Intelligence (OSINT) gathering professionally for quite a while now – back from the days where it was just a GeoCities check to see if the subject had been ridiculous enough to create a page dedicated to their criminal enterprise (surprisingly/unsurprisingly, this was very common). However, I’ve only recently shuffled my way into the OSINT ‘scene’ which has popped up on reddit/twitter/forums. It’s great to see that there’s a network of people passionate about the subject area and there is a lot of great sharing and caring going on.
However, as with any community, a lot of buzzwords/phrases creep in, it starts to become a ‘club’, and newcomers flock in with wildly varying levels of experience. It’s great that things are opening up as it all adds new perspectives, but since getting involved I’ve seen a lot of people posting methodologies and suggestions which are…well pretty much just illegal. There’s no way to sugar coat it, and it doesn’t matter what jurisdiction you’re in: some of what is being shared as ‘OSINT methodologies’ fall directly into either harassment, stalking or fraud.
The main offender for this is ‘Social Engineering‘.
Social Engineering – what it is.
The Wikipedia entry for Social Engineering (the best we can get to the current consensus for the word) is:
For anyone unsure of what Social Engineering is (which seems to be a lot of people), this video is the single best explanation:
As you can see, it’s basically lying to someone to get what you want via invoking the most holy trinity of the BLT. This is otherwise known as fraud.
Here’s the UK and the US definitions of fraud in case anyone has forgotten:
Fraud act 2006 (Section 2)
[A person commits fraud if they make…] a false representation, dishonestly, knowing that the representation was or might be untrue or misleading, with intent to make a gain for himself or another, to cause loss to another or to expose another to risk of loss.
US Code 18 (it’s a bit trickier in the US as there are many laws to choose from which cover it):
Whoever falsely and willfully represents himself to be a citizen of the United States shall be fined under this title or imprisoned not more than three years, or both.
(Many more under Chapter 47, Chapter 63 and stated cases)
Lets take the UK definition and apply it to our man Crash Override. He commits a false representation straight out by saying he’s “Mr Eddie Vedder from accounting”. He knows this is untrue and misleading, and he does this for gain (to get access to the modem number). It’s pretty straight forward.
As you can see, the definition of social engineering is just fraud by another name. Now some might start arguing that ‘manipulation’ doesn’t have to involve fraud, but I honestly can’t think of a ‘manipulation’ which wouldn’t be fraudulent in some way by most legal systems.
That isn’t what *I* mean though.
In computer security discussions, the term ‘social engineering’ is well understood – it’s the phishing scams and ransomware attacks. However this term which most people seem to understand in the context of compsec, somehow seems to get distorted when we talk about OSINT – I’ve seen posts with things like ‘if that doesn’t work try a bit of social engineering to see if you can find out x’ or ‘I couldn’t find out anything online so I used social engineering to get what I needed’.
Now I don’t think that people are quite recommending ransomware or similar – it’s more likely one of the below:
The writer doesn’t really understand what ‘social engineering’ is and just uses it as a buzz word for anything from adding a subject as a friend on Facebook to holding their spouse hostage.
They’re using it to reference social media OSINT methodologies.
The writer doesn’t want to say ‘lie to them to get what you want’.
Now the first one is a distinct possibility – we’ve all heard countless people use phrases in that sort of clunky, ‘I’ve-only-heard-this-at-a-conference’ way and I feel that a lot of people are using the term Social Engineering to sound a bit more ‘exciting’. It certainly sounds cooler than ‘…and then I looked at his Facebook feed until my eyes turned into sandpaper-y cubes.’
The second I think is mostly a way for OSINT practitioners to flag up that they do ‘social media stuff’ as well as Experian checks. I’ve been to a number of conferences recently where a worried director exclaims ‘won’t someone think of the social media platforms!’ after too much talking about any other type of OSINT service (or whilst the speaker is just taking a breath).
More than that, I think some ‘OSINT evangelists’ are also trying to push such language in a marketing sense as well – the whole ‘we’re willing to go to the very edge of legality/I can kill a man with my intersects alone’ vibe sells contracts unfortunately.
I know. Why is this important?
Mostly because I feel there’s still that misunderstanding that using the word OSINT makes you somehow exempt from the usual rules. It’s the same as the ‘if it’s on the net then I can do what I like with it’ misconception.
Everyone who has been involved in this field in a professional setting knows that isn’t the case, but unfortunately a lot of newcomers seem to believe that there’s some sort of magical get-out-of-jail-free card available under the umbrella of ‘doing OSINT’. It’s always been a problem, but using ‘criminal’ terms like social engineering in a ‘valid tactic’ sort of way starts to muddy the water more than ever.
This is compounded by a mix up of what is acceptable when hired to do security/pentesting, and what is acceptable without such a contract – this may seem obvious to some, but for newcomers it’s not. They read @OSINT_BLACK_OPS_SN1PER_HACK3R tweet ‘wasn’t getting anywhere on my new contract so did a bit of social engineering and now I’m the CEO’s dentist’ and think ‘I’m learning how to do OSINT gathering! I’ll social engineer my local gardening club and see what Margery is really up to!’
Instead of the 50k bonus and as much mouthwash as he can swig for the rest of his life, our intrepid newbie ends up having awkward bedtime chats with cellmate and fellow stalker Profusely Sweaty Greg, all the while wondering why the magical shield of ‘just OSINT’ing’ didn’t protect him.
For some, this whole post will seem all very patronising and obvious – to others it will seem like pedantry. I understand this, and I’m not trying to say that we should jealously guard our secrets or make the community any less welcoming to newcomers, but for those who are experienced I’d really like to ask you to try to share your knowledge responsibly and throw in a quick comment then next time you see someone getting told to ‘do a bit of social engineering’ or similar.
Version 0.6 has been a bit of a turning point for the project – I’d say that Metadata Interrogator now does what I wanted it to do when I set out to create it. I wouldn’t say it’s reached ‘v1.0’ yet, but it analyses a whole raft of metadata above just the usual EXIF data, it has a useful timeline and highlights some interesting data both in comparison between files and across a whole data set.
As it’s got to that stage, I don’t think it’s necessary to do quite so many small iterations, and so the move between 0.6 and 0.7 is going to be a big one and take some time to do. The below is a sort of road map on where I want to take the project.
Performance – This is the big one, currently it takes far too long to load up and analyse files. To try to get better performance (and maybe a smaller file size?!) I’ll try to port it across to Nuitka (https://nuitka.net). I’ve heard fantastic things about it, and I’m sure it’ll improve things no end. Early tests have not ended well however, so time will tell if I can gather together enough goat entrails and black candles to make this work.
Release on Mac and Linux – I’m holding off on this till I sort out Nuitka, but it’s a main priority after that.
Installer version – Whether this is necessary depends on the gains I get from Nuitka – if they’re impressive I’ll keep it portable, however if not I’ll add a version that can be installed which will give faster access for regular users.
Pre-computed Analysis – One thing I’m very interested in doing is doing analysis on large data sets of files to work out what ‘normal’ metadata drift on a single device looks like. As in, how many (and which) fields change as one smartphone takes 100 different pictures. This will then be built into the software to highlight where the differences aren’t ‘normal’.
Overhaul the UI – The UI works, but it’s still a bit janky in parts. This isn’t a top priority as it all seems to work roughly fine for me, but it’s not ideal.
As always, any comments or suggestions would be great.
In version 0.4 I’ve included a basic timeline functionality I’ve added this feature as a quick way to visualise the creation dates of a large number of files, and show up any gaps in the dates.
I created this due to three things I discovered during my time working in counter-fraud:
Fraudsters often use templates created months/years ago as the basis for verification documents.
Fraudsters will often forget to change/miss a date or two out of their submission documents.
It’s easy to overlook ‘wrong’ dates when scanning through dozens of files.
The idea is that this feature will allow you to quickly notice any file dates which are far out of your expected norm, as well as if there are lengthy gaps between files. Hopefully that will help you identify any with odd times attached to them, but be aware that metadata/exif sometimes isn’t very reliable with recording times and dates, so be careful how you use the information.
One final thing to note is that I’ve stopped files with the date and time of ”1980-01-01 00:00:00′ from showing up – this seems to be a Microsoft default date and was skewing a lot of timelines.
If you have any ideas of how this feature can be improved, feel free to send me a note on the contact page.
Currently, Metadata Interrogator comes in at a completely-unreasonable-in-this-day-and-age 76mb file size. As promised, an attempt was made to reduce the size of Metadata Interrogator by using a different table GUI (pure TKinter rather than Pandastable). Unfortunately, even with my best efforts the file size reduction wasn’t that significant – 76mb to 55mb.
Whilst it is a reduction in size (and I might be able to shave off another 5-10mb) – for the reduction in functionality (and extra work in maintaining two versions) it’s not really worth it.
Now that we’re in the age of widespread broadband, terabyte USB sticks and colour TVs I feel it’s a roughly acceptable file size, although do get in touch if you have a really burning use-case for a much smaller file size.
Going forward, I’ll be working on new functionality and optimising the load. I’ll also release a zipped package version which should run faster and still stay relatively portable.
Digital Document Forensics (DDF) is what Metadata Interrogator is all about really, it’s trying to gather as much information as possible from a file – especially any ‘hidden’ attributes that might give us clues to who/what/where/when/how the file was made.
This has obvious utility in a number of sectors, but if you’re interested in using it for counter-fraud/customer validation/KYC then you might be interested in an online course that I run on the subject. Your keen forensic senses might be able to warn you this is a slightly promotional post.
At the moment, the course doesn’t use metadata interrogator as I’m not quite happy it’s stable enough (although hopefully that will change soon!) and the course itself is much wider in scope than just gathering metadata. You also get a fancy certificate.
The training course covers (amongst other things):
PDF, MS Office and Image file specific analysis
File Signature analysis/magic numbers
Best practice of evidence handling (Hashing/storage)
You may or may not have noticed that there are tons of MakerNote fields that come up when photos are analysed. Some of these are followed by a descriptor, some just have something like 0x0002 and then a jumble of numbers and letters as the result. Whilst it sounds unlikely, this isn’t my awful programming causing this.
The MakerNote fields are custom fields allowed within the Exif standard that allow device manufacturers to store whatever they want in them. For some reason, these fields are also a jealously guarded secret by companies – some just aren’t listed, and others are encrypted. Even with my biggest and shiniest security hat on I can’t really understand what they could be storing in them that requires such security – I don’t know what you’d need to store on a file that would give insight into anything proprietary or that would have security implications.
Various efforts have gone into trying to decipher what these fields mean (and the ones known are included in Metadata Interrogator) however some are very difficult to guess. If there’s anyone out there that wants to team up on trying to decipher the fields of common devices, please get in touch.