I’ve added in a new passport MRZ resolver as it’s been quite a requested function – most of the ones available are online only and whilst I’m sure they don’t scrape any data, it’s always good to be careful.
IPTC and PDF metadata was a little bit lacking previously – it was getting the main fields but getting stuck on some of the (admittedly less useful) fields. It should gather it all now. I’ve also made MI a lot better at grabbing creation type dates (still be careful with this!)
v0.7 – Callahan
Passport MRZ resolver. This is extremely alpha, but was requested by a few companies using MI as most tools for that are online.
IPTC metadata now displays properly.
All fields added/calculated by MI (and not part of the metadata) now start with a *
MI tries harder to find a creation date – still be careful with this.
More metadata from PDF files, and removed some useless values.
I’ve been doing Open Source Intelligence (OSINT) gathering professionally for quite a while now – back from the days where it was just a GeoCities check to see if the subject had been ridiculous enough to create a page dedicated to their criminal enterprise (surprisingly/unsurprisingly, this was very common). However, I’ve only recently shuffled my way into the OSINT ‘scene’ which has popped up on reddit/twitter/forums. It’s great to see that there’s a network of people passionate about the subject area and there is a lot of great sharing and caring going on.
However, as with any community, a lot of buzzwords/phrases creep in, it starts to become a ‘club’, and newcomers flock in with wildly varying levels of experience. It’s great that things are opening up as it all adds new perspectives, but since getting involved I’ve seen a lot of people posting methodologies and suggestions which are…well pretty much just illegal. There’s no way to sugar coat it, and it doesn’t matter what jurisdiction you’re in: some of what is being shared as ‘OSINT methodologies’ fall directly into either harassment, stalking or fraud.
The main offender for this is ‘Social Engineering‘.
Social Engineering – what it is.
The Wikipedia entry for Social Engineering (the best we can get to the current consensus for the word) is:
For anyone unsure of what Social Engineering is (which seems to be a lot of people), this video is the single best explanation:
As you can see, it’s basically lying to someone to get what you want via invoking the most holy trinity of the BLT. This is otherwise known as fraud.
Here’s the UK and the US definitions of fraud in case anyone has forgotten:
Fraud act 2006 (Section 2)
[A person commits fraud if they make…] a false representation, dishonestly, knowing that the representation was or might be untrue or misleading, with intent to make a gain for himself or another, to cause loss to another or to expose another to risk of loss.
US Code 18 (it’s a bit trickier in the US as there are many laws to choose from which cover it):
Whoever falsely and willfully represents himself to be a citizen of the United States shall be fined under this title or imprisoned not more than three years, or both.
(Many more under Chapter 47, Chapter 63 and stated cases)
Lets take the UK definition and apply it to our man Crash Override. He commits a false representation straight out by saying he’s “Mr Eddie Vedder from accounting”. He knows this is untrue and misleading, and he does this for gain (to get access to the modem number). It’s pretty straight forward.
As you can see, the definition of social engineering is just fraud by another name. Now some might start arguing that ‘manipulation’ doesn’t have to involve fraud, but I honestly can’t think of a ‘manipulation’ which wouldn’t be fraudulent in some way by most legal systems.
That isn’t what *I* mean though.
In computer security discussions, the term ‘social engineering’ is well understood – it’s the phishing scams and ransomware attacks. However this term which most people seem to understand in the context of compsec, somehow seems to get distorted when we talk about OSINT – I’ve seen posts with things like ‘if that doesn’t work try a bit of social engineering to see if you can find out x’ or ‘I couldn’t find out anything online so I used social engineering to get what I needed’.
Now I don’t think that people are quite recommending ransomware or similar – it’s more likely one of the below:
The writer doesn’t really understand what ‘social engineering’ is and just uses it as a buzz word for anything from adding a subject as a friend on Facebook to holding their spouse hostage.
They’re using it to reference social media OSINT methodologies.
The writer doesn’t want to say ‘lie to them to get what you want’.
Now the first one is a distinct possibility – we’ve all heard countless people use phrases in that sort of clunky, ‘I’ve-only-heard-this-at-a-conference’ way and I feel that a lot of people are using the term Social Engineering to sound a bit more ‘exciting’. It certainly sounds cooler than ‘…and then I looked at his Facebook feed until my eyes turned into sandpaper-y cubes.’
The second I think is mostly a way for OSINT practitioners to flag up that they do ‘social media stuff’ as well as Experian checks. I’ve been to a number of conferences recently where a worried director exclaims ‘won’t someone think of the social media platforms!’ after too much talking about any other type of OSINT service (or whilst the speaker is just taking a breath).
More than that, I think some ‘OSINT evangelists’ are also trying to push such language in a marketing sense as well – the whole ‘we’re willing to go to the very edge of legality/I can kill a man with my intersects alone’ vibe sells contracts unfortunately.
I know. Why is this important?
Mostly because I feel there’s still that misunderstanding that using the word OSINT makes you somehow exempt from the usual rules. It’s the same as the ‘if it’s on the net then I can do what I like with it’ misconception.
Everyone who has been involved in this field in a professional setting knows that isn’t the case, but unfortunately a lot of newcomers seem to believe that there’s some sort of magical get-out-of-jail-free card available under the umbrella of ‘doing OSINT’. It’s always been a problem, but using ‘criminal’ terms like social engineering in a ‘valid tactic’ sort of way starts to muddy the water more than ever.
This is compounded by a mix up of what is acceptable when hired to do security/pentesting, and what is acceptable without such a contract – this may seem obvious to some, but for newcomers it’s not. They read @OSINT_BLACK_OPS_SN1PER_HACK3R tweet ‘wasn’t getting anywhere on my new contract so did a bit of social engineering and now I’m the CEO’s dentist’ and think ‘I’m learning how to do OSINT gathering! I’ll social engineer my local gardening club and see what Margery is really up to!’
Instead of the 50k bonus and as much mouthwash as he can swig for the rest of his life, our intrepid newbie ends up having awkward bedtime chats with cellmate and fellow stalker Profusely Sweaty Greg, all the while wondering why the magical shield of ‘just OSINT’ing’ didn’t protect him.
For some, this whole post will seem all very patronising and obvious – to others it will seem like pedantry. I understand this, and I’m not trying to say that we should jealously guard our secrets or make the community any less welcoming to newcomers, but for those who are experienced I’d really like to ask you to try to share your knowledge responsibly and throw in a quick comment then next time you see someone getting told to ‘do a bit of social engineering’ or similar.
Version 0.6 has been a bit of a turning point for the project – I’d say that Metadata Interrogator now does what I wanted it to do when I set out to create it. I wouldn’t say it’s reached ‘v1.0’ yet, but it analyses a whole raft of metadata above just the usual EXIF data, it has a useful timeline and highlights some interesting data both in comparison between files and across a whole data set.
As it’s got to that stage, I don’t think it’s necessary to do quite so many small iterations, and so the move between 0.6 and 0.7 is going to be a big one and take some time to do. The below is a sort of road map on where I want to take the project.
Performance – This is the big one, currently it takes far too long to load up and analyse files. To try to get better performance (and maybe a smaller file size?!) I’ll try to port it across to Nuitka (https://nuitka.net). I’ve heard fantastic things about it, and I’m sure it’ll improve things no end. Early tests have not ended well however, so time will tell if I can gather together enough goat entrails and black candles to make this work.
Release on Mac and Linux – I’m holding off on this till I sort out Nuitka, but it’s a main priority after that.
Installer version – Whether this is necessary depends on the gains I get from Nuitka – if they’re impressive I’ll keep it portable, however if not I’ll add a version that can be installed which will give faster access for regular users.
Pre-computed Analysis – One thing I’m very interested in doing is doing analysis on large data sets of files to work out what ‘normal’ metadata drift on a single device looks like. As in, how many (and which) fields change as one smartphone takes 100 different pictures. This will then be built into the software to highlight where the differences aren’t ‘normal’.
Overhaul the UI – The UI works, but it’s still a bit janky in parts. This isn’t a top priority as it all seems to work roughly fine for me, but it’s not ideal.
As always, any comments or suggestions would be great.
This release is a particularly big one (like the last) and has some major changes in it. Firstly, I’m now using matplotlib for the analysis timeline, so it looks much, much better and allows for things like colour coding. Whilst it does bring with it a small performance drop, I’ll be using it for more analysis graphics in the future so I think it’s worth while. I’ve also made a lot of improvements to the data set analysis and match analysis functions.
This will probably be the last incremental release for a while – I’ll be publishing a road map shortly which shows what’s next on the agenda, but I feel it’s in a state now where Metadata Interrogator fulfils all of my original needs.
Migrated the timeline to MatPlotLib – this slightly slows down the loading time, but I think it’s well worth it and will be used for more analysis graphics in the future.
The timeline has been drastically improved, with a much better layout and ability to handle high numbers of files.
Colour coding for different file types has been implemented on the timeline .
The data set analysis function has been improved, and now lists file types in the set.
The file comparison function has been improved to more accurately show differences.
Minor performance improvements across the board, especially in analysis times.
This release of Metadata Interrogator is a bit of a turning point – it’s a lot more stable now and I feel that most of the bugs have been ironed out. I’ve also started on the ‘Data set analysis’ feature – this analyses all the files currently in the data table and looks for things of interest. At the moment it’s very basic, but I’ll be expanding it soon.
As always, if you have any requests for features in this exif and metadata analyser, I’d be very interested to hear them and then complain about how difficult it would be to implement them.
Basic Data Set Analysis; currently extremely basic, but it’s being worked on.
The only good bug is a dead bug, and I’ve squashed a lot of bugs.
New fancy icons.
MI should throw up more errors if you try to do something…erroneous.
Some very minor performance improvements.
Get it here: https://github.com/globalcrimadmin/metadata/releases/download/v0.5/Metadata.Interrogator.v05.exe
In version 0.4 I’ve included a basic timeline functionality I’ve added this feature as a quick way to visualise the creation dates of a large number of files, and show up any gaps in the dates.
I created this due to three things I discovered during my time working in counter-fraud:
Fraudsters often use templates created months/years ago as the basis for verification documents.
Fraudsters will often forget to change/miss a date or two out of their submission documents.
It’s easy to overlook ‘wrong’ dates when scanning through dozens of files.
The idea is that this feature will allow you to quickly notice any file dates which are far out of your expected norm, as well as if there are lengthy gaps between files. Hopefully that will help you identify any with odd times attached to them, but be aware that metadata/exif sometimes isn’t very reliable with recording times and dates, so be careful how you use the information.
One final thing to note is that I’ve stopped files with the date and time of ”1980-01-01 00:00:00′ from showing up – this seems to be a Microsoft default date and was skewing a lot of timelines.
If you have any ideas of how this feature can be improved, feel free to send me a note on the contact page.
This release of Metadata Interrogator is mostly bug fixes and performance related – take note of the decision to remove the ‘default’ date that appears on some files (mostly Microsoft Office ones). This was skewing a lot of timelines and added nothing to the analysis – if there’s demand for it, I’ll add in an option to keep them.
Fixed files with no date information crashing timeline.
Removed files from timeline with Microsoft ‘default’ date (1980-01-01 00:00:00)
This release includes basic timeline functionality – it’s useful in a number of scenarios, but mostly to highlight if there were lengthy gaps between two files being created. This version also includes much better match analysis – hopefully it’s of more use now, as it analyses more exif fields and gives you a better comparison. I’ll keep improving both of these functions in the coming weeks.
Basic Timeline Analysis!
Much improved Match Analysis.
User feedback when processing files (so you know it hasn’t frozen!).
Ho, Ho, Ho, Now I have a machi…Happy Christmas ! The best Christmas present of all has come:
Version 0.3 (named McClane – obviously) is released with two new, major features!
First and foremost, I’ve added in the Hachoir metadata libraries – these drag out even more metadata on a wider variety of files. Pretty much every major file type is now supported, with metadata on EXE’s, Videos and Audio files as well as all the previous ones. There may be some duplication in fields, which I’ll whittle down in future releases.
The second feature is basic comparison functionality to see what the differences are between two files. I hope to improve on this substantially in future releases, but it’s a start.
I’ve also cleaned up the UI substantially now that we know we’re sticking to the Pandastable version (see Trimming Down the Metadata Interrogator) and now that I’m happier with how it’s going I’ve laid the groundwork for easier expansion.
Added Hachoir parsing!
Rudimentary file comparison
Added Settings panel to help adjust the layout better.
Squashed a lot of bugs.
GUI resizing works properly now.
A few more (hopefully descriptive) error messages instead of it just quietly not working.
Complete separation of GUI and analysis in the code.