How a Linux Server Gets Turned into a Zombie
A very techie forensic analysis, but interesting.
A very techie forensic analysis, but interesting.
merkelcellcancer • August 16, 2007 1:51 PM
I thought Macs and Linux boxes were somewhat safe from this type of activity.
Jojo • August 16, 2007 2:00 PM
No, they are not…depends on your administration abilities (update frequency) and the default settings of your provider.
Nicola • August 16, 2007 2:20 PM
As a linux domestic user (and a previous windows user), I must say that a linux workstation is far more secure of a windows one (and not only because linux has a 3-4% share in the desktop user)… but this refers most to viruses, worms and trojan, while in this case the server was directly attacked by an intruder and, as stated by jojo, it depends on your abilities, on your isp’s policies and, I add, on the intruders’ abilities and determination…
But this case is about a server, a segment where maybe the difference between windows and linux are littler (linux-desktop has policies similar to the server versions, but i hope that windows servers have tighter rules!).
nzruss • August 16, 2007 2:27 PM
you can make the most secure operating system in the world as insecure as you like, either through the wrong settings or poor administration.
I’d have to say that in my experience the DEFAULT settings on major Linux distributions are more ‘secure’ (i.e no/less open ports by default) and limited user accounts by default (rather than Admin/Root by default.)
But again, if you’re running a server exposed to the internet, people will prod it. If you don’t need to serve the internet, get a NAT router (and configure it properly) to hide behind.
Tremaine Lea • August 16, 2007 2:34 PM
Interestingly, the rootkit/tools appear to remain accessible.
Alan • August 16, 2007 2:50 PM
The whole directory is full of script-kiddy goodness.
More things to add to my rootkit detection scripts.
Carlo Graziani • August 16, 2007 2:51 PM
Setting syslog to log remotely to some fortress machine — an OpenBSD box, or Linux running minimal services and with no user other than the sysadmins — can be a useful forensic tool for this kind of situation. Deleting or modifying files in the local /var/log won’t compromise the audit trail, and even modifying /etc/syslog.conf is risky, since changes in logging cadence at the fortress box might be noticed. Certainly, the initial break-in would be much harder to conceal.
antibozo • August 16, 2007 3:27 PM
That writeup is a great example of how not to do this kind of investigation.
First thing after determining you’re hacked: collect last used times of all objects in filesystem, followed by last modified and inode change times (or all together, using an appropriate tool). Only then should you start looking at things; otherwise you won’t be able to construct a timeline of the intruder’s activity. Once you start poking around the /var/.x directories you are throwing away important timestamp information, which can help you a lot in determining the full extent of your compromise if, for example, the intruder installed a keylogger or trojaned ssh client. Ideally, you also conduct your investigation as unobtrusively as possible until you’re ready to actually contain the intrusion; tipping off the intruder that you know about the intrusion is an invitation for some of them to dd if=/dev/zero of=/dev/sda your system. Similarly, wgets and DNS lookups should be performed from an unrelated system as this activity may be monitored.
Nix • August 16, 2007 4:02 PM
Also: if you can, unplug your box from the net to minimize the damage it can do; then image its block devices and do the analysis on the imaged devices, mounted read-only through a loopback filesystem or in a virtual machine if possible. (The writeup author was using a virtual machine for a lot of this, apparently.)
(And, Bruce, I’m disappointed. A `very techie’ writeup would have dissassembled the binaries and worked out what they really do. 🙂 )
antibozo • August 16, 2007 4:17 PM
Nix> Also: if you can, unplug your box from the net to minimize the damage it can do
Depending on the compromise, there are scenarios where this is not a good idea. It’s advisable to know the scope of the compromise before you do anything that will tip off the intruder. If the intruder got onto ten other boxes using knowledge from the one you know about, the damage resulting from taking the one you know about off the net can be much worse (e.g. intruder decides to wipe out the other ten boxes) than anything the intruder actually does on the one box in the time it takes you to figure out where else he was able to get to.
Brandioch Conner • August 16, 2007 4:56 PM
The problem is that ANYTHING you do could alert the intruder.
Search for atimes? That would alert him.
ls -R? That would alert him.
Following that, you couldn’t even reboot a router. Or shut the box down to swap a power supply.
Christoph Zurnieden • August 16, 2007 5:25 PM
Depending on the compromise, there are scenarios where this is not a good idea.
There are no such scenarios as long as you do no official forensic analysis with all the necessary judicial permissions.
Tha largest problem is that you don’t know what your Zombie does—it might publish copies of the american constitution or child pornography; the first one might get you to northern Cuba, the second will get you in some way more serious trouble.
So, if you think that you have been cracked: pull the plug immediatly! Either litterally or by overwriting (‘/bin/rm’ might be compromised, ‘echo’ is a buildin in most shells, which might also be compromised) and deleting the kernel and rebooting. It might even be necessary to call the police that one of your server has been cracked (you may use the term “hacked” instead, it’s more common to the public) and that you can’t stop it. That might be helpfull in case of a prosecution.
Because it is sometimes very complicated if not impossible to determine the exact time of the breach it is very complicated to determine which backup is the last correct one.
One way to avoid that trouble is the full backup of all binaries in use for every backup or better to keep the cryptographic hashes of the binaries on the backups and compare it with the original hashes every time you do a backup. The latter will alarm you if something, eh, curious happens. At least if nobody found a way to circumvent it or, more probably, you messed up the implementation.
antibozo • August 16, 2007 5:31 PM
Brandioch Conner> The problem is that ANYTHING you do could alert the intruder.
Obviously. But taking the box off the net is a lot more likely to alert him than ls -lu, especially if just before that you do a bunch of nslookups and wgets from the compromised host.
In addition, there are other reasons not to disconnect the box from the network. For one thing, you may be unable to collect all the information you need about the intruder’s network connections before they time out. For another, if you have NFS mounts on the box, taking it off the net impairs your ability to collect system information since things start hanging.
The general rule is to be as unobtrusive as possible until you’ve sussed out the full extent of the compromise and are ready to contain, then contain all the compromised boxes simultaneously so you’re not playing whack-a-mole with the intruder. The major exception to this is if you have a web defacement–in that case, the intruder expects you to take the box off the net, and the PR harm usually warrants doing so as quickly as possible.
Brandioch Conner> Following that, you couldn’t even reboot a router. Or shut the box down to swap a power supply.
Indeed, if you have a mass compromise, those are risky activities, and even if the intruder doesn’t notice, you may lose important information about the compromise. But if you know you have a compromise, dealing with it is usually the highest priority, so you shouldn’t be swapping power supplies. If you don’t know about a compromise, you just go about your normal maintenance activity.
Lawrence D'Oliveiro • August 16, 2007 6:26 PM
A machine belonging to a client of mine was cracked in a similar way several years ago. The giveaway was also similar, through the “ls” command: the cracker had created a directory in /home, which didn’t show up with “ls /home”, but did with “ls /home/*”. In those days, Red Hat installations tended to have all services enabled by default, and they got in through a vulnerability in RPC (port 111), which we weren’t even using.
There’s still the mystery of how they broke in with this attack. Though, as soon as anybody says “PHP”, I tend to shout “There’s your problem!” 🙂
Brandioch Conner • August 16, 2007 6:42 PM
It looks like you’re confusing a few different issues.
#1. Identify which machines are cracked.
#2. Identify the code on the machines that the cracker put there.
#3. Learn more about the cracker’s existing network.
#4. Protect yourself from his reprisals.
In my experience, 99.99% of the time the cracker doesn’t care whether the machine goes off-line. He’s managing thousands of them. He doesn’t even care if one is dis-infected. Or a dozen. Mostly these are Windows machines spewing spam, hosting files or part of a DDoS.
In the other (very rare) cases, he’s looking for something specific. In those cases he would care whether you started doing anything unusual.
You can’t say that he would look for A & B & C & D … but not E. At least you cannot say so without providing some explanation on WHY he would take some steps but not others. They’re all about the same difficulty. He’s already cracked the machine.
“But if you know you have a compromise, dealing with it is usually the highest priority, so you shouldn’t be swapping power supplies. If you don’t know about a compromise, you just go about your normal maintenance activity.”
No, you didn’t understand. My example was that, given your scenario, even if the admin was NOT aware his machines had been cracked, his regular maintenance routine COULD cause the cracker to retaliate as you describe.
Now, how many stories have you heard where someone rebooted a firewall and suddenly 10 of his servers were wiped? I haven’t heard any.
antibozo • August 16, 2007 6:45 PM
Lawrence D’Oliveiro> they got in through a vulnerability in RPC (port 111), which we weren’t even using.
That may have been the NFS statd vulnerability which accounted for a lot of Linux compromises in Red Hat 6.x days. See this URL for discussion and exploit code:
That’s not a vulnerability in RPC per se; it’s a vulnerability in an RPC service–the NFS status daemon. RPC is a protocol and set of libraries for performing remote procedure calls.
Someone • August 16, 2007 7:13 PM
Oh goodness, here we go. Only half an article there: as others here have noticed, the consideration that by interfering you’d alert the kiddie is another factor.
Other things to consider: if you’ve only just noticed how one insecure service you’re running got cracked months ago, who’s to say that’s the only kit installed? It needs reiterating: there is no chance of repairing this box, it needs a full imaging for forensics, wipe & reinstall.
Why do these black-hat idiots not work off proper reasonably uptodate sources anyway? I’ve detected a cracked box because its netstat didn’t understand -p, that’s par for the course; but here we have an ls that’s a million miles from ls, and “smbd” running with -a, an option which the real thing does not have. Crappy disguise.
There’s nothing particularly imaginative in this writeup. Some years ago when I was more into this kind of thing, I knew of articles talking about injecting kernel modules (even in non-modular kernels) and the ability to divert exec*() calls to different inodes than open() calls – think about it, you could md5sum the binary and it would show up clean but still execute a nasty instead.
Our narrator here is nothing like paranoid enough; are you?
antibozo • August 16, 2007 7:40 PM
Brandioch Conner> It looks like you’re confusing a few different issues.
The issues you list are indeed various issues to consider. I am not in the least confusing them, however.
Brandioch Conner> In my experience, 99.99% of the time the cracker doesn’t care whether the machine goes off-line.
With the Windows botnets, perhaps, but not with the far less common Linux compromises, where, in the very least, the intruders generally try to get back in over the next few days after containment.
Brandioch Conner> In the other (very rare) cases, he’s looking for something specific. In those cases he would care whether you started doing anything unusual. You can’t say that he would look for A & B & C & D … but not E.
If you’re talking about how skillfully the intruder can deduce that I know about him, yes, that is difficult to gauge a priori, but after doing the basic stuff unobtrusively one can form some judgments based on the types of tools one finds. It takes a fairly highly skilled intruder to figure out that you’re onto him if you’re reasonably quiet about it and don’t name your tool “collect-forensic-intrusion-info.sh”. But when the box goes off the net, even the clumsiest intruder gets a thwack from the clue stick.
But yes, in the worst case, a hair-trigger paranoid intruder might wipe the box when you log in to check on disk space, even though you don’t know about the intrusion; there’s not much you can do about that, but dealing with intrusions is always a case of making the best out of a bad situation.
Brandioch Conner> No, you didn’t understand. My example was that, given your scenario, even if the admin was NOT aware his machines had been cracked, his regular maintenance routine COULD cause the cracker to retaliate as you describe.
I did understand, and I agreed with you. The discussion here is about what to do when you know you have a compromise, not what to do when you don’t know.
Brandioch Conner> Now, how many stories have you heard where someone rebooted a firewall and suddenly 10 of his servers were wiped? I haven’t heard any.
And of course you wouldn’t hear any such stories, because when a gang of systems goes south all of a sudden in the normal course of business, people don’t usually assume it’s because of a mass compromise. When all the compromised boxes are toast, it’s not easy to tell that a compromise even occurred, and very few admins will even think of looking; instead they’ll schedule a maintenance call with their UPS vendor.
I can tell you that such scenarios occur, though I don’t know of one with the level of caprice in your example. The point, however, is not to attempt direct combat with someone if you don’t know where he is, why, or how he got there. Your best bet for effective containment is to work as quickly and unobtrusively on the system as possible to collect all the information you might need, then analyze that on another system without mucking with the compromised box in any obvious way. The knee-jerk pull-it-off-the-net strategy rarely helps anything (if there’s a DDoS going on, or, as I mentioned earlier, a defacement, it may be warranted; otherwise it usually doesn’t really harm the attacker as you yourself pointed out, and, for all you know, he’s been there for three months, so what’s another few hours?), definitely impairs your ability to find out what he’s doing and how he got there, and may result in reprisal, so what’s the point? In those cases where the box you know about really is the only box that had the vulnerability, you may luck out. Otherwise, they’re either already in on other boxes (maybe the one you know about was a secondary compromise), or worse, you have a common vulnerability on a bunch of other exposed systems and he’s in on those in full knowledge you detected him, while you’re off playing with the one box you took offline.
What you want to do is find out the compromise vector, find out the scope of the vulnerability and the compromise, and contain. Pulling one box off the net is not an effective strategy for this goal–no more so than shooting the first soldier you see in the forest is a good way to combat an unknown number of soldiers nearby. Superior reconnaissance can win the battle with far less destruction, and reconnaissance requires stealth and finesse.
xrey • August 16, 2007 8:06 PM
Regarding pull-the-plug vs. don’t-pull-the-plug:
If a sysadmin who I’ve hired to manage my data realizes that a box is compromised and doesn’t pull the network connection immediately, that sysadmin should be out of a job.
The cracker may have stolen some of your data–that’s why you need to immediately image the drive and do forensics later–but there’s a good chance the cracker has not stolen your data yet. Better to not lose any more.
Networks go down all the time, so a cracker isn’t going to start reprisals just because he can’t get to your box temporarily. If that was the standard MO, we’d see zombies being formatted on a regular basis right after network downtime. How often does that happen?
In this case, the cracker was most likely interested in a zombie rather than the contents of the machine. However, once you start trying to back trace to the cracker, he now has some incentive to perform real damage.
Maybe the best solution is to have a cloned honey-pot ready to go. From the outside, it will look and act like the compromised machine, but all the sensitive material should be replaced with something innocuous.
antibozo • August 16, 2007 8:55 PM
xrey> The cracker may have stolen some of your data–that’s why you need to immediately image the drive and do forensics later–but there’s a good chance the cracker has not stolen your data yet. Better to not lose any more.
Actually, the probability you’ve lost data, and how much of it, is completely unknown to you at the time of discovery. If he was there for three months, again, a few hours don’t make any difference.
In your scenario, if indeed the compromised box is the only box that your supersecret data on it, maybe your argument has merit. Otherwise, the moment you pull that plug, the race is on between your identifying and containing every other compromised host and the intruder’s stealing every other secret he has access to but hasn’t had time to collect yet. Once you start that race, the intruder has an incentive to steal as much as he can in the shortest possible time. Before the race, however, he’ll work at his own pace while trying to stay under the radar, and it’s to your advantage to keep him in that mode if there’s any possibility he’s into other boxes you don’t know about.
Knowing this, a reasonable person will start the race at a time of his choosing, after assessing the situation, not before. The trick is to start and end the race at the same moment, which you can do only if you know the full extent of the compromise. Doing this properly involves setting up systems to cover critical services on compromised hosts ahead of time, then taking all of the compromised and vulnerable hosts (you have to know the vulnerability to know which these are) offline at the same moment, pointing the remaining boxes at the new critical service systems, patching the vulnerable systems that weren’t compromised, and quarantining the compromised ones for recovery, usually involving a complete reinstall. Note how dramatically different this is from pulling the network cable on one box.
xrey> Networks go down all the time, so a cracker isn’t going to start reprisals just because he can’t get to your box temporarily.
The sort of intruder who will toast your system can also tell the difference between your network dropping for ten minutes at 0200 for an IOS upgrade, and the single box that he saw root log into five minutes ago suddenly becoming unreachable. But really, threat of reprisal (and it’s not even reprisal–it’s ass-covering) is only one of many reasons to forgo the knee-jerk route (or unroute, ha ha), and instead invest some time and energy toward understanding the situation before reacting. Knowledge enables you to take an advantageous, offensive position before the combat starts, rather than end up in a defensive position because you attacked blindly without learning your enemy’s position first.
xrey • August 16, 2007 9:28 PM
Actually, if there are more boxes with supersecret data, then you have to pull those too. Depending on the size of your institution, that may involve pulling a lot of network cables. It sucks, but I’ve been there.
You have to do it simultaneously and immediately. You need to prevent the cracker from entering any of your boxes with backdoors that he’s had months building.
If your institution can’t pull everything, then at the very least, you should pull all the machines listed in /etc/hosts and any machine with of the same user accounts listed in /etc/users. You have to assume that the cracker has used the compromised box to learn more about the interior of your network, and you have to assume that the cracker may have intercepted user accounts and passwords including root passwords.
Depending on how much time your institution wants to spend doing forensics, at the bare minimum, you must change all root and user passwords for all the boxes after they have been scanned for compromise.
Well, I won’t tell you, personally, what to do, antibozo. You can play amateur detective with your own boxes all you want. However, if you are administering someone else’s machines, good luck explaining why you left compromised machines vulnerable for “another few hours,” during which time the cracker could do anything he wanted.
The cracker might be a script kiddie or it might be a h4xxor g0d, so it’s better to not give him a chance to act. Physically removing the machine(s) from the network is the only way to be sure.
antibozo • August 16, 2007 9:58 PM
xrey> Actually, if there are more boxes with supersecret data, then you have to pull those too. Depending on the size of your institution, that may involve pulling a lot of network cables. It sucks, but I’ve been there.
I see. So you don’t work at an organization that has requirements about downtime tolerance. I suppose you’d take all the potentially compromised systems off the network immediately if they were, say, controlling a city-wide electric power grid?
xrey> Well, I won’t tell you, personally, what to do, antibozo. You can play amateur detective with your own boxes all you want. However, if you are administering someone else’s machines, good luck explaining why you left compromised machines vulnerable for “another few hours,” during which time the cracker could do anything he wanted.
I’ll ignore the tone of that comment insofar as to note that I don’t have any difficulty explaining the logic of my procedures to my clients, and to inform you that nothing I am saying is particularly iconoclastic. I think you’ll find, if you look above the “amateur detective” class, that there are people who understand that this field is far more complex that can be handled with an always-pull-the-network-cable approach. As I’ve intimated, there are cases where that’s appropriate, and there are cases where it isn’t. Yes, most Windows virus infections may be in the former class. Most Linux compromises are not.
xrey> Physically removing the machine(s) from the network is the only way to be sure.
Sure of what? In cases where someone left a logic bomb, physically removing the machine from the network may trigger a system wipe. Meanwhile, if I lose my chance to find out what IPs the attacker was last working from by pulling the network too early, I’m throwing away valuable information I could use to identify other compromised hosts by inspecting firewall logs.
Nothing is certain, but knowledge tends to triumph over ignorance. Procedures that emphasize knowledge gathering facilitate intelligent incident response. One-rule-fits-all, knowledge-free strategies may work in a lot of cases, but when they fail, they fail spectacularly. In some organizations, this results in people dying, so some of us are willing to do some extra work to avoid that.
Brandioch Conner • August 16, 2007 10:57 PM
“If you’re talking about how skillfully the intruder can deduce that I know about him, yes, that is difficult to gauge a priori, but after doing the basic stuff unobtrusively one can form some judgments based on the types of tools one finds.”
“Sure of what? In cases where someone left a logic bomb, physically removing the machine from the network may trigger a system wipe.”
Yep, you’re arguing against anything other than your position by assuming that the cracker has taken specific steps to maximize the damage in anything you disagree with …
… while simultaneously claiming that the cracker would NOT have done anything similar in the scenario you are advocating.
Meanwhile, in the real world and in the real world …
“And of course you wouldn’t hear any such stories, because when a gang of systems goes south all of a sudden in the normal course of business, people don’t usually assume it’s because of a mass compromise.”
Actually, you would. Individual machines can fail…
A bad update can be applied to a bunch of machines…
A bad machine can flood the network making it appear that a bunch of machines are down…
But having a bunch of machines suddenly wipe themselves? That would be a story and a half.
You might want to familiarize yourself with Bruce’s “movie plot threat” concept. It’s what you’re practicing right now.
Meanwhile, the advice to follow best practices still stands. Disconnect the machine. Image it or whatever later, but disconnect it first. The best admins will not concern themselves with “movie plot threats”. They’ll follow best practices instead.
antibozo • August 16, 2007 11:41 PM
Brandioch Conner> you’re arguing against anything other than your position by assuming that the cracker has taken specific steps to maximize the damage in anything you disagree with
Actually, I’m arguing against assuming anything specific about what the intruder has done, and arguing for observation before action. But you’d have to read what I wrote to understand that, and note that it was a response to an absolute statement (“it’s the only way to be sure”) that is demonstrably false.
I’m also arguing against panic-driven autonomic response, by the way. Again, what horrible thing do you think happens between time [compromise + x] and [compromise + x + y], where x is unknown and y is time to gather knowledge of the situation, that necessarily outweighs the value of the knowledge gained during y for orchestrating an instantaneous containment? Why do you think it always correct to pull one box off the network before you’ve even figured out how it was compromised, when a hundred other boxes may have the same vulnerability?
Brandioch Conner> But having a bunch of machines suddenly wipe themselves? That would be a story and a half.
It would, if the sysadmins who observed it had the skill to even think of it. Most sysadmins need to be specifically trained to consider compromise as a possible cause of bad system behavior. Their natural tendency is to assume an environmental (“power surge”) or common software cause (“NFS problems”), because that’s the usual case (as far as they know). I’m not talking about the Brandioch Conners of the world, here; I’m talking about the other 98%. Yes, for them, the best advice might be to pull the network cable, leave the facility, and go get a new job as a crane operator.
Brandioch Conner> You might want to familiarize yourself with Bruce’s “movie plot threat” concept.
If you want to drag Bruce into it, you might want to familiarize yourself with his “focus on intelligence gathering, and don’t defend blindly against specific tactics because the attackers will adapt” advice.
Pat Cahalan • August 17, 2007 12:27 AM
@ Brandioch, Antibozo, xrey
re: first response
I think you’re arguing way past base cases, my friends.
If someone cracks into my personal *NIX box, I’m going to unplug the thing from the network and the wall. Then I’m going to yank the hard drive out of the machine, plug it into another box, fsck it and examine it if and when I decide I feel like a good murder mystery. You can leave it up if you like, but I don’t believe overmuch in performing forensics on a live machine, since you already don’t know what it’s doing. You may learn something interesting, you may not, but whatever it is doing it’s going to keep doing while you’re digging around in its guts.
If you’re dealing with a mission critical box or something that has proprietary, secret, or private data on it, you should already know what the hell you’re supposed to do (and you should have already communicated that to any stakeholders that may be affected by downtime). In most cases, I think a reasonable security plan would say, “Kill the box immediately, and anything it has been talking to”.
I suppose you’d take all the potentially compromised systems
off the network immediately if they were, say, controlling a city-wide
electric power grid?
If I don’t have an emergency response plan that considers that I may have to forcibly shut down a major control center (or have it shut down for me), for whatever the root cause, and I’m responsible for something as important as a city power grid, I’m going to quietly go work on my resume. If I had a plan but it was rejected, I’m digging up the documentation of who rejected it and why, printing it out, and making sure that leaves with me in case I can’t get back in the building later.
Meanwhile, if I lose my chance to find out what IPs the attacker was
last working from by pulling the network too early, I’m throwing
away valuable information I could use to identify other compromised
hosts by inspecting firewall logs.
I’ll have to disagree with you here on this one, AB. Someone who’s going to build a self destruct is probably going to do a good enough job of covering his/her tracks that you’re not going to get much out of the machine, forensically speaking. Since the box is untrusted, you can’t rely on the logs anyway (most rootkits I’ve seen turn off logging as the first matter of course anyway). If you don’t have some method of analyzing the network traffic outside the host itself, like router logs, it’s probably better to shut the beast down.
And, if they’re not good enough to wipe tracks properly, you’ll get plenty of dirt off of examining the contents of the drive and the swap file while doing your examination using a known trusted OS platform.
Chris Rutherford • August 17, 2007 2:42 AM
Bruce, what has happened to you? First you join up with BT and then you start using derogatory words like ‘Techie’ to describe intelligent people. As far as i’m concerned the article you commented in isn’t in the slightest bit ‘Techie’.
antibozo • August 17, 2007 3:53 AM
Pat Calahan> If someone cracks into my personal *NIX box, I’m going to unplug the thing from the network and the wall. Then I’m going to yank the hard drive out of the machine, plug it into another box, fsck it and examine it if and when I decide I feel like a good murder mystery.
I’m reluctant to go into specific tactics for a specific yet vaguely defined scenario for a number of reasons, but here are a few thoughts about that course of action:
Pat Calahan> You can leave it up if you like, but I don’t believe overmuch in performing forensics on a live machine, since you already don’t know what it’s doing.
Well, I wouldn’t leave it up indefinitely, but I’d certainly take the time to identify everything I reasonably could about live network connections before yanking any network cables. And I wouldn’t crash it until I got everything useful I could out of the live system, and believe me, that’s actually quite a lot, because rootkits generally suck.
Pat Calahan> In most cases, I think a reasonable security plan would say, “Kill the box immediately, and anything it has been talking to”.
In most cases, it depends on the box, the function, and the type of compromise. Take a look at the NIST incident response documents (e.g. 800-61, 800-86) for some idea of the range of possibilities here. In most scenarios, a partial network containment would be initiated early on, but there are always exceptional circumstances, and every incident is unique.
Pat Calahan> If I don’t have an emergency response plan that considers that I may have to forcibly shut down a major control center (or have it shut down for me), for whatever the root cause, and I’m responsible for something as important as a city power grid, I’m going to quietly go work on my resume.
I think you erroneously assume here that the incident response team, or any computer security professional for that matter, ever had any input into the emergency response plan for the compromised system. Incident response teams are like firefighters; they’re called in when there’s a fire. They aren’t consulted when the facility was built or the fire detection and suppression system was installed; they’re just expected to deal with the fire when it happens, and in many cases the people who built the facility didn’t follow the fire code, if you will. Few organizations have an inspection regimen for software and systems that parallels the physical inspection that covers facilities, so it’s way worse for response teams than for firefighters. On the plus side, incident responders have much safer jobs… ;^)
antibozo> Meanwhile, if I lose my chance to find out what IPs the attacker was last working from by pulling the network too early, I’m throwing away valuable information I could use to identify other compromised hosts by inspecting firewall logs.
Pat Calahan> Someone who’s going to build a self destruct is probably going to do a good enough job of covering his/her tracks that you’re not going to get much out of the machine, forensically speaking.
I mean that once you pull the network cable you have limited time to collect the network information in all cases, not just those where a logic bomb is involved. Sorry that wasn’t clear.
And, as observed elsewhere, by pulling the cable you start the game clock, so you’d better be ready to play.
Pat Calahan> Since the box is untrusted, you can’t rely on the logs anyway
That doesn’t mean you ignore them.
Pat Calahan> (most rootkits I’ve seen turn off logging as the first matter of course anyway).
You’d be amazed at how often people screw up this simple step.
In practice, there aren’t any good rootkits. Suckit was fairly innovative at the time, but it has major flaws and it is actually extremely easy for a competent incident responder to detect and build a network scanner for. Yes, the sysadmin may be fooled by it. That’s why step 1 for the sysadmin is not “pull the network cable”; it’s “call the incident response team”. After all, a lot of times, the box that’s acting funny isn’t compromised at all.
Pat Calahan> And, if they’re not good enough to wipe tracks properly, you’ll get plenty of dirt off of examining the contents of the drive and the swap file while doing your examination using a known trusted OS platform.
I’m afraid you are vastly understating the volume of information that is lost when a system is crashed. Just consider what the core image of an intruder’s shell can be worth. Don’t count on your being able to reconstruct anything useful out of the page-sized confetti fragments strewn all over the swap device. And a lot of systems have enough core these days that they never even touch swap.
New systems are an issue with disk imaging as well. How many 300GB system disk images do you feel like scanning through and trying to keep in online storage? And now that sysadmins don’t know how to partition disks any more (“oh, I just made one giant root partition”) you have almost no help from the disk layout in narrowing your search. Believe me, you’re going to find out a lot more a helluva lot faster looking at the live system than you are trying to scrape through a massive disk, after you’ve pulled it and found another box to plug it into. Sorry, but the notion of crashing a box without a preliminary exam in anything other than the most extreme case is totally insane.
Nice chatting with you, Pat. :^)
Kristine • August 17, 2007 4:47 AM
Time to re-read my Zombie Survival Guide. Especially the section on penguin zombies.
Colossal Squid • August 17, 2007 5:42 AM
Interesting discussion, could anyone point me towards best practice resources for an amateur administering a Linux home server?
Brandioch Conner • August 17, 2007 9:03 AM
“If you want to drag Bruce into it, you might want to familiarize yourself with his “focus on intelligence gathering, and don’t defend blindly against specific tactics because the attackers will adapt” advice.”
And that contradicts what I said … how? Please be specific.
Remember, the attack has already occurred and been successful. Bruce seems to advocate “intelligence gathering” prior to the attack as a means of preventing the attack.
You do know that “prior” and “after” are different, don’t you?
“It would, if the sysadmins who observed it had the skill to even think of it.”
Ah, so now the scenarios you describe also depend upon admins who are so incompetent that they don’t suspect anything when 10 of their machines suddenly wipe themselves.
Congratulations, you’ve just moved from “movie plot threat” to “BAD movie plot threat”. You know, the ones that violate various laws of physics or that depend upon what I like to call “Hollywood Coincidences” to succeed.
Yeah, that’s the reason why your scenario has never been reported “in the wild”. The times it happens, the admin running those machines thinks it’s perfectly normal for multiple machines to simultaneously wipe themselves.
And that admin does NOT read any tech publications nor does he have ANY tech friends whom he would talk to that would tell him any differently.
Pat Cahalan • August 17, 2007 9:41 AM
- I assume you don’t perform any remote admin work for clients from your
personal *NIX box, because all your remote credentials may have been compromised
in this scenario.
You’re right, I don’t 🙂 I don’t do any adminning from any box that runs services.
- You just threw away oodles of information from the /proc filesystem, cores of the
intruder’s processes, in-core log information, and kernel data structures that would
help you determine the scope of the situation you’re dealing with.
Yes, you’re correct. Saving the machine state information is useful.
Well, I wouldn’t leave it up indefinitely, but I’d certainly take the time to identify
everything I reasonably could about live network connections before
yanking any network cables.
Sorry, I just re-read my post and I realize what I was trying to say was “each incident’s response should be context dependent”, but then I went and discussed my response without providing full context, so on the whole I get credit for a bad post 🙂
If a machine is compromised, saying “you should always do this [list]” vs “you should always do this other [list]” isn’t the argument you should be having.
The argument should be, “Given a system which provides service [foo], how do I plan to respond given [hack, fire, power outage, etc], who needs to know what pieces of information, etc.” If you’re responding to a security incident using your own [security and/or sysadmin] template without knowing what the consequences are to your organization, your template may be very useful in doing forensics, but might be totally inappropriate for your customer.
In most cases, it depends on the box, the function, and the type of compromise.
That was my intended point. You’re arguing a predominantly forensic standpoint, Brandioch is arguing a predominantly preventative standpoint, but you both have good ideas that may be more appropriate given the context.
You’d be amazed at how often people screw up this simple step.
Not really, we’ve had to deal with a couple of intrusions and this step is often broken, that’s true. Using a log host helps a lot even when logging is the first thing they disable, since they can’t affect the existing logs if you can read them elsewhere 🙂
I’m afraid you are vastly understating the volume of information that is lost
when a system is crashed.
Yes and no. From a forensic standpoint, you can learn a lot. You can learn a great many things about the intruder, certainly. You can learn specifically what tools were used to break into your machine, sure. You can find out whose account was hacked, where the compromise may have occurred, what other machines the host was talking to, etc.
However, forensics may not be an issue. Aside from “being a good citizen”, you may not care where the attack came from, or what other machines were attacked by the box. You may have the router logs, and already know what traffic has been coming from the box. You may know that the box was running a web service with a PHP backend and already have a pretty good idea how the box was hacked, even if you don’t know the specific details. You may be in a race for time to prevent the box from attacking other things inside your DMZ, if the box has access to your management net. We can go on and on, but “shutting the box off” (and losing forensics data) may be vastly more appropriate than “leaving the box on” (and allowing the machine to run in a nasty state until you can figure out what’s going on).
My point (albeit not well articulated in the last post) is that arguing about tactics before acknowledging the strategic situation is… well, not the right place to start.
Paul Glover • August 17, 2007 10:16 AM
@merkelcellcancer: there’s a significant difference between “somewhat safe” and “invulnerable”. I trust my Linux boxen and the Mac infinitely more than any Windows box I’m unfortunate enough to have to use, but I also know they’re only “somewhat safe” and I’d be crazy to assume I could never have a security breach.
There’s some really useful advice here in the comments. My gut reaction to a discovered compromise would have been “yank the power, image the drive and examine what was on it”. Which, as antibozo points out, would mean I threw away everything in /proc, plus the raw memory image, both of which would be invaluable to have around during the subsequent forensics work.
Todd Knarr • August 17, 2007 10:43 AM
xrey: your sysadmin may be being smart by not disconnecting the box on the spot. Think about this: you know at least one box is compromised, but how do you know that only one box is compromised? There’s two conflicting priorities when you’ve got a compromise on your hands. One is containing the damage and limiting the attacker’s ability to go any further. The other is figuring out what he’s done, how he did it and what else he’s doing. One of the more reliable ways of doing that’s to watch what the attacker’s doing (now that you know he’s there). If he’s compromised other machines on your network taking the machine you know about down suddenly may make him lie low until he sees if he’s been detected, where if everything appears normal he may continue accessing machines unaware that you’re actively looking for his traffic.
What to do depends on the balance between those two priorities, and how confident your sysadmin is that he can monitor the intruder without being noticed himself.
antibozo • August 17, 2007 11:38 AM
Brandioch, your vitriolic tone has become tiresome, so I’ll keep it short here:
“Before the attack” is not a well-defined event in your analogy. Knowledge that a box has been compromised is not analogous to a plane hitting the World Trade Center; it’s analogous to finding out some individual is a member of a terrorist cell. How you respond to that knowledge depends on a lot of things, and you don’t necessarily go out and arrest the one individual you know about the instant you find out.
As to your undying faith in the skill of sysadmins, I’m not in a position to talk about specific things I’ve witnessed handling incidents. But I will repeat something I said earlier which perhaps you didn’t read carefully:
antibozo> I can tell you that such scenarios occur, though I don’t know of one with the level of caprice in your example.
Whether exactly that event has been reported is irrelevant. What I’m trying to do is to get you to think beyond the autonomic response, and toward developing an understanding of a compromise before you walk in and start making things worse for someone. You’ll probably be okay because you sound like a reasonably intelligent, if needlessly snide, person, and I suspect your extreme position is more posturing than how you’d actually behave when faced with a real compromise. But it’s too bad you’re not willing to engage in a reasonable discussion about it.
Brandioch Conner • August 17, 2007 12:13 PM
“”Before the attack” is not a well-defined event in your analogy.”
What part do you have trouble with?
I’ll give you a pass on “the”.
“Knowledge that a box has been compromised is not analogous to a plane hitting the World Trade Center; it’s analogous to finding out some individual is a member of a terrorist cell.”
Only in your mind. And since you’ve already lost, you’d rather argue by analogy. I’m not going to fall for that.
The box was cracked. The attack had succeeded. And you’ve even postulated that MULTIPLE attacks have succeeded and that MULTIPLE boxes are cracked.
And then you lapse into your bad “movie plot threat” scenario.
But let’s see what else I can get out of you …
“As to your undying faith in the skill of sysadmins, I’m not in a position to talk about specific things I’ve witnessed handling incidents.”
Awesome. You’ve gone from bad “movie plot threats” to “I have secret knowledge that proves I’m right but I can’t tell you because it’s a secret”.
Yeah. And your “secret” encryption algorithm “proves” you’re right but it cannot ever be released. How many times have we heard that.
Can you do better?
“What I’m trying to do is to get you to think beyond the autonomic response, and toward developing an understanding of a compromise before you walk in and start making things worse for someone.”
Awww. Now you’re back to your bad “movie plot threat” again.
I’ve already addressed that. You can make up any scenario you want to. You’ve already brought up “logic bombs” in case the machine is disconnected from the Internet. Yet you always insist that the cracker will NOT do anything that would invalidate your position. That is “movie plot threat” in a nutshell.
“You’ll probably be okay because you sound like a reasonably intelligent, if needlessly snide, person, and I suspect your extreme position is more posturing than how you’d actually behave when faced with a real compromise.”
It’s not “snide”, it’s “cynical”.
I think you’re a troll. And I’m having fun mocking the troll.
Yes, my “extreme position” of … disconnecting the box as soon as you find out that it was cracked.
Let me guess. Now you’ll try to contort a scenario where …
A. Every box in a life-or-death situation is cracked.
B. A single box is cracked in a life-or-death situaiton … and that single box is a single point of failure.
Yeah, the designers of those systems must have attended the same schools that the admins did. You know, the ones you claim would not suspect anything odd about 10 machines suddenly wiping themselves.
“But it’s too bad you’re not willing to engage in a reasonable discussion about it.”
It’s the word “reasonable” as used by you (my dear troll) that is the problem.
You want to argue by analogy.
You want to claim that the admins are too dumb to suspect 10 boxes wiping themselves would be … odd.
You do not want a “reasonable discussion”. You want to troll this forum.
You want to “discuss” Hollywood Coincidences and Movie Plot Threats as if they were real.
xrey • August 17, 2007 12:28 PM
I agree that there is much to be said for watching a cracker and giving him enough rope to hang himself. However, I’m thinking about a situation where you have machines with sensitive data (including your credentials to get to other machines). Anything that the cracker downloads, alters, or destroys between time x (compromise discovered) and time x+y (when the network cable is plugged) is a gift to the cracker.
Like I said, you also have to assume that the cracker has had the run of your internal network for some time, so you must also pull the network cables for machines in your sub-network or department.
Obviously IBM is not going to pull their T1 to their corporate headquarters for one compromised box, but any user with an account on that box might now be an attack vector.
As for the techinical question of leaving the machine powered-on or powered-off, I’ll defer to the expertise of forensic pros. Antibozo makes a strong case for leaving it on.
At the time of discovery, you only have one data point: this machine has a backdoor and will respond to commands issued by an attacker. If you remove the network cable, it can no longer respond to commands.
If you leave it connected then you are responsible for any commands that get through. Whether the cracker noticed your observation or he just happened to issue a command between x and x+y, the result is the same.
There might be a logic bomb set to go off at a specific time, or if it remains off the network for a certain length, but that goes into the real of movie plot threat.
If you really want to be a “good citizen” and track down the hacker, then by all means set up a honeypot without any sensitive data. Even better, set up a virtualized clone of your cracked box and perform a man-in-the-middle attack against the cracker, taking care of what you allow to be sent back.
From my perspective, my priority is protecting data, so any sensitive data should be quarantined. The priority of an incident response team or a three-letter-agency might be to catch the hacker.
Also, it’s easy to be taken in by the “rookie mistakes” made by the cracker and become over confident. What if the crack is in fact professional industrial espionage made to look like a common script kiddie botnet?
antibozo • August 17, 2007 1:01 PM
Pat Calahan> I don’t do any adminning from any box that runs services.
Oh, good. And you don’t ssh into any box that runs services with X11 forwarding enabled either, right? ;^)
Pat Calahan> If a machine is compromised, saying “you should always do this [list]” vs “you should always do this other [list]” isn’t the argument you should be having.
I know it’s a lot of noise to slog through, but I think you’ll find that if you go back and read everything I’ve written above, you won’t find me saying you should always do anything in particular; rather you’ll see that my position is that there are always exceptional circumstances and that no single course of action is necessarily correct in every case. In other words, [always] think, and try to observe, before acting.
Pat Calahan> You’re arguing a predominantly forensic standpoint, Brandioch is arguing a predominantly preventative standpoint, but you both have good ideas that may be more appropriate given the context.
I think that’s not accurate–I’m arguing a containment-driven standpoint, not a forensic one. The overarching context here is a Linux compromise, which may thus involve a box used to conduct remote admin functions, thus containing the incident may involve a lot of other systems.
The first operational goal in a compromise is containment, and the first step toward that goal is not necessarily pulling the network cable on the box you know about. That would be a greedy algorithm approach, and, as I’m sure you know, with complex problems, greedy algorithms are not always the most efficient. In some cases it can lead to a nasty game of whack-a-mole. You’ll win in the end, one way or another, but if it takes a week instead of a day to contain, and a month instead of a week for the syadmins to recover (because they have to reinstall more systems), that’s a lot of time wasted that could have been spent on preventing the next compromise.
Pat Calahan> You may have the router logs, and already know what traffic has been coming from the box. You may know that the box was running a web service with a PHP backend and already have a pretty good idea how the box was hacked, even if you don’t know the specific details.
If you already have that kind of knowledge, indeed, you have a lot less to do on the compromised box before acting. Again, all I’m saying is, “take some time to understand the situation before acting”.
Pat Calahan> Using a log host helps a lot
Pat Calahan> You may be in a race for time to prevent the box from attacking other things inside your DMZ, if the box has access to your management net.
Definitely–you’re always in a race for time. But until you investigate, you don’t know where your opponent is in that race. There’s no point running a leg of the race you’ve already lost. If the intruder already got onto your management net and has his sticky little paws on all your switches and servers, there’s really no point in unplugging the network cable of the one box–you may need to go for the jugular at that point and hit a kill switch on the whole computer room. And if you’re going to have to do that, you may not want to give the attacker any lead time by taking finger-in-the-dike measures beforehand.
A lot of intruders are slow and unfocused. Some of them are lousy typists, and are running five network scans in parallel with working on your box and a dozen others. There’s even a reasonable chance the intruder is asleep at the time the compromise is discovered. If you’re focused and swift, you have a decent chance of finding out everything you need to know while the intruder does nothing of significance, and as long as he is unaware you’re working on the problem, he has no reason to hurry.
I should also clarify, as I think others have misconstrued my comments, that I’m not generally advocating delaying action in order to observe the intruder’s activity to try to learn more about his behavior and whereabouts; rather I’m arguing that you may wish to delay action until you’ve had a chance to survey your own systems, starting with the compromised box you know about, to understand the full scope of your compromise, so you can perform an instantaneous containment, closing all the doors at the same time. Sometimes this is a good idea and sometimes it isn’t, and experience helps judgment, but don’t rule it out just because someone told you to always pull the network cable immediately on a compromised box.
As Bugs Bunny would say, “It’s time for a little stragety.”
Todd Knarr • August 17, 2007 1:06 PM
xrey: yes, you can assume the intruder’s had the run of your internal network. But how much? Which other machines has he compromised? How did he compromise them, and how’s he accessing them now? One of the worst situations to be in is to obviously slam the door on the one machine you knew about, giving the intruder solid confirmation that you found him, only to find out a week later about the 2 other machines he’d compromised that you didn’t notice the first time. Before you had an intruder, now you have an intruder out to teach you a lesson.
Yes, he may have downloaded data. But how much? Has he already downloaded everything, meaning there’s no more data at risk locally? Or has he spent all his time gaining further access and not gone looking locally for sensitive data? In those cases, the data most at risk may not be on a machine affected by your shutting down the machine you’ve found compromised.
And your last is exactly correct. That’s one of the reasons my first reaction is to not do anything. I approach any evidence I find asking “Is this something the intruder wanted me to find? And if it is, why?”. If I were an intruder, one thing I’d want to set up is a tripwire (or several) to tell me if some nosy sysadmin’s started poking around where he shouldn’t be.
Pat Cahalan • August 17, 2007 1:18 PM
Oh, good. And you don’t ssh into any box that runs services with X11 forwarding
enabled either, right? ;^)
Brrr… X11 forwarding makes my skin crawl. Have you been reading my blog?
But until you investigate, you don’t know where your opponent is in that race.
There’s no point running a leg of the race you’ve already lost.
Sure, I’ve just seen people forget what race they’re actually running. One anecdote involves a sysadmin of my acquaintance spending the lion’s share of two weeks digging into a compromised host searching for forensics clues, when we already knew that the box in question hadn’t done anything to any other boxes (yay for the router logs there), and the real question of precisely how it got hacked was moot -> it was running about four known-to-be exploitable services. Who cared exactly why it was exploited? We weren’t going to redeploy it in that state anyway 🙂 Sure, he got a lot of intellectual satisfaction out of the exercise, but there were other (more pressing) things he ought to have been working on.
I should also clarify, as I think others have misconstrued my comments, that I’m
not generally advocating delaying action in order to observe the intruder’s activity to
try to learn more about his behavior and whereabouts; rather I’m arguing that
you may wish to delay action
Sounds like we’re both actually on about the same page, we just both wrote posts that weren’t clear enough in the first place 🙂
xrey • August 17, 2007 1:59 PM
I agree with just about everything you said; it’s very important to figure out how he cracked your system, how far he got, etc.
However, I disagree that the best method is to let him crack your system even more. I think the best method is forensic analysis of the dis-connected machine or from its image.
We have to use classic cost/benefit analysis to determine how many other machines we must also disconnect. The cracker now knows about all the machines listed in “/etc/hosts” so that’s probably a good place to start.
Forensics takes time, so we’ll have to do a cost/benefit analysis of how much effort we want to put into answering which questions.
This may take weeks. If that’s unacceptable, then I hope we set up backup systems ahead of time.
When the first machine gets disconnected you have a finite time to do the rest, but I don’t think it’s as short a fuse as some make it to be. After all, single machines go down all the time for all the normal reasons.
People on this board smarter than me have pointed out that you need to have these procedures worked out ahead of time. If I have a hand in crafting the procedures, my vote is that it should say “pull the network cable of the compromised machine and any machine within radius R as soon as a compromise is detected.”
I am skeptical that even the most “733t” sysadmin can watch a compromised machine without letting the cracker know he is watching and prevent the cracker from doing more damage. How many processes can a sysadmin monitor at once?
And if my sysadmin(s) are busy playing spy-vs-spy over Machine A, they could miss the fact that the user credentials stolen via a key-logger from Machine A have now just logged into Machines B, C, and D which have all just opened up outboard connections….
If the cracker set up a trip-wire, how is he going to know I’m poking around on un-plugged machines?
I’ll leave it up to others more knowledgable than myself to address the legal questions:
1) Is a system which you left on-line and manipulated after compromise detection now inadmissable as evidence?
2) If I decide to leave the machine connected, am I legally liable for proprietary information stolen after detection (hospital records, etc.)?
Of course, information that is literally “Secret” in the DoD sense is never put on externally connected computers in the first place. Government agencies have rules for handling Classified information–assuming you belong to one of the three normal branches of government (I’m looking at you, Cheney).
antibozo • August 17, 2007 2:43 PM
xrey> This may take weeks.
With any decent organization, it’s on the order of two or three hours from discovery to analysis of the first box, and from there another hour to hit a dozen or so more with one sysadmin working on that part of the problem in coordination with an incident responder. But you can double those times if the sysadmin yoinked the network cable before calling, and triple them if he rebooted.
The majority of the initial time is the sysadmin contacting the incident response team, the team getting the admin up to speed on procedures, downloading tools, etc. In parallel with that is discussion with sysadmins and managers about the network architecture and determining current critical service placement and downtime tolerance so that as the scope is narrowed down, the containment procedure can be ready to go in short order.
It will never, ever, take weeks to contain. Assuming proper staffing, two days at the outside for a complex system that provides life-and-property functions hence has stringent downtime requirements, and that may involve partial containment at a much earlier mark. On most other scenarios, three to four hours to contain, on the short side if the vector is found quickly.
xrey> Is a system which you left on-line and manipulated after compromise detection now inadmissable as evidence?
Criminal cases are extremely rare, and the objective in a criminal investigation is not necessarily containment of the compromise; it’s establishment of sufficient evidence to conduct a prosecution. This may be inimical to containment and is really a whole issue unto itself. I suggest you take a look at NIST 800-86, “Guide to Integrating Forensic Techniques into Incident Response”, as a starting point, though, if you’re interested. You might want to compare the forensic process discussed there with the incident response life cycle discussed in NIST 800-61, “Computer Security Incident Handling Guide”.
xrey> if my sysadmin(s) are busy playing spy-vs-spy over Machine A, they could miss the fact that the user credentials stolen via a key-logger from Machine A have now just logged into Machines B, C, and D which have all just opened up outboard connections….
If your sysadmins are busy playing whack-a-mole logging into B, C, and D to check for illegitimate logins (which may have been wiped out of wtmp), they could miss the fact that the .bash_history file on A shows the attacker logging into E, F, and G, and the fact that the attacker logged into A from 192.168.1.1 and your router logs show connections to the ssh port on H, I, and J from that same IP, and the fact that the attacker initially got into A using a vulnerability in service S, and K, L, and M are all running that service as well. If you don’t have incident responders who can figure out those things from A rapidly, yes, a scattershot approach may be your only option, but rest assured that plenty of organizations have capable incident response teams.
xrey> If I decide to leave the machine connected, am I legally liable for proprietary information stolen after detection (hospital records, etc.)?
If you decide to disconnect the machine, are you legally liable for proprietary information stolen at an increased rate from other systems because you just told the intruder he has to finish his task as quickly as possible?
Really, if legal liability in the type of scenario you’re talking about is your concern, your best ass-covering strategy is to down every external connection at your perimeter and kill power on every box, damn the consequences.
xrey • August 17, 2007 4:33 PM
antibozo> “It will never, ever, take weeks to contain.”
It did at my last job! A *NIX box was cracked in an adjacent group. The decision was made to unplug every *NIX box in the whole department. This caused an immediate work stoppage because most of us needed the network to function.
Dozens of Engineers sitting on their hands for a couple of weeks resulted in millions of dollars of lost productivity. It helps if you’re employer has deep pockets!
But more bureaucracy also means more higher-ups putting in their requirements.
These weren’t mission-critical boxes, to be sure, or else there would have been a back-up in place.
Each box had to be scanned for vulnerabilities, all patches had to be up-to-date, and everyone had to change all their passwords.
In the interim, at least we were sure that the none of our machines were spilling any data that hadn’t been compromised already. It was a drastic decision, but I had to admit it was the right one. And it sure was a wake-up call for everyone.
We had to assume that the compromised Machine A may have been used to crack B, which may have been used to crack C, etc.
However, there was no need to un-plug machines X, Y, and Z in another department, because there were no user accounts overlapping. That doesn’t mean that IT wasn’t watching them more closely.
Later I became the sysadmin for my group and shored up a lot of the machines (with guidance from upper level IT).
In the years that I was sysadmin, we were never cracked again, and it wasn’t from lack of trying. We detected attempted connections over every port and IP Address at least once a month.
Either I was such a good administrator that I made our boxes impenetrable, or I was so inept that I didn’t realize when we were cracked 😉
But if any of my machines had been cracked, I would not have hesitated to un-plug all my boxes from the network.
Brandioch Conner • August 17, 2007 8:32 PM
I have to agree with you. Unplugging is best. Immediately, if at all possible.
And you’ll notice that “antibozo” has changed his position from:
paraphrased: “unplugging will destroy your machines”
paraphrased: “unplugging will reduce the amount of information you can gather”.
With the unstated assumption that losing that block of information will be more harmful to you in some way than allowing the cracked box to remain in his control.
antibozo • August 18, 2007 12:53 AM
xrey, that’s a pretty brutal story, and you have my sympathy. But technically, the weeks of downtime you suffered were eradication and recovery, not containment. Containment happened when all those boxes were disconnected (or at least, one hopes so). In any case, it sounds like you got some useful knowledge out of the experience.
I really recommend you take a look at NIST 800-61. It covers a lot of the issues we’ve discussed here, and may put your experience in perspective. You can find the NIST 800 documents here:
supersaurus • August 18, 2007 7:17 PM
tools question: there are tools, e.g. tripwire, you can run to find out what happened in the filesystem. leaving aside the shutdown/pull the plug or not argument, do you use things like that at some point in the investigation (yes, I know, it matters how, where and when you set up the database and how you do the checks)?
supersaurus • August 18, 2007 7:19 PM
for the original poster: did you ever find out how the cracker got in?
antibozo • August 18, 2007 8:29 PM
supersaurus> tools question: there are tools, e.g. tripwire, you can run to find out what happened in the filesystem. leaving aside the shutdown/pull the plug or not argument, do you use things like that at some point in the investigation (yes, I know, it matters how, where and when you set up the database and how you do the checks)?
For the most part tools like tripwire are used for either detecting a compromise or trying to recover a system after containment without reinstallation (rarely advisable, but sometimes necessary). They are not as useful for figuring out what really happened; they tell you that a file has been modified, but not when.
What is more useful during initial investigation are tools that build a chronology of filesystem activity, e.g. TCT/sleuthkit mactime. This mode of investigation is not foolproof because it is possible for an intruder to tweak timestamps deliberately, but this is very rare in practice, at least nowadays. A filesystem chronology helps you correlate intrusion activity with logs, both on and off the system, and shell history files, and gives you a rough idea of the sequence of actions the intruder performed. It also gives you an approximate time at which the intrusion began, and may be useful for developing signatures for checking for intrusion on other hosts. For example, if the first thing the intruder did was use wget to pull down a toolkit, and no one else uses wget regularly, you can check the last access time of the wget binary on all your systems to build a list of suspect systems that should be initial candidates for further investigation and possible containment. Timestamps also help when dealing with password sniffers; the last access time of the sniffer log usually tells you the last time the intruder looked at it, and passwords logged before that time are thus more suspect than those logged later.
Chronology tools are only useful, however, if the timestamps are pristine at the time they are collected, which is why timestamp collection is something that should happen as soon after detection of a compromise as possible. Every time a user does an ls, executes a binary, accesses a config file, etc., last access timestamps get changed. Normal activity can quickly erode the access chronology from the filesystem, making investigation significantly more difficult. If the sysadmin tracks down a sniffer log and cats it before collecting last access times, evidence of what the intruder knows is destroyed. Timestamps may also be reset by backups, cron jobs (the updatedb process that builds the fast find database for the locate command, for example), patching, and especially a reboot. A system that has been rebooted after initial compromise is a lot harder to analyze.
Of course, the ability to build a useful chronology is severely hampered by unsynchronized clocks. It’s important to measure the clock skew on a system during initial investigation so that timestamps can be adjusted to correlate with the rest of the world, and it’s an important practice for all sysadmins to keep system clocks synchronized. NTP is your friend.
If you’re interested in this field, download sleuthkit and play with it. See if you can independently construct a timeline of recent activity on your box as an exercise.
Gopi • August 19, 2007 12:12 PM
Does it really take that long to pull the plug, rip out the drive, image it, and put it back in and turn the machine back on, if you want to avoid tipping the person off?
If imaging would take too long, I just thought of a cool piece of hardware somebody should build. It would do a man-in-the-middle on your IDE or SATA bus. It would do a block copy as fast as possible, while passing through read requests from the CPU. If there was a write request, it would read and store the original block first.
Another idea would be to take advantage of sleep mode. If you can configure your desktop to do a full write system state to disk sleep, you can pull off all the system state for analysis when you do a block image of the drive.
Finally, FireWire has some interesting direct DMA options. You can use it to read and write arbitrary memory; this can be done without the OS kernel’s approval if the firewire hardware’s permissions are set the “right” (“wrong”?) way. You could likely build up some interesting forensics tools that didn’t involve the remote kernel knowing what was up. I believe the FreeBSD has FireWire kernel debugging which will work even if the kernel has fallen over and died.
antibozo • August 19, 2007 12:36 PM
Gopi> Does it really take that long to pull the plug, rip out the drive, image it, and put it back in and turn the machine back on, if you want to avoid tipping the person off?
Yes, in general, especially with modern disks. A 300GB disk takes 50 minutes to image at 100MB/s, which is a generous transfer rate. And killing the box means throwing away lots of volatile information and completely disrupting the environment both for the system and the intruder. In contrast, collecting file metadata on a similar-sized disk would typically only take 5-10 minutes and cause no disruption.
Furthermore, the incident handler is often working remotely from the compromised system, so getting a disk image to the handler for analysis may be constrained by network bandwidth, and talking the sysadmin through the necessary steps can be iffy.
Gopi> If imaging would take too long, I just thought of a cool piece of hardware somebody should build. It would do a man-in-the-middle on your IDE or SATA bus. It would do a block copy as fast as possible, while passing through read requests from the CPU. If there was a write request, it would read and store the original block first.
Satisfactory live disk imaging for non-evidentiary purposes can be accomplished with dd and netcat, and doesn’t require taking the system down. On LVM systems, snapshots can be used to maintain consistency. In cases where you have a hot-pluggable server with RAID 1, you can do an instantaneous image without killing the box by failing one side of the mirror and pulling the disk. These are sometimes useful tactics, but again, from an initial investigation standpoint, the priority is a good filesystem chronology, and it’s usually a lot faster and less resource-intensive to collect file metadata on the live box than work with an image.
I believe devices similar to what you describe exist, however.
Your other suggestions about sleep mode and firewire are interesting. As far as sleep mode, I don’t believe I’ve ever seen a Linux server that implements a hibernation mode, but I can see it being workable with Windows laptops. I don’t do much investigation on Windows, so maybe someone else can comment.
Brandioch Conner • August 19, 2007 4:42 PM
“Does it really take that long to pull the plug, rip out the drive, image it, and put it back in and turn the machine back on, if you want to avoid tipping the person off?”
Does it even matter? “antibozo’s” original position was that the cracker would destroy other systems if he should become aware that you are aware of his activity.
“antibozo” has since abandoned that position.
“Another idea would be to take advantage of sleep mode. If you can configure your desktop to do a full write system state to disk sleep, you can pull off all the system state for analysis when you do a block image of the drive.”
Yep. But the question is whether the information you gain from such would be that much more useful than the information gained through just searching the hard drive after you shut down the box.
It’s called “analysis paralysis”.
You have to keep your eyes on the final goal. Will that information change the approach you’ll be taking? Would the lack of that information result in your failure to attain that goal?
In every case you’ll run into you’ll have a pretty good idea of how the cracker got in after you run a few basic checks.
After that it’s just a question of what data he had access to and what other machines inside your operation could he have attacked.
Don’t waste your time over-thinking this. Digging through a few gigabytes of data (RAM dump + files) that won’t change your course of action is not an effective use of your time.
Instead, think through the various scenarios now. Know how your systems can be attacked and how you’d detect that (and prevent it) now.
If a system does get cracked, how quickly would your detection system alert you? Can you automatically isolate that system from your others?
Armchair 'expert' • August 22, 2007 10:28 PM
I’m nothing but an armchair security expert (perhaps “security enthusiast” is a better term) but I’ve found this discussion interesting.
I feel those of you arguing against Antibozo should re-read his posts. Consider the logic of them.
From what I read he’s not arguing for heroic detective work, but that any action should be considered and weighed based on your own situation. (Perhaps he’s arguing against a default reaction?)
If you do First Aid here in Australia you’d learn the “DRABC” acronym.
Notice that the first thing to do by the acronym is ‘Danger’.
In other words you need to do some kind of assessment of the danger in any situation before acting. In the First Aid situation you don’t want to get yourself hurt when trying to help somebody else.
So, consider the bigger picture I guess, don’t rush and make a poor decision which at worse, could possibly make the situation worse (A cracker “race”, other boxes compromised, perhaps your whole organisation down or even some kind of logic bomb!)
Eric Windisch • August 24, 2007 1:41 PM
I deal with cracked machines fairly frequently, being involved in web hosting. Our customers are responsible for their own security, and are not always vigilent. (We protect ourselves with various network security measures, but thats another matter)
Virtual machines have significantly affected how we deal with these things. We run Xen and use LVM on Linux. From our host, we can create disk snapshots and examine the filesystem without the guest OS being shutdown or aware. We do this not just for hacked hosts, but to handle DMCA and spam complaints.
In fact, when possible, we snapshot the filesystem, investigate, then alert the intruder inviting him to “clean up”! Afterwards, we perform a filesystem comparison, to see if there was anything we missed on the initial investigation.
Of course there are things that can only be done from inside the running guest, and we’re hoping that some new intraspection tools, such as being involved in web hosting. Our customers are responsible for their own security, and are not always vigilent. (We protect ourselves with various network security measures, but thats another matter)
Virtual machines have significantly affected how we deal with these things. We run Xen and use LVM on Linux. From our host, we can create disk snapshots and examine the filesystem without the guest OS being shutdown or aware. We do this not just for hacked hosts, but to handle DMCA and spam complaints.
In fact, when possible, we snapshot the filesystem, investigate, then alert the intruder inviting him to “clean up”! Afterwards, we perform a filesystem comparison, to see if there was anything we missed on the initial investigation.
Of course there are things that can only be done from inside the running guest, and we’re hoping that some new intraspection tools currently in development, such as XenKIMODO, that will allow greater security analysis from our hosts.
Subscribe to comments on this entry
Sidebar photo of Bruce Schneier by Joe MacInnis.
Leave a comment