About Computer (In)Security
Overview about About Computer (In)Security, Trusted Computing Base, Bad Usability of Programming Languages, Too much Software Written in Unsafe Languages, Backdoors, Purposeful Vulnerabilities in Software, Miscellaneous
Trusted Computing Base[edit]
A secure and trustworthy computer is a computer that only does what you (think you) tell it to.
Computers are insecure and not trustworthy and there is nothing we can do about that. Computers are "universal machines" (see Von Neumann, Turing). They can be instructed to do anything they can do according to the laws of physics. It is important to understand that a computer can't discern between "good" and "bad" instructions, bits are bits. There are two general problems:
(1) There are a lot of people who can instruct a given machine to do something (anything!), we need to trust them all, or verify everything, which is impossible.
If you read or write all instructions (kernel, software, firmware) yourself, study the hardware design (so you understand what an instruction actually does) and compare that with the actual hardware and then make sure you are the only one who can access the machine and give it new instructions, you need to trust that other people only give the machine instructions to do things that you want it to do. You can tell it to accept other, external untrusted instructions in a restricted, controlled fashion (like "render that html but other wise don't touch my system") but this is next to impossible to do reliably because of point 2). Otherwise you could never connect that machine to a network or transfer any input from another system to it ("input" and "instructions" has to be understood in a very broad sense, it is not just about executable code, it can as well be media files or hardware devices that are being attached).
The "Trusted Computing Base" (TCB) refers to all instructions and hardware used to restrict a "universal machine" from accepting and acting on (other) arbitrary instructions. It is used to set a certain security policy to enforce that a computer only does what these trusted instructions allow it to. "Trusted" simply means that we have to trust it, not that it is wise to trust it. The TCB tries to restrict what instructions can do but it consists of instructions itself. It can't restrict or police itself. Any flaw whatsoever in the TCB directly results in a compromise of the security of the system.
(2) The machines themselves and the instructions are incredibly complex.
Even in an ideal world with mathematically verified hardware and software we may think we are giving instructions to do A while in reality the instructions tell it to do B, or also B. Machines can not make "mistakes" but they certainly can do unexpected things...
We can't give machines instructions in our language (programming language) it needs to be in machine code. The translation is so complex that we need another machine to do that for us, and to create that machine another was used and so on. This expands our chain of trust to basically all machines and humans involved to the first compiler created by an assembler which in turn was created by hand-written machine code or earlier even punched cards.
Hardware is fallible, hardware failure can result in entirely unpredictable results. Cosmic rays "flipping a bit" or just bad memory and extreme temperatures? Look up "Bit-squatting" for an example. Software depends on the hardware to function perfectly, what happens if the most basic assumptions are betrayed?
Another example for complexity, side channel attacks: they are very difficult to protect against and poorly understood by most software developers. E.g. through power or CPU load analysis otherwise isolated parts outside the TCB can influence or eavesdrop on privileged encryption. In other words, determining what is part of the TCB and what is definitely not, is a very hard problem. See for example Spectre and Meltdown.
If that wasn't enough we also have the problem that circuit layouts, microcode and firmware of most vendors is proprietary (nonfree). Part because of competition concerns, part because they are afraid of patents... In any case, this makes verification even more difficult that it already is. Further "hardware" (which includes firmware, i.e. software) is becoming more and more capable (#complex and dangerous). Features like Trusted Computing when used to prevent ring 0 access from the user and owner of the computer, EFI that has Internet connectivity and can update itself or Intel RMM / Intel AMT / Intel ME which can grant remote access that is invisible to the user and operating system (OS) combined with a less than perfect track record (e.g. see Intel related research by Invisible Things Lab) doesn't exactly spell trustworthy and 100% dependable.
Conclusion: The TCB can never really be trustworthy. The source code of every currently usable OS kernel alone is too complex and large to completely audit and make error free, not just for a single human but even for large groups like the "Linux community". But let's assume we solve that (e.g. through using microkernels and formal verification): How do you make sure compiled binaries are actually doing what the source intended? Or, how can you verify that complex hardware and integrated circuits are actually built according to their intended design? For all the verification and auditing processes we are dependent on other complex computer systems that would have to be trusted unconditionally. Bootstrapping trust is a chicken and egg problem. We would have to be able to verify systems with just looking at them with our bare eyes and hands or build/verify all systems necessary to bootstrap a modern compiler and hardware development platform. This may have been possible for the first "computers" in the first half of the 19th century but not anymore.
Since there's nothing we can do about that, what else can we do then?
We need to design our systems in a way that makes it no longer necessary to trust them 100%. We only need to trust that it is good enough and that it is astronomically unlikely that multiple diverse systems are untrustworthy in the same way.
If you don't want a computer to be able to tell anyone your location or identity better make sure the computer doesn't know either... The Kicksecure security design in some ways mimics the security design of the Tor network itself: Don't trust any single entity to be trustworthy. We only rely on the fact that it is unlikely that all entities (nodes, computers) are compromised and colluding. To be precise, that's a goal that is not achieved yet with Kicksecure alone. One could also say that the actual TCB of such a "system" (actually multiple systems) becomes the design, arrangement and usage policy which is very well possible for every user to comprehend and verify.
This could be compared to the "Air Gap" used on most high security networks. They assume that the TCBs are not trustworthy and work around that using a simple and easily verifiable policy that basically eliminates the complete attack surface or hardware and software bugs and even protects against most backdoors (for example, a subverted PRNG could still result in weak crypto being exported from the trusted network where it can be recorded and "cracked" by an adversary. A strong pyhiscal isolation based system could then encrypt data twice on different systems using different PRNG implementations to protect against such attacks.)
Physical Isolation in the sense of Kicksecure is not a new idea, see Verifiable Computer Security and Hardware: Issues by William D. Young. September 1991 (PDF!) page 18 for a summary. It seems like the idea was rediscovered by Kicksecure (independently, and we came up with the same term), to our knowledge Kicksecure is currently the only project following this approach in a defined way.
Security Mindset of Open Source Software Ecosystem[edit]
Upstream distributions (such as Debian) can mostly only package available upstream software but not re-write most. Though, a lot upstream software was created with priorities and not necessarily highest security in mind. A lot required functionality is only available through "legacy" software written in memory-unsafe programming languages. The whole Open Source software ecosystem was never primarily focused on highest possible security to begin with.
36C3 - Das nützlich-unbedenklich Spektrum does not sound promising on general trends in computer security. Quoting freely. "More and more companies (such as google chrome to pick one example among a trend) move to an approach where security bug reports are disregarded unless a proof of concept exploit is being provided. Mitigation methods for bad, potentially exploitable code are preferred over actual security bug fixes because there is too much source code, complexity is too high and the total number of bugs is unmanageable."
See also Linux User Experience versus Commercial Operating Systems to learn about organizational issues in the Open Source ecosystem.
See also Dev/Operating System to learn about which operating systems were considered for Kicksecure.
Example:
https://github.com/Ride-The-Lightning/RTL/issues/1351#issuecomment-1928677703
[edit]
- https://bootstrappable.org/
- https://en.wikipedia.org/wiki/Bootstrappable_builds
- https://stagex.tools/
- trusting trust
- bootstrappable builds
Bad Usability of Programming Languages[edit]
Too much software which is part of the TCB is written in very old programming languages such as assembler, C and C++. These programming languages are unsafe and difficult to master. There are many ways to make mistakes. Review of software written in such languages is difficult even if the author had good intentions. It is getting worse if the author had malicious intentions. Auditing exiting source code can be super difficult, see Obfuscated C Code Contest. Most times it is easier to write source code than to understand source code written by a third party. Geeks will disagree however due to the flood of security issues it should be abundantly proven that there is rarely security bug free source code. Therefore the conclusion is undeniable that there are safety and usability issues in programming languages and the overall tool chain.
Even easier programming languages such as ruby are still too difficult to learn. Therefore the number of hobbyists motivated/capable to use them is limited. Programming languages that focus on usability, run usability studies and iterate according to usability research, that can be understood by as many people as possible are yet to be invented.
Too much Software Written in Unsafe Languages[edit]
This is partially because by the time the software was written there were no easier / more suitable programming languages available. Also performance considerations were part of this. Nowadays too few people are sufficiently aware, funded and/or motivated to replace this software with rewrites in languages with higher safety and usability. In the very strong opinion of the author, for "perfect security" a lot old source code written in unsafe languages would have to be rewritten in safe and easy (yet to be invented) programming languages to keep the amount of code in the TCB which is written in unsafe languages at a minimum to make it manageable to audit and free of security issues. This however is very unlikely to happen.
Too Much Source Code[edit]
It's open source. Don't like it? Why don't you fork it?Common discussion pattern.
To address that, some of the reasons for the invention of Open Source need to be recapitulated. In the past, someone would have said:
You've got the binary. Just use a disassembler and then look at the disassembly.Illustrative discussion.
However, disassembly is not an acceptable answer, that anyone would take serious, because the disassembly is:
- Not considered source code.
- Too complex.
- Optimized.
- Architecture-dependent.
- Too much code.
Obfuscated source code (minified, without comments, all in one line, with nonsensical variable and function names) is also unacceptable.
However, what's happening nowadays is...
Want the source code? Here you go. 5 million lines of it.Illustrative discussion.
Millions of lines of source code (that on top of that keeps constantly changing, called updates) is beyond overwhelming, simply impractical and infeasible.
- Binary closed source: infeasible
- Excessive open-source code: infeasible
A closed-source binary almost equates to an overwhelming amount of source code. Many of the original advantages of Open Source are diminished.
A new concept must be introduced: fork-friendly
Source code is also changing too fast. When you talk about quick changes, you're dealing with another level of impracticality in terms of understanding it. You can never fully understand it; even with infinite time, it changes so quickly that you never have a chance to catch up. You finally understand one version, but it's already changed 20 versions later, making your knowledge irrelevant.
The underlying principles:
- Understandability
- Changeability
- Maintainability
- Trustability
- Securability
These require known quantities. A constantly changing piece of code can never be trusted because, as mentioned earlier, it’s never actually defined. Before you can decide if it's trustworthy, you have to know what it is and what it does.
Securability is also affected by the lines of code. The more code real estate, the more elements there are to secure. It quickly becomes impractical for small teams.
If the code constantly changes, this becomes simply impossible. You will be behind every single release on security as new code and new bugs are introduced faster than anyone can analyze.
Proposed concept: For any given number of Open Source developers and any required level of security and trustability, there exists a certain maximum number of lines of code. The Freedom Software security community does not have the manpower to ensure trustworthy, maintainable code if there’s too much of it.
Manpower is not just developers multiplied by time but rather developers multiplied by developer quality multiplied by time. This leads to the conclusion that if we want trustworthy code, we need to vastly simplify it.
If we want security:
- We need to expect fewer features.
- We cannot "have it all."
- We cannot compete with commercial software comprising millions of lines of complex code but delivering thousands of features.
Trustworthy software cannot be built on top of untrustworthy software stacks. Nothing build on top of Android could ever be truly trustworthy. Too many millions of lines of code, authored by a company whose interests don't align with privacy and Freedom Security. See also Mobile Devices Privacy and Security and Mobile Operating System Comparison.
The same applies to mainstream Open Source web browsers such as Mozilla Firefox, Chromium. They are unsecurable, unmaintainable, unchangeable (according to improved privacy and security), untrustworthy, and insecure (by any definition of security), and their operation remains unknowable due perpetually ongoing rapid source code changes with no end in sight due to No Final Specification.
Librewolf (a fork of Firefox, with an emphasis on privacy and security) developers for instance, has no realistic chance of fully understanding Firefox or its operations. They have to rebase (integrate) in whatever changes come from upstream (Mozilla, original developers) almost blindly because essential security updates are intertwined with new functionalities and operational changes.
For example, feature request Radio Silence by Default for Browser Startup and Background Connections aka "Disable Phone Home" #1779 is simply not doable. No community software fork of Firefox can tame Firefox. It's simply impossible.
Is Firefox really an appropriate default browser for Qubes? No, it's not, there is no suitable alternative, see also Kicksecure Default Browser - Development Considerations.
Firefox is not fork-friendly, see Firefox Potential Legal Risk. At time of writing, feature request Add an Unbranding Option in Firefox Settings has seen no interest and there is little likelihood of getting implemented.
Scattered Attention[edit]
One major issue with Open Source security is the sheer volume of code. This vastness leads to scattered attention and a lack of focused, comprehensive reviews. There is a practical limit to the amount of code, multiplied by its complexity, that the community can effectively audit and maintain.
The overwhelming number of tools and projects available makes it difficult to choose between them, further diluting focus. XKCD illustrates this problem humorously.
The only feasible long-term solution seems to be a significant simplification of the software development process. Without this, the Open Source community risks being overshadowed by large corporations that have the resources to manage complexity on a much larger scale.
Code Bloat[edit]
- https://www.positech.co.uk/cliffsblog/2022/06/05/code-bloat-has-become-astronomical/
- https://en.m.wikipedia.org/wiki/The_Mythical_Man-Month
- https://en.wikipedia.org/w/index.php?title=Criticism_of_Linux
Lack of Minimalism[edit]
Trusted Computing Base (TCB) should be kept minimal.
Security-critical code [...] A successful attack against any of these components could compromise the system’s security. This code can be thought of as the Trusted Computing Base (TCB) of Qubes OS. One of the main goals of the project is to keep the TCB to an absolute minimum.https://www.qubes-os.org/doc/security-critical-code/
The size of the current TCB is on the order of hundreds of thousands of lines of C code, which is several orders of magnitude less than other OSes. (In Windows, Linux, and Mac OSes, the amount of trusted code is typically on the order of tens of millions of lines of C code.)https://www.qubes-os.org/doc/security-critical-code/
considering that SeaBIOS is just 50k lines of clean code (of which maybe just a few thousands would be active at specific configuration)https://github.com/microsoft/mu/issues/52#issuecomment-448993481
Too Much Complexity[edit]
Open Source projects such as the major browsers (Firefox, Chrome), and the mobile operating system Android, have grown too large for anyone to fully, independently, continously security audit and/or and permanently software fork without eventually needing to rebase from the upstream source.
- What is upstream? The original developer, such as Mozilla for Firefox.
- What is a software fork? A software fork occurs when developers take a copy of the source code from one software project and start independent development.
- What is a rebase? Rebase refers to the process of integrating changes from the upstream source into the forked version.
- Why is rebase necessary? It ensures that the forked version stays current with improvements and developments made by the original developers.
- What happens if no rebase occurs? Without rebasing, websites could become inaccessible over time as web standards evolve. Security vulnerabilities fixed by the original developers would remain unpatched in the fork, leaving the forked version responsible for maintaining compatibility and security.
Popular, well-maintained forks of browsers, such as Tor Browser and Mullvad Browser, are often based on the ESR (extended support release) version of Firefox. The ESR version introduces fewer changes over time, giving Linux distributions, enterprises, and forks more time to test and adapt their patches. However, even the ESR version has a limited lifespan, and once deprecated, forks must adapt to newer ESR versions to stay viable.
Most forks of mobile operating systems are based on Android, which also faces similar challenges of maintaining security and feature parity with the original.
Due to the constant maintenance required to stay in sync with upstream changes, many forks face eventual breakdowns or abandonment.
- Why? The sheer size of these codebases—millions of lines of code—and the speed of development make it nearly impossible to keep up with the necessary reviews and updates.
Although open source remains beneficial and critical, the complexity of modern projects has somewhat undermined its practical independence.
No Final Specification[edit]
The fundamental issue with modern software—whether in browsers, operating systems, or web standards—is that there is no final specification. The nature of these technologies ensures they will keep evolving for the next 100 years, and likely beyond, with no end in sight. As a result, we are trapped in a perpetual cycle of updates, patches, and rewrites, chasing the latest standards and features.
This never-ending evolution is problematic for several reasons:
- Endless Development: Software is never truly "finished." There is always a new feature, a new security vulnerability, or a new standard to implement. Developers must continuously adapt, but this often feels like chasing one's own tail—never quite catching up.
- Constant Maintenance: The upkeep required for complex systems like browsers or operating systems is immense. Without regular updates, these systems quickly fall behind on web standards, security patches, and user expectations. Yet, keeping up requires constant nursing, which can become overwhelming for even the largest teams.
- Fragmented Focus: The vast amount of Open Source code multiplies this problem. As more tools and projects emerge, it becomes increasingly difficult for developers to focus and contribute meaningfully. The attention of the community is scattered across too many projects, diluting the potential for deep, thorough reviews and improvements.
This situation leads to a paradox: while Open Source software offers freedom and flexibility, the sheer volume and complexity can undermine its potential. Maintaining forks, ensuring security, and complying with evolving standards are all enormous burdens that grow over time.
In the absence of a final specification, the only feasible solution seems to be simplifying the process. However, if the current trend continues, large corporations with significant resources will dominate software development, leaving smaller teams and independent developers struggling to keep up.
While Open Source remains valuable and critical, the endless cycle of development risks turning its greatest strength—flexibility—into a major obstacle. Without a clear path to simplification, the next 100 years will see more of the same: constant change with no final destination.
No Review Process[edit]
https://www.bleepingcomputer.com/news/security/qbittorrent-fixes-flaw-exposing-users-to-mitm-attacks-for-14-years/
Economic Considerations[edit]
One of the key challenges facing Open Source software is the lack of sustainable Open Source business models. While Open Source development has flourished in terms of contributions and innovation, finding viable economic models to support it remains elusive. This is true not only for individual software projects, but for the entire ecosystem, which includes frameworks, libraries, and services that rely on constant maintenance and development.
- Limited Revenue Streams: Many Open Source projects rely on donations, sponsorships, or community support. However, these revenue streams are often insufficient to cover the ongoing costs of development, security patches, and long-term support.
- Corporate Dominance: In the absence of robust business models, large corporations have stepped in, leveraging Open Source technologies while contributing minimally back to the ecosystem. Companies can afford to hire dedicated developers to work on Open Source projects, giving them a significant advantage in shaping the direction of these technologies. This dominance risks undermining the decentralized, community-driven nature of Open Source.
- Ecosystem-Wide Challenges: Beyond software, there is even less clarity on business models for Open-source Hardware. Developing and maintaining hardware—whether it’s microcontrollers, boards, or entire devices—requires significant upfront investment, supply chains, and manufacturing infrastructure. Open Source hardware projects often struggle to gain traction due to the high costs and lack of financial backing.
Without robust business models, Open Source projects are increasingly dependent on voluntary contributions or the goodwill of large companies. This creates an economic imbalance where only projects backed by large corporations or strong communities can survive, leaving others to wither or become abandoned.
The long-term health of the Open Source ecosystem will require innovative business models that address not only the software but the entire infrastructure that supports it, including Open-source Hardware. Until then, the economic challenges will persist, with more projects struggling to find the resources needed to maintain and evolve in a sustainable way.
Purposeful Vulnerabilities in Software[edit]
In 2021, researchers at the University of Minnesota released a paper that revealed they had purposefully tested the feasibility of introducing vulnerabilities (use-after-free bugs) in open source software (OSS) -- in this case the Linux kernel -- via "hypocrite commits." They defined these commits as "...seemingly beneficial commits that in fact introduce other critical issues." [1] While this research is ethically questionable, it prompts doubt about the OSS claim that this development approach produces more reliable, secure and higher-quality software, particularly since the "many eyeballs" approach is suggested to increase the rate at which bugs are identified and fixed. The paper is significant because the Linux kernel powers billions of devices world-wide and the hypocrite commits were not identified until the researcher's paper was published.
The researchers identified three primary reasons for why hypocrite commits to the kernel were possible: [1]
(1) OSS is open by nature, so anyone from anywhere, including malicious ones, can submit patches. (2) Due to the overwhelming patches and performance issues, it is impractical for maintainers to accept preventive patches for “immature vulnerabilities”. (3) OSS like the Linux kernel is extremely complex, so the patch-review process often misses introduced vulnerabilities that involve complicated semantics and contexts.
Suggested mitigations included: [1]
- increasing committer liability and accountability
- employment of advanced static-analysis tools
- high-coverage or directed dynamic testing (such as fuzzers)
- accepting patches for high-risk immature vulnerabilities
- raising awareness of the risk of hypocrite commits
- auditing of public patches by a larger number of people (not just maintainers)
This example illustrates that malicious committers to OSS may have had multiple opportunities to purposefully introduce stealthy vulnerabilities in countless software tools and projects. It is feasible that high-profile OSS focused on privacy and anonymity might have become a target for similar attempts. Successful attempts have the potential to be significant, because they could potentially exist for a long period and affect a countless number of users. As outlined by the researchers, it is possible to mitigate these underhanded methods by updating the code of conduct for OSS and developing/improving tools for patch testing and verification. [1]
Backdoors[edit]
Table: Finding Backdoors in Freedom Software vs Non-Freedom Software
Non-Freedom Software (precompiled binaries) | Freedom Software (source-available) | |
---|---|---|
Original source code is reviewable | No | Yes |
Compiled binary file can be decompiled into disassembly | Yes | Yes |
Regular pre-compiled binaries | Depends [2] | Yes |
Obfuscation (anti-disassembly, anti-debugging, anti-VM) [3] is usually not used | Depends [4] | Yes [5] |
Price for security audit searching for backdoors | Very high [6] | Lower |
Difference between precompiled version and self-compiled version | Unavailable [7] | Small or none [8] |
Reverse-engineering is not required | No | Yes |
Assembler language skills required | Much more | Less |
Always legal to decompile / reverse-engineer | No [9] [10] | Yes [11] |
Possibility of catching backdoors via observing incoming/outgoing Internet connections | Very difficult [12] | Very difficult [12] |
Convenience of spotting backdoors | Lowest convenience [13] | Very high convenience [14] |
Difficulty of spotting intentional, malicious backdoors [15] [16] | Much higher difficulty [17] | Much lower difficulty [18] |
Difficulty of spotting a "bugdoor" [19] | Much higher difficulty [20] | Lower difficulty |
Third parties can legally release a software fork, a patched version without the backdoor | No [21] | Yes [22] |
Third parties can potentially make (possibly illegal) modifications like disabling serial key checks [23] | Yes | Yes |
Software is always modifiable | No [24] | Yes |
Third parties can use static code analysis tools | No | Yes |
Third parties can judge source code quality | No | Yes |
Third parties can find logic bugs in the source code | No | Yes |
Third parties can find logic bugs in the disassembly | Yes | Yes |
Benefits from population-scale scrutiny | No | Yes |
Third parties can benefit from debug symbols during analysis | Depends [25] | Yes |
Display source code intermixed with disassembly | No | Yes [26] |
Effort to audit subsequent releases | Almost same [27] | Usually lower [28] |
Forum discussion: Finding Backdoors in Freedom Software vs Non-Freedom Software |
Spotting backdoors is already very difficult in Freedom Software where the full source code is available to the general public. Spotting backdoors in non-freedom software composed of obfuscated binaries is exponentially more difficult. [29] [30] [31] [32] [33] [34] [35] [36] [37]
To further improve the situation in the future, the Freedom Software community is working on the Reproducible Builds project. Quote:
Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code.
Whilst anyone may inspect the source code of free and open source software for malicious flaws, most software is distributed pre-compiled with no method to confirm whether they correspond.
This incentivises attacks on developers who release software, not only via traditional exploitation, but also in the forms of political influence, blackmail or even threats of violence.
This is particularly a concern for developers collaborating on privacy or security software: attacking these typically result in compromising particularly politically-sensitive targets such as dissidents, journalists and whistleblowers, as well as anyone wishing to communicate securely under a repressive regime.
Whilst individual developers are a natural target, it additionally encourages attacks on build infrastructure as an successful attack would provide access to a large number of downstream computer systems. By modifying the generated binaries here instead of modifying the upstream source code, illicit changes are essentially invisible to its original authors and users alike.
The motivation behind the Reproducible Builds project is therefore to allow verification that no vulnerabilities or backdoors have been introduced during this compilation process. By promising identical results are always generated from a given source, this allows multiple third parties to come to a consensus on a “correct” result, highlighting any deviations as suspect and worthy of scrutiny.
This ability to notice if a developer has been compromised then deters such threats or attacks occurring in the first place as any compromise would be quickly detected. This offers comfort to front-liners that they not only can be threatened, but they would not be coerced into exploiting or exposing their colleagues or end-users.
Several free software projects already, or will soon, provide reproducible builds.
Solution[edit]
awareness mindset movement new generation
See Also[edit]
- grsecurity's Brad Spengler: Interview Notes
- The Invisible Things Lab's blog: Trusting Hardware
- The Invisible Things Lab's blog: More Thoughts on CPU backdoors
- On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits
- Open-source Hardware
- stateless computer - State considered harmful - A proposal for a stateless laptop
- Betrusted: Open Hardware, Open Source, auditable, mobile Computer / Precursor
Footnotes[edit]
- ↑ 1.0 1.1 1.2 1.3 https://raw.githubusercontent.com/QiushiWu/qiushiwu.github.io/main/papers/OpenSourceInsecurity.pdf
- ↑ Some use binary obfuscators.
- ↑ https://resources.infosecinstitute.com/topic/anti-disassembly-anti-debugging-and-anti-vm/
- ↑ Some use obfuscation.
- ↑ An Open Source application binary could be obfuscated in theory. However, depending on the application and the context -- like not being an Open Source obfuscator -- that would be highly suspicious. An Open Source application using obfuscators would probably be criticized in public, get scrutinized, and lose user trust.
- ↑
This is because non-freedom software is usually only available as a pre-compiled, possibly obfuscated binary. Using an anti-decompiler:
- Auditors can only look at the disassembly and cannot compare a pre-compiled version from the software vendor with a self-compiled version from source code.
- There is no source code that is well-written, well-commented, and easily readable by design.
- ↑ Since there is no source code, one cannot self-build one's own binary.
- ↑
- small: for non-reproducible builds (or reproducible builds with bugs)
- none: for reproducible builds
- ↑ Decompilation is often expressly forbidden by license agreements of proprietary software.
- ↑ Skype used DMCA (Digital Millenium Copyright Act) to shut down reverse engineering of Skype
- ↑ Decompilation is always legal and permitted in the license agreements of Freedom Software.
- ↑ 12.0 12.1 This is very difficult because most outgoing connections are encrypted by default. At some point the content must be available to the computer in an unencrypted (plain text) format, but accessing that is not trivial. When running a suspected malicious application, local traffic analyzers like Wireshark cannot be trusted. The reason is the malicious application might have compromised the host operating system and be hiding that information from the traffic analyzer or through a backdoor. One possible option might be running the application inside a virtual machine, but many malicious applications actively attempt to detect this configuration. If a virtual machine is identified, they avoid performing malicious activities to avoid being detected. Ultimately this might be possible, but it is still very difficult.
- ↑ It is necessary to decompile the binary and read "gibberish", or try to catch malicious traffic originating from the software under review. As an example, consider how few people would have decompiled Microsoft Office and kept doing that for every upgrade.
- ↑
It is possible to:
- Audit the source code and confirm it is free of backdoors.
- Compare the precompiled binary with a self-built binary and audit the difference. Ideally, and in future, there will be no difference (thanks to the Reproducible Builds project) or only a small difference (due to non-determinism introduced during compilation, such as timestamps).
- ↑ An example of a malicious backdoor is a hardcoded username and password or login key only known by the software vendor. In this circumstance there is no plausible deniability for the software vendor.
- ↑ List of malicious backdoors in wikipedia.
- ↑ Requires strong disassembly auditing skills.
- ↑ If for example hardcoded login credentials were in the published source code, that would be easy to spot. If the published source code is different from the actual source code used by the developer to compile the binary, that difference would stand out when comparing pre-compiled binaries from the software vendor with self-compiled binaries by an auditor.
- ↑ A "bugdoor" is a vulnerability that can be abused to gain unauthorized access. It also provides plausible deniability for the software vendor. See also: Obfuscated C Code Contest.
- ↑ Such issues are hard to spot in the source code, but even harder to spot in the disassembly.
- ↑ This is forbidden in the license agreement. Due to lack of source code, no serious development is possible.
- ↑ Since source code is already available under a license that permits software forks and redistribution.
- ↑ This entry is to differentiate from the concept immediately above. Pre-compiled proprietary software is often modified by third parties for the purposes of privacy, game modifications, and exploitation.
- ↑ For example, Intel ME could not be disabled in Intel CPUs yet. At the time of writing, a Freedom Software re-implementation of Intel microcode is unavailable.
- ↑ Some may publish debug symbols.
- ↑
objdump
with parameter-S
/--source
- How does objdump manage to display source code with the -S option?
- ↑ It is possible to review the disassembly, but that effort is duplicated for subsequent releases. The disassembly is not optimized to change as little as possible or to be easily understood by humans. If the compiled version added new optimizations or compilation flags changed, that creates a much bigger diff of the disassembly.
- ↑ After the initial audit of a source-available binary, it is possible to follow changes in the source code. To audit any newer releases, an auditor can compare the source code of the initially audited version with the new version. Unless there was a huge code refactoring or complete rewrite, the audit effort for subsequent versions is lower.
- ↑
The consensus is the assembler low level programming language is more difficult than other higher level abstraction programming languages. Example web search terms:
assembler easy
,assembler easier
,assembler difficult
. - ↑
Source code written in higher level abstraction programming languages such as C and C++ are compiled to object code using a compiler. See this article for an introduction and this image.
Source code written in lower level abstraction programming language assembler is converted to object code using an assembler. See the same article above and this image.
Reverse engineering is very difficult for a reasonably complex program that is written in C or C++, where the source code is unavailable; that can be deduced from the high price for it. It is possible to decompile (meaning re-convert) the object code back to C with a decompiler like Boomerang. To put a price tag on it, consider this quote -- Boomerang: Help! I've lost my source code:
How much will it cost? You should expect to pay a significant amount of money for source recovery. The process is a long and intensive one. Depending on individual circumstances, the quality, quantity and size of artifacts, you can expect to pay upwards of US$15,000 per man-month.
- ↑
The following resources try to solve the question of how to disassemble a binary (byte code) into assembly source code and re-assemble (convert) to binary.
- Tricks to Reassemble Disassembly
- IDA pro asm instructions change
- Why there are not any disassemblers that can generate re-assemblable asm code?
- Recompile the asm file IDA pro created
- Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics
- GitHub: Guide to disassemble - disassemble.md
- How to disassemble a binary executable in Linux to get the assembly code?
- Use GCC and objdump to disassemble any hex to assembly code
1. Take a hello world assembler source code.
2. Assemble.
nasm -felf64 hello.asm
3. Link.
ld hello.o -o hello
4. objdump (optional).
objdump -d hello
5. Exercise for the reader: disassemble
hello
and re-assemble. - ↑
The GNU Hello program source file
hello.c
at the time of writing contains170
lines. Theobjdump -d /usr/bin/hello
on Debian buster has2757
lines.Install package(s)
hello
following these instructions1 Platform specific notice.
- Kicksecure: No special notice.
- Kicksecure-Qubes: In Template.
2 Update the package lists and upgrade the system .
sudo apt update && sudo apt full-upgrade
3 Install the
hello
package(s).Using
apt
command line--no-install-recommends
option is in most cases optional.sudo apt install --no-install-recommends hello
4 Platform specific notice.
- Kicksecure: No special notice.
- Kicksecure-Qubes: Shut down Template and restart App Qubes based on it as per Qubes Template Modification .
5 Done.
The procedure of installing package(s)
hello
is complete.objdump -d /usr/bin/hello
2757
- Consider all the Debian package maintainer scripts. Clearly these are easier to review as is, since most of them are written in
sh
orbash
. Review would be difficult if these were converted to a program written in C, and were closed source and precompiled. - Similarly, it is far preferable for OnionShare to stay Open Source and written in python, rather than the project being turned into a precompiled binary.
low
-median
-high
- Python: 77 - 92 - 108
- Python: 51 - 79 - 107
- C: 56 - 80 - 100
- C: 49 - 80 - 112
- Malware reverse engineering: 101 - 124 - 149
- Assembler: 81 - 104 - 131
- Malware Analyst: 75 - 115 - 141
Quote research paper Android Mobile OS Snooping By Samsung, Xiaomi, Huawei and Realme Handsets:
Reverse Engineering A fairly substantial amount of non-trivial reverse engineering is generally required in order to decrypt messages and to at least partially decode the binary plaintext. 1) Handset Rooting: The first step is to gain a shell on the handset with elevated privileges, i.e. in the case of Android to root the handset. This allows us then to (i) obtain copies of the system apps and their data, (ii) use a debugger to instrument and modify running apps (e.g. to extract encryption keys from memory and bypass security checks), and (iii) install a trusted SSL root certificate to allow HTTPS decryption, as we explain below. Rooting typically requires unlocking the bootloader to facilitate access to the so-called fastboot mode, disabling boot image verification and patching the system image. Unlocking the bootloader is often the hardest of these steps, since many handset manufacturers discourage bootloader unlocking. Some, such as Oppo, go so far as to entirely remove fastboot mode (the relevant code is not compiled into the bootloader). The importance of this is that it effectively places a constraint on the handset manufacturers/ mobile OSes that we can analyse. Xiaomi and Realme provide special tools to unlock the bootloader, with Xiaomi requiring registering user details and waiting a week before unlocking. Huawei require a handset-specific unlock code, but no longer supply such codes. To unlock the bootloader on the Huawei handset studied here, we needed to open the case and short the test point pads on the circuit board, in order to boot the device into the Huawei equivalent of Qualcomm’s Emergency Download (EDL) mode. In EDL mode, the bootloader itself can be patched to reset the unlock code to a known value (we used a commercial service for this), and thereby enable unlocking of the bootloader.
Decompiling and Instrumentation On a rooted handset, the Android application packages (APKs) of the apps on the /system disk partition can be extracted, unzipped and decompiled. While the bytecode of Android Java apps can be readily decompiled, the code is almost always deliberately obfuscated in order to deter reverse engineering. As a result, reverse engineering the encryption and binary encoding in an app can feel a little like exploring a darkened maze. Perhaps unsurprisingly, this is frequently a time-consuming process, even for experienced researchers/practitioners. It is often very helpful to connect to a running system app using a debugger, so as to view variable values, extract encryption keys from memory, etc.
The research paper describes in far more detail the highly complicated technical challenges of reverse engineering.
We believe security software like Kicksecure needs to remain Open Source and independent. Would you help sustain and grow the project? Learn more about our 12 year success story and maybe DONATE!