[ACCEPTED]-How to reverse engineer a program which has no documentation-open-source

Accepted answer
Score: 11

Michael Feathers' "Working Effectively with Legacy Code" is a superb starting point 21 for such endeavors -- not particularly language-dependent 20 (his examples are in several non-python 19 languages, but the techniques and mindset 18 DO extend pretty well to Python and just 17 about any other language).

The key focus 16 is, that you want to understand the code 15 for a reason -- modifying it and/or porting it. So, instrumenting 14 the legacy code -- with batteries and scaffolding 13 of tests and tracing/logging -- is the crucial 12 path on the long, hard slog to understanding 11 and modifying safely and responsibly.

Feathers 10 suggests heuristics and techniques for where 9 to focus your efforts and how to get started 8 when the code is a total mess (hence "legacy") - no 7 docs, or misleading docs (describing something quite 6 different, maybe in subtle ways, from what 5 the code actually DOES), no tests, an untestable-without-refactoring 4 tangle of spaghetti dependencies. This may 3 seem an extreme case but anybody who's spent 2 a long-ish career in programming knows it's 1 actually more common than anyone would like;-).

Score: 5
  • In past I have used 'Python call graph' to understand the source structure
  • Use a debugger e.g. pdb to wak thru the code.
  • Try to read code again after one day break, that also helps


Score: 5

I would recommend to generate some documentation 9 with epydoc http://epydoc.sourceforge.net/ . For sure, if no docstring 8 exists, the result will be poor but it will 7 give you at least one view of your application 6 and you'lle be able to navigate in the classes 5 more easily.

Then you can try to document 4 by yourself when you understand something 3 new and then regenerate the docs again. It 2 is never too late to start something.

I hope 1 it helps

Score: 3

You are lucky it's in Python which is easy 17 to read. But it is of course possible to 16 write tricky hard to understand code in 15 Python as well.

The steps are:

  1. Run the software and learn to use it, and understand it's features at least a little bit.
  2. Read though the tests, if any.
  3. Read through the code.
  4. When you encounter code you don't understand, put a debug break there, and step through the code, looking at what it does.
  5. If there aren't any tests, or the test coverage is low, write tests to increase the test coverage. It's a good way to learn the system.
  6. Repeat until you feel you have a vague grip on the code. A vague grip is all you need if you are going to manage the code. You'll get a good grip once you start actually working with the code. For a big system that can take years, so don't try to understand it all first.

There are tools 14 that can help you. As Stephen C says, an 13 IDE is a good idea. I'll explain why:

Many 12 editors analyses the code. This typically 11 gives you code completion, but more importantly 10 in this case, it makes it possible to just 9 just ctrl-click on a variable to see where 8 it comes from. This really speeds things 7 up when you want to understand otehr peoples 6 code.

Also, you need to learn a debugger. You 5 will, in tricky parts of the code, have 4 to step through them in a debugger to see 3 what the code actually do. Pythons pdb works, but 2 many IDE's have integrated debuggers, which 1 make debugging easier.

That's it. Good luck.

Score: 2

I have had to do a lot of this in my job. What 23 works for me may be different to what works 22 for you, but I'll share my experience.

I 21 start by trying to identify the data structures 20 being used and draw diagrams showing the 19 relationships between them. Not necessarily 18 something formal like UML, but a sketch 17 on paper which you understand which allows 16 you to see the overall structure of the 15 data being manipulated by the program. Only 14 once I have some view of the data structures 13 being used do I start to try to understand 12 how the data is being manipulated.

Secondly, for 11 a large body of software, sometimes you 10 need to just attack bite sized pieces at 9 first. You won't get an overall understanding 8 straight away, but if you understand small 7 parts in detail and keep chipping away, eventually 6 all the pieces fall together.

I combine these 5 two approaches, switching between them when 4 I am getting overly frustrated or bored. Regular 3 walks around the block are recommended :) I 2 find this gets me good results in the end.

Good 1 luck!

Score: 1

pyreverse from Logilab and PyNSource from 2 Andy Bulka are helpful too for UML diagram 1 generation.

Score: 0

I'd start with a good python IDE. See the 1 answers for this question.

Score: 0

Enterprise Architect by Sparx Systems is very good at processing 6 a source directory and generating class 5 diagrams. It is not free, but very reasonably 4 priced for what you get. (I am not associated 3 with this company in any way, I've just 2 been a satisfied user of their product for 1 several years.)

More Related questions