Skip to content

katja-kolos/program_analysis

Repository files navigation

dynamicslicing

Installation

pip install -r requirements.txt
pip install -e . 

NB: works with an updated version of pip. Earlier versions require setup.py or setup.cfg, which are missing. The newer version can config from the .toml file instead.

Milestones

Deliverables for progress meetings throughout the semester

Milestone 1

Goal: getting to know DynaPyt, the basic framework for analysis

Task 1

A dynamic analysis in DynaPyt that each time a variable is being written to, prints the value being written.

This analysis is implemented in src/dynamicslicing/trace_writes.py.

Test:

./milestone1_task1.sh

Task 2

A function that, given the path of a Python file and a set of code lines to keep, removes nodes from the syntax tree that are not on these lines.

This function is implemented in src/dynamicslicing/utils.py as the remove_lines function.

Test:

python3 milestone1_task2.py --filename "tests/milestone1/program.py.copy" --lines 1 2 4 5 9

Note: using SimpleStatementLine. For some reason leave_BaseStatement did not seem to be considered for visits, as if the name of the function was odd.

Milestone 2

Goal: implement dynamic program slicing based on data-flow dependencies.

Test: pytest tests --only tests/milestone2 from the top dynamicslisingdirectory.

Alternative testing for debug purposes: python milestone2_tmp.py from the top dynamicslisingdirectory.

Idea:

  • find the slicing criterion line;
  • identify all generated variables, as defined in Live Variables analysis. Add those variables to the gen set;
  • go back to the preceding statement in dataflow.
    • Remove variables that are killed (=assigned to, appended to etc) in this statement, from the gen set.
    • For all variables that are generated in this statement, add them to the gen set.
    • If the statement adds to the gen set, add the node to sentences_to_keep.

Example: gen = {a, b, x} a = c + 2 #adds c, removes a x = y + x + 1 #adds x and y

Two implementations have been tried:

  1. assuming a linear dataflow (enough for Milestone 2).
  2. a dynamic dataflow based on the execution flow from DynaPyt, as traced with (read and write hooks).

The first implementation allows to consider all possible paths without paying attention to the actual values of variables.

The second one is limited to the actual execution flow and eliminates all the lines (nodes) that were not visited.

Note: if the function slice_me() accepted arguments or could be influenced from the outside context, dynamic executions would cover different paths; fuzzy testing could have been employed to study possible dataflows. As the behaviour of the function is deterministic, we assume that the only execution flow is the dataflow.

Milestone 3

Goal: implement the complete slicing algorithm, including both data-flow and control-flow dependencies.

Test: pytest tests --only tests/milestone3 from the top dynamicslisingdirectory.

Alternative testing for debug purposes: python milestone3_tmp.py from the top dynamicslisingdirectory.

New: keeping parent nodes, if any of their children are kept. Partly implemented in the dynamic part of the analysis (child nodes' line numbers saved at execution with the corresponding parent nodes' numbers). Partly implemented in LibCST in the RemoveLineByNum class (leave_If, leave_For, leave_While, leave_SimpleStatementLine hooks) (utils.py).

Limitations:

  • most probably does not work with more deep hierarchical structures/more complicated branching; recursive treatment needs to be introduced for that;
  • wasting resources on analysing the code outside the slice_me() function (post-filtering allows modification to the lines inside the slice_me subtree only);
  • works with line numbers and not node iids; although treating nodes larger than one line is possible thanks to location.start and location.end, treating nodes smaller than one line (if such test would be produced) is complicated;
  • works with variable names and hence does not treat correctly alias names as in the two additional tests to Milestone 2 from Beatriz;
  • the whole code was produced in test-driven development and only covers the syntax represented in our tests.

About

Course Project, UniStuttgart, WiSe2324

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published