Posted by Aditya Kumar – Software program Engineer
Context
Binary format utilizing an emblem order file (also referred to as binary order file or linker order file) is a well known link-time optimization. The linker makes use of the order of symbols so as file to put out symbols within the binary. Order file based mostly binary format improves utility launch time in addition to different essential person journeys. Order file era is often a multi-step course of the place builders use completely different instruments at each stage. We’re offering a unified set of instruments and documentation that can enable each native app developer to leverage this optimization. Each Android app builders and the AOSP group can profit from the instruments.
Background
Supply code is often structured to facilitate software program improvement and comprehension. The format of features and variables in a binary can be impacted by their relative ordering within the supply code. The binary format impacts utility efficiency because the working system has no manner of realizing which symbols can be required in future and usually makes use of spatial locality as one of many value fashions for prefetching subsequent pages.
However the order of symbols in a binary could not replicate this system execution order. When an utility executes, fetching symbols that aren’t current in reminiscence would lead to web page faults. For instance, think about the next program:
// Take a look at.cpp
int foo() { /* */ } int bar() { /* */ } // Different features... int principal() { bar(); foo();}
Which will get compiled into:
# Take a look at.app page_x: _foo page_y: _bar # Different symbols page_z:_main
When Take a look at.app begins, its entrypoint _main is fetched first then _bar adopted by _foo. Executing Take a look at.app can result in web page faults for fetching every operate. Evaluate this to the next binary format the place all of the features are situated in the identical web page (assuming the features are sufficiently small).
# Take a look at.app page_1: _main page_1: _bar page_1: _foo # Different symbols
On this case when _main will get fetched, _bar and _foo can get fetched within the reminiscence on the similar time. In case these symbols are giant and they’re situated in consecutive pages, there’s a excessive likelihood the working system could prefetch these pages leading to much less web page faults.
As a result of execution order of features throughout an utility lifecycle could rely on numerous elements it’s inconceivable to have a singular order of symbols that’s most effective. Thankfully, utility startup sequence is pretty deterministic and steady basically. And it is usually doable to construct a binary having a desired image order with the assistance of linkers like lld which is the default linker for Android NDK toolchain.
Order file is a textual content file containing a listing of symbols. The linker makes use of the order of symbols in order file to put out symbols within the binary. An order file having features that get known as throughout the app startup sequence can scale back web page faults leading to improved launch time. Order recordsdata can enhance the launch time of cell purposes by greater than 2%. The advantages of order recordsdata are extra significant on bigger apps and decrease finish units. A extra mature order file era system can enhance different essential person journeys.
Design
The order file era entails the next steps
- Acquire app startup sequence utilizing compiler instrumentation method
- Use compiler instrumentation to report each operate invocation
- Run the instrumented binary to gather launch sequence in a (binary) profraw file
- Generate order file from the profraw recordsdata
- Validate order file
- Merge a number of order recordsdata into one
- Recompile the app with the merged order file
Overview
The order file era relies on LLVM’s compiler instrumentation course of. LLVM has a stage to generate the order file then recompile the supply code utilizing the order file.
Acquire app startup sequence
The supply code is instrumented by passing -forder-file-instrumentation to the compiler. Moreover, the -orderfile-write-mapping flag can be required for the compiler to generate a mapping file. The mapping file is generated throughout compilation and it’s used whereas processing the profraw file. The mapping file reveals the mapping from MD5 hash to operate image (as proven under).
# Mapping file MD5 db956436e78dd5fa principal MD5 83bff1e88ac48f32 _GLOBAL__sub_I_main.cpp MD5 c943255f95351375 _Z5mergePiiii MD5 d2d2238cf08db816 _Z9mergeSortPiii MD5 11ed18006e729e73 _Z4partPiii MD5 3e897b5ee8bebbd1 _Z9quickSortPiii
The profile (profraw file) is generated each time the instrumented utility is executed. The profile information within the profraw file accommodates the MD5 hash of the features executed in chronological order. The profraw file doesn’t have duplicate entries as a result of every operate solely outputs its MD5 hash on first invocation. A typical run of binary containing the features listed within the mapping file above can have the next profraw entries.
# Profraw file 00000000 32 8f c4 8a e8 f1 bf 83 fa d5 8d e7 36 64 95 db |2...........6d..| 00000010 16 b8 8d f0 8c 23 d2 d2 75 13 35 95 5f 25 43 c9 |.....#..u.5._%C.| 00000020 d1 bb be e8 5e 7b 89 3e 00 00 00 00 00 00 00 00 |....^{.>........| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
With a purpose to discover the operate names similar to the MD5 hashes in a profraw file, a corresponding mapping file is used.
Word: The compiler instrumentation for order recordsdata (-forder-file-instrumentation) solely works when an optimization flag (01, 02, 03, 0s, 0z) is handed. So, if -O0 (compiler flag usually used for debug builds) is handed, the compiler is not going to instrument the binary. In precept, one ought to use the identical optimization flag for instrumentation that’s utilized in delivery launch binaries.
The Android NDK repository has scripts that automate the order file era given a mapping file and an order file.
Recompiling with Order File
After getting an order file, you present the trail of the order file to the linker utilizing the –symbol-ordering-file flag.
Detailed design
Creating Order File Construct Property
The Android Open Supply Undertaking (AOSP) makes use of a construct system known as soong so we are able to leverage this construct system to go the flags as mandatory. The order file construct property has 4 principal fields:
- instrumentation
- load_order_file
- order_file_path
- cflags
The cflags are supposed to add different mandatory flags (like mapping flags) throughout compilation. The load_order_file and order_file_path tells the construct system to recompile with the order file reasonably than set it to the profiling stage. The order recordsdata have to be in saved in considered one of two paths:
- toolchain/pgo-profiles/orderfiles
- vendor/google_data/pgo_profile/orderfiles
# Profiling orderfile: { instrumentation: true, load_order_file: false, order_file_path: "", cflags: [ "-mllvm", "-orderfile-write-mapping=<filename>-mapping.txt", ], } #Recompiling with Order File orderfile: { instrumentation: true, load_order_file: true, order_file_path: "<filename>.orderfile", }
Creating order recordsdata
We offer a python script to create an order file from a mapping file and a profraw file. The script additionally permits eradicating a selected image or creating an order file till a selected image.
Script Flags:
- Profile file (–profile-file):
- Description: The profile file generated by working a binary compiled with -forder-file-instrumentation
- Mapping file (–mapping-file):
- Description: The mapping file generated throughout compilation that maps MD5 hashes to image names
- Output file (–output):
- Description: The output file title for the order file. Default Title: default.orderfile
- Deny Checklist (–denylist):
- Description: Symbols that you just wish to exclude from the order file
- Final image (–last-symbol):
- Description: The order file will finish on the handed final image and ignore the symbols after it. If you need an order file just for startup, you need to go the final startup image. Final-symbol has precedence over leftover so we are going to output till the final image and ignore the leftover flag.
- Leftover symbols (–leftover):
- Description: Some symbols (features) won’t have been executed so they won’t seem within the profile file. If you need these symbols in your order file, you need to use this flag and it’ll add them on the finish.
Validating order recordsdata
As soon as we get an order file for a library or binary, we have to test whether it is legitimate based mostly on a set of standards. Some order recordsdata might not be of fine high quality so they’re higher discarded. This may occur because of a number of causes like utility terminated unexpectedly, the runtime couldn’t write the entire profraw file earlier than exiting, an undesired code-sequence was collected within the profile, and many others. To automate this course of, we offer a python script that may assist builders test for:
- Partial order that must be within the order file
- Symbols that should be current so as file
- Symbols that shouldn’t be current so as file
- Minimal variety of symbols to make an order file
Script Flags:
- Order file (–order-file):
- Description: The order file you might be validating on the under standards.
- Partial Order (–partial):
- Description: A partial order of symbols that have to be held within the order file.
- Allowed Lists (–allowlist):
- Description: Symbols that have to be current within the order file.
- Denied Lists (–denylist):
- Description: Symbols that shouldn’t be within the order file. Denylist flag has precedence over allowlist.
- Minimal Variety of Entries (–min):
- Description: Minimal variety of symbols wanted for an order file
Merging order recordsdata
At the next degree, the order file symbols in a group of order recordsdata approximate a partial order (poset) of operate names with order outlined by time of execution. Throughout completely different runs of an utility, the order recordsdata may need variations. These variations might be because of OS, system class, construct model, person configurations and many others. Nevertheless, the linker can solely take one order file to construct an utility. With a purpose to have one order file that gives the specified advantages, we have to merge these order recordsdata right into a single order file. The merging algorithm additionally must be environment friendly in order to not decelerate the construct time. There are non-linear clustering algorithms that will not scale effectively for merging giant numbers of order recordsdata, every having many symbols. We offer an environment friendly merging algorithm that converges shortly. The algorithm permits for customizable parameters, such that builders can tune the result.
Merging N partial order units may be completed both pessimistically (merging a collection of order recordsdata all the best way till there’s one order file left) or optimistically (merging all of them without delay). The pessimistic strategy may be inefficient in addition to sub-optimal. Because of this, it’s higher to work with all N partial order units without delay. With a purpose to have an environment friendly implementation it helps to characterize all N posets with a weighted directed Graph (V,E) the place:
- V: Parts of partial order units (symbols) and the variety of occasions it seems in numerous partial order units. Word that the frequency of vertices could also be higher than the sum of all incoming edges due to invocations from uninstrumented elements of binary, dependency injection and many others.
- E (V1 -> V2): An edge happens if the ingredient of V2 instantly succeeds V1 in any partial order set with its weight being the variety of occasions this occurs.
For a binary executable, there’s one root (e.g., principal) vertex, however shared libraries may need many roots based mostly on which features are known as within the binary utilizing them. The graph will get difficult if the applying has threads as they often lead to cycles. To have a topological order, cycles are eliminated by preferring the best likelihood path over others. A Depth-First traversal that selects the best weighted edge serves the aim.
Eradicating Cycles:
- Mark again edges throughout a Depth-First traversal - For every Cycle (c): - Add the weights of all in-edges of every vertex (v) within the cycle excluding the perimeters within the cycle - Take away the cycle edge pointing **to** the vertex with highest sum
After cycles are eliminated, the identical depth first traversal offers a topological order (the order file) when all of the ahead edges are eliminated. Basically, the algorithm computes a minimum-spanning-tree of maximal weights and traverses the tree in topological order.
Producing an order:
printOrderUtil(G, n, order): - If n was visited: - return - Add n to the top of order - Kind all out edges based mostly on weight - For each out_edge (n, v): - printOrderUtil(G, v, order) printOrder(G): - Get all roots - order = [] - For every root r: - printOrderUtil(G, r, order) - return order
Instance:
Given the next order recordsdata:
- principal -> b -> c -> d
- principal -> a -> c
- principal -> e -> f
- principal -> b
- principal -> b
- principal -> c -> b
The graph to the proper is obtained by eradicating cycles.
- DFS: principal -> b-> c -> b
- Again edge: c -> b
- Cycle: b -> c-> b
- Cycle edges: [b -> c, c -> b]
- b’s sum of in-edges is 3
- c’s sum of in-edges is 2
- This suggests b can be traversed from the next frequency edge, so c -> b is eliminated
- Ignore ahead edges a->c, main->c
- The DFS of the acyclic graph on the proper will produce an order file principal -> b -> c -> d -> a -> e -> f after ignoring the ahead edges.
Amassing order recordsdata for Android Apps (Java, Kotlin)
The order file instrumentation and profile information assortment is barely enabled for C/C++ purposes. Because of this, it can not profit Java or Kotlin purposes. Nevertheless, Android apps that ship compiled C/C++ libraries can profit from order file.
To generate order file for libraries which might be utilized by Java/Kotlin purposes, we have to invoke the runtime strategies (known as as a part of order file instrumentation) on the proper locations. There are three features that customers should name:
- __llvm_profile_set_filename(char *f): Set the title of the file the place profraw information can be dumped.
- __llvm_profile_initialize_file: Initialize the file set by __llvm_profile_set_filename
- __llvm_orderfile_dump: Dumps the profile(order file information) collected whereas working instrumented binary
Equally, the compiler and linker flags needs to be added to construct configurations. We offer template construct system recordsdata e.g, CMakeLists.txt to compile with the right flags and add a operate to dump the order recordsdata when the Java/Kotlin utility calls it.
# CMakeLists.txt set(GENERATE_PROFILES ON) #set(USE_PROFILE "${CMAKE_SOURCE_DIR}/demo.orderfile") add_library(orderfiledemo SHARED orderfile.cpp) target_link_libraries(orderfiledemo log) if(GENERATE_PROFILES) # Producing profiles require any optimization flag other than -O0. # The mapping file will not generate and the profile instrumentation does not work with out an optimization flag. target_compile_options( orderfiledemo PRIVATE -forder-file-instrumentation -O2 -mllvm -orderfile-write-mapping=mapping.txt ) target_link_options( orderfiledemo PRIVATE -forder-file-instrumentation ) target_compile_definitions(orderfiledemo PRIVATE GENERATE_PROFILES) elseif(USE_PROFILE) target_compile_options( orderfiledemo PRIVATE -Wl,--image-ordering-file=${USE_PROFILE} -Wl,--no-warn-image-ordering ) target_link_options( orderfiledemo PRIVATE -Wl,--image-ordering-file=${USE_PROFILE} -Wl,--no-warn-image-ordering ) endif()
We additionally present a pattern app to dump order recordsdata from a Kotlin utility. The pattern app creates a shared library known as “orderfiledemo” and invokes the DumpProfileDataIfNeeded operate to dump the order file. This library may be taken out of this pattern app and may be repurposed for different purposes.
// Order File Library #if outlined(GENERATE_PROFILES) extern "C" int __llvm_profile_set_filename(const char *); extern "C" int __llvm_profile_initialize_file(void); extern "C" int __llvm_orderfile_dump(void); #endif void DumpProfileDataIfNeeded(const char *temp_dir) { #if outlined(GENERATE_PROFILES) char profile_location[PATH_MAX] = {}; snprintf(profile_location, sizeof(profile_location), "%s/demo.output", temp_dir); __llvm_profile_set_filename(profile_location); __llvm_profile_initialize_file(); __llvm_orderfile_dump(); __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Wrote profile information to %s", profile_location); #else __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Didn't write profile information as a result of the app was not " "constructed for profile era"); #endif } extern "C" JNIEXPORT void JNICALL Java_com_example_orderfiledemo_MainActivity_runWorkload(JNIEnv *env, jobject /* this */, jstring temp_dir) { DumpProfileDataIfNeeded(env->GetStringUTFChars(temp_dir, 0)); }
# Kotlin Software class MainActivity : AppCompatActivity() { non-public lateinit var binding: ActivityMainBinding override enjoyable onCreate(savedInstanceState: Bundle?) { tremendous.onCreate(savedInstanceState) binding = ActivityMainBinding.inflate(layoutInflater) setContentView(binding.root) runWorkload(applicationContext.cacheDir.toString()) binding.sampleText.textual content = "Hey, world!" } /** * A local technique that's carried out by the 'orderfiledemo' native library, * which is packaged with this utility. */ exterior enjoyable runWorkload(tempDir: String) companion object { // Used to load the 'orderfiledemo' library on utility startup. init { System.loadLibrary("orderfiledemo") } } }
Limitation
order file era solely works for native binaries. The validation and merging scripts will work for any set of order recordsdata.