code view::java/bcel/src/site/xdoc/manual.xml

[top] / java / bcel / src / site / xdoc / manual.xml
     <?xml version="1.0"?>
     <!--
         * Licensed to the Apache Software Foundation (ASF) under one
         * or more contributor license agreements.  See the NOTICE file
         * distributed with this work for additional information
         * regarding copyright ownership.  The ASF licenses this file
         * to you under the Apache License, Version 2.0 (the
         * "License"); you may not use this file except in compliance
         * with the License.  You may obtain a copy of the License at
         * 
         *   http://www.apache.org/licenses/LICENSE-2.0
         * 
         * Unless required by applicable law or agreed to in writing,
         * software distributed under the License is distributed on an
         * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
         * KIND, either express or implied.  See the License for the
         * specific language governing permissions and limitations
         * under the License.    
     -->
     <document>
     
       <properties>
         <title>Byte Code Engineering Library (BCEL)</title>
       </properties>
     
       <body>
     
       <section name="Abstract">
       <p>
         Extensions and improvements of the programming language Java and
         its related execution environment (Java Virtual Machine, JVM) are
         the subject of a large number of research projects and
         proposals. There are projects, for instance, to add parameterized
         types to Java, to implement <a
         href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>, to
         perform sophisticated static analysis, and to improve the run-time
         performance.
       </p>
     
       <p>
         Since Java classes are compiled into portable binary class files
         (called <em>byte code</em>), it is the most convenient and
         platform-independent way to implement these improvements not by
         writing a new compiler or changing the JVM, but by transforming
         the byte code. These transformations can either be performed
         after compile-time, or at load-time. Many programmers are doing
         this by implementing their own specialized byte code manipulation
         tools, which are, however, restricted in the range of their
         re-usability.
       </p>
     
       <p>
         To deal with the necessary class file transformations, we
         introduce an API that helps developers to conveniently implement
         their transformations.
       </p>
       </section>
     
       <section name="1 Introduction">
       <p>
         The <a href="http://java.sun.com/">Java</a> language has become
         very popular and many research projects deal with further
         improvements of the language or its run-time behavior. The
         possibility to extend a language with new concepts is surely a
         desirable feature, but the implementation issues should be hidden
         from the user. Fortunately, the concepts of the Java Virtual
         Machine permit the user-transparent implementation of such
         extensions with relatively little effort.
       </p>
     
       <p>
         Because the target language of Java is an interpreted language
         with a small and easy-to-understand set of instructions (the
         <em>byte code</em>), developers can implement and test their
         concepts in a very elegant way. One can write a plug-in
         replacement for the system's <em>class loader</em> which is
         responsible for dynamically loading class files at run-time and
         passing the byte code to the Virtual Machine (see section ).
         Class loaders may thus be used to intercept the loading process
         and transform classes before they get actually executed by the
         JVM. While the original class files always remain unaltered, the
         behavior of the class loader may be reconfigured for every
         execution or instrumented dynamically.
       </p>
       
       <p>
         The <font face="helvetica,arial">BCEL</font> API (Byte Code
         Engineering Library), formerly known as JavaClass, is a toolkit
         for the static analysis and dynamic creation or transformation of
         Java class files. It enables developers to implement the desired
         features on a high level of abstraction without handling all the
         internal details of the Java class file format and thus
         re-inventing the wheel every time. <font face="helvetica,arial">BCEL
         </font> is written entirely in Java and freely available under the
         terms of the <a href="license.html">Apache Software License</a>.
       </p>
     
       <p>
         This manual is structured as follows: We give a brief description
         of the Java Virtual Machine and the class file format in <a
         href="#2 The Java Virtual Machine">section 2</a>. <a href="#3 The
         BCEL API">Section 3</a> introduces the <font
         face="helvetica,arial">BCEL</font> API. <a href="#4 Application
         areas">Section 4</a> describes some typical application areas and
         example projects. The appendix contains code examples that are to
         long to be presented in the main part of this paper. All examples
         are included in the down-loadable distribution.
       </p>
     
       </section>
     
       <section name="2 The Java Virtual Machine">
       <p>
         Readers already familiar with the Java Virtual Machine and the
         Java class file format may want to skip this section and proceed
         with <a href="#3 The BCEL API">section 3</a>.
       </p>
     
       <p>
         Programs written in the Java language are compiled into a portable
         binary format called <em>byte code</em>. Every class is
         represented by a single class file containing class related data
         and byte code instructions. These files are loaded dynamically
         into an interpreter (<a
         href="http://java.sun.com/docs/books/vmspec/index.html">Java
         Virtual Machine</a>, aka. JVM) and executed.
       </p>
     
       <p>
         <a href="#Figure 1">Figure 1</a> illustrates the procedure of
         compiling and executing a Java class: The source file
         (<tt>HelloWorld.java</tt>) is compiled into a Java class file
         (<tt>HelloWorld.class</tt>), loaded by the byte code interpreter
         and executed. In order to implement additional features,
         researchers may want to transform class files (drawn with bold
         lines) before they get actually executed. This application area
         is one of the main issues of this article.
       </p>
       
       <p align="center">
       <a name="Figure 1">
       <img src="images/jvm.gif"/>
       <br/>
       Figure 1: Compilation and execution of Java classes</a>
       </p>
         
       <p>
         Note that the use of the general term "Java" implies in fact two
         meanings: on the one hand, Java as a programming language, on the
         other hand, the Java Virtual Machine, which is not necessarily
         targeted by the Java language exclusively, but may be used by <a
         href="http://www.robert-tolksdorf.de/vmlanguages.html">other
         languages</a> as well. We assume the reader to be familiar with
         the Java language and to have a general understanding of the
         Virtual Machine.
       </p>
     
       </section>
     
       <section name="2.1 Java class file format">
       <p>
         Giving a full overview of the design issues of the Java class file
         format and the associated byte code instructions is beyond the
         scope of this paper. We will just give a brief introduction
         covering the details that are necessary for understanding the rest
         of this paper. The format of class files and the byte code
         instruction set are described in more detail in the <a
         href="http://java.sun.com/docs/books/vmspec/index.html">Java
         Virtual Machine Specification</a>. Especially, we will not deal
         with the security constraints that the Java Virtual Machine has to
         check at run-time, i.e. the byte code verifier.
       </p>
     
       <p>
         <a href="#Figure 2">Figure 2</a> shows a simplified example of the
         contents of a Java class file: It starts with a header containing
         a "magic number" (<tt>0xCAFEBABE</tt>) and the version number,
         followed by the <em>constant pool</em>, which can be roughly
         thought of as the text segment of an executable, the <em>access
         rights</em> of the class encoded by a bit mask, a list of
         interfaces implemented by the class, lists containing the fields
         and methods of the class, and finally the <em>class
         attributes</em>, e.g.,  the <tt>SourceFile</tt> attribute telling
         the name of the source file. Attributes are a way of putting
         additional, user-defined information into class file data
         structures. For example, a custom class loader may evaluate such
         attribute data in order to perform its transformations. The JVM
         specification declares that unknown, i.e., user-defined attributes
         must be ignored by any Virtual Machine implementation.
       </p>
     
       <p align="center">
       <a name="Figure 2">
       <img src="images/classfile.gif"/>
       <br/>
       Figure 2: Java class file format</a>
       </p>    
     
       <p>
         Because all of the information needed to dynamically resolve the
         symbolic references to classes, fields and methods at run-time is
         coded with string constants, the constant pool contains in fact
         the largest portion of an average class file, approximately
         60%. In fact, this makes the constant pool an easy target for code
         manipulation issues. The byte code instructions themselves just
         make up 12%.
       </p>
       
       <p>
         The right upper box shows a "zoomed" excerpt of the constant pool,
         while the rounded box below depicts some instructions that are
         contained within a method of the example class. These
         instructions represent the straightforward translation of the
         well-known statement:
       </p>
     
       <p align="center">
         <source>System.out.println("Hello, world");</source>
       </p>
     
       <p>
         The first instruction loads the contents of the field <tt>out</tt>
         of class <tt>java.lang.System</tt> onto the operand stack. This is
         an instance of the class <tt>java.io.PrintStream</tt>. The
         <tt>ldc</tt> ("Load constant") pushes a reference to the string
         "Hello world" on the stack. The next instruction invokes the
         instance method <tt>println</tt> which takes both values as
         parameters (Instance methods always implicitly take an instance
         reference as their first argument).
       </p>
       
       <p>
         Instructions, other data structures within the class file and
         constants themselves may refer to constants in the constant pool.
         Such references are implemented via fixed indexes encoded directly
         into the instructions. This is illustrated for some items of the
         figure emphasized with a surrounding box.
       </p>
       
       <p>
         For example, the <tt>invokevirtual</tt> instruction refers to a
         <tt>MethodRef</tt> constant that contains information about the
         name of the called method, the signature (i.e., the encoded
         argument and return types), and to which class the method belongs.
         In fact, as emphasized by the boxed value, the <tt>MethodRef</tt>
         constant itself just refers to other entries holding the real
         data, e.g., it refers to a <tt>ConstantClass</tt> entry containing
         a symbolic reference to the class <tt>java.io.PrintStream</tt>.
         To keep the class file compact, such constants are typically
         shared by different instructions and other constant pool
         entries. Similarly, a field is represented by a <tt>Fieldref</tt>
         constant that includes information about the name, the type and
         the containing class of the field.
       </p>
     
       <p>
         The constant pool basically holds the following types of
         constants: References to methods, fields and classes, strings,
         integers, floats, longs, and doubles.
       </p>
       
       </section>
       
       <section name="2.2 Byte code instruction set">
       <p>
         The JVM is a stack-oriented interpreter that creates a local stack
         frame of fixed size for every method invocation. The size of the
         local stack has to be computed by the compiler. Values may also be
         stored intermediately in a frame area containing <em>local
         variables</em> which can be used like a set of registers. These
         local variables are numbered from 0 to 65535, i.e., you have a
         maximum of 65536 of local variables per method. The stack frames
         of caller and callee method are overlapping, i.e., the caller
         pushes arguments onto the operand stack and the called method
         receives them in local variables.
       </p>
       
       <p>
         The byte code instruction set currently consists of 212
         instructions, 44 opcodes are marked as reserved and may be used
         for future extensions or intermediate optimizations within the
         Virtual Machine. The instruction set can be roughly grouped as
         follows:
       </p>
       
       <p>
         <b>Stack operations:</b> Constants can be pushed onto the stack
          either by loading them from the constant pool with the
          <tt>ldc</tt> instruction or with special "short-cut"
          instructions where the operand is encoded into the instructions,
          e.g.,  <tt>iconst_0</tt> or <tt>bipush</tt> (push byte value).
       </p>
       
       <p>
         <b>Arithmetic operations:</b> The instruction set of the Java
            Virtual Machine distinguishes its operand types using different
            instructions to operate on values of specific type. Arithmetic
            operations starting with <tt>i</tt>, for example, denote an
            integer operation. E.g., <tt>iadd</tt> that adds two integers
            and pushes the result back on the stack. The Java types
            <tt>boolean</tt>, <tt>byte</tt>, <tt>short</tt>, and
            <tt>char</tt> are handled as integers by the JVM.
       </p>
         
       <p>
         <b>Control flow:</b> There are branch instructions like
          <tt>goto</tt>, and <tt>if_icmpeq</tt>, which compares two integers
          for equality. There is also a <tt>jsr</tt> (jump to sub-routine)
          and <tt>ret</tt> pair of instructions that is used to implement
          the <tt>finally</tt> clause of <tt>try-catch</tt> blocks.
          Exceptions may be thrown with the <tt>athrow</tt> instruction.
          Branch targets are coded as offsets from the current byte code
          position, i.e., with an integer number.
       </p>
       
       <p>
         <b>Load and store operations</b> for local variables like
           <tt>iload</tt> and <tt>istore</tt>. There are also array
           operations like <tt>iastore</tt> which stores an integer value
           into an array.
       </p>
       
       <p>
         <b>Field access:</b> The value of an instance field may be
          retrieved with <tt>getfield</tt> and written with
          <tt>putfield</tt>. For static fields, there are
          <tt>getstatic</tt> and <tt>putstatic</tt> counterparts.
       </p>
       
       <p>
         <b>Method invocation:</b> Static Methods may either be called via
          <tt>invokestatic</tt> or be bound virtually with the
          <tt>invokevirtual</tt> instruction. Super class methods and
          private methods are invoked with <tt>invokespecial</tt>. A
          special case are interface methods which are invoked with
          <tt>invokeinterface</tt>.
       </p>
         
       <p>
         <b>Object allocation:</b> Class instances are allocated with the
           <tt>new</tt> instruction, arrays of basic type like
           <tt>int[]</tt> with <tt>newarray</tt>, arrays of references like
           <tt>String[][]</tt> with <tt>anewarray</tt> or
           <tt>multianewarray</tt>.
       </p>
       
       <p>
         <b>Conversion and type checking:</b> For stack operands of basic
           type there exist casting operations like <tt>f2i</tt> which
           converts a float value into an integer. The validity of a type
           cast may be checked with <tt>checkcast</tt> and the
           <tt>instanceof</tt> operator can be directly mapped to the
           equally named instruction.
       </p>
     
       <p>
         Most instructions have a fixed length, but there are also some
         variable-length instructions: In particular, the
         <tt>lookupswitch</tt> and <tt>tableswitch</tt> instructions, which
         are used to implement <tt>switch()</tt> statements.  Since the
         number of <tt>case</tt> clauses may vary, these instructions
         contain a variable number of statements.
       </p>
     
       <p>
         We will not list all byte code instructions here, since these are
         explained in detail in the <a
         href="http://java.sun.com/docs/books/vmspec/index.html">JVM
         specification</a>. The opcode names are mostly self-explaining,
         so understanding the following code examples should be fairly
         intuitive.
       </p>
     
       </section>
     
       <section name="2.3 Method code">
       <p>
         Non-abstract (and non-native) methods contain an attribute
         "<tt>Code</tt>" that holds the following data: The maximum size of
         the method's stack frame, the number of local variables and an
         array of byte code instructions. Optionally, it may also contain
         information about the names of local variables and source file
         line numbers that can be used by a debugger.
       </p>
       
       <p>
         Whenever an exception is raised during execution, the JVM performs
         exception handling by looking into a table of exception
         handlers. The table marks handlers, i.e., code chunks, to be
         responsible for exceptions of certain types that are raised within
         a given area of the byte code. When there is no appropriate
         handler the exception is propagated back to the caller of the
         method. The handler information is itself stored in an attribute
         contained within the <tt>Code</tt> attribute.
       </p>
       
       </section>
       
       <section name="2.4 Byte code offsets">
       <p>
         Targets of branch instructions like <tt>goto</tt> are encoded as
         relative offsets in the array of byte codes. Exception handlers
         and local variables refer to absolute addresses within the byte
         code.  The former contains references to the start and the end of
         the <tt>try</tt> block, and to the instruction handler code. The
         latter marks the range in which a local variable is valid, i.e.,
         its scope. This makes it difficult to insert or delete code areas
         on this level of abstraction, since one has to recompute the
         offsets every time and update the referring objects. We will see
         in <a href="#3.3 ClassGen">section 3.3</a> how <font
         face="helvetica,arial">BCEL</font> remedies this restriction.
       </p>
     
       </section>
     
       <section name="2.5 Type information">
       <p>
         Java is a type-safe language and the information about the types
         of fields, local variables, and methods is stored in so called
         <em>signatures</em>. These are strings stored in the constant pool
         and encoded in a special format. For example the argument and
         return types of the <tt>main</tt> method
       </p>
     
       <p align="center">
       <source>public static void main(String[] argv)</source>
       </p>
     
       <p>
       are represented by the signature
       </p>
     
       <p align="center">
       <source>([java/lang/String;)V</source>
       </p>
     
       <p>
         Classes are internally represented by strings like
         <tt>"java/lang/String"</tt>, basic types like <tt>float</tt> by an
         integer number. Within signatures they are represented by single
         characters, e.g., <tt>I</tt>, for integer. Arrays are denoted with
         a <tt>[</tt> at the start of the signature.
       </p>
     
       </section>
     
       <section name="2.6 Code example">
       <p>
         The following example program prompts for a number and prints the
         factorial of it. The <tt>readLine()</tt> method reading from the
         standard input may raise an <tt>IOException</tt> and if a
         misspelled number is passed to <tt>parseInt()</tt> it throws a
         <tt>NumberFormatException</tt>. Thus, the critical area of code
         must be encapsulated in a <tt>try-catch</tt> block.
       </p>
       
       <source>  
         import java.io.*;
     
         public class Factorial {
           private static BufferedReader in = new BufferedReader(new
                                     InputStreamReader(System.in));
     
           public static final int fac(int n) {
             return (n == 0)? 1 : n * fac(n - 1);
           }
     
           public static final int readInt() {
             int n = 4711;
             try {
             System.out.print("Please enter a number> ");
             n = Integer.parseInt(in.readLine());
             } catch(IOException e1) { System.err.println(e1); }
             catch(NumberFormatException e2) { System.err.println(e2); }
             return n;
           }
     
           public static void main(String[] argv) {
             int n = readInt();
             System.out.println("Factorial of " + n + " is " + fac(n));
           }
         }
       </source>
     
       <p>
         This code example typically compiles to the following chunks of
         byte code:
       </p>
       
       <source>
         0:  iload_0
         1:  ifne            #8
         4:  iconst_1
         5:  goto            #16
         8:  iload_0
         9:  iload_0
         10: iconst_1
         11: isub
         12: invokestatic    Factorial.fac (I)I (12)
         15: imul
         16: ireturn
     
         LocalVariable(start_pc = 0, length = 16, index = 0:int n)
       </source>
     
       <p><b>fac():</b>
         The method <tt>fac</tt> has only one local variable, the argument
         <tt>n</tt>, stored at index 0. This variable's scope ranges from
         the start of the byte code sequence to the very end.  If the value
         of <tt>n</tt> (the value fetched with <tt>iload_0</tt>) is not
         equal to 0, the <tt>ifne</tt> instruction branches to the byte
         code at offset 8, otherwise a 1 is pushed onto the operand stack
         and the control flow branches to the final return.  For ease of
         reading, the offsets of the branch instructions, which are
         actually relative, are displayed as absolute addresses in these
         examples.
       </p>
       
       <p>
         If recursion has to continue, the arguments for the multiplication
         (<tt>n</tt> and <tt>fac(n - 1)</tt>) are evaluated and the results
         pushed onto the operand stack.  After the multiplication operation
         has been performed the function returns the computed value from
         the top of the stack.
       </p>
     
       <source>
         0:  sipush        4711
         3:  istore_0
         4:  getstatic     java.lang.System.out Ljava/io/PrintStream;
         7:  ldc           "Please enter a number> "
         9:  invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V
         12: getstatic     Factorial.in Ljava/io/BufferedReader;
         15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String;
         18: invokestatic  java.lang.Integer.parseInt (Ljava/lang/String;)I
         21: istore_0
         22: goto          #44
         25: astore_1
         26: getstatic     java.lang.System.err Ljava/io/PrintStream;
         29: aload_1
         30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V
         33: goto          #44
         36: astore_1
         37: getstatic     java.lang.System.err Ljava/io/PrintStream;
         40: aload_1
         41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V 
         44: iload_0
         45: ireturn
     
         Exception handler(s) = 
         From    To      Handler Type
         4       22      25      java.io.IOException(6)
         4       22      36      NumberFormatException(10)
       </source>
       
       <p><b>readInt():</b> First the local variable <tt>n</tt> (at index 0)
         is initialized to the value 4711.  The next instruction,
         <tt>getstatic</tt>, loads the referencs held by the static
         <tt>System.out</tt> field onto the stack. Then a string is loaded
         and printed, a number read from the standard input and assigned to
         <tt>n</tt>.
       </p>
     
       <p>
         If one of the called methods (<tt>readLine()</tt> and
         <tt>parseInt()</tt>) throws an exception, the Java Virtual Machine
         calls one of the declared exception handlers, depending on the
         type of the exception.  The <tt>try</tt>-clause itself does not
         produce any code, it merely defines the range in which the
         subsequent handlers are active. In the example, the specified
         source code area maps to a byte code area ranging from offset 4
         (inclusive) to 22 (exclusive).  If no exception has occurred
         ("normal" execution flow) the <tt>goto</tt> instructions branch
         behind the handler code. There the value of <tt>n</tt> is loaded
         and returned.
       </p>
     
       <p>
         The handler for <tt>java.io.IOException</tt> starts at
         offset 25. It simply prints the error and branches back to the
         normal execution flow, i.e., as if no exception had occurred.
       </p>
     
       </section>
     
       <section name="3 The BCEL API">
       <p>
         The <font face="helvetica,arial">BCEL</font> API abstracts from
         the concrete circumstances of the Java Virtual Machine and how to
         read and write binary Java class files. The API mainly consists
         of three parts:
       </p>
     
       <p>
     
         <ol type="1">
         <li> A package that contains classes that describe "static"
         constraints of class files, i.e., reflects the class file format and
         is not intended for byte code modifications. The classes may be
         used to read and write class files from or to a file.  This is
         useful especially for analyzing Java classes without having the
         source files at hand.  The main data structure is called
         <tt>JavaClass</tt> which contains methods, fields, etc..</li>
     
         <li> A package to dynamically generate or modify
         <tt>JavaClass</tt> or <tt>Method</tt> objects.  It may be used to
         insert analysis code, to strip unnecessary information from class
         files, or to implement the code generator back-end of a Java
         compiler.</li>
     
         <li> Various code examples and utilities like a class file viewer,
         a tool to convert class files into HTML, and a converter from
         class files to the <a
         href="http://mrl.nyu.edu/~meyer/jasmin/">Jasmin</a> assembly
         language.</li>
         </ol>
       </p>
       </section>
       
       <section name="3.1 JavaClass">
       <p>
         The "static" component of the <font
          face="helvetica,arial">BCEL</font> API resides in the package
          <tt>org.apache.bcel.classfile</tt> and closely represents class
          files. All of the binary components and data structures declared
          in the <a
          href="http://java.sun.com/docs/books/vmspec/index.html">JVM
          specification</a> and described in section <a
          href="#2 The Java Virtual Machine">2</a> are mapped to classes.
     
          <a href="#Figure 3">Figure 3</a> shows an UML diagram of the
          hierarchy of classes of the <font face="helvetica,arial">BCEL
          </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also
          shows a detailed diagram of the <tt>ConstantPool</tt> components.
       </p>
       
       <p align="center">
       <a name="Figure 3">
       <img src="images/javaclass.gif"/> <br/>
       Figure 3: UML diagram for the JavaClass API</a>
       </p>
     
       <p>
         The top-level data structure is <tt>JavaClass</tt>, which in most
         cases is created by a <tt>ClassParser</tt> object that is capable
         of parsing binary class files. A <tt>JavaClass</tt> object
         basically consists of fields, methods, symbolic references to the
         super class and to the implemented interfaces.
       </p>
       
       <p>
         The constant pool serves as some kind of central repository and is
         thus of outstanding importance for all components.
         <tt>ConstantPool</tt> objects contain an array of fixed size of
         <tt>Constant</tt> entries, which may be retrieved via the
         <tt>getConstant()</tt> method taking an integer index as argument.
         Indexes to the constant pool may be contained in instructions as
         well as in other components of a class file and in constant pool
         entries themselves.
       </p>
       
       <p>
         Methods and fields contain a signature, symbolically defining
         their types.  Access flags like <tt>public static final</tt> occur
         in several places and are encoded by an integer bit mask, e.g.,
         <tt>public static final</tt> matches to the Java expression
       </p>
     
     
       <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source>
     
       <p>
         As mentioned in <a href="#2.1 Java class file format">section
         2.1</a> already, several components may contain <em>attribute</em>
         objects: classes, fields, methods, and <tt>Code</tt> objects
         (introduced in <a href="#2.3 Method code">section 2.3</a>).  The
         latter is an attribute itself that contains the actual byte code
         array, the maximum stack size, the number of local variables, a
         table of handled exceptions, and some optional debugging
         information coded as <tt>LineNumberTable</tt> and
         <tt>LocalVariableTable</tt> attributes. Attributes are in general
         specific to some data structure, i.e., no two components share the
         same kind of attribute, though this is not explicitly
         forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped
         with the component they belong to.
       </p>
     
       </section>
       
       <section name="3.2 Class repository">
       <p>
         Using the provided <tt>Repository</tt> class, reading class files into
         a <tt>JavaClass</tt> object is quite simple:
       </p>
     
       <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source>
     
       <p>
         The repository also contains methods providing the dynamic equivalent
         of the <tt>instanceof</tt> operator, and other useful routines:
       </p>
     
       <source>
       if(Repository.instanceOf(clazz, super_class) {
         ...
       }</source>
     
       </section>
       
       <section name="3.2.1 Accessing class file data">
     
       <p>
         Information within the class file components may be accessed like
         Java Beans via intuitive set/get methods. All of them also define
         a <tt>toString()</tt> method so that implementing a simple class
         viewer is very easy. In fact all of the examples used here have
         been produced this way:
       </p>
     
       <source>
       System.out.println(clazz);
       printCode(clazz.getMethods());
       ...
       public static void printCode(Method[] methods) {
         for(int i=0; i < methods.length; i++) {
           System.out.println(methods[i]);
     
           Code code = methods[i].getCode();
           if(code != null) // Non-abstract method
             System.out.println(code);
         }
       }
       </source>
     
       </section>
     
       <section name="3.2.2 Analyzing class data">
       <p>
         Last but not least, <font face="helvetica,arial">BCEL</font>
         supports the <em>Visitor</em> design pattern, so one can write
         visitor objects to traverse and analyze the contents of a class
         file. Included in the distribution is a class
         <tt>JasminVisitor</tt> that converts class files into the <a
         href="http://mrl.nyu.edu/~meyer/jasmin/">Jasmin</a>
         assembler language.
       </p>
     
       </section>
     
       <section name="3.3 ClassGen">
       <p>
         This part of the API (package <tt>org.apache.bcel.generic</tt>)
         supplies an abstraction level for creating or transforming class
         files dynamically. It makes the static constraints of Java class
         files like the hard-coded byte code addresses "generic". The
         generic constant pool, for example, is implemented by the class
         <tt>ConstantPoolGen</tt> which offers methods for adding different
         types of constants. Accordingly, <tt>ClassGen</tt> offers an
         interface to add methods, fields, and attributes.
          <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API.
       </p>
     
       <p align="center">
         <a name="Figure 4">
         <img src="images/classgen.gif"/>
         <br/>
         Figure 4: UML diagram of the ClassGen API</a>
       </p>
     
       </section>
     
       <section name="3.3.1 Types">
       <p>
         We abstract from the concrete details of the type signature syntax
         (see <a href="#2.5 Type information">2.5</a>) by introducing the
         <tt>Type</tt> class, which is used, for example, by methods to
         define their return and argument types. Concrete sub-classes are
         <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt>
         which consists of the element type and the number of
         dimensions. For commonly used types the class offers some
         predefined constants. For example, the method signature of the
         <tt>main</tt> method as shown in 
         <a href="#2.5 Type information">section 2.5</a> is represented by:
       </p>
     
       <source>
       Type   return_type = Type.VOID;
       Type[] arg_types   = new Type[] { new ArrayType(Type.STRING, 1) };
       </source>
     
       <p>
         <tt>Type</tt> also contains methods to convert types into textual
         signatures and vice versa. The sub-classes contain implementations
         of the routines and constraints specified by the Java Language
         Specification.
       </p>
       </section>
     
       <section name="3.3.2 Generic fields and methods">
       <p>
         Fields are represented by <tt>FieldGen</tt> objects, which may be
         freely modified by the user. If they have the access rights
         <tt>static final</tt>, i.e., are constants and of basic type, they
         may optionally have an initializing value.
       </p>
       
       <p>
         Generic methods contain methods to add exceptions the method may
         throw, local variables, and exception handlers. The latter two are
         represented by user-configurable objects as well. Because
         exception handlers and local variables contain references to byte
         code addresses, they also take the role of an <em>instruction
         targeter</em> in our terminology. Instruction targeters contain a
         method <tt>updateTarget()</tt> to redirect a reference. This is
         somewhat related to the Observer design pattern. Generic
         (non-abstract) methods refer to <em>instruction lists</em> that
         consist of instruction objects. References to byte code addresses
         are implemented by handles to instruction objects. If the list is
         updated the instruction targeters will be informed about it. This
         is explained in more detail in the following sections.
       </p>
       
       <p>
         The maximum stack size needed by the method and the maximum number
         of local variables used may be set manually or computed via the
         <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods
         automatically.
       </p>
     
       </section>
     
       <section name="3.3.3 Instructions">
       <p>
         Modeling instructions as objects may look somewhat odd at first
         sight, but in fact enables programmers to obtain a high-level view
         upon control flow without handling details like concrete byte code
         offsets.  Instructions consist of an opcode (sometimes called
         tag), their length in bytes and an offset (or index) within the
         byte code. Since many instructions are immutable (stack operators,
         e.g.), the <tt>InstructionConstants</tt> interface offers
         shareable predefined "fly-weight" constants to use.
       </p>
       
       <p>
         Instructions are grouped via sub-classing, the type hierarchy of
         instruction classes is illustrated by (incomplete) figure in the
         appendix. The most important family of instructions are the
         <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to
         targets somewhere within the byte code. Obviously, this makes them
         candidates for playing an <tt>InstructionTargeter</tt> role,
         too. Instructions are further grouped by the interfaces they
         implement, there are, e.g., <tt>TypedInstruction</tt>s that are
         associated with a specific type like <tt>ldc</tt>, or
         <tt>ExceptionThrower</tt> instructions that may raise exceptions
         when executed.
       </p>
       
       <p>
         All instructions can be traversed via <tt>accept(Visitor v)</tt>
         methods, i.e., the Visitor design pattern. There is however some
         special trick in these methods that allows to merge the handling
         of certain instruction groups. The <tt>accept()</tt> do not only
         call the corresponding <tt>visit()</tt> method, but call
         <tt>visit()</tt> methods of their respective super classes and
         implemented interfaces first, i.e., the most specific
         <tt>visit()</tt> call is last. Thus one can group the handling of,
         say, all <tt>BranchInstruction</tt>s into one single method.
       </p>
       
       <p>
         For debugging purposes it may even make sense to "invent" your own
         instructions. In a sophisticated code generator like the one used
         as a backend of the <a href="http://barat.sourceforge.net">Barat
         framework</a> for static analysis one often has to insert
         temporary <tt>nop</tt> (No operation) instructions. When examining
         the produced code it may be very difficult to track back where the
         <tt>nop</tt> was actually inserted. One could think of a derived
         <tt>nop2</tt> instruction that contains additional debugging
         information. When the instruction list is dumped to byte code, the
         extra data is simply dropped.
       </p>
       
       <p>
         One could also think of new byte code instructions operating on
         complex numbers that are replaced by normal byte code upon
         load-time or are recognized by a new JVM.
       </p>
       
       </section>
     
       <section name="3.3.4 Instruction lists">
       <p>
         An <em>instruction list</em> is implemented by a list of
         <em>instruction handles</em> encapsulating instruction objects.
         References to instructions in the list are thus not implemented by
         direct pointers to instructions but by pointers to instruction
         <em>handles</em>. This makes appending, inserting and deleting
         areas of code very simple and also allows us to reuse immutable
         instruction objects (fly-weight objects). Since we use symbolic
         references, computation of concrete byte code offsets does not
         need to occur until finalization, i.e., until the user has
         finished the process of generating or transforming code. We will
         use the term instruction handle and instruction synonymously
         throughout the rest of the paper. Instruction handles may contain
         additional user-defined data using the <tt>addAttribute()</tt>
         method.
       </p>
       
       <p>
         <b>Appending:</b> One can append instructions or other instruction
         lists anywhere to an existing list. The instructions are appended
         after the given instruction handle. All append methods return a
         new instruction handle which may then be used as the target of a
         branch instruction, e.g.:
       </p>
     
       <source>
       InstructionList il = new InstructionList();
       ...
       GOTO g = new GOTO(null);
       il.append(g);
       ...
       // Use immutable fly-weight object
       InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);
       g.setTarget(ih);
       </source>
     
       <p>
         <b>Inserting:</b> Instructions may be inserted anywhere into an
         existing list. They are inserted before the given instruction
         handle. All insert methods return a new instruction handle which
         may then be used as the start address of an exception handler, for
         example.
       </p>
     
       <source>
       InstructionHandle start = il.insert(insertion_point,
                                           InstructionConstants.NOP);
       ...
       mg.addExceptionHandler(start, end, handler, "java.io.IOException");
       </source>
     
       <p>
         <b>Deleting:</b> Deletion of instructions is also very
         straightforward; all instruction handles and the contained
         instructions within a given range are removed from the instruction
         list and disposed. The <tt>delete()</tt> method may however throw
         a <tt>TargetLostException</tt> when there are instruction
         targeters still referencing one of the deleted instructions. The
         user is forced to handle such exceptions in a <tt>try-catch</tt>
         clause and redirect these references elsewhere. The <em>peep
         hole</em> optimizer described in the appendix gives a detailed
         example for this.
       </p>
     
       <source>
       try {
         il.delete(first, last);
       } catch(TargetLostException e) {
         InstructionHandle[] targets = e.getTargets();
         for(int i=0; i < targets.length; i++) {
           InstructionTargeter[] targeters = targets[i].getTargeters();
           for(int j=0; j < targeters.length; j++)
              targeters[j].updateTarget(targets[i], new_target);
         }
       }
       </source>
     
       <p>
         <b>Finalizing:</b> When the instruction list is ready to be dumped
         to pure byte code, all symbolic references must be mapped to real
         byte code offsets. This is done by the <tt>getByteCode()</tt>
         method which is called by default by
         <tt>MethodGen.getMethod()</tt>. Afterwards you should call
         <tt>dispose()</tt> so that the instruction handles can be reused
         internally. This helps to improve memory usage.
       </p>
       
       <source>
       InstructionList il = new InstructionList();
     
       ClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",
                                   "<generated>", ACC_PUBLIC | ACC_SUPER,
                                   null);
       MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC,
                                    Type.VOID, new Type[] { 
                                      new ArrayType(Type.STRING, 1) 
                                    }, new String[] { "argv" },
                                    "main", "HelloWorld", il, cp);
       ...
       cg.addMethod(mg.getMethod());
       il.dispose(); // Reuse instruction handles of list
       </source>
     
       </section>
     
       <section name="3.3.5 Code example revisited">
       <p>
         Using instruction lists gives us a generic view upon the code: In
         <a href="#Figure 5">Figure 5</a> we again present the code chunk
         of the <tt>readInt()</tt> method of the factorial example in section
         <a href="#2.6 Code example">2.6</a>: The local variables
         <tt>n</tt> and <tt>e1</tt> both hold two references to
         instructions, defining their scope.  There are two <tt>goto</tt>s
         branching to the <tt>iload</tt> at the end of the method. One of
         the exception handlers is displayed, too: it references the start
         and the end of the <tt>try</tt> block and also the exception
         handler code.
       </p>
       
       <p align="center">
         <a name="Figure 5">
         <img src="images/il.gif"/>
         <br/>
         Figure 5: Instruction list for <tt>readInt()</tt> method</a>
       </p>
       
       </section>
       
       <section name="3.3.6 Instruction factories">
       <p>
         To simplify the creation of certain instructions the user can use
         the supplied <tt>InstructionFactory</tt> class which offers a lot
         of useful methods to create instructions from
         scratch. Alternatively, he can also use <em>compound
         instructions</em>: When producing byte code, some patterns
         typically occur very frequently, for instance the compilation of
         arithmetic or comparison expressions. You certainly do not want
         to rewrite the code that translates such expressions into byte
         code in every place they may appear. In order to support this, the
         <font face="helvetica,arial">BCEL</font> API includes a <em>compound
         instruction</em> (an interface with a single
         <tt>getInstructionList()</tt> method). Instances of this class
         may be used in any place where normal instructions would occur,
         particularly in append operations.
       </p>
     
       <p>
         <b>Example: Pushing constants</b> Pushing constants onto the
         operand stack may be coded in different ways. As explained in <a
         href="#2.2 Byte code instruction set">section 2.2</a> there are
         some "short-cut" instructions that can be used to make the
         produced byte code more compact. The smallest instruction to push
         a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other
         possibilities are <tt>bipush</tt> (can be used to push values
         between -128 and 127), <tt>sipush</tt> (between -32768 and 32767),
         or <tt>ldc</tt> (load constant from constant pool).
       </p>
       
       <p>
         Instead of repeatedly selecting the most compact instruction in,
         say, a switch, one can use the compound <tt>PUSH</tt> instruction
         whenever pushing a constant number or string. It will produce the
         appropriate byte code instruction and insert entries into to
         constant pool if necessary.
       </p>
     
       <source>
       InstructionFactory f  = new InstructionFactory(class_gen);
       InstructionList    il = new InstructionList();
       ...
       il.append(new PUSH(cp, "Hello, world"));
       il.append(new PUSH(cp, 4711));
       ...
       il.append(f.createPrintln("Hello World"));
       ...
       il.append(f.createReturn(type));
       </source>
     
       </section>
           
       <section name="3.3.7 Code patterns using regular expressions">
       <p>
         When transforming code, for instance during optimization or when
         inserting analysis method calls, one typically searches for
         certain patterns of code to perform the transformation at. To
         simplify handling such situations <font
         face="helvetica,arial">BCEL </font>introduces a special feature:
         One can search for given code patterns within an instruction list
         using <em>regular expressions</em>. In such expressions,
         instructions are represented by their opcode names, e.g.,
         <tt>LDC</tt>, one may also use their respective super classes, e.g.,
         "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>,
         <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus,
         the expression
       </p>
       
       <source>"NOP+(ILOAD|ALOAD)*"</source>
     
       <p>
         represents a piece of code consisting of at least one <tt>NOP</tt>
         followed by a possibly empty sequence of <tt>ILOAD</tt> and
         <tt>ALOAD</tt> instructions.
       </p>
     
       <p>
         The <tt>search()</tt> method of class
         <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular
         expression and a starting point as arguments and returns an
         iterator describing the area of matched instructions. Additional
         constraints to the matching area of instructions, which can not be
         implemented via regular expressions, may be expressed via <em>code
         constraint</em> objects.
       </p>
       
       </section>
       
       <section name="3.3.8 Example: Optimizing boolean expressions">
       <p>
         In Java, boolean values are mapped to 1 and to 0,
         respectively. Thus, the simplest way to evaluate boolean
         expressions is to push a 1 or a 0 onto the operand stack depending
         on the truth value of the expression. But this way, the
         subsequent combination of boolean expressions (with
         <tt>&&</tt>, e.g) yields long chunks of code that push
         lots of 1s and 0s onto the stack.
       </p>
     
       <p>
         When the code has been finalized these chunks can be optimized
         with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt>
         (e.g.  the comparison of two integers: <tt>if_icmpeq</tt>) that
         either produces a 1 or a 0 on the stack and is followed by an
         <tt>ifne</tt> instruction (branch if stack value 0) may be
         replaced by the <tt>IfInstruction</tt> with its branch target
         replaced by the target of the <tt>ifne</tt> instruction:
       </p>
     
       <source>
       CodeConstraint constraint = new CodeConstraint() {
         public boolean checkCode(InstructionHandle[] match) {
           IfInstruction if1 = (IfInstruction)match[0].getInstruction();
           GOTO          g   = (GOTO)match[2].getInstruction();
           return (if1.getTarget() == match[3]) &&
                  (g.getTarget() == match[4]);
         }  
       };
     
       InstructionFinder f    = new InstructionFinder(il);
       String            pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)";
     
       for(Iterator e = f.search(pat, constraint); e.hasNext(); ) {
         InstructionHandle[] match = (InstructionHandle[])e.next();;
         ...
         match[0].setTarget(match[5].getTarget()); // Update target
         ...
         try {
           il.delete(match[1], match[5]);
         } catch(TargetLostException ex) { ... }
       }
       </source>
     
       <p>
         The applied code constraint object ensures that the matched code
         really corresponds to the targeted expression pattern. Subsequent
         application of this algorithm removes all unnecessary stack
         operations and branch instructions from the byte code. If any of
         the deleted instructions is still referenced by an
         <tt>InstructionTargeter</tt> object, the reference has to be
         updated in the <tt>catch</tt>-clause.
       </p>
     
       <p>
         <b>Example application:</b>
         The expression:
       </p>
     
       <source>
       if((a == null) || (i < 2))
         System.out.println("Ooops");
       </source>
     
       <p>
         can be mapped to both of the chunks of byte code shown in <a
         href="#Figure 6">figure 6</a>. The left column represents the
         unoptimized code while the right column displays the same code
         after the peep hole algorithm has been applied:
       </p>
       
       <p align="center"><a name="Figure 6">
       <table>
       <tr>
       <td valign="top"><pre>
     5:  aload_0
     6:  ifnull        #13
     9:  iconst_0
     10: goto          #14
     13: iconst_1
     14: nop
     15: ifne          #36
     18: iload_1
     19: iconst_2
     20: if_icmplt     #27
     23: iconst_0
     24: goto          #28
     27: iconst_1
     28: nop
     29: ifne          #36
     32: iconst_0
     33: goto          #37
     36: iconst_1
     37: nop
     38: ifeq          #52
     41: getstatic     System.out
     44: ldc           "Ooops"
     46: invokevirtual println
     52: return
       </pre></td>
       <td valign="top"><pre>
     10: aload_0
     11: ifnull        #19
     14: iload_1
     15: iconst_2
     16: if_icmpge     #27
     19: getstatic     System.out
     22: ldc           "Ooops"
     24: invokevirtual println
     27: return
       </pre></td>
       </tr>
       </table>
       </a>
       </p>
     
       </section>
       
       <section name="4 Application areas">
       <p>
         There are many possible application areas for <font
         face="helvetica,arial">BCEL</font> ranging from class
         browsers, profilers, byte code optimizers, and compilers to
         sophisticated run-time analysis tools and extensions to the Java
         language.
       </p>
     
       <p>
         Compilers like the <a
         href="http://barat.sourceforge.net">Barat</a> compiler use <font
         face="helvetica,arial">BCEL</font> to implement a byte code
         generating back end. Other possible application areas are the
         static analysis of byte code or examining the run-time behavior of
         classes by inserting calls to profiling methods into the
         code. Further examples are extending Java with Eiffel-like
         assertions, automated delegation, or with the concepts of <a
         href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>.<br/> A
         list of projects using <font face="helvetica,arial">BCEL</font> can
         be found <a href="projects.html">here</a>.
       </p>
     
       </section>
     
       <section name="4.1 Class loaders">
       <p>
         Class loaders are responsible for loading class files from the
         file system or other resources and passing the byte code to the
         Virtual Machine. A custom <tt>ClassLoader</tt> object may be used
         to intercept the standard procedure of loading a class, i.e.m  the
         system class loader, and perform some transformations before
         actually passing the byte code to the JVM.
       </p>
       
       <p>
         A  possible  scenario is  described  in <a href="#Figure 7">figure
         7</a>:
         During run-time the Virtual Machine requests a custom class loader
         to load a given class. But before the JVM actually sees the byte
         code, the class loader makes a "side-step" and performs some
         transformation to the class. To make sure that the modified byte
         code is still valid and does not violate any of the JVM's rules it
         is checked by the verifier before the JVM finally executes it.
       </p>
       
       <p align="center">
         <a name="Figure 7">
         <img src="images/classloader.gif"/>
         <br/>
         Figure 7: Class loaders
         </a>
       </p>
     
       <p>
         Using class loaders is an elegant way of extending the Java
         Virtual Machine with new features without actually modifying it.
         This concept enables developers to use <em>load-time
         reflection</em> to implement their ideas as opposed to the static
         reflection supported by the <a
         href="http://java.sun.com/j2se/1.3/docs/guide/reflection/index.html">Java
         Reflection API</a>. Load-time transformations supply the user with
         a new level of abstraction. He is not strictly tied to the static
         constraints of the original authors of the classes but may
         customize the applications with third-party code in order to
         benefit from new features. Such transformations may be executed on
         demand and neither interfere with other users, nor alter the
         original byte code. In fact, class loaders may even create classes
         <em>ad hoc</em> without loading a file at all.<br/> <font
         face="helvetica,arial">BCEL</font> has already builtin support for
         dynamically creating classes, an example is the ProxyCreator class.
       </p>
       
       </section>
       
       <section name="4.1.1 Example: Poor Man's Genericity">
       <p>
         The former "Poor Man's Genericity" project that extended Java with
         parameterized classes, for example, used <font
         face="helvetica,arial">BCEL</font> in two places to generate
         instances of parameterized classes: During compile-time (with the
         standard <tt>javac</tt> with some slightly changed classes) and at
         run-time using a custom class loader. The compiler puts some
         additional type information into class files (attributes) which is
         evaluated at load-time by the class loader. The class loader
         performs some transformations on the loaded class and passes them
         to the VM. The following algorithm illustrates how the load method
         of the class loader fulfills the request for a parameterized
         class, e.g., <tt>Stack<String></tt>
       </p>
       
       <p>
         <ol type="1">
         <li> Search for class <tt>Stack</tt>, load it, and check for a
         certain class attribute containing additional type
         information. I.e.  the attribute defines the "real" name of the
         class, i.e., <tt>Stack<A></tt>.</li>
     
         <li>Replace all occurrences and references to the formal type
         <tt>A</tt> with references to the actual type <tt>String</tt>. For
         example the method
         </li>
     
         <source>
         void push(A obj) { ... }
         </source>
       
         <p>
           becomes
         </p>
     
         <source>
         void push(String obj) { ... }
         </source>
     
         <li> Return the resulting class to the Virtual Machine.</li>
         </ol>
       </p>
       
       </section>
     
       <section name="A Appendix"/>
     
       <section name="HelloWorldBuilder">
       <p>
       The following program reads a name from the standard input and
       prints a friendly "Hello". Since the <tt>readLine()</tt> method may
       throw an <tt>IOException</tt> it is enclosed by a <tt>try-catch</tt>
       clause.
       </p>
     
       <source>
       import java.io.*;
     
       public class HelloWorld {
         public static void main(String[] argv) {
           BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
           String name = null;
     
           try {
             System.out.print("Please enter your name> ");
             name = in.readLine();
           } catch(IOException e) { return; }
     
           System.out.println("Hello, " + name);
         }
       }
       </source>
     
       <p>
       We will sketch here how the above Java class can be created from the
       scratch using the <font face="helvetica,arial">BCEL</font> API. For
       ease of reading we will use textual signatures and not create them
       dynamically. For example, the signature
       </p>
     
       <source>"(Ljava/lang/String;)Ljava/lang/StringBuffer;"</source>
     
       <p>
        actually be created with
       </p>
     
       <source>Type.getMethodSignature(Type.STRINGBUFFER, new Type[] { Type.STRING });</source>
     
       <p><b>Initialization:</b>
       First we create an empty class and an instruction list:
       </p>
     
       <source>
       ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object",
                                  "<generated>", ACC_PUBLIC | ACC_SUPER,
                                  null);
       ConstantPoolGen cp = cg.getConstantPool(); // cg creates constant pool
       InstructionList il = new InstructionList();
       </source>
     
       <p>
     We then create the main method, supplying the method's name and the
     symbolic type signature encoded with <tt>Type</tt> objects.
       </p>
     
       <source>
       MethodGen  mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, // access flags
                                     Type.VOID,               // return type
                                     new Type[] {             // argument types
                                       new ArrayType(Type.STRING, 1) },
                                     new String[] { "argv" }, // arg names
                                     "main", "HelloWorld",    // method, class
                                     il, cp);
       InstructionFactory factory = new InstructionFactory(cg);
       </source>
     
       <p>
       We now define some often used types:
       </p>
     
       <source>
       ObjectType i_stream = new ObjectType("java.io.InputStream");
       ObjectType p_stream = new ObjectType("java.io.PrintStream");
       </source>
     
       <p><b>Create variables <tt>in</tt> and <tt>name</tt>:</b> We call
       the constructors, i.e., execute
       <tt>BufferedReader(InputStreamReader(System.in))</tt>. The reference
       to the <tt>BufferedReader</tt> object stays on top of the stack and
       is stored in the newly allocated <tt>in</tt> variable.
       </p>
     
       <source>
       il.append(factory.createNew("java.io.BufferedReader"));
       il.append(InstructionConstants.DUP); // Use predefined constant
       il.append(factory.createNew("java.io.InputStreamReader"));
       il.append(InstructionConstants.DUP);
       il.append(factory.createFieldAccess("java.lang.System", "in", i_stream,
                                           Constants.GETSTATIC));
       il.append(factory.createInvoke("java.io.InputStreamReader", "<init>",
                                      Type.VOID, new Type[] { i_stream },
                                      Constants.INVOKESPECIAL));
       il.append(factory.createInvoke("java.io.BufferedReader", "<init>", Type.VOID,
                                      new Type[] {new ObjectType("java.io.Reader")},
                                      Constants.INVOKESPECIAL));
     
       LocalVariableGen lg = mg.addLocalVariable("in",
                               new ObjectType("java.io.BufferedReader"), null, null);
       int in = lg.getIndex();
       lg.setStart(il.append(new ASTORE(in))); // "i" valid from here
       </source>
     
       <p>
       Create local variable <tt>name</tt> and  initialize it to <tt>null</tt>.
       </p>
     
       <source>
       lg = mg.addLocalVariable("name", Type.STRING, null, null);
       int name = lg.getIndex();
       il.append(InstructionConstants.ACONST_NULL);
       lg.setStart(il.append(new ASTORE(name))); // "name" valid from here
       </source>
     
       <p><b>Create try-catch block:</b> We remember the start of the
       block, read a line from the standard input and store it into the
       variable <tt>name</tt>.
       </p>
     
       <source>
       InstructionHandle try_start =
         il.append(factory.createFieldAccess("java.lang.System", "out", p_stream,
                                             Constants.GETSTATIC));
     
       il.append(new PUSH(cp, "Please enter your name> "));
       il.append(factory.createInvoke("java.io.PrintStream", "print", Type.VOID, 
                                      new Type[] { Type.STRING },
                                      Constants.INVOKEVIRTUAL));
       il.append(new ALOAD(in));
       il.append(factory.createInvoke("java.io.BufferedReader", "readLine",
                                      Type.STRING, Type.NO_ARGS,
                                      Constants.INVOKEVIRTUAL));
       il.append(new ASTORE(name));
       </source>
     
       <p>
       Upon normal execution we jump behind exception handler, the target
       address is not known yet.
       </p>
     
       <source>
       GOTO g = new GOTO(null);
       InstructionHandle try_end = il.append(g);
       </source>
     
       <p>
       We add the exception handler which simply returns from the method.
       </p>
     
       <source>
       InstructionHandle handler = il.append(InstructionConstants.RETURN);
       mg.addExceptionHandler(try_start, try_end, handler, "java.io.IOException");
       </source>
     
       <p>
       "Normal" code continues, now we can set the branch target of the <tt>GOTO</tt>.
       </p>
     
       <source>
       InstructionHandle ih =
         il.append(factory.createFieldAccess("java.lang.System", "out", p_stream,
                                             Constants.GETSTATIC));
       g.setTarget(ih);
       </source>
     
       <p><b>Printing "Hello":</b>
       String concatenation compiles to <tt>StringBuffer</tt> operations.
       </p>
     
       <source>
       il.append(factory.createNew(Type.STRINGBUFFER));
       il.append(InstructionConstants.DUP);
       il.append(new PUSH(cp, "Hello, "));
       il.append(factory.createInvoke("java.lang.StringBuffer", "<init>",
                                      Type.VOID, new Type[] { Type.STRING },
                                      Constants.INVOKESPECIAL));
       il.append(new ALOAD(name));
       il.append(factory.createInvoke("java.lang.StringBuffer", "append",
                                      Type.STRINGBUFFER, new Type[] { Type.STRING },
                                      Constants.INVOKEVIRTUAL));
       il.append(factory.createInvoke("java.lang.StringBuffer", "toString",
                                      Type.STRING, Type.NO_ARGS,
                                      Constants.INVOKEVIRTUAL));
         
       il.append(factory.createInvoke("java.io.PrintStream", "println",
                                      Type.VOID, new Type[] { Type.STRING },
                                      Constants.INVOKEVIRTUAL));
       il.append(InstructionConstants.RETURN);
       </source>
     
     
       <p><b>Finalization:</b> Finally, we have to set the stack size,
       which normally would have to be computed on the fly and add a
       default constructor method to the class, which is empty in this
       case.
       </p>
     
       <source>
       mg.setMaxStack();
       cg.addMethod(mg.getMethod());
       il.dispose(); // Allow instruction handles to be reused
       cg.addEmptyConstructor(ACC_PUBLIC);
       </source>
     
       <p>
       Last but not least we dump the <tt>JavaClass</tt> object to a file.
       </p>
     
       <source>
       try {
         cg.getJavaClass().dump("HelloWorld.class");
       } catch(java.io.IOException e) { System.err.println(e); }
       </source>
     
      </section>
     
      <section name="Peephole optimizer">
      <p>
      This class implements a simple peephole optimizer that removes any NOP
      instructions from the given class.
      </p>
     
      <source>
     import java.io.*;
     
     import java.util.Iterator;
     import org.apache.bcel.classfile.*;
     import org.apache.bcel.generic.*;
     import org.apache.bcel.Repository;
     import org.apache.bcel.util.InstructionFinder;
     
     public class Peephole {
       public static void main(String[] argv) {
         try {
           /* Load the class from CLASSPATH.
            */
           JavaClass       clazz   = Repository.lookupClass(argv[0]);
           Method[]        methods = clazz.getMethods();
           ConstantPoolGen cp      = new ConstantPoolGen(clazz.getConstantPool());
     
           for(int i=0; i < methods.length; i++) {
             if(!(methods[i].isAbstract() || methods[i].isNative())) {
               MethodGen mg       = new MethodGen(methods[i],
                                    clazz.getClassName(), cp);
               Method    stripped = removeNOPs(mg);
           
               if(stripped != null)     // Any NOPs stripped?
                 methods[i] = stripped; // Overwrite with stripped method
             }
           }
     
           /* Dump the class to "class name"_.class
            */
           clazz.setConstantPool(cp.getFinalConstantPool());
           clazz.dump(clazz.getClassName() + "_.class");
         } catch(Exception e) { e.printStackTrace(); }
       }
     
       private static final Method removeNOPs(MethodGen mg) {
         InstructionList   il    = mg.getInstructionList();
         InstructionFinder f     = new InstructionFinder(il);
         String            pat   = "NOP+"; // Find at least one NOP
         InstructionHandle next  = null;
         int               count = 0;
     
         for(Iterator iter = f.search(pat); iter.hasNext(); ) {
           InstructionHandle[] match = (InstructionHandle[])iter.next();
           InstructionHandle   first = match[0];
           InstructionHandle   last  = match[match.length - 1];
           
           /* Some nasty Java compilers may add NOP at end of method.
            */
           if((next = last.getNext()) == null)
         break;
     
           count += match.length;
     
           /* Delete NOPs and redirect any references to them to the following
            * (non-nop) instruction.
            */
           try {
         il.delete(first, last);
           } catch(TargetLostException e) {
         InstructionHandle[] targets = e.getTargets();
         for(int i=0; i < targets.length; i++) {
           InstructionTargeter[] targeters = targets[i].getTargeters();
           
           for(int j=0; j < targeters.length; j++)
             targeters[j].updateTarget(targets[i], next);
         }
           }
         }
     
         Method m = null;
         
         if(count > 0) {
           System.out.println("Removed " + count + " NOP instructions from method " +
                  mg.getName());
           m = mg.getMethod();
         }
     
         il.dispose(); // Reuse instruction handles
         return m;
       }
     }
      </source>
      </section>
     
       <section name="BCELifier">
       <p>
       If you want to learn how certain things are generated using BCEL you
       can do the following: Write your program with the needed features in 
       Java and compile it as usual. Then use <tt>BCELifier</tt> to create
       a class that creates that very input class using BCEL.<br/>
       (Think about this sentence for a while, or just try it ...)
       </p>
       </section>
     
       <section name="Constant pool UML diagram">
     
       <p align="center">
         <a name="Figure 8">
         <img src="images/constantpool.gif"/>
         <br/>
         Figure 8: UML diagram for constant pool classes
         </a>
       </p> 
       </section>
     </body>
     </document>
[top] / java / bcel / src / site / xdoc / manual.xml
contact | logmethods.com
[code.view]