Metadata
Metadata is defined as "data about data", and in this case you can think of it
as a deeper level of data about system level attributes. Metadata is the key
to the simpler programming model that the CLR supports.
The metadata is generated by a compiler and stored automatically in an EXE or
DLL. It's in binary, but the framework offers an API to export metadata to and
from an XML schema or a COM type library. An XML schema export might be useful,
for example, in extracting version and compile information for a repository on
components. Here are some of the items in the metadata defined for the .NET framework:
- Description of a deployment unit (called an assembly)
- Name, version, culture (which could determine, for example, the default user language)
- A public key for verification
- Types exported by the assembly
- Dependencies - other assemblies which this assembly depends upon
- Security permissions needed to run
- Base classes and interfaces used by the assembly
- Custom attributes
- User defined (inserted by the developer)
- Compiler defined (inserted by the compiler to indicate something special about the language)
Some of these, such as the custom attributes, are optional - the required
ones are all managed automatically by the tools.
The metadata is one of the ways the CLR can support a wide variety of tools.
Here are some of the possible consumers of .NET metadata:
- Designers
- Debuggers
- Profilers
- Proxy Generators
- Other compilers (to find out how to use a component in their language)
- Type / Object browsers
- Schema generators
Compilers are some of the most extensive users of metadata. For example, a
compiler can examine a module produced by a different compiler and use the metadata
for cross-language type import. It can also produce metadata about its own compiled
modules, including such elements as flags that a module has compiled for debugging,
or a language-specific marker.
Even information that might appear in a tool tip can be embedded in metadata.
This extendable data store about a compiled module greatly facilitates the simpler
deployment available under the .NET Framework. An API, called the Reflection
API, is available for scanning and manipulation of metadata elements.
Multiple Language Integration and Support
The most ambitious aspect of the CLR is that it is designed to support multiple
languages and allow unprecedented levels of integration among those languages.
By enforcing a common type system, and by having complete control over interface
calls, the CLR allows languages to work together more transparently than ever
before.
Previously, one language could instantiate and use components written in another
language by using COM. Sometimes calling conventions were difficult to manage,
especially when Visual Basic was involved, but it could generally be made to
work. However, subclassing a component written in a different language required
a sophisticated wrapper, and only advanced developers did such work.
It is straightforward in the .NET Framework to use one language to subclass a
class implemented in another language. A class written in Visual Basic can inherit
from a base class written in C++, or in COBOL for that matter (at least one major
vendor is at work on a COBOL implementation for .NET). The VB program doesn't
even need to know the language used for the base class, and we're talking full
implementation inheritance with no problems requiring recompilation when the
base
class changes.
How can this work? The information furnished by metadata makes it possible. There
is no Interface Definition Language (IDL) in .NET because none is needed. A class
interface looks the same, regardless of the language that generated it. The CLR
uses metadata to manage all the interfaces and calling conventions between languages.
This has major implications; mixed language programming teams become far more
feasible than before. And it becomes less necessary to force developers who are
perfectly comfortable in one language to adopt another just to fit into a development
effort. Cross-language inheritance promises to open up architectural options
than never existed before.
One Microsoft person summed this up by saying that, as far as they are concerned,
with .NET, the language used becomes a "lifestyle choice". While there will always
be benefits to programming teams using a common language, .NET raises the practicality
of mixed-language projects.
A Common Type System
A key piece of functionality that enables Multiple Language Support is a Common
Type System, in which all commonly used data types, even base types such as Longs
and Booleans, are actually implemented as objects. Coercion among types can now
be done at a lower level for more consistency between languages. And, since all
languages are using the same library of types, calling one language from another
doesn't require type conversion or weird calling conventions.
This results in the need for some readjustment, particularly for Visual Basic
developers. For example, what we called an Integer in VB6 and earlier, is now
known as a Short in Visual Basic.NET. The adjustment is well worth the effort
in order to bring VB in line with everything else - and, as a by-product, other
languages get the same support for strings that VB has always had. (There are
many details on adjustments for Visual Basic developers in Chapter 4).
Namespaces
One of the most important concepts in .NET is namespaces. They help organize
object libraries and hierarchies, simplify object references, prevent ambiguity
when referring to objects, and control the scope of object identifiers.
Namespaces are discussed in more detail in Chapter 6. For now, it is useful to
know that class libraries are normally referenced in each language before they
are used. The reference allows the types to be used in the code with abbreviations
instead of detailed library references. In VB, this is done with an Imports statement,
and that this can be thought of as similar in concept to checking a box in the
References dialog in Visual Basic 6. For example, a typical VB form module in
.NET might have the following lines at the beginning:
Imports System.WinForms Imports MyDebug = System.Diagnostics.Debug
The first line simply makes all of the standard form properties
and methods available to the code in the form module. This second line illustrates
use of an alias. A branch of the object hierarchy can thus receive its own identifier,
which is only valid in that code module. Instead of referring to the System.Diagnostics.Debug
class object, the code in this module would refer to the MyDebug object.
Deployment and Execution
With all the intelligence in the .NET framework, there is a lot more going on
at execution time with the CLR than we are accustomed to. Programs or components
can't just load and go - there are several things that must happen for the whole
structure to work.
Start with an Assembly, Build to an Application
The unit of deployment, as previously mentioned, is an assembly. It can consist
of one or more files and is self-describing. It contains a "manifest" which holds
the metadata describing everything exported from the assembly, and what's needed
to deploy and run the assembly.
An assembly has its own version. Assemblies are combined to become applications.
An application has one or more assemblies, and also may contain application specific
files or data. Applications may have their own private versions of assemblies,
and may be configured to prefer their private version to any shared versions.
Execution
Source code modules for an assembly are compiled (at development time) into the
CLR's Intermediate Language (IL). Then IL is compiled into native code before
execution. That compilation can take place in several ways, and at various times.
Normally, however, compilation into native code is done only once and the results
are cached for future use.
The CLR contains a couple of just-in-time (JIT) compilers, which convert IL to
native code (binary code targeted at a specific machine processor). One is called
"Econo-JIT" and it has very fast compilation, but produces un-optimized code.
It is useful when the code, such as script in a batch file, will likely be thrown
away and regenerated. The other is the standard JIT compiler, which operates
a bit more slowly but performs a high level of optimization. It is used for most
systems.
The JIT compilers produce code that is targeted at the specific processor on
the machine. This is one of the reasons applications in .NET would normally be
distributed in compiled IL, allowing processor-specific optimizations in native
code to be done by the .NET compilers on a particular machine. An installation
of a package can be set to pre-compile the IL code into native code during the
installation if required.
Scripting also fits into this model, actually being compiled before it is used.
In current systems interpreted script (in Active Server Pages or the Windows
Scripting Host, for example) is never compiled. But in .NET, such script is sent
through a language compiler the first time it is accessed, and transformed into
IL. Then the IL is immediately transformed into native code, and cached for future
use. Scripts are created in .NET the way they are now, with any editor you like,
and require no explicit compilation step. The compilation is handled in the background,
and is managed automatically so that a change to the script results in appropriate
recompilation. VBScript developers are now encouraged to migrate to Visual Basic
for web development, which is now the default language for producing ASP.NET
pages.
With software compiled into a processor-independent intermediate language, .NET
makes it possible to achieve future platform independence. It is architecturally
possible for a CLR to be produced for platforms based on other processors or
other operating systems, which would enable applications produced on Windows
2000 to run on them. Microsoft has not emphasized this in their announcements,
but the capabilities of the CLR parallel in some respects those of the Java virtual
machine, which is designed for platform independence.
The Next Layer - .NET Framework Base Classes
The next layer down in the framework provides the services and object models
for data, input/output, security, and so forth. The next generation of ADO, called
ADO.NET, resides here (though there will also be an updated version of regular
ADO in .NET to provide compatibility for older code). Also included is the core
functionality that lets you work with XML, including the parsers and XSL transformer.
Much of the functionality that a programmer might think of as being part of a
language has been moved to the framework classes. For example, the Visual Basic
keyword Sqr for extracting a square root is no longer available in .NET. It has
been replaced by the System.Math.Sqrt method in the framework classes.
It's important to emphasize that all languages based on the .NET framework have
these framework classes available. That means that COBOL, for example, can use
the same function mentioned above for getting a square root. This makes such
base functionality widely available and highly consistent across languages. All
calls to Sqrt look essentially the same (allowing for syntactical differences
among languages) and access the same underlying code.
As a side note, a programming shop can create their own classes for core functionality,
such as globally available, pre-compiled functions. This custom functionality
can then be referenced in code in the same way as built-in .NET functionality.
Much of the functionality in the base framework classes resides in a vast namespace
called System. The System.Math.Sqrt method was mentioned above. Here are just
a few other examples of the subsections of the System namespace, which actually
contains dozens of such subcategories:
The list above merely begins to hint at the capabilities in the System namespace. Chapter 6 will examine the System namespace and other framework classes in more detail, and you can find a full list of .NET namespaces in Appendix A.
Comments