UAST - Unified Abstract Syntax Tree (IDEA 插件开发)

In IDEA there is strange language called UAST, which can be applied to all JVM languages such as Java, Kotlin etc.

 

https://plugins.jetbrains.com/docs/intellij/uast.html

 

UAST - Unified Abstract Syntax Tree

Last modified: 01 July 2022

UAST (Unified Abstract Syntax Tree) is an abstraction layer on the PSI of different JVM languages. It provides a unified API for working with common language elements like classes and method declarations, literal values, and control flow operators.

Motivation

Different JVM languages have their own PSI, but many IDE features like inspections, gutter markers, reference injection, and many others work the same way for all these languages.

Using UAST allows providing features that will work across all supported JVM languages using a single implementation.

Presentation Writing IntelliJ Plugins for Kotlin offers a thorough overview of using UAST in real-world scenarios.

When should I use UAST?

For plugins, that should work for all JVM languages in the same way.

Some known examples are:

Which languages are supported?

  • Java: full support

  • Kotlin: full support

  • Scala: beta, but full support

  • Groovy: declarations only, method bodies not supported

What about modifying PSI?

UAST is a read-only API. There are experimental UastCodeGenerationPlugin and JvmElementActionsFactory classes, but they are currently not recommended for external usage.

Working with UAST

The base element of UAST is UElement. All common base sub-interfaces are located in the declarations and expressions directories of the uast module.

All these sub-interfaces provide methods to get the information about common syntax elements: UClass about class declarations, UIfExpression about conditional expressions, and so on.

PSI to UAST Conversion

To obtain UAST for given PsiElement of one of supported languages, use UastFacade class or UastContextKt.toUElement() method (org.jetbrains.uast.toUElement for Kotlin).

To convert PsiElement to the specific UElement, use one of the following approaches:

  • for simple conversion:

  • for conversion to one of different given options:

  • in some cases, PsiElement could represent several UElements. For instance, the parameter of a primary constructor in Kotlin is UField and UParameter at the same time. When needing all options, use:

UAST to PSI Conversion

Sometimes it's required to get from the UElement back to sources of the underlying language. For that purpose, UElement#sourcePsi property returns the corresponding PsiElement of the original language.

The sourcePsi is a "physical" PsiElement, and it is mostly used for getting text ranges in the original file (e.g., for highlighting). Avoid casting the sourcePsi to specific classes because it means falling back from the UAST abstraction to the language-specific PSI. Some UElement are "virtual" and thus do not have sourcePsi. For some UElement, the sourcePsi could be different from the element from which the UElement was obtained.

Also, there is a UElement#javaPsi property that returns a "Java-like" PsiElement. It is a "fake" PsiElement to make different JVM languages emulate Java language to keep compatibility with Java-API. For instance, when calling MethodReferencesSearch.search(PsiMethod), only Java natively provides PsiMethod; other JVM languages thus provide a "fake" PsiMethod via UMethod#javaPsi.

Note that UElement#javaPsi is physical for Java only. Thus UElement#sourcePsi should be used to obtain text-range or an anchor element for inspection warnings/gutter marker placement.

In short:

sourcePsi:

  • is physical: represents a real existing PsiElement in the sources of the original language

  • can be used for highlighting, PSI modifications, creating smart-pointers, etc.

  • should not be cast unless absolutely required (for instance, handling a language-specific case)

javaPsi:

  • should be used only as a representation of JVM-visible declarations: PsiClassPsiMethodPsiField for getting their names, types, parameters, etc., or to pass them to methods that accept Java-PSI declarations

  • not guaranteed to be physical: could not exist in sources

  • is not modifiable: calling modification methods could throw exceptions for non-Java languages

Note: both sourcePsi and javaPsi can be converted back to the UElement.

UAST Visitors

In UAST there is no unified way to get children of the UElement (though it is possible to get its parent via UElement#uastParent). Thus, the only way to walk the UAST as a tree is passing the UastVisitor to UElement.accept() method.

Note: there is a convention in UAST-visitors that a visitor will not be passed to children if visit*() returns true. Otherwise, UastVisitor will continue the walk into depth.

UastVisitor can be converted to PsiElementVisitor using UastVisitorAdapter or UastHintedVisitorAdapter. The latter is preferable as it offers better performance and more predictable results.

As a general rule, it's recommended to abstain from using UastVisitor: if you don't need to process many UElements of different types and if the structure of elements is not very important, then it is better to walk the PSI-tree using PsiElementVisitor and convert each PsiElement to its corresponding UAST explicitly via UastContext.toUElement().

UAST Performance Hints

UAST is not a zero-cost abstraction: some methods could be unexpectedly expensive for some languages, so be careful with optimizations because it could yield the opposite effect.

Converting to UElement also could require resolve for some languages in some cases, again, possibly unexpectedly expensive. Converting to UAST should be performed only when necessary. For instance, converting the whole PsiFile to UFile and then walk it solely to collect UMethod declarations is inefficient. Instead, walk the PsiFile and convert each encountered matching element to UMethod explicitly.

UAST is lazy when you pass visitors to UElement.accept() or getting UElement#uastParent.

For really hard performance optimisation consider using UastLanguagePlugin.getPossiblePsiSourceTypes() to pre-filter PsiElements before converting them to UAST.

UAST Caveats

ULiteralExpression should not be used for strings

ULiteralExpression represents literal values like numbers, booleans, and string. Although string values are also literals, ULiteralExpression is not very handy to work with them. For instance, it doesn't handle Kotlin's string interpolations. To process string literals when evaluating their value or to perform language injection, use UInjectionHost instead.

sourcePsi and javaPsi, psi and UElement as PSI

For historical reasons, the relations between UElement and PsiElement are complicated. Some UElements implement PsiElement; for instance, UMethod implements PsiMethod. It is strongly discouraged to use UElement as PsiElement, and Plugin DevKit provides a corresponding inspection (Plugin DevKit | Code | UElement as PsiElement usage). This "implements" is considered deprecated and might be removed in the future.

Also, there is UElement#psi property; it returns the same element as javaPsi or the sourcePsi. As it is hard to guess what will be returned, it is also deprecated.

Thus sourcePsi and javaPsi should be the only ways to obtain PsiElement from UElement. See the corresponding section.

Should I use UMethod or PsiMethod, UClass or PsiClass ?

UAST provides a unified way to represent JVM compatible declarations via UMethodUFieldUClass, and so on. But at the same time, all JVM language plugins implement PsiMethodPsiClass, and so on to be compatible with Java. These implementations could be obtained via UElement#javaPsi property.

So the question is: "What should I use to represent the Java-declaration in my code?". The answer is: We encourage using PsiMethodPsiClass as common interfaces for Java-declarations regardless of the JVM language and discourage exposing the UAST interfaces in the API.

Note: for method bodies, there are no such alternatives, so exposing, for instance, the UExpression is not discouraged. Still, consider exposing the raw PsiElement instead.

UAST/PSI Tree Structure Mismatch

UAST is an abstraction level on top of PSI of different languages and tries to build a unified tree (see Inspecting UAST Tree). It leads to the fact that the tree structure could seriously diverge between UAST and original language, so no ancestor-descendant relation preserving is guaranteed.

For instance, the results of:

could be different, not only in the number of elements, but also in their order.

Using UAST in Plugins

To register extensions applicable to UAST, specify language="UAST" in plugin.xml.

Inspecting UAST Tree

To inspect UAST Tree, invoke internal action Tools | Internal Actions | UAST | Dump UAST Tree (By Each PsiElement).

Inspections

Use AbstractBaseUastLocalInspectionTool as base class. If inspection targets only a subset of default types (UFileUClassUField, and UMethod), specify UElements as hints in overloaded constructor to improve performance.

Line Marker

Use UastUtils.getUParentForIdentifier() or UAnnotationUtils.getIdentifierAnnotationOwner() for annotations to obtain suitable "identifier" element (see Line Marker Provider for details).

posted @ 2020-09-07 01:10  beansoft  阅读(389)  评论(0编辑  收藏  举报