PrimeTime and Timing Derate

We learned OCV in a few other posts. However running OCV at 2 different PVT corners may not always be practical. For ex, consider the voltage drops seen on a chip due to IR. We may not be able to get lib at that particular voltage corner after accounting for the voltage drop due to IR. Similarly for temperature, we may not be able to lib for that exact temperature after accounting for on chip heating. Also, even if we are able to get these libs, ocv analysis requires running at 2 extreme corners. If we do not want to run analysis at 2 diff corners for ocv, we can run it at 1 corner only by specifying derating. Derating is an alternate approach where we speed up or slow down certain paths so that they can indirectly achieve same results as OCV. Derating is basically applying a certain multiplying factor to each gate delay so that the delay can be scaled up (by having a multiplying factor > 1), or can be scaled down (by having a multiplying factor < 1). The advantage of derate is that each and every gate in design can now be customized to have a particular delay on it. With OCV analysis, we weren’t able to do this, as the flow just chose b/w WC and BC lib and applied one or the other to each gate in design. Here, we first choose a nominal voltage, for which we have library available, and then apply derate to achieve effects of Voltage and Temperature variations.

There are different kind of derates:

  • Timing derates: When we run sta at particular voltage/temperature, we assume same voltage and same temperature on all transistors. However, based on IR drop and temperature variation as well as aging effect, we know that not all transistors will be on same voltage/temperature. So, we apply timing derating. We apply these timing derates as “voltage guardband derates”. Even though we say voltage, we include effect of temperature and aging effect, so that the “voltage derate” includes effects of all of these. In PT flow, these derates specified via “set_timing_derate -pocvm guardband” or “set_timing_derate -aocvm guardband” => This is explained later. By default, derate specify ocv derate, which are derates due to local process variations only. Then we apply either aocv or pocv voltage guardband derate, which account for Voltage+Temperature+reliability derates.
  • POCVM distance derates: Only applied on clocks. This is additional derate on top of voltage derate above.
  • LDE (Layout dependent Effect) derates: provided by foundry. Applied as incremental derate
  • MixedVt / MixedLg derates: Differences in Threshold voltage as well as in “Length” of transistors, we experience differences in delay which don’t scale same way. i.e process is not correlated for different VT, so LVT might be at SS -3σ corner, but ULVT instead of being perfectly at SS -3σ corner, it may be a little bit faster or slower. e.g. in a slow-slow corner the capture clock is LVT and might be slightly faster than fast-fast corner due to Vt mistracking. This VT mistracking is not OCV related. OCV models local process variations, while Mixed VT is modeling global process corner correlation. We model this MixedVt correleation effect via derate.
  • Margining derates: Other derates used for margining

What derating factor to apply for ocv/aocv/pocv is derived by running monte carlo sims.

1. set_timing_derate => It allows us to adjust delays by a multiplying factor. It automatically sets op cond to ocv mode. The derating factor applies only to the specified objects which is a collection of cells, lib_cells or nets. If no objects specified, then derating applies to whole design. report_timing -derate shows timing with derating factor shown for each cell/net in the path. We do not derate slews or constraint arcs (as they are not supported for AOCV or POCV), but we do have options for setting these in set_timing_derate cmd.

options:

-early/-late => unlike other cmds, there is no default here. We have to specify -early to apply derating on early (shortest delay) path, and -late for late (longest delay) path. We need to have separate cmd for early and late, else derating will get applied to only 1. The tool applies worst-case delay adjustments, both early and late, at the same time. For example, for a setup check, it applies late derating on the launch clock path and data path, and early derating on the capture clock path. For a hold check, it does the opposite. We get these derating values from simulations. First, we try to find worst/best case voltage drop on transistor power pins (after accounting for off chip IR drop, PMU overshoot/undershoot and on chip IR drop) and then apply derating accordingly.

  • Early derating: We apply early derating corresponding to the voltage level which would be with off chip IR drop only. This is the absolute highest voltage that can be seen by any transistor on die. Then we add extra derate to account for temperature offset. We apply same early derate for both clk and data path.
  • Late derating: We apply late derating corresponding to the voltage level which would be with off chip IR drop + on chip IR/power_switch drop + reliability margin (due to VT shift seen for transistors with low activity). This is the absolute lowest voltage that can be seen by any transistor on die. Here, we apply slightly different derate for clk and data path. For clock path, we don’t add the reliability margin, since clk is always switching, so there is no reliability penalty that the clk path incurs. So, clk path sees a slightly higher voltage.

NOTE: since we apply these early/late derates, we want our nominal voltage at which we are going to run STA to be around these early/late voltages. If our librar’s nominal voltage is too far from these early/late voltages, then we have to apply large derating, which may not produce accurate results.

-cell_delay/-net_delay => By default, derating applies to both cell and net delays, but not to cell timing check constraints. This allows derating to apply only to cell or net delays. -cell_check allows derating to be applied to cell timing check constraints (setup/hold constraints)

-data/-clock => By default, delays are derated in both data paths and clock paths. This allows derating to be applied to only data or clock

-rise/-fall => By default, both rising and falling delays are derated. This allows derating to be applied to cell/net delays that end with a rising or falling transition only

There are many more options that we’ll see later (including -aocvm guardband/-pocvm guardband options). If the options -aocvm guardband/-pocvm guardband are not used, then the above derating cmd sets OCV derate, which ony accounts for local process variation related derate. Voltage/Temperature and Reliability derates are captured via additional derate specified with -aocvm guardband/-pocvm guardband options. Thus OCV derate and aocv/pocv guardband derate are all needed to account for all PVT+reliability variations.

ex: set_timing_derate -early 0.9; set_timing_derate -late 1.2 => The first command decreases all early (shortest-path) cell and net delays by 10 percent, such as those in the data path of a hold check (and clk path of setup check). The second command increases late (longest-path) cell and net delays by 20 percent, such as those in the data path of a setup check (and clk path of hold check). These two adjustments result in a more conservative analysis. We should use derating < 1 for early and >1 for late, since we are trying to simulate worst case ocv that can happen. Derating gets applied to whole design, as we did not specify any object.

ex: set_timing_derate -increment -cell_delay -data -rise -late 0.0171 [get_lib_cells { TSM_svt_ff_1v_125c_cbest/SDF_NOM_D1_SVT}] => applies derating of 1.7% only to lib cell specified for rise dirn, and long delay path. -increment adds this derating on top of any derating that is already applied globally or to this cell earlier.

ex: set_timing_derate -cell_delay -net_delay -late 1.05 [get_cells top/H1] => sets a late derating factor on all cells and nets in the hierarchical cell top/H1, including its lower-level hierarchical cells

report_timing_derate => shows derates set for all cells in design. This is very long list so better to redirect it to some o/p file

AOCV and POCV: OCV is OK, but it doesn’t model advanced levels of variations for 65nm and below, which results in overdesign. OCV allows us to model different derating for diff cells (by using set_timing_derate cmd), but fails to capture other factors. To mitigate some of these OCV issues, advanced forms of OCV came into picture. AOCV (advanced OCV) was used earlier, but even more advanced POCV (parametric OCV) is used now. We will go over details of both

PT has app_var variables which allow advanced OCV and parametric OCV. To report all app_var, we can use this cmd:

pt_shell > report_app_var *ocv* => reports all aocv and pocv app_var settings

AOCV: timing_aocvm_enable_analysis => setting it to true enables AOCV) => needs AOCV tables in a side file

POCV: timing_pocvm_enable_analysis => setting it to true enables POCV) => needs POCV side file or liberty file

AOCV: Advanced on chip variation

OCV doesn’t handle below factors:

  1. path depth => variation reduces on long paths due to statistical canceling. So, even if each cell has lot of variations, due to random nature of variations, they can be +ve variation or -ve variation. More the gates in design, higher the changes that +ve and -ve variations will cancel out, resulting in very low level of variations. So, path depth is a factor only for random variations on die.
  2. path distance => variation increases as paths travels across more die area. This is based on simple silicon observation that close by structures have less variation, but if you compare structure far away, they have larger variation. That is why in analog circuits, matching transistors are placed as close as possible to each other to minimize variations b/w them. So, path distance is a factor only for systematic spatial variations on die. Even if you have more gates in design but they are closeby, then spatial variations will be very low, compared to when these gates are far away. In other words, we are saying that these variations are correlated more or less depending on their proximity with each other. So, total variation in any path is function of both random variation as well as spatial variation. However random variation dominate, so path distance based variation is not very critical.
  3. different cell types => variations varies depending on transistors used in cells. Lower width transistor have more variations that larger width ones. However, cell level derating is already captured in simple derating cmd above, as it allows us to set different derates for different kinds of cells.

AOCV was proposed to provide path depth and distance based derating factors to capture random and systematic on-die variation effects respectively. Instead of applying a constant derating factor as in ocv to any cell, we adjust the derating factor for a given cell based on path distance and depth. This is provided in form of a 2D table in a file. AOCV provides a single number for delay of a path based on derating used (derating value itself is taken from 2D table based on path depth and path distance for that cell). It only models delay variation, but does not model transition or constraint variation. Thus AOCV is siame as OCV except for derating added for above 2 factors.

Both GBA and PBA AOCV can be performed. GBA AOCV reduces pessimism by some compared to GBA OCV, which may be sufficient for signoff. If few paths still fail, PBA AOCV can be run on selected failing paths, which reduces pessimism even further.

AOCV flow:

set_app_var read_parasitics_load_locations true => this is so that read_parasitics can read location info from spef file

read_parasitics file.spef => To use distance based derating specified in aocv file below, we need physical location of various gates, nets, etc. This info is contained in SPEF files, and can be read via read_parasitics cmd. If we have hirarchical flow, where there are separate spef files for blocks and for top lvel, PT can automatically determine correct offset and orientation of blocks. However, if it fails, we can specify it manually via extra args in read_parasitics cmd.

set_app_var timing_aocvm_enable_analysis true => sets GBA POCV analysis

read_ocvm file.aocvm => reads derating from this aocvm file. It has 2D table of derates with depth and distance as index (It can also be 1D table with either depth or distance as index, although this will give less accurate results). aocv derate take precedence over ocv derate specified for any cell, as it’s more accurate. Syntax of this file is explained under pocv section below.

set_timing_derate -aocvm_guardband -early 0.95 => this applies additional guardband to model non proces related effects (ir drop, margin, etc) in aocv flow. For fast paths, we reduced delays by further 5%. Final derate = aocv_derate * guardband_derate. set_timing_derate -increment adds derate on top of this derate (instead of multiplying, it adds). So, Final derate = aocv_derate * guardband_derate + incremental_derate. Either guardband derate or incremental derate can be applied, as two are mutually exclusive.

set_timing_derate -aocvm_guardband -late 1.04 => For slow paths, we increased delays by 4%.

update_timing => performs timing update

report_ocvm -type aocvm => reports number of cells/nets that aocvm deratings were applied on, in summarized table. Any cell/net not-annotated with derate is listed here.

report_ocvm -type aocvm I0/U123 => If object list specified, derating on specific instances or arcs reported

report_timing => shows timing paths with aocv analysis. -derate shows derate factor too

POCV: Parametric on chip variation

POCV is even more advanced, and radical departure from conventional methods. Here, timing values are stored not as one single number but rather as statistical quantities (i.e as a gaussian distribution with mean u and std dev sigma). These statistical distribution are propagated along the path, and probabllity/statistics theorems applied to come with a statistical distribution for the final delay at end point of a path. AOCV models deratings only for delay, but POCV statistical method is applied not only for delay, but also for transition variation for each cell on a path. It also models constraint variation (setup/hold times on flops), as these vary too depending on variation within the cell, as well as transition variation on clk and data pins. mu and sigma values are stored in lib files for each cell for delay, transition and constraint (only if provided for flops). By defauly, only delay variation is enabled. Transition variation and constraint variation have to be enabled to get better match with HPSICE monte carlo sims. Timing values can be reported at any N sigma corner (since sigma is known). Usually, we report it at 3 sigma, as that implies 99.9% of the dies for that timing path will lie within 3 sigma (i.e only 0.1% of chips will fail for that path).

POCV takes care of path depth automatically, as it propagates each distribution as independent random variable. So, statistically cancellation takes care of path depth. path distance is handled by using distance based AOCV tables. So, these tables are 1D in case of POCV (as opposed to 2D tables in AOCV).

Lower VDD, as found in low nm tech, increases sensitivity to delay, transition and constraint variation (as Vdd is close to Vth, so small change in Vdd causes large changes in current) . So, POCV accounts for all this sensitivity, and prevents overdesign during PnR. POCV run with GBA provides better coorelation with PBA, as it reduces pessimism in GBA. On other hand, with AOCV, exhaustive PBA had to be run at signoff as GBA has inbuilt pessimism, increasing run time. Tight GBA PBA corelation in POCV prevents running exhaustive PBA.

POCV is run in PT as regular flow. The only extra step is reading variation information from liberty files, or from sidefiles in AOCV like table. Then timing is reported at specific sigma corner.

POCV input data: 2 methods: One is providing a sidefile for distance based derating (or single sigma value called as single coefficient), and the other is liberty file with sigma values across i/p slew rate and o/p load. derate in POCV can be applied to both mean or sigma values. derate is applied to mean values available in .lib file, and sigma values available in .lib file or in sdefile as single coeff.

1. sidefile with POCV single coefficient: This sidefile is just an extension of AOCV table format in version 4.0 (this is same synopsys file format as shown in AOCV section). It can either have distance based derate, or constant coefficient for sigma. Syntax is as below: (file1.pocvm or file1.aocvm or any name)

version: 4.0

ocvm_type: pocvm => it can be aocvm or pocvm
object_type: lib_cell => this can be design, lib_cell or hier cell
rf_type: rise fall => for both rise/fall

voltage: 0.9 => this allows voltage based derating where diff derate can be applied based on voltage the cell is at.
delay_type: cell => this can be cell or net. For net, object_spec is empty to apply it for all nets
derate_type: early => early means it’s applied on shortest paths (for setup, clk paths are early, while for hold, data paths are early => to model worst case scenario)
path_type: clock => below derating is applied only for cells in clk path (applicable only for setup). For cells in data path for early (applicable only for hold), we may specify a different derating (<1).
object_spec: -quiet TSM_LIB/* => applies to all cells. For particular cells, we can do TSM_LIB/invx1*. -quiet prevents warnings from showing up
distance: 0 5000000 10000000 20000000 30000000 40000000 => this specfies distance for derating purpose
table: 1.000000 0.990442 0.986483 0.980885 0.976588 0.972967 => since type is early (fast paths), derates are < 1 to model worst scenario.

coefficient: 0.05 => this specifies single coefficient which is sigma divided by mean = sigma/mu (random variation coeff normalized with mu). This is specified if we want to do single coeff based POCV for our timing runs, instead of more accurate liberty based sigma. However, coeff and distance are mutually exclusive, you can specify only one of them. Different values can be specified for diff cells, etc. Usually more accurate lib files used to provide sigma, instead of providing coefficient here.

1
2
3
4
5
6
7
8
9
ocvm_type: pocvm
object_type: lib_cell
rf_type: rise fall
delay_type: cell
derate_type: late => late means it's applied on longest paths
path_type: clock => below derating only for cells on clk path (applicable only for hold). We specify derating separately for cells on data path for late (applicable only for setup)
object_spec: -quiet TSM_LIB/*
distance: 0 5000000 10000000 20000000 30000000 40000000
table: 1.000000 1.009558 1.013517 1.019115 1.023412 1.027033

=> since type is late (slow paths), derates are > 1 to model worst scenario.

coefficient: 0.02

2. LVF (liberty variation format) file: These file have additional groups which contain sigma info for delay, transition and constraint variation. They may also have distance based derating values here, instead of being in a sidefile (using ocv_derate group)

format of this explained in Liberty section

POCV flow:

set_app_var read_parasitics_load_locations true => this is so that read_parasitics can read location info from spef file

read_parasitics file.spef =>

set_app_var timing_pocvm_enable_analysis true => sets GBA POCV analysis

set_app_var timing_pocvm_corner_sigma 3 => sets corner value for timing to be 3 sigma. It can be set to 5 sigma for more conservative analysis

set_app_var timing_enable_slew_variation true => to enable transition variation (i/p slew variation affects delay variation as well as o/p transition variation). Optional but recommended for better accuracy at < 16nm

set_app_var timing_enable_constraint_variation true => to enable constraint variation (setup/hold, rec/rem, clkgating variation). Optional but recommended for better accuracy at < 16nm

read_ocvm file.pocvm => reads single coeff or distance based derating from side file based on what’s available

set_timing_derate -pocvm_guardband -early 0.95 => For fast paths, we reduced delays by further 5%. POCV guardband is applied on both mean delay and sigma delay (AOCV guardband is only for mean delay, as there’s no concept of sigma in AOCV). If we want to derate only sigma delay, we can scale pcvm coefficient in sidefile or liberty file (w/o modifying the value directly in sidefile or liberty file) by using “set_timing_derate -pocvm_coefficient_scale_factor 1.03” to scale it, which scales only sigma and not mean delay. However, pocvm coeff scaling is applied on top of guardband for sigma delay.

  • Final derate_mean = pocv_derate * guardband_derate + incremental_derate,
  • Final_derate_sigma = guardband_derate * pocvm_coefficient_scale_factor

set_timing_derate -pocvm_guardband -late 1.04 => For slow paths, we increased delays by 4%

update_timing => performs timing update

report_ocvm -type pocvm => reports summarized pocvm deratings applied on cells/nets. If object list provided, it shows coeff and distance based derating picked from sidefile or LVF

report_timing => shows timing paths with aocv analysis. -derate shows derate factor too. However, now we may want to see both mean and sigma delays (since sigma delays are taken into account when reporting slack). slacks are not simple difference b/w expected arrival time and atual arrival time, but are square root of squares of these (since now we are dealing with statistical quantities). To see both mean and sigma delays, set this app var:

set_app_var variation_report_timing_increment_format delay_variation => Now report_timing -derate will show 2 columns: mean (delay w/o variation) and sensit (sigma or delay variation). Incremental time column for that arc should equal mean +/- 3*sensit (+ or - depending on slow(max) or fast(min) paths). mean and sensit are with derating already applied. Apart from incremental colums, there is also path colums, which show both mean and sensit again. Mean here is the cummulative mean upto that point (sum of all means), while sensit is cummulative sensitivity upto that point (sqrt of sqaures of all sigma). These help to verify various path delays and how they contribute to overall delays. There are also statistical corrections applied to get numbers to add up. There is also statistical graph pessimism applied in timing analysis. Latch borrowing also needs to be treated differently when in POCV mode.

  • final cell_delay mean derated = Cell_delay_mean * final_derate_mean = cell_delay * ( “POCVM guardband” * “POCVM distance derate” + “Incremental derate” )
  • final cell_delay sigma derated = cell_delay_sigma_adjusted * final_derate_sigma = cell_delay_sigma_adjusted * ( “POCVM guardband” * pocvm_coefficient_scale_factor) => cell_delay_sigma here is adjusted from the original sigma number reported in liberty file (if LVF used) by accounting for the fact that transition variation on i/p will affect delay variation (as well as o/p transition variation) depending on correlation b/w transition and delay. There is proprietery formula applied by synopsys to come up with adjusted sigma number.

Now update_timing runs aocv, and report_timing shows timing paths with pocv analysis.

report_delay_calculation -from buf1/A -to buf1/Z -derate => This shows detailed calculation of cell mean delay and cell sigma delay with derating. This is useful for debug.

POCV analysis with xtalk: POCV analysis is only applied on non-SI cell delay. However, POCV can indirectly change crosstalk delta delay due to the difference in timing window alignment.

POCV characterization: POCV data can be generated via Synopsys SiliconSmart too, It can generate sigma for use in LVF, or coeff for use in pocv side file. distance based derating in sidefile can’t be generated via any tool, and is generated via silicon measurements