The ClockClock Project
FPGA
With any FPGA design it is important to outline what you want it to do before you start. In this case I needed to create something that could accept commands over I2C and step the motors accordingly.
I settled on the commands being a number of steps and a value corresponding to the period between steps. I originally thought about making the controller fancier with automatic ramping of the steppers but it turned out not to be necessary and would just complicate coordination between the hands.
I also wanted to be able to queue up a series of commands. That would make the Qwiic timing not important as each command would just be executed one after another.
Finally, I need the design to figure out when steps would be issued and enable/disable the motors accordingly to save power.
The Animator
To start off, I created a module that would control a single stepper motor. This module would accept a command to step so many steps with a specified delay between each step. It would then generate the appropriate direction and step signals for the stepper motor driver.
Here is the code for the module.
module animator (
input clk, // clock
input rst, // reset
signed input stepCount[16], // it can be negative to indicate direction
input delayCycles[16], // cycles between each step
input newAnimation, // flag for new animation
output busy, // flag the animator is busy and won't accept animations
output step, // step signal for the driver
output direction // direction signal for the driver
) {
.clk(clk) {
// The driver requires each "step" pulse to be at least 1us so we make them 2us
pulse_extender stepExt(#MIN_PULSE_TIME(2000));
dff dirCt[8]; // counter for waiting after changing direction. 200ns delay required
dff counter[16+8]; // counter for delaying between steps. The +8 is the pre-divider
.rst(rst) {
fsm state = {IDLE, DIR_WAIT, STEP};
dff dir; // saved direction of the motor
dff delayCt[16]; // saved delay count
dff steps[16]; // saved number of steps (absolute value)
}
}
always {
busy = state.q != state.IDLE; // busy when not idle
step = stepExt.out; // step output is the extended pulse
stepExt.in = 0; // default to no new step
direction = dir.q; // output the saved direction
case (state.q) {
state.IDLE:
if (newAnimation && stepCount != 0) { // if new animation with steps (skip 0 step animations)
state.d = state.DIR_WAIT; // move to next state
dir.d = stepCount[stepCount.WIDTH-1]; // direction is the sign of the step input (0 = positive, 1 = negative)
steps.d = stepCount[stepCount.WIDTH-1] ? -stepCount : stepCount; // save the absolute value of stepCount
delayCt.d = delayCycles; // save the number of delay cycles
}
state.DIR_WAIT:
dirCt.d = dirCt.q + 1; // wait for the direction output after it changed
if (&dirCt.q) { // if done waiting
state.d = state.STEP; // move to stepping state
}
state.STEP:
counter.d = counter.q + 1; // increment step delay counter
if (counter.q[counter.WIDTH-1-:16] == delayCt.q) { // if counter has reached the delay count
counter.d = 0; // reset counter
stepExt.in = 1; // send a pulse
steps.d = steps.q - 1; // decrement the number of steps remaining
if (steps.q == 1) { // if no more are left
state.d = state.IDLE; // return to idle
}
}
}
}
}
The first thing to note is that the stepper controller I used required that the direction input be stable for at least 200ns before and after rising edge of the step input. It also required that the step pulse have a minimum time high or low of 1us.
The step pulse width is easily achieved using the pulse_extender
module from the component library. This module takes single cycle pulses and extends them to the specified length. In this case I set the length to be 2us to be nice and safe.
Once a new animation command is received, the direction output is set and the module waits for the dirCt
to overflow. This counter holds 256 values and when using a 100MHz clock, that means it waits 2.56us. This is significantly longer than the 200ns required but it ensures that there are no timing issues with the long wires. Tightening the timing here wouldn’t make any performance difference either.
The stepping state increments a counter and steps each time it overflows. This counter has a pre-divider of 8 so each increment of delayCycles in the animation command is 256 extra delay cycles. This pre-divider allows
delayCycles` to stay relatively small at 16 bits but still allow for a very wide range of speeds.
Even with this pre-divider, I found the lowest delayCycles
could be safely set is around 760. That corresponds to 8 seconds for a full rotation.
Enable Gate
The next module to tackle is the enable gate. This module is responsible for gating, or blocking, the new animation flag to the animators while the motors are being enabled. It also makes sure the motors stay enabled long enough after an animation to complete their last step.
The module takes in the new animation pending flags for 12 different motors. It then enables the motors and waits a while, 42ms, for the drivers to re-energize the motors and the motors to settle.
After this period, it allows the pending animation flag to pass onto the animators.
While any animator in the group is running, it keeps the motors enabled. Once the last one is done, it keeps the motors enabled for another 42ms before disabling them.
Here’s the code for the module.
module enable_gate (
input clk, // clock
input rst, // reset
output new_animation[12], // output to the animators
input fifo_empty[12], // input from fifos (pending animations)
input animator_busy[12], // input from animators
output enable // output to the stepper drivers
) {
.clk(clk) {
.rst(rst) {
dff onCtr[22]; // counter to ensure motors are fully on (22bits ~ 42ms)
dff offCtr[22]; // counter to keep motors on after animators finish (22bits ~ 42ms)
}
}
sig running; // value used to know when the motors should be on
always {
// run when we have pending animations or are actively running
running = |(animator_busy | ~fifo_empty);
// enable flag is set when running or onCtr isn't 0 (it is reset after offCtr overflows)
enable = running || (onCtr.q != 0);
// pass on new_animation flag only when onCtr is full
new_animation = ~fifo_empty & 12x{&onCtr.q};
if (running) {
offCtr.d = 0; // reset off counter
if (!&onCtr.q) { // if not full
onCtr.d = onCtr.q + 1; // increment onCtr
}
} else { // not running
if (!&offCtr.q) { // if offCtr not full
offCtr.d = offCtr.q + 1; // increment offCtr
} else { // if offCtr is full
onCtr.d = 0; // reset the onCtr
}
}
}
}
When the module is sitting idle, onCtr
is 0 and offCtr
will max out. When a pending animation is detected (the fifo isn’t empty), the enable output is set and onCtr
is incremented each cycle.
Once onCtr
is full, the new_animation
flags are passed through.
Once all the animations have been performed, offCtr
is incremented. Once it reaches its maximum value, onCtr
is reset which disables the motors.
An interesting line to look at is the first line in the always block.
running = |(animator_busy | ~fifo_empty);
This line can be a bit cryptic if you aren’t familiar with bitwise reduction operators. The goal of this line is to take the 12 animator_busy
signals and the 12 fifo_empty
signals and turn them into a single bit.
First, we can think about a single case. Any one motor is running if the fifo isn’t empty or it is currently busy. This can be taken care of by animator_busy | ~fifo_empty
. A single pipe (vertical bar, |) is a bitwise OR. This will OR each of the bits of the two operands together keeping the bit width the same. The tilda (~) is a bitwise inversion. This flips each of the bits in fifo_empty.
After those operations we now have a 12 bit wide signal that says when each animator is running. However, we need to condense this into a single bit. The OR reduction operator is used here. The pipe operator, when placed in front of a value without a preceding value, will OR all the bits in the signal together and output a single bit.
In this case, that means if any of the motors are running, running will be 1.
Later on in the module, I use the AND reduction operator to check if all the bits in a signal are 1 (aka the max value). This works the same way as the OR reduction operator but ANDs ever bit together. Basically, it is 1 if they all are 1 and 0 otherwise.
You can also use the carrot (^) to perform an XOR reduction which will be 1 if there are an odd number of 1s.
Qwiic
We are now going to look at the top level module which takes care of the Qwiic interface and glues everything together.
Let’s just jump into it.
module au_top (
input clk, // 100MHz clock
input rst_n, // reset button (active low)
output led [8], // 8 user controllable LEDs
input usb_rx, // USB->Serial input
output usb_tx, // USB->Serial output
output step[48], // step output to motors
output dir[48], // direction output to motors
output enable[4], // enable output to motors (one per digit)
inout sda, // Qwiic SDA
input scl // Qwiic SCL
) {
sig rst; // reset signal
.clk(clk) {
// The reset conditioner is used to synchronize the reset signal to the FPGA
// clock. This ensures the entire FPGA comes out of reset at the same time.
reset_conditioner reset_cond;
dff ani_id[6]; // saved ID for the motor
signed dff ani_steps[16]; // saved number of steps
dff ani_delay[16]; // saved delay counts
dff byteCt; // byte flag for 16bit numbers
.rst(rst) {
i2c_peripheral qwiic (.sda(sda), .scl(scl)); // i2c peripheral module for qwiic interface
dff ledReg[8]; // reg to hold the LED values (useful for qwiic testing)
fsm state = {IDLE, ENABLE, LED, ANIMATION_STEPS, ANIMATION_DELAY, ANIMATION_PUT};
animator animators[48]; // need 48 individual animators (one per motor)
// need one fifo per animator, 32 bits wide for 16 bit steps and 16 bit delay
// the 128 depth is definitely overkill and 16 would probably be plenty for the
// current usage.
fifo ani_fifos[48] (#SIZE(32), #DEPTH(128));
enable_gate gates[4]; // modules to control the enable signals (one per digit)
}
}
var i;
always {
reset_cond.in = ~rst_n; // input raw inverted reset signal
rst = reset_cond.out; // conditioned reset
led = ledReg.q; // output ledReg to the leds
usb_tx = usb_rx; // echo the serial data
// the ~ here flips every motor direction so positive steps would go clock-wise
// the 48hAAAAAAAAAAAA constant has every other bit flipped so the geared motors
// and direct drive motors will turn the hands the same way
dir = animators.direction ^ ~48hAAAAAAAAAAAA;
step = animators.step;
enable = ~gates.enable; // enable of the controllers is active low so invert the bits
qwiic.tx_data = 8bx; // this design is "write only" and never sends data to the microcontroller
qwiic.tx_enable = 0; // never send data
// combined groups of 12 motors for the four enable gates
for (i = 0; i < 4; i++) {
gates.fifo_empty[i] = ani_fifos.empty[i*12+:12];
gates.animator_busy[i] = animators.busy[i*12+:12];
animators.newAnimation[i*12+:12] = gates.new_animation[i];
// only remove a value from the fifo when the gate passes the new_animation flag
// and the animator isn't busy
ani_fifos.rget[i*12+:12] = gates.new_animation[i] & ~animators.busy[i*12+:12];
}
// for each motor split the fifo output to the animator signals
for (i = 0; i < 48; i++) {
animators.stepCount[i] = ani_fifos.dout[i][15:0];
animators.delayCycles[i] = ani_fifos.dout[i][31:16];
}
// default to no new animations
ani_fifos.wput = 48b0;
// always input the saved delay and steps
// this line takes the two values, joins them, packs them into a 1x32 array,
// and finally duplicates it 48 times into a 48x32 array
// essentially, it just feeds the same 32 bits to each of the 48 fifos
ani_fifos.din = 48x{{c{ani_delay.q, ani_steps.q}}};
case (state.q) {
state.IDLE:
byteCt.d = 0;
if (qwiic.rx_valid) { // new data
case (qwiic.rx_data) { // case on the value
8hFF: state.d = state.LED; // make "address" FF the LEDs for testing
default:
ani_id.d = qwiic.rx_data[5:0]; // default to "address" as the motor id
state.d = state.ANIMATION_STEPS;
}
}
state.LED:
if (qwiic.rx_valid) { // if new data
state.d = state.IDLE; // return to idle
ledReg.d = qwiic.rx_data; // show value on the LEDs
}
state.ANIMATION_STEPS:
if (qwiic.rx_valid) { // if new data
ani_steps.d = c{ani_steps.q[7:0], qwiic.rx_data}; // save byte and shift old byte
byteCt.d = ~byteCt.q; // flip byte counter
if (byteCt.q == 1) { // if second byte
state.d = state.ANIMATION_DELAY; // go to delay capture state
}
}
state.ANIMATION_DELAY:
if (qwiic.rx_valid) { // if new data
ani_delay.d = c{ani_delay.q[7:0], qwiic.rx_data}; // save byte and shift old byte
byteCt.d = ~byteCt.q; // flip byte counter
if (byteCt.q == 1) { // if second byte
state.d = state.ANIMATION_PUT; // go to put state
}
}
state.ANIMATION_PUT:
state.d = state.IDLE; // return to idle
ani_fifos.wput[ani_id.q] = 1; // put the new animation into the correct fifo
}
if (qwiic.stop) { // if I2C stop condition is detected
state.d = state.IDLE; // reset to IDLE
}
}
}
The Qwiic interface is handled by the i2c_peripheral
module. This module is a bit complicated since it breaks out the start/stop signals and requires you to provide direction when it should accept data or send data.
For our case, we can simplify it a lot by only reading in data. The important flags become rx_valid
which tells us a new byte has been read in and stop that says the I2C transaction was stopped and we should reset. The output rx_data
has the value of the byte read in when rx_valid
is high.
If you want to respond to something, you need to monitor the start, next, and write flags. On the next clock cycle you can set tx_enable
to 1 and provide data to send on tx_data
. This will cause the module to write that byte instead of listen for one.
The start flag signals your ID was detected on the bus. At the same time that this is set, write will tell you if the last bit in the ID byte indicated a read (0) or write (1).
Again, we can ignore all this for this design.
The protocol I used for each transaction is the first byte is the address followed by the command’s data. For addresses 0-47, four bytes are expected. The first two are the step count and the second two are the delay count. Address 8hFF is special in that it only expects one byte after it and is used to set the LEDs on the Au. This is useful for testing the Qwiic bus.
I also made it so that you don’t need to start/stop the I2C transaction for each animation. Every 5 byte packet is a valid animation and it will loop after the last byte is received. This allows you to send all 48 motors a new animation in a single transaction.
FIFOs
This design is set up with 48 FIFOs to hold additional animations while the animators are busy. These are created from the fifo
component in the component library.
Each FIFO is 32 bits wide and 128 entries deep. The 32 bits are split into 16 for the delay and 16 for the step counts. 128 entries deep is overkill for the current usage but would allow for many short animations to be stacked if I wanted to implement ramping and faster movements on the other side. The Au has plenty of built-in block RAM to fit all this anyways.
The FIFO follows a first-word-fallthrough style where when empty is 0, indicating there is data, the value is already available on dout. Setting rget to 1 will remove the entry and show the next entry on the following clock cycle.
To supply data to the FIFO, simply put the data on din
and set wput
to 1. You may also want to check that full
isn’t 1, or your data may be ignored.
Bit Assignments
In the beginning of the always block there is quite a bit of array/bit manipulation.
In Lucid, you can conveniently make modules arrays and have their ports packed into arrays. In some cases of our design, like the step outputs, we can directly assign these arrays as the bits perfectly line up.
In other cases, we need to split them out into subsections. When this happens it is convenient to use for loops. Remember that for loops can’t be realized in hardware and need to have a fixed number of iterations so they can be unrolled during synthesis. They are simply a way to write things more compact.
For example, the first for loop goes through four iterations with i being 0 through 3.
The first line will evaluate to the following for the first iteration.
gates.fifo_empty[0] = ani_fifos.empty[0+:12];
The [0+:12] bit selector means starting at bit 0, select 12 bits above it. So bits 0-11 are selected.
In the next iteration, it will evaluate to the following.
gates.fifo_empty[1] = ani_fifos.empty[12+:12];
Here, the second enable gate gets the bits 12-23.
All four iterations could be listed out.
gates.fifo_empty[0] = ani_fifos.empty[0+:12];
gates.fifo_empty[1] = ani_fifos.empty[12+:12];
gates.fifo_empty[2] = ani_fifos.empty[24+:12];
gates.fifo_empty[3] = ani_fifos.empty[36+:12];
This would create an identical circuit in the FPGA. However, I’m sure you’ll agree that this is cumbersome to type out and maintain.
It is very common to use the start/width bit selectors in for loops instead of the start/stop bit selectors. This is because you can’t use start/stop selectors with non-constant values.
The start/width selector used above ensures that the width of the selection is always 12 bits wide. You can’t realize a signal that changes width in hardware as you can’t spontaneously create or remove connections.
In this case I used the up variant of the selector by using the +:. You can also use -: to use the down variant of the selector. This selects the start bit and the bits below it.
For example, [11-:12] is the same as [0+:12]. They both select bits 0-11.
Pin Assignments
At this point you may be wondering how the step
and dir
signals map to the IO pins on the Au.
This mapping is defined in a constraint file. In this case they are in the clockclock.acf file. The acf extension is for Alchitry Constraint File. This format is very simple and allows you to specify the pin names as the pins on the Alchitry boards instead of the FPGA. For example, A2 maps to the second pin of the top left header (bank A) on the Au.
If you open this file you’ll see a whole bunch of lines that look like this.
pin step[0] A2;
pin dir[0] A3;
Each IO port needs to be mapped to a physical pin. The format is the pin keyword followed by the signal name and finally the physical pin location.
You can also add the pullup
or pulldown
keyword to add an internal pullup/down resistor to the pin. However, `pulldown is ignored on the Cu as the Lattice FPGA doesn’t have internal pulldown resistors.
Most of the pins on an FPGA are fully interchangeable and the pinout I used for the clock was super arbitrary with the exception of the Qwiic signals since they are wired to the Qwiic connector.
All that was important for this project was that I kept them all straight.