
|
Computing Degree Revision Notes
|
Here are revision notes for my software engineering degree. I found
them useful, maybe you will as well. Note that I make no guarantees about
the accuracy of them. Use at your own risk!
Artificial Intelligence (AI)
Search
Start from initial state and search for goal state.
-
Depth First Search - May not find solution. Max. depth should usually be
specified to stop it from getting trapped in a dead end.
-
Breadth First Search - Will find shortest solution.
Both DFS and BFS are not v.efficient
-
Hill Climbing - version of DFS where the most promising node is selected
to expand. Great efficiency gains can be made as not all the search space
has to be stored. It could however get trapped.
-
Best First Search - mixture of DFS and BFS. The most promising of all the
nodes seen so far is selected to expand.
-
A* - specialised version of Best First Search where we take the cost of
getting to the current node into account. We select best node by how good
the node looks, and by how expensive the path to this node is. A* is good
if we are interested in the quality of a solution.
f(x) = g(x) + h(x) - where g(x) is the cost of the cheapest known path
from the source to x, and h(x) is the heuristic estimate of the cost of
getting from node x to the goal node.
Heuristics
Heuristic knowledge can be used to guide the search process. Weak heuristics
involve very general search strategies, with little domain knowledge. They
can be used in a wide range of domains Strong heuristics involve a lot
of specific domain knowledge, and can only be used on a limited number
of domains.
Knowledge Representation
-
Resolution Refutation - We want to convert logical statements to clause
form. This is a flat representation with no embedded components, and all
quantifiers have been separated out.
-
Eliminate <->
-
Eliminate ->
-
Reduce scope of ¬ to a single atom
-
Rename variables so quantifiers bind to a unique variable
-
Eliminate existential quantifiers (skolemzing) - Each existentially quantified
variable must be replaced by a skolem function that is defined in terms
of the universally quantified variables that it is in the scope of.
-
Move quantifiers to left of formula
-
Eliminate prefix (all variables are now assumed to be universally quantified)
-
Transform into a conjunction of clauses
-
Standardize clauses by renaming variables
-
Unification
-
Resolve clauses - Negate clause that you want to prove and add it to the
set of clauses
Answer extraction - Add the goal and complement of it to the set of clauses.
Tag the goal and don't select the goal as a complementary literal. It's
variables are changed as a result of substitutions. At the end this tagged
clause should be the only one left
Concurrency
Mutual Exclusion Problem
-
ME Property - The processes are mutually exclusive if at any one time,
no two processes are in their critical sections.
-
Deadlock freedom (and livelock freedom) - deadlock is where the process
remains in it's waiting stage, and will never be executed
-
No Starvation - A process is starved if other process can enter their critical
sections many times, but this process cannot enter it.
-
Success when no contention - If no other processes need to enter the critical
section, then a process which does, should be allowed to enter it.
Peterson's algorithm is correct
Semaphores
These are a higher level mechanism for concurrency.
Busy Wait Semaphores
-
Init(S)>=0
-
Wait(S) If S > 0 then S=S-1 else suspend the execution of this process.
The process is said to be suspended on the semaphore S.
-
Signal(S) If there are processes that have been suspended on this semaphore,
wake one of them else S=S+1
Blocked Queue Semaphores
-
Init(S)>=0
-
Wait(S) - If S > 0 then S=S-1 else suspend the execution of this process
and put it into FIFO queue. enqueue(qs,p)
-
Signal(S) If qs is not empty then dequeue(qs,p) and wake P else S=S+1
At any point of execution of a program using semaphore S, S>=0 and S=Init(S)
+ #signals - #waiting operations
Bounded Buffer
Example of producer-consumer problem
Deadlock Model
The following conditions are necessary for deadlock.
-
ME
-
Hold and Wait - If a process wants a resource that is already being used
then it will have to wait.
-
No Pre-emption
-
Circular Wait
Deadlock Prevention
-
Hold and Wait - 1. Get all or none - you get all resources or you get none
2. Get some - Release all - Get others.
-
Pre-emption
-
Circular Wait - in order to prevent circular waiting all resources must
be numbered. The processes must then request resources from smaller to
bigger
Resource Allocation Graphs
Deadlock Detection
Databases
Candidate Keys
Primary Keys are a specialised form of Candidate Key. If R is a relation
then a Candidate Key (C) for R is a subset of R's attributes such that:
-
No two tupples of R have the same value for C. The Candidate Key is a unique
identifier for the tupple.
-
No proper subset of C has a uniqueness property. C is irreducible.
Candidate keys are used by the DBMS to pinpoint a specific tuple. Each
tuple may have multiple candidate keys. For example a table of atomic elements
may have a candidate key for the element symbol, and a candidate key for
the element number. A superkey is a relation that contains at least one
candidate key as a subset.
Primary Key
One of the relation's Candidate Keys must be chosen to be the Primary Key.
If the relation has more than one Candidate Key (rare) then it is generally
up to the user to choose which one will be the Primary Key. No component
of a Primary Key may accept nulls.
Foreign Key
The Foreign Key of a table must match the primary key of another table.
Every value of a foreign key must appear as a value of a matching Candidate
Key. However the primary key of the foreign table is not required to be
a foreign key of the first table.
Referential Integrity
The database must not contain any unmatched foreign key values. If B referenced
A, then A must exist.
Relational Algebra
Relation algebra is a theoretical language with operations that work on
one or more relations without changing the original relations.
-
Closure - the output from one relation is another relation.
-
Union - Two relations can be unioned if they have the same set of attribute
name, which are defined on the same domain.
-
Intersection - The intersection of the relations that have the same set
of attribute names.
-
Difference - Minus
-
Product - Times
-
Restriction - selects the rows that satisfy the condition.
-
Projection - chooses the columns that match a condition.
-
Natural Join - Joins two relations that have matching attributes.
-
Division
Intersection and Times are associative and commutative. Minus is not.
Functional Dependencies
A functional dependency is a many-to-one relationship from one set of attributes
to another in a given relationship. FD's tell us the meaning of data. i.e.
every supplier is located in only one city. FD's represent integrity constraints.
They are checked by the DBMS at every update. The left hand side of a FD
is called a determinant. The right hand side is called the dependent.
For efficiency reasons we want to be able to find the smallest set of
FD's for the database. An irreducible set of dependencies has the following
properties:
-
RHS (the dependent) of every FD in S has just one attribute
-
LHS (the determinant) is irreducible
-
No FD can be discarded without changing the closure of S+
In order to reduce a set of FD's the following rules can be used.
-
Reflexivity - If B is a subset of A then A->B
-
Augmentation - If A->B then AC->BC
-
Transitivity - If A->B and B->C then A->C
-
Self-determination - A->A
-
Decomposition - A->BC then A->B and A->C
-
Union - If A->B and A->C then A->BC
-
Composition - A->B and C->D then AC->BD
Normalisation
-
1NF - every field contains atomic data.
-
2NF - every non-key field is irreducibly dependent on the primary key.
-
3NF - All non-key fields (if any) are (1) mutually independent and (2)
irreducibly dependent on the primary key.
-
BCNF - Only determinants (LHS of FD) are candidate keys - There are only
dependency arrows out of the candidate keys.
Normalisation is always reversible as no information is lost. Only redundant
and duplication is removed. If the relations are not normalised to a sufficient
extent the Insert, Update and Delete problems can occur.
Calculating Closure
To calculate the closure {A,B}+ of attributes {A,B} using the FD's A->BC,
E->CF, B->E, CD->EF
-
Add A,B to the result
-
Loop around the FD's, If the left hand side is a subset of the closure
so far, add the attributes to result.
-
Keep looping round the FD's until no new attributes can be added.
To tell if the FD X->Y follows under S (set of FD's). Calculate the closure
of X+ of X under S. If Y is a subset of this closure then X->Y follows.
Functional Programming
Important Higher Order Functions
A higher order function that accepts another function as input.
foldr f s [] = s
foldr f s (a:x) = f a (foldr f s x)
map - applies a function to each item of list
map :: (t->u)->[t]->[u]
map f [] = []
map f (a:x) = f a : map f x
filter - returns the elements of the list that satisfy a condition
filter :: (t->Bool)->[t]->[t]
filter p [] = []
filter p (a:x) | p a = a : filter p x
| otherwise = filter p x
A polymorphic function can involve more than one type of variable.
Other standard functions
-
length - returns length of list
-
concat - flattens a list of lists into a list
-
take, drop - takes or drops elements from a list
Currying
Currying is when a function that accepts two or more arguments is made
to accept arguments one at a time.
Grammar
Phrase Structure Grammar
G = (N,T,P,S)
N= finite set of non-terminal symbols
T= finite set of terminal symbols (such that N intersection T = {),
and write V=NUT)
P is a finite set of pairs called production rules
S= start symbol
-
Immediate derivation - New string that can be derived from old string by
applying one production rule.
-
Derivation -New String that can be derived from old string by applying
production rules a number of times
-
Sentinal form - This is any string that can be derived from start symbol.
Made up of either terminals and/or non terminals
-
Sentence - Sentinal form consisting of only non-terminal symbols.
-
Language L(G) - set of all sentences generated by G
Deterministic Finite State Automaton
A deterministic FSA is a 5 tuple
M = (K, T, t, k, F) ---- Example M=(K,T,t,k0,F)
K= finite non-empty set of states
T= finite input alphabet
t= (possibly partial) mapping of KxT into K
k (inK) is the initial state
F (inK) is the set of final states |
K={k1, k2, k3, k4}
T={0,1}
F={k1}
t(k1,0) = k3
t(k1,1) = k2 etc |
Non-Deterministic Finite State Automaton
A non-deterministic FSA is a 5 tuple
M = (K, T, t, k, F) ---- Example M=(K,T,t,k0,F)
K= finite non-empty set of states
T= finite input alphabet
t= mapping of KxT into subsets of K
k (inK) is the initial state
F (inK) is the set of final states |
K={k0, k1, k2, k3, k4}
T={a,b}
F={k2,k4}
t(k0,a) = {k0,k3}
t(k0,b) = {k0,k1} etc |
Top Down Parsing
Generates a left most derivation tree. May need to eliminate left recursion.
Predictive Parser
G = (N,T,P,S)
N= {S,A}
T= {a,b,c}
P={S -> aS (a)
S -> bA (b)
A -> d (d)
A -> ccA} (c) |
Predictive Parser Table
|
a |
b |
c |
d |
S |
S -> aS |
S -> bA |
|
|
A |
|
|
A -> ccA |
A -> d |
|
Above: The look ahead value is in paremphasis. It is an LL(1) grammar
as each starts with a different terminal symbol.
Input |
-Stack- |
Action |
[a]abccd$ |
S$
|
apply S->aS |
[a]abccd$ |
aS$
|
pop&scan |
a[a]bccd$ |
S$
|
apply S->aS |
a[a]bccd$ |
aS$
|
pop&scan |
aa[b]ccd$ |
S$
|
apply S->bA |
aa[b]ccd$ |
bA$
|
pop&scan |
aab[c]cd$ |
A$
|
apply A->ccA |
aab[c]cd$ |
ccA$
|
pop&scan |
aabc[c]d$ |
cA$
|
pop&scan |
aabcc[d]$ |
A$
|
apply A->d |
aabcc[d]$ |
d$
|
pop&scan |
aabccd[$] |
$
|
String is valid |
Java
Graphs
Various representations
-
2D array - v.fast to check if edge exists, and to add edges.
-
List of edges
-
Adjacency list representation
Algorithms
-
Dijkstra's Algorithm for shortest paths - keeps array of shortest know
distances to goal node
-
Floyd-Fulkerson - finds paths through graph to give largest network flow.
-
Formula Checker - Add 2 edges for each pair in formula. If there is a path
from + to - version of any variable then formula is false.
-
Warshalls algorithm - Finds shortest path?
Java
-
Observers - addObserver(Observer o), notifyObservers(), notifyObserver(Observer
O)
-
Client server system
-
GUI's
For more Java information see my Java source code page.
Networks
A network is a system of inter-connected communicating entities.
Data Transmission
-
Amplitude is the height of the waves
-
Wavelength is the length of the waves.
-
Frequency is the number of complete units per unit time
-
Phase is the relative position in time of a fixed point in a signal.
A codec converts analog data to digital data.
-
Duplex - In a full duplex system the two signals can be sent at different
frequency, and then the correct frequencies extracted at each end.
-
Attenuation - the amount of reduction in signal strength due to boosting
a weakened signal
-
Delay - the time of propagation
-
Noise - Thermal, intermodulation (signal interferes with itself), crosstalk
(signal overlay), impulse (everything else, electrical). Noise is measured
as a ratio in decibels.
Different media are used to transmit signals. Either fixed wire or free
medium
-
Two wire - one line carries ground, the other carries the signal
-
Twisted pair - reduced interference, use repeater every 2-3km
-
Coaxial Cable - big bandwidth, up to 10,000 voice signals.
-
Optical fibre
Then there are free media
-
Terrestrial microwaves - subject to interference
-
Satellite microwaves
-
Radio
-
Infrared
Network Performance
We need to be able to measure network performance. We need to be able to
work out reliability, availability, usage, response time and job loss.
We can use system modelling to predict how a network will perform. Several
types of model.
-
Deterministic - We know what will happen and when
-
Non-Deterministic - We don't know what will happen and when - uses random
time e.g. component failure
-
Continuous event - weather system
-
Discrete event - things happen at a set time (e.g. light bulb breaking)
-
Continuous time
-
Discrete time - Things happen at a set CPU clock cycle
Roulette simulation - probability of a particular event occurring only
depends on the current state
Network Architecture
OSI seven layer model
-
Application later - communication between end user applications
-
Presentation layer - data encryption, compression
-
Session layer - Deals with synchronisation of the dialogue, between users
at each end.
-
Transport layer - this layer is based on an end-to-end connection
-
Network layer - establishes and maintains connection between end points
-
Data link layer - bits -> frames. Provides error correction.
-
Physical layer - check for bit errors, convert bits into signals
Protocols
Used for communication between entities in different systems.
Data Integrity
Data can be corrupted, and integrity checks are needed to detect this corruption.
Parity can be used to check errors. If seven bits are used to represent
a character then an 8th bit can be added so that the number of 1's is always
even. If a single bit is corrupted then the number of 1's will no longer
be even and the error will be detected (even parity).
Checksums are a more sophisticated way of checking for corruption. The
numerical values can be added, and then the lower digits of the sum can
be sent as a checksum. e.g. if the total is 67835, we might send he value
835 as the checksum. A cyclic redundancy check or frame check sequence
can be used, these use a more complex mathematical equation to compute
a checking value, and they are often useful for dealing with certain error
patterns.
In order to help correct errors as well as detect them hamming codes
and convolution codes can be used. Convolution codes operate on continuous
streams of data rather than blocks. The value of each encoded bit is dependent
on the number of preceding bits.
Data may be transmitted 'vertically'. This will make the use of hamming
codes more effective as a burst of interference will affect fewer bits
from each block.
Redundancy, Routing and re-routing, and acknowledgement and resend are
other ways of ensuring data integrity.
Security
-
Static data access - restrict access to host, or material including passwords,
read/write protection and firewalls.
-
Dynamic data access - measures to prevent data from being read, or altered
while in transit. Uses encryption and authentication.
Border routers are simple gateways that use simple filtering. Bastion hosts
link the internal and external networks at an application level (www or
ftp). These can prevent connections at the lower TCP level, which prevents
most types of hacking tools.
Networks
LANs are normally manages by a single organisation, whereas a WAN would
be managed by a number of organisations.
-
Ethernet - computers connected by coaxial cable. Frames of data are transmitted
to all computers. The computer whose address matches the destination of
the frame can receive the frame. All ethernet frames are preceded by a
preamble of 62 alternating bits. If two frames are sent by two different
computers at the same time then a collision will occur, and the preamble
will be corrupted. If a computer detects that another computer is sending
a frame of data, then it will not send any other frames. If a collision
is detected then the transmitting machines will wait a random amount of
time before resending the data.
-
Cambridge ring - All the computers are connected in a loop. Each computer
(host) has a repeater for the signal. Data moves round the ring in a train
of mini-packets. Each mini packet has a destination address and a full/empty
flag. When the train reaches a host which is the destination of one of
the mini packets, the packet is removed from the train, and the packet
header is sent on with the full/empty flag set to 0. A monitor (a separate
machine) controls the data train. When it first encounters a packet it
sets a flag to say that it has seen the packet. If it sees the packet again
then the packet is removed from the train. To transmit a packet a host
adds it to the end of the train.
-
Token Ring Network - This type of network uses a token to determine which
host has the authority to send data. It adds data to the token, and this
travels round to it's destination. A host wanting to send data must wait
for a free token. The monitors main job is to look for lost-token errors.
-
Aloha - Packet based radio system used by Uni of Hawaii. Two frequencies
are used, one for sending from the host, and one for sending to the host.
If a collision occurs with two islands sending to the host, the the host
will not be able to send an acknowledgement. The senders will then wait
for a random amount of time before resending their messages.
Operating Systems
OS Components
-
Process Management
-
Main Memory management
-
Secondary storage management
-
I/O System Management
-
File Management
-
Protection System
-
Networking
-
Command-Interpreter System
Typical Layered Approach
-
User Programs
-
Buffering for input and output devices
-
Operator console device drivers
-
Memory management
-
CPU Scheduling
-
Hardware
Memory Protection
For each program two registers can be used for memory protection. The base
register holds the smallest legal memory address. The limit register holds
the size of the range.
Device Management
-
Direct I/O with Polling - The CPU regularly checks the controller state
register to see if the device is idle, busy or has completed it's requested
operation. If it is idle then the CPU can place a command in it's Command
Register.
-
Interrupts - When the device has completed it's task it interrupts the
CPU. The interrupt handler determines which device interrupted it and acts
on the interrupt.
Privileged Instructions
These are instructions that only the OS can use. They control enabling
and disabling interrupts, switching between processes, accessing memory
protection registers, performing I/O and halting the CPU. These instructions
can only be used in Supervisor mode, not User mode.
Interrupts
An interrupt is a signal that causes the control unit to branch to a specific
location to execute code to service the occurrence of an external condition.
The first level interrupt handler determines the source of an interrupt,
then services the interrupt.
Dispatcher
This allocates the CPU amongst the various processes.
Scheduler
-
Long term controls the degree of multi programming i.e. the number of processes
in memory.
-
Medium term scheduler - occasionally removes processes from memory if the
memory becomes over allocated. Also swaps processes into memory.
-
Short term Scheduler - performed by the CPU scheduler. It decided which
process should be executed next. In pre-emptive multitasking if a process
arrives with a length of less than the remaining current process, then
the current process will be removed from execution.
Types of Scheduling Algorithm
-
First Come First Serve
-
Shortest Job first - non-preemptive is when the process cannot be pre-empted
until it completes it's CPU burst. Pre-emptive is where if a process arrives
with a less remaining time than the currently executing process, the currently
executing process is removed from memory
-
Priority Scheduling
-
Round Robin - Each process gets a small amount of CPU time. It is then
pre-empted and added to the end of the ready queue.
Memory management
-
Overlays - only keep in memory those instructions that are needed at any
given time. Implemented by user, no OS support is needed.
-
Logical / Physical Addressing - The programs see the logical or virtual
address. The physical address is only seen by the memory management unit.
In the MMU the value of the relocation register is added to the address
generated by the user process at the time it is sent to memory.
-
Swapping - process can be swapped out of memory and into a backing store.
It is brought back when it needs to be executed.
-
Page Table - held in memory and maps between logical and physical memory.
-
Segmentation - Program is split into logical units such as main program,
functions, stacks etc.
Virtual Memory
-
Demand Paging - A page is brought into memory whenever it is needed. When
there is a reference to a page a page fault is generated. If the page is
not in memory then the page will be swapped into memory. Usually another
page will have to be swapped out of memory in order to make way for the
new page. There are a number of algorithms to decide which page to swap
out of memory.
-
FIFO
-
Optimal - replaces page that will not be used for longest period of time.
Used to identify how well other algorithms perform.
-
Least Recently Used
Secondary Storage
There are a number of disk scheduling algorithms.
-
First Come Fist Served
-
Shortest Seek Time First
-
Scan - The disk arm starts at one end of the disk are sweeps across it
to the other end servicing requests as it passes them. It then goes back
the other way.
-
C-Scan - Treats the cylinder as a circular list that wraps from the last
cylinder to the first.
-
C-Look - Arm only goes as far as the last request in each direction, then
reverses immediately and goes back.
Software Engineering
Management
-
Team - work towards a collective goal, collective ego
-
Group - individual goals and egos
Belbin's management roles - Chairman, Shaper, Plant, Evaluator, Company
worker, Resource instigator, Team worker, Finisher progress
Software Problems
-
Sophisticated hardware has outpaced out ability to use it's potential
-
Our ability to build new programs cannot keep pace with demand
-
Ability to maintain existing programs cannot keep pace with demand.
Software Crisis
-
Schedule and cost estimates are often inaccurate
-
Productivity of software has not kept pace with demand
-
Quality is often inadequate
-
Existing software is difficult to maintain
Risk
Risks involve uncertainty and loss. We can never be 100% sure that they
will occur, but if they do then we will experience an unwanted loss.
-
Examine potential damage and probability
-
They affect cost, schedule and performance
-
We need to decide a level at which the risks will cause the project to
terminate
Schedules
Slipage, slacktime, Pert and Gantt charts
Requirements
A requirement is something that a customer needs.
-
Functional requirements - Interactions between the system and it's environment
-
Non Functional requirements - Restrictions on system, e.g. specific hardware
or software or performance requirements
Cost Estimation
Cocomo produces an estimate of man months and developer schedule in months.
It does this by taking LOC and cost drivers into account. There are three
types of cocomo
-
Basic - takes LOC into account
-
Intermediate - uses LOC and cost drivers
-
Advanced - Also includes departmental cost breakdown
Each software project can be either Organic (small in-house project), embedded
(tight constraints, often tightly coupled to hardware) and semidetached
(mixture of the two). There are four types of cost drivers - product, computer,
personnel and project.
Design
Design is applying techniques and principles for the purpose of identifying
the system in sufficient detail to permit it's physical implementation.
It is the foundation for implementation, testing, maintenance and reuse.
Design consists of the following concepts
-
Abstraction
-
Refinement
-
Information hiding
-
Structured programming
-
Modularity - It is easier to solve lots of little problems, divide and
conquer
-
Reusability (concept supported by all the others)
Cohesion is the degree to which modules perform one, and only one function.
Coupling is the degree to which the module is connected to other modules
in the system. Design is essential for quality software.
DFD and JSP can be used. In JSP *-iteration 0-option/branch
OO Design
Objects (data structures), operations (specification of process), messages
(invocation method). Encapsulation - user of object only needs to know
what the object does, not how it does it. Polymorphism - the ability to
take several forms.
SCM
This is the tracking and control of activities. It begins when software
development starts, and ends when the software is taken out of use. SCM
consists of Identifying Change, Controlling Change, Auditing Change and
Reporting Change. The baseline is the foundation of SCM.
Testing
Testing is a process that leads to debugging. Verification - Are we designing
the system right? Validation - Are we designing the right system. Technical
reviews (Walkthroughs)
-
Unit Testing - usually white box, branch coverage, boundary value analysis
and equivalence classes. Uses drivers (main programs) and stubs (replace
subordinate modules that are not being tested)
-
Integration testing - (design) Mostly black box
-
Validation - (requirements) black box, demonstrates tractability to requirements.
-
System testing - validates software once it has been incorporated into
a larger system.
Regression testing - re-doing tests when the program has changed, in order
to verify that it is still correct.
The three standard methods of debugging are brute force, backtracking
and cause elimination. When testing, code statement coverage, branch coverage
and condition coverage can be used.
Metrics
-
McCalls quality factors - Audibility, accuracy, completeness, consistency,
data communality, error tolerance, execution efficiency, expandability,
generality, hardware independence, reliability, testability, modularity
etc.
-
FURPS - functionality, usability, reliability, performance and supportability.
Process Metrics:
-
G/Q/M - Goal/Question/Metrics
-
QFD - Quality Functional Deployment - needed for identifying control points
of quality assurance to be carried through the design
-
CMM - Compatibility Maturity Model - 5 levels of process maturity, 1.Initial,
Repeatable, Defined, Management, 5.Optimising. The companies are independently
assessed.
McCabe's Cyclomatic Number
Number of linear independent paths in control flow graph. V(G) = e-n+2
For programs consisting of more than one function (p functions) V(G)
= e-n+(2*p) It is a good idea for the value to be <=7. Need to be
able to construct Control Flow Graphs (program graphs)
Projects Home
Copyright © 1997 - 2023
Privacy policy
Hosted by 1&1
|